Exploring neural networks for facial recognition

Issued by Saratoga
Johannesburg, 27 Jun 2019

Jason Elder, Senior Technical Consultant at Saratoga

As a technical consultant, Jason Elder has worked on various client projects, delivering functional solutions in a range of industries. A recent project at a large corporate organisation not only took him out of his comfort zone, but also set him on the path of working on solving the many challenges of facial recognition.

This journey began when Elder was asked to solve a computer vision problem using OpenCV, a library of programming functions and cascading classifiers, which use all information collected from an output from a given classifier as additional information for the next classifier in the cascade. In hindsight, this technology stack was ill-suited to the needs of the project. Not knowing this at the time, he pushed this technology to its limits and achieved over 80% detection on average, as well as near real-time recognition.

When the limits of these technologies ultimately became clear, a decision was taken to put the project on hold until an alternative solution could be found. After some research into possible alternative solutions, Elder decided that he would change to using a neural network-based approach. In his own time he started studying object recognition using neural networks and began playing with the technology in order to understand how he could apply this to the original business case.

Neural networks

Neural networks are computing systems inspired by, but not necessarily identical to, animal brains. The benefit is that these networks can ‘learn’ to perform tasks such as image recognition through being shown positive and negative images. For example, training a neural network to seek out ‘heads’ in an image. The system is trained by manually labelling ‘head’ images and feeding the system negative images with no heads, and using the trained network to identify ‘heads’ in other images.

The biggest challenge is that it takes hundreds of thousands of images, if not more, to try to cater for the wide range of hairstyles, clothing, caps, glasses, lighting and camera angles when training a neural network to recognise ‘head’ images.

Elder then came up with the idea of combining his neural network with a basic facial recognition system that would search for facial features in order to reduce the false positives. He theorised that with this approach he could more accurately identify false positives (negative images) and separate those from the positive images (images correctly identified as heads). This would, in theory, allow Elder to use those into further training of the network, while ensuring that he still retained enough variance to prevent overfitting and/or network bias. The goal of this theory was to improve the overall accuracy of the model, as well as tweak the network parameters to reduce the need to run on expensive hardware.

Following this, Elder implemented his own version of Google’s FaceNet research paper. In a nutshell, the researchers' theory proposed using a convolutional neural network that would, in its simplest form, return a 128-dimensional vector embedding for each face. These embeddings could then be used in facial recognition, clustering and verification. The Google FaceNet method proved to be accurate and handled facial occlusion very well, specifically in support of clustering and verification. This, in turn, proved Elder'stheory, helped his network to more accurately detect heads and allowed him to optimise training.

Facial recognition

During the course of this journey, Elder realised that his client had a different business case to solve, one that could leverage some of what he had learned. The new business case was to create a proof of concept for a cost-effective solution that could accurately detect faces and run facial recognition in under five seconds. At the time, current off-the-shelf systems did not quite meet the client’s requirements in terms of cost-effectiveness.

Challenges

Fortunately, as Elder had already looked into combining basic facial detection with his solution, this enabled him to leverage the work he had already done as a foundation for the new project.

The most challenging part of this journey was not the facial recognition, but rather, the ability to accurately and consistently detect a face. In the same way that his previous work required detecting heads, the new system needed to be able to find faces accurately prior to running recognition. Low-quality images, images with bad lighting as well as blurry images make it harder to detect a face in an image. These were significant challenges as the system cannot run facial recognition before detecting actual faces. The actual recognition requirement was solved by using the extracted facial embedding from the detected face and then passing this on to the FaceNet implementation, where the embedding can be matched against a database.

To overcome this challenge, Elder's solution was to break down the detection process into various stages, with each stage passing through its own specialised neural network. Each of these stages look for specific features, extract those and feed their results into the next stage. During the last stage, the final facial data is extracted as a 128-dimensional embedding and is run through his recognition method. This approach not only improved the accuracy but also the overall detection of frontal, partial and occluded faces.

The approach taken by Elder achieved great results on a relatively modest Nvidia GT1030 graphics card. The system managed to average an impressive 1.4 seconds to accurately match an unknown face in an image to a sample database of approximately 1 000 faces and 487 unique profiles.

As many organisations continue to explore the possibilities of new technologies, these kinds of projects will become increasingly prevalent. Elder found the experience of creating this proof of concept a rewarding one. It has sparked what he calls "a lifetime of learning on this journey into the world of AI".

Jason Elder is a Senior Technical Consultant at Saratoga. Connect with us at www.saratoga.co.za for more insights from our technology experts and thought leaders.

Editorial contacts

Exploring neural networks for facial recognition

Results, achieved on an Nvidia GT1030 graphics card, saw the accuracy and overall detection of frontal, partial and occluded faces being improved.