Face and Face’s Landmarks Detection Using Vision Framework in iOS 11
Some time ago, I made a tutorial on how to use face detection in AVFoundation. This time, I would like to show you how to do this using Apple’s new Vision framework, presented at WWDC 2017.
You may be thinking “why should I use something new if the old staff is working quite nicely?”. Well, now we are able to do much more: we can detect a face’s landmarks! Let’s do it!
How do we start?
First of all, we need to setup our camera support by setting the AVSession
, previewLayer
, and implementing a captureOutput
method delegate.
Let me skip the beginning parts and focus on the captureOutput
method. The project is available on our GitHub account and a link to it will be at the bottom of this page.
For this method, we need to convert a sampleBuffer
into a CIImage
object. What is important here is that we need to provide the right orientation, because face detection is really sensitive at this point, and rotated image may cause no results.
1 2 3 4 | let ciImage = CIImage(cvImageBuffer: pixelBuffer!, options: attachments as! [String : Any]?) //leftMirrored for front camera let ciImageWithOrientation = ciImage.applyingOrientation(Int32(UIImageOrientation.leftMirrored.rawValue)) |
For the front camera, we have to use leftMirrored
orientation.
Face detection
To be able to detect specific landmarks of our face, we first of all need to detect the whole face.
Using the Vision framework for this is really easy.
We need to create two objects, one for the face rectangle request and one as a handler of that request.
1 2 | let faceDetection = VNDetectFaceRectanglesRequest() let faceDetectionRequest = VNSequenceRequestHandler() |
How can we detect the face?
Simply call perform
on the request and check the results.
1 2 | try? faceDetectionRequest.perform([faceDetection], on: image) let results = faceDetection.results as? [VNFaceObservation] |
The result of this perform
method is an array of VNFaceObservation
objects which have only one property: landmarks
of VNFaceLandmarks2D
type.
Landmarks detection
Once we have our face detected, we are able to start looking for some landmarks. The full list for this is quite long, as we can detect landmarks like face contour, eyes, eyebrow, nose, lips with outer lips and a few more.
If we want to detect one of these, we need to create a new request and request handler objects focused on our particular landmarks detection.
1 2 | let faceLandmarks = VNDetectFaceLandmarksRequest() let faceLandmarksDetectionRequest = VNSequenceRequestHandler() |
Another really important thing here is setting the faceLandmarks
property inputFaceObservations
Only by setting this one are we able to detect anything more than the whole face.
The usage of VNDetectFaceLandmarksRequest
is exactly the same as in the previous case.
Just call:
1 2 3 4 | try? faceLandmarksDetectionRequest.perform([faceLandmarks], on: image) if let landmarksResults = faceLandmarks.results as? [VNFaceObservation] { for observation in landmarksResults { ... |
and we can iterate through the landmarksResults
Let’s assume we want to draw faceContour using UIBezierPath
, for example.
Each object of VNFaceLandmarkRegion2D
, which is the type of landmark, contains a C array of points
and pointCount
Points are a type of UnsafePointer<vector_float2>
so we need to convert it before its used. I’ve prepared a method for conversion which creates an array of tuples consisting of two CGFloat
values.
The next thing we need to do is perform some calculations because the points values are normalized, which means they are lower than 1.0
1 2 3 4 5 6 | let faceLandmarkPoints = convertedPoints.map { (point: (x: CGFloat, y: CGFloat)) -> (x: CGFloat, y: CGFloat) in let pointX = point.x * boundingBox.width + boundingBox.origin.x let pointY = point.y * boundingBox.height + boundingBox.origin.y return (x: pointX, y: pointY) } |
As you can see, I’m using an object called boundingBox
which is a bounding box of a detected face.
At the end, just draw a UIBezierPath
built from the provided converted points.
It’s worth remembering that Vision
the framework is using a flipped coordinate system, which means we need to to the same with our drawing layer, as our face contour will be upside-down and on the wrong side…
to do this, just call:
1 | shapeLayer.setAffineTransform(CGAffineTransform(scaleX: -1, y: -1)) |
And that’s all. Now your detected face landmarks should be on our face on live camera preview.
Conclusions
You can find the final project here on our GitHub account.
Here’s the result of our working example.
Vision is a nice framework which is also really easy to implement, but only if you remember these small details.
I see many improvements in comparison to the old version of face detection tool. So, if you need really good precision or more details detected on the face, I recommend giving it a chance and playing with it. Once you’re a little more familiar with it, implementing it in a working project shouldn’t be a big problem.
Ready to take your business to the next level with a digital product?
We'll be with you every step of the way, from idea to launch and beyond!
They changed the API slightly and seems like this is causing issue. They have normalizedPoints instead of Points and I think we do not need to do conversion? I cannot make it work.
Hey there, managed to get project to build with following function:
func convertPointsForFace(_ landmark: VNFaceLandmarkRegion2D?, _ boundingBox: CGRect) {
if let points = landmark?.normalizedPoints, let count = landmark?.pointCount {
// let convertedPoints = convert(points, with: count)
let faceLandmarkPoints = points.map { (point: CGPoint) -> (x: CGFloat, y: CGFloat) in
let pointX = point.x * boundingBox.width + boundingBox.origin.x
let pointY = point.y * boundingBox.height + boundingBox.origin.y
return (x: pointX, y: pointY)
}
DispatchQueue.main.async {
self.draw(points: faceLandmarkPoints)
}
}
}
what is frame rate
Ciao, ho provato su xcode9 e iOS 11, ma mi da problemi qui: https://uploads.disquscdn.com/images/241c4ad4a9d513931a62ba06b761b2a4f535837f1184e1bd261a887024d8eb6a.png inoltre vorrei sapere se possibile tenere traccia della distanza della persona della fotocamera ed usare il tutto in una sessione di ArKit senza problemi di lag o altro
Hi, you need to use normalizedPoints for this.
I have changed the function:
func convertPointsForFace(_ landmark: VNFaceLandmarkRegion2D?, _ boundingBox: CGRect) {
if let points = landmark?.normalizedPoints, let count = landmark?.pointCount {
let convertedPoints = convert(points, with: count)
let faceLandmarkPoints = convertedPoints.map { (point: (x: CGFloat, y: CGFloat)) -> (x: CGFloat, y: CGFloat) in
let pointX = point.x * boundingBox.width + boundingBox.origin.x
let pointY = point.y * boundingBox.height + boundingBox.origin.y
return (x: pointX, y: pointY)
}
DispatchQueue.main.async {
self.draw(points: faceLandmarkPoints)
}
}
}
Im still getting this with the new function:
Terminating app due to uncaught exception ‘NSInvalidArgumentException’, reason: ‘-
[VNFaceLandmarkRegion2D normalizedPoints]: unrecognized selector sent to instance
Any ideas?
Can this be used to identify different people?
Any idea to convert the landmarks to CGRect?
what is frame rate ? FPS ?