Face tracking with AVFoundation
Face tracking is an interesting feature which is available on iOS since it’s 5th version. In this tutorial, I would like to show you how to implement it in Swift 3.0.
The biggest issue with the implementation of this feature is a fact that you have to implement camera support by using AVFoundation which is an alternative for UIImagePicker but it’s much more customizable and allow you for doing almost everything with your cameras, and also requires a bit more time…
Ok so let’s code something.
First of all, we can create an instance of CIDetector object which will be used later for our face-detecting features. We have to create it by setting detector type to CIDetectorTypeFace (in the same way we are able to detect rectangles, QR codes or texts) and also specify it’s accuracy (which can be low or high).
1 | let faceDetector = CIDetector(ofType: CIDetectorTypeFace, context: nil, options: [CIDetectorAccuracy : CIDetectorAccuracyLow]) |
Now it’s time to focus on camera support implementation. We have to create AVCaptureSession instance and set the sessionPreset which defines the quality of captured images.
1 2 3 | session = AVCaptureSession() [...] session.sessionPreset = AVCaptureSessionPresetPhoto |
Next step will be getting an access to AVCaptureDevice. For face tracking purpose we will use the front camera. We can get it by filtering the list of all possible devices (cameras) in our real-device.
1 2 3 4 5 | lazy var frontCamera: AVCaptureDevice? = { guard let devices = AVCaptureDevice.devices(withMediaType: AVMediaTypeVideo) as? [AVCaptureDevice] else { return nil } return devices.filter { $0.position == .front }.first }() |
Once we have an instance of captureDevice (front camera) we have to create AVCaptureDeviceInput which will be added to our AVCaptureSession.
We should lock the session for our changes by using beginConfiguration method.
Good and safe way to add a new input to our session is checking if we are able to add it before using addInput method.
1 2 3 4 5 6 | let deviceInput = try AVCaptureDeviceInput(device: captureDevice) session.beginConfiguration() if session.canAddInput(deviceInput) { session.addInput(deviceInput) } |
If our input is created we have to create output to capture data from the camera.
We can use AVCaptureVideoDataOutput instance for that. We should also add some video settings like pixels format type, and add it in the same way as in the case of input.
1 2 3 4 5 6 7 8 9 | let output = AVCaptureVideoDataOutput() output.videoSettings = [kCVPixelBufferPixelFormatTypeKey as String : NSNumber(value: kCVPixelFormatType_420YpCbCr8BiPlanarFullRange)] output.alwaysDiscardsLateVideoFrames = true if session.canAddOutput(output) { session.addOutput(output) } session.commitConfiguration() |
Once we are finished with the configuration we have to call commitConfiguration to let our session instance know that everything is set.
Because our output will be collecting data all the time once session will be running, we have to create a special dispatch queue for it.
1 2 | let queue = DispatchQueue(label: "output.queue") output.setSampleBufferDelegate(self, queue: queue) |
Next, we have to implement AVCaptureVideoDataOutputSampleBufferDelegate to have an access to raw data which is gathered by the camera.
Now the magic begins…
Our face detector is able to look for features in the instance of CIImage so we have to convert our sampleBuffer form delegate method to it.
By features I mean mouths, eyes, heads (yes, we can detect for more than one person at once).
To get CIImage on which faceDetector will be looking for features, we have to use CMSampleBuggerGetImageBuffer.
We also have to create an options object which will define what exactly on our faces we have to be looking for.
In example app which is available on our GitHub (link at the bottom of page), I focused on detecting smile and eyes blink.
1 2 3 4 5 6 7 | func captureOutput(_ captureOutput: AVCaptureOutput!, didOutputSampleBuffer sampleBuffer: CMSampleBuffer!, from connection: AVCaptureConnection!) { let pixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) let attachments = CMCopyDictionaryOfAttachments(kCFAllocatorDefault, sampleBuffer, kCMAttachmentMode_ShouldPropagate) let ciImage = CIImage(cvImageBuffer: pixelBuffer!, options: attachments as! [String : Any]?) let options: [String : Any] = [CIDetectorImageOrientation: exifOrientation(orientation: UIDevice.current.orientation), CIDetectorSmile: true, CIDetectorEyeBlink: true] |
Once we have our ciImage and options set up, we can start real tracking. CIDetector object has a function called features which returns an array with all found features.
1 | let allFeatures = faceDetector?.features(in: ciImage, options: options) |
Now we can loop through the array to examine the bounds of each face and each feature in the faces. I’m focused only on displaying details about one person.
To have an access to properties like mouthPosition, hasSmile or left/right eye closed we need to cast feature to CIFaceFeature before.
Inside for loop, I have also helper function which can calculate proper face rest and update label.
1 2 3 4 5 6 7 8 9 | for feature in features { if let faceFeature = feature as? CIFaceFeature { let faceRect = calculateFaceRect(facePosition: faceFeature.mouthPosition, faceBounds: faceFeature.bounds, clearAperture: cleanAperture) let featureDetails = ["has smile: \(faceFeature.hasSmile)", "has closed left eye: \(faceFeature.leftEyeClosed)", "has closed right eye: \(faceFeature.rightEyeClosed)"] update(with: faceRect, text: featureDetails.joined(separator: "\n")) } } |
So as you can see here, the implementation of face features tracking is really easy but only once AVFoundation camera support is implemented.
You can find the complete source code on Droids on Roids’s GitHub repository.
Ready to take your business to the next level with a digital product?
We'll be with you every step of the way, from idea to launch and beyond!
Hi Paweł,
I am experimenting with your code and find it very cool! I wonder if you can help me with a problem I am having. I would like to be able to click on the red square of the person when they are highlighted which will then crop their face image to be used in another method I have.
I have most of it working, but I find that for some reason, the touchesBegan method isn’t called very reliably within the parameters of the red box. It is called some of the time, but it seems like the camera interferes with it. When I slow down the frame rate of the capture device (in the commented lines), it seems to be a little more reliable, but still not 100%.
Here is my ViewController code which will show an XY coordinate if you click outside the red box, and should also show the message “Click Detected within Face Bounds” if you are in the box..]
Have you seen problems like this where the touchesBegan event doesn’t work on the face detected events? Thanks for any insight you can provide and keep up the awesome code samples!
Thanks,
Dave
Here are my 2 files:
////
//// Globals.swift
////
import UIKit
var CapturedImage:UIImage = UIImage();
var CapturedFaceRect:CGRect = CGRect()
var wasEventCaptured:Bool = false
//
// ViewController.swift
// AutoCamera
//
// Created by Pawel Chmiel on 26.09.2016.
// Copyright © 2016 Pawel Chmiel. All rights reserved.
//
import Foundation
import AVFoundation
import UIKit
class ViewController: UIViewController {
var session: AVCaptureSession?
var stillOutput = AVCaptureStillImageOutput()
var borderLayer: CAShapeLayer?
let detailsView: DetailsView = {
let detailsView = DetailsView()
detailsView.setup()
return detailsView
}()
lazy var previewLayer: AVCaptureVideoPreviewLayer? = {
var previewLay = AVCaptureVideoPreviewLayer(session: self.session!)
previewLay?.videoGravity = AVLayerVideoGravityResizeAspectFill
return previewLay
}()
lazy var frontCamera: AVCaptureDevice? = {
guard let devices = AVCaptureDevice.devices(withMediaType: AVMediaTypeVideo) as? [AVCaptureDevice] else { return nil }
return devices.filter { $0.position == .front }.first
}()
let faceDetector = CIDetector(ofType: CIDetectorTypeFace, context: nil, options: [CIDetectorAccuracy : CIDetectorAccuracyHigh])
override func touchesBegan(_ touches: Set, with event: UIEvent?) {
if let touch = touches.first
{
let position:CGPoint = touch.location(in: detailsView)
print(position.x)
print(position.y)
if (wasEventCaptured){
CapturedImage = imageRotatedByDegrees(oldImage: CapturedImage, deg: 90.0)
wasEventCaptured = false
}
}
}
func imageRotatedByDegrees(oldImage: UIImage, deg degrees: CGFloat) -> UIImage {
//Calculate the size of the rotated view’s containing box for our drawing space
let rotatedViewBox: UIView = UIView(frame: CGRect(x: 0, y: 0, width: oldImage.size.width, height: oldImage.size.height))
let t: CGAffineTransform = CGAffineTransform(rotationAngle: degrees * CGFloat(Double.pi / 180))
rotatedViewBox.transform = t
let rotatedSize: CGSize = rotatedViewBox.frame.size
//Create the bitmap context
UIGraphicsBeginImageContext(rotatedSize)
let bitmap: CGContext = UIGraphicsGetCurrentContext()!
//Move the origin to the middle of the image so we will rotate and scale around the center.
bitmap.translateBy(x: rotatedSize.width / 2, y: rotatedSize.height / 2)
//Rotate the image context
bitmap.rotate(by: (degrees * CGFloat(Double.pi / 180)))
//Now, draw the rotated/scaled image into the context
bitmap.scaleBy(x: 1.0, y: -1.0)
bitmap.draw(oldImage.cgImage!, in: CGRect(x: -oldImage.size.width / 2, y: -oldImage.size.height / 2, width: oldImage.size.width, height: oldImage.size.height))
let newImage: UIImage = UIGraphicsGetImageFromCurrentImageContext()!
UIGraphicsEndImageContext()
return newImage
}
override func viewDidLayoutSubviews() {
super.viewDidLayoutSubviews()
previewLayer?.frame = view.frame
}
override func viewDidAppear(_ animated: Bool) {
super.viewDidAppear(animated)
guard let previewLayer = previewLayer else { return }
view.layer.addSublayer(previewLayer)
view.addSubview(detailsView)
}
override func viewDidLoad() {
super.viewDidLoad()
sessionPrepare()
session?.startRunning()
self.view.isUserInteractionEnabled = true
}
}
class DetailsView: UIView {
override func touchesBegan(_ touches: Set, with event: UIEvent?) {
print(“Click Detected within Face Bounds”)
wasEventCaptured = true
super.touchesBegan(touches, with: event)
}
func setup() {
layer.borderColor = UIColor.red.withAlphaComponent(0.7).cgColor
layer.borderWidth = 5.0
}
}
extension ViewController {
func sessionPrepare() {
session = AVCaptureSession()
guard let session = session, let captureDevice = frontCamera else { return }
session.sessionPreset = AVCaptureSessionPresetPhoto
do {
let deviceInput = try AVCaptureDeviceInput(device: captureDevice)
session.beginConfiguration()
if session.canAddInput(deviceInput) {
session.addInput(deviceInput)
}
//Framerate throttle
// try captureDevice.lockForConfiguration()
// captureDevice.activeVideoMinFrameDuration = CMTimeMake(1, 2)
// captureDevice.activeVideoMaxFrameDuration = CMTimeMake(1, 2)
// captureDevice.unlockForConfiguration()
let output = AVCaptureVideoDataOutput()
output.videoSettings = [kCVPixelBufferPixelFormatTypeKey as String : NSNumber(value: kCVPixelFormatType_420YpCbCr8BiPlanarFullRange)]
output.alwaysDiscardsLateVideoFrames = false
if session.canAddOutput(output) {
session.addOutput(output)
}
session.commitConfiguration()
let queue = DispatchQueue(label: “output.queue”)
output.setSampleBufferDelegate(self, queue: queue)
} catch {
print(“error with creating AVCaptureDeviceInput”)
}
}
}
extension ViewController: AVCaptureVideoDataOutputSampleBufferDelegate {
func captureOutput(_ captureOutput: AVCaptureOutput!, didOutputSampleBuffer sampleBuffer: CMSampleBuffer!, from connection: AVCaptureConnection!) {
let pixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer)
let attachments = CMCopyDictionaryOfAttachments(kCFAllocatorDefault, sampleBuffer, kCMAttachmentMode_ShouldPropagate)
let ciImage = CIImage(cvImageBuffer: pixelBuffer!, options: attachments as! [String : Any]?)
let options: [String : Any] = [CIDetectorImageOrientation: exifOrientation(orientation: UIDevice.current.orientation),
CIDetectorSmile: true,
CIDetectorEyeBlink: true]
let allFeatures = faceDetector?.features(in: ciImage, options: options)
let formatDescription = CMSampleBufferGetFormatDescription(sampleBuffer)
let cleanAperture = CMVideoFormatDescriptionGetCleanAperture(formatDescription!, false)
guard let features = allFeatures else { return }
for feature in features {
if let faceFeature = feature as? CIFaceFeature {
let faceRect = calculateFaceRect(facePosition: faceFeature.mouthPosition, faceBounds: faceFeature.bounds, clearAperture: cleanAperture)
// let featureDetails = [“has smile: (faceFeature.hasSmile)”,
// “has closed left eye: (faceFeature.leftEyeClosed)”,
// “has closed right eye: (faceFeature.rightEyeClosed)”]
//
//update(with: faceRect, text: featureDetails.joined(separator: “n”))
update(with: faceRect, text: “”)
}
}
if features.count == 0 {
DispatchQueue.main.async {
self.detailsView.alpha = 0.0
}
}
else
{
//Capture Image contained within Bounds
let myOrigin = CapturedFaceRect.origin
let myTopRight = CGPoint(x: CapturedFaceRect.maxX, y: CapturedFaceRect.minY)
let myBottomLeft = CGPoint(x: CapturedFaceRect.minX, y: CapturedFaceRect.maxY)
let myBottomRight = CGPoint(x: CapturedFaceRect.maxX, y: CapturedFaceRect.maxY)
let croppedFace = cropFaceForPoints(image: ciImage, topLeft: myOrigin, topRight: myTopRight, bottomLeft: myBottomLeft, bottomRight: myBottomRight)
CapturedImage = convert(cmage: croppedFace)
}
}
func cropFaceForPoints(image: CIImage, topLeft: CGPoint, topRight: CGPoint, bottomLeft: CGPoint, bottomRight: CGPoint) -> CIImage {
var newImage: CIImage
newImage = image.applyingFilter(
“CIPerspectiveTransformWithExtent”,
withInputParameters: [
“inputExtent”: CIVector(cgRect: image.extent),
“inputTopLeft”: CIVector(cgPoint: topLeft),
“inputTopRight”: CIVector(cgPoint: topRight),
“inputBottomLeft”: CIVector(cgPoint: bottomLeft),
“inputBottomRight”: CIVector(cgPoint: bottomRight)])
newImage = image.cropping(to: newImage.extent)
return newImage
}
func convert(cmage:CIImage) -> UIImage
{
let context:CIContext = CIContext.init(options: nil)
let cgImage:CGImage = context.createCGImage(cmage, from: cmage.extent)!
let image:UIImage = UIImage.init(cgImage: cgImage)
return image
}
func exifOrientation(orientation: UIDeviceOrientation) -> Int {
switch orientation {
case .portraitUpsideDown:
return 8
case .landscapeLeft:
return 3
case .landscapeRight:
return 1
default:
return 6
}
}
func videoBox(frameSize: CGSize, apertureSize: CGSize) -> CGRect {
let apertureRatio = apertureSize.height / apertureSize.width
let viewRatio = frameSize.width / frameSize.height
var size = CGSize.zero
if (viewRatio > apertureRatio) {
size.width = frameSize.width
size.height = apertureSize.width * (frameSize.width / apertureSize.height)
} else {
size.width = apertureSize.height * (frameSize.height / apertureSize.width)
size.height = frameSize.height
}
var videoBox = CGRect(origin: .zero, size: size)
if (size.width < frameSize.width) {
videoBox.origin.x = (frameSize.width – size.width) / 2.0
} else {
videoBox.origin.x = (size.width – frameSize.width) / 2.0
}
if (size.height CGRect {
let parentFrameSize = previewLayer!.frame.size
let previewBox = videoBox(frameSize: parentFrameSize, apertureSize: clearAperture.size)
var faceRect = faceBounds
swap(&faceRect.size.width, &faceRect.size.height)
swap(&faceRect.origin.x, &faceRect.origin.y)
let widthScaleBy = previewBox.size.width / clearAperture.size.height
let heightScaleBy = previewBox.size.height / clearAperture.size.width
faceRect.size.width *= widthScaleBy
faceRect.size.height *= heightScaleBy
faceRect.origin.x *= widthScaleBy
faceRect.origin.y *= heightScaleBy
faceRect = faceRect.offsetBy(dx: 0.0, dy: previewBox.origin.y)
let frame = CGRect(x: parentFrameSize.width – faceRect.origin.x – faceRect.size.width / 2.0 – previewBox.origin.x / 2.0, y: faceRect.origin.y, width: faceRect.width, height: faceRect.height)
CapturedFaceRect = faceBounds
return frame
}
}
extension ViewController {
func update(with faceRect: CGRect, text: String) {
DispatchQueue.main.async {
UIView.animate(withDuration: 0.2) {
//self.detailsView.detailsLabel.text = text
self.detailsView.alpha = 1.0
self.detailsView.frame = faceRect
}
}
}
}
hi,Paweł Chmiel.I am interested in -(CGRect)calculateFaceRectFacePosition:(CGPoint)facePosition FaceBounds:(CGRect)faceBounds ClearAperture:(CGRect)clearAperture this function can you elaborate on the principle?
Awesome.