Face tracking with AVFoundation

Face tracking is an interesting feature which is available on iOS since it’s 5th version. In this tutorial, I would like to show you how to implement it in Swift 3.0.

The biggest issue with the implementation of this feature is a fact that you have to implement camera support by using AVFoundation which is an alternative for UIImagePicker but it’s much more customizable and allow you for doing almost everything with your cameras, and also requires a bit more time…

Ok so let’s code something.

First of all, we can create an instance of CIDetector object which will be used later for our face-detecting features. We have to create it by setting detector type to CIDetectorTypeFace (in the same way we are able to detect rectangles, QR codes or texts) and also specify it’s accuracy (which can be low or high).

let faceDetector = CIDetector(ofType: CIDetectorTypeFace, context: nil, options: [CIDetectorAccuracy : CIDetectorAccuracyLow])

1	let faceDetector = CIDetector(ofType: CIDetectorTypeFace, context: nil, options: [CIDetectorAccuracy : CIDetectorAccuracyLow])

Now it’s time to focus on camera support implementation. We have to create AVCaptureSession instance and set the sessionPreset which defines the quality of captured images.

 session = AVCaptureSession()
[...]
 session.sessionPreset = AVCaptureSessionPresetPhoto

session = AVCaptureSession()

[...]

session.sessionPreset = AVCaptureSessionPresetPhoto

Next step will be getting an access to AVCaptureDevice. For face tracking purpose we will use the front camera. We can get it by filtering the list of all possible devices (cameras) in our real-device.

  lazy var frontCamera: AVCaptureDevice? = {
        guard let devices = AVCaptureDevice.devices(withMediaType: AVMediaTypeVideo) as? [AVCaptureDevice] else { return nil }
        
        return devices.filter { $0.position == .front }.first
   }()

lazy var frontCamera: AVCaptureDevice? = {

guard let devices = AVCaptureDevice.devices(withMediaType: AVMediaTypeVideo) as? [AVCaptureDevice] else { return nil }

return devices.filter { $0.position == .front }.first

}()

Once we have an instance of captureDevice (front camera) we have to create AVCaptureDeviceInput which will be added to our AVCaptureSession.

We should lock the session for our changes by using beginConfiguration method.

Good and safe way to add a new input to our session is checking if we are able to add it before using addInput method.

let deviceInput = try AVCaptureDeviceInput(device: captureDevice)
session.beginConfiguration()
            
if session.canAddInput(deviceInput) {
   session.addInput(deviceInput)
}

let deviceInput = try AVCaptureDeviceInput(device: captureDevice)

session.beginConfiguration()

if session.canAddInput(deviceInput) {

session.addInput(deviceInput)

}

If our input is created we have to create output to capture data from the camera.

We can use AVCaptureVideoDataOutput instance for that. We should also add some video settings like pixels format type, and add it in the same way as in the case of input.

let output = AVCaptureVideoDataOutput()
output.videoSettings = [kCVPixelBufferPixelFormatTypeKey as String : NSNumber(value: kCVPixelFormatType_420YpCbCr8BiPlanarFullRange)]
output.alwaysDiscardsLateVideoFrames = true
        
if session.canAddOutput(output) {
     session.addOutput(output)
}
            
session.commitConfiguration()

let output = AVCaptureVideoDataOutput()

output.videoSettings = [kCVPixelBufferPixelFormatTypeKey as String : NSNumber(value: kCVPixelFormatType_420YpCbCr8BiPlanarFullRange)]

output.alwaysDiscardsLateVideoFrames = true

if session.canAddOutput(output) {

session.addOutput(output)

}

session.commitConfiguration()

Once we are finished with the configuration we have to call commitConfiguration to let our session instance know that everything is set.

Because our output will be collecting data all the time once session will be running, we have to create a special dispatch queue for it.

let queue = DispatchQueue(label: "output.queue")
output.setSampleBufferDelegate(self, queue: queue)

1 2	let queue = DispatchQueue(label: "output.queue") output.setSampleBufferDelegate(self, queue: queue)

Next, we have to implement AVCaptureVideoDataOutputSampleBufferDelegate to have an access to raw data which is gathered by the camera.

Now the magic begins…

Our face detector is able to look for features in the instance of CIImage so we have to convert our sampleBuffer form delegate method to it.

By features I mean mouths, eyes, heads (yes, we can detect for more than one person at once).

To get CIImage on which faceDetector will be looking for features, we have to use CMSampleBuggerGetImageBuffer.

We also have to create an options object which will define what exactly on our faces we have to be looking for.

In example app which is available on our GitHub (link at the bottom of page), I focused on detecting smile and eyes blink.

func captureOutput(_ captureOutput: AVCaptureOutput!, didOutputSampleBuffer sampleBuffer: CMSampleBuffer!, from connection: AVCaptureConnection!) {
        let pixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer)
        let attachments = CMCopyDictionaryOfAttachments(kCFAllocatorDefault, sampleBuffer, kCMAttachmentMode_ShouldPropagate)
        let ciImage = CIImage(cvImageBuffer: pixelBuffer!, options: attachments as! [String : Any]?)
        let options: [String : Any] = [CIDetectorImageOrientation: exifOrientation(orientation: UIDevice.current.orientation),
                                       CIDetectorSmile: true,
                                       CIDetectorEyeBlink: true]

func captureOutput(_ captureOutput: AVCaptureOutput!, didOutputSampleBuffer sampleBuffer: CMSampleBuffer!, from connection: AVCaptureConnection!) {

let pixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer)

let attachments = CMCopyDictionaryOfAttachments(kCFAllocatorDefault, sampleBuffer, kCMAttachmentMode_ShouldPropagate)

let ciImage = CIImage(cvImageBuffer: pixelBuffer!, options: attachments as! [String : Any]?)

let options: [String : Any] = [CIDetectorImageOrientation: exifOrientation(orientation: UIDevice.current.orientation),

CIDetectorSmile: true,

CIDetectorEyeBlink: true]

Once we have our ciImage and options set up, we can start real tracking. CIDetector object has a function called features which returns an array with all found features.

let allFeatures = faceDetector?.features(in: ciImage, options: options)

1	let allFeatures = faceDetector?.features(in: ciImage, options: options)

Now we can loop through the array to examine the bounds of each face and each feature in the faces. I’m focused only on displaying details about one person.

To have an access to properties like mouthPosition, hasSmile or left/right eye closed we need to cast feature to CIFaceFeature before.

Inside for loop, I have also helper function which can calculate proper face rest and update label.

for feature in features {
     if let faceFeature = feature as? CIFaceFeature {
        let faceRect = calculateFaceRect(facePosition: faceFeature.mouthPosition, faceBounds: faceFeature.bounds, clearAperture: cleanAperture)
        let featureDetails = ["has smile: \(faceFeature.hasSmile)",
                              "has closed left eye: \(faceFeature.leftEyeClosed)",
                              "has closed right eye: \(faceFeature.rightEyeClosed)"]        
        update(with: faceRect, text: featureDetails.joined(separator: "\n"))
     }
}

for feature in features {

if let faceFeature = feature as? CIFaceFeature {

let faceRect = calculateFaceRect(facePosition: faceFeature.mouthPosition, faceBounds: faceFeature.bounds, clearAperture: cleanAperture)

let featureDetails = ["has smile: \(faceFeature.hasSmile)",

"has closed left eye: \(faceFeature.leftEyeClosed)",

"has closed right eye: \(faceFeature.rightEyeClosed)"]

update(with: faceRect, text: featureDetails.joined(separator: "\n"))

}

So as you can see here, the implementation of face features tracking is really easy but only once AVFoundation camera support is implemented.

You can find the complete source code on Droids on Roids’s GitHub repository.

Hi Paweł,
I am experimenting with your code and find it very cool! I wonder if you can help me with a problem I am having. I would like to be able to click on the red square of the person when they are highlighted which will then crop their face image to be used in another method I have.

I have most of it working, but I find that for some reason, the touchesBegan method isn’t called very reliably within the parameters of the red box. It is called some of the time, but it seems like the camera interferes with it. When I slow down the frame rate of the capture device (in the commented lines), it seems to be a little more reliable, but still not 100%.

Here is my ViewController code which will show an XY coordinate if you click outside the red box, and should also show the message “Click Detected within Face Bounds” if you are in the box..]

Have you seen problems like this where the touchesBegan event doesn’t work on the face detected events? Thanks for any insight you can provide and keep up the awesome code samples!

Thanks,
Dave

Here are my 2 files:

////
//// Globals.swift
////
import UIKit

var CapturedImage:UIImage = UIImage();
var CapturedFaceRect:CGRect = CGRect()
var wasEventCaptured:Bool = false

//
// ViewController.swift
// AutoCamera
//
// Created by Pawel Chmiel on 26.09.2016.
// Copyright © 2016 Pawel Chmiel. All rights reserved.
//

import Foundation
import AVFoundation
import UIKit

class ViewController: UIViewController {

var session: AVCaptureSession?
var stillOutput = AVCaptureStillImageOutput()
var borderLayer: CAShapeLayer?

let detailsView: DetailsView = {
let detailsView = DetailsView()
detailsView.setup()
return detailsView
}()

lazy var previewLayer: AVCaptureVideoPreviewLayer? = {
var previewLay = AVCaptureVideoPreviewLayer(session: self.session!)
previewLay?.videoGravity = AVLayerVideoGravityResizeAspectFill
return previewLay
}()

lazy var frontCamera: AVCaptureDevice? = {
guard let devices = AVCaptureDevice.devices(withMediaType: AVMediaTypeVideo) as? [AVCaptureDevice] else { return nil }
return devices.filter { $0.position == .front }.first
}()

let faceDetector = CIDetector(ofType: CIDetectorTypeFace, context: nil, options: [CIDetectorAccuracy : CIDetectorAccuracyHigh])

override func touchesBegan(_ touches: Set, with event: UIEvent?) {
if let touch = touches.first
{

let position:CGPoint = touch.location(in: detailsView)
print(position.x)
print(position.y)
if (wasEventCaptured){
CapturedImage = imageRotatedByDegrees(oldImage: CapturedImage, deg: 90.0)
wasEventCaptured = false
}

}
}

func imageRotatedByDegrees(oldImage: UIImage, deg degrees: CGFloat) -> UIImage {
//Calculate the size of the rotated view’s containing box for our drawing space
let rotatedViewBox: UIView = UIView(frame: CGRect(x: 0, y: 0, width: oldImage.size.width, height: oldImage.size.height))
let t: CGAffineTransform = CGAffineTransform(rotationAngle: degrees * CGFloat(Double.pi / 180))
rotatedViewBox.transform = t
let rotatedSize: CGSize = rotatedViewBox.frame.size
//Create the bitmap context
UIGraphicsBeginImageContext(rotatedSize)
let bitmap: CGContext = UIGraphicsGetCurrentContext()!
//Move the origin to the middle of the image so we will rotate and scale around the center.
bitmap.translateBy(x: rotatedSize.width / 2, y: rotatedSize.height / 2)
//Rotate the image context
bitmap.rotate(by: (degrees * CGFloat(Double.pi / 180)))
//Now, draw the rotated/scaled image into the context
bitmap.scaleBy(x: 1.0, y: -1.0)
bitmap.draw(oldImage.cgImage!, in: CGRect(x: -oldImage.size.width / 2, y: -oldImage.size.height / 2, width: oldImage.size.width, height: oldImage.size.height))
let newImage: UIImage = UIGraphicsGetImageFromCurrentImageContext()!
UIGraphicsEndImageContext()
return newImage
}

override func viewDidLayoutSubviews() {
super.viewDidLayoutSubviews()
previewLayer?.frame = view.frame
}

override func viewDidAppear(_ animated: Bool) {
super.viewDidAppear(animated)
guard let previewLayer = previewLayer else { return }
view.layer.addSublayer(previewLayer)
view.addSubview(detailsView)

}

override func viewDidLoad() {
super.viewDidLoad()
sessionPrepare()
session?.startRunning()
self.view.isUserInteractionEnabled = true
}
}

class DetailsView: UIView {

override func touchesBegan(_ touches: Set, with event: UIEvent?) {
print(“Click Detected within Face Bounds”)
wasEventCaptured = true
super.touchesBegan(touches, with: event)
}

func setup() {
layer.borderColor = UIColor.red.withAlphaComponent(0.7).cgColor
layer.borderWidth = 5.0

}
}

extension ViewController {

func sessionPrepare() {
session = AVCaptureSession()

guard let session = session, let captureDevice = frontCamera else { return }

session.sessionPreset = AVCaptureSessionPresetPhoto

do {

let deviceInput = try AVCaptureDeviceInput(device: captureDevice)
session.beginConfiguration()

if session.canAddInput(deviceInput) {
session.addInput(deviceInput)
}

//Framerate throttle
// try captureDevice.lockForConfiguration()
// captureDevice.activeVideoMinFrameDuration = CMTimeMake(1, 2)
// captureDevice.activeVideoMaxFrameDuration = CMTimeMake(1, 2)
// captureDevice.unlockForConfiguration()

let output = AVCaptureVideoDataOutput()
output.videoSettings = [kCVPixelBufferPixelFormatTypeKey as String : NSNumber(value: kCVPixelFormatType_420YpCbCr8BiPlanarFullRange)]

output.alwaysDiscardsLateVideoFrames = false

if session.canAddOutput(output) {
session.addOutput(output)
}

session.commitConfiguration()

let queue = DispatchQueue(label: “output.queue”)
output.setSampleBufferDelegate(self, queue: queue)

} catch {
print(“error with creating AVCaptureDeviceInput”)
}
}
}

extension ViewController: AVCaptureVideoDataOutputSampleBufferDelegate {

func captureOutput(_ captureOutput: AVCaptureOutput!, didOutputSampleBuffer sampleBuffer: CMSampleBuffer!, from connection: AVCaptureConnection!) {
let pixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer)
let attachments = CMCopyDictionaryOfAttachments(kCFAllocatorDefault, sampleBuffer, kCMAttachmentMode_ShouldPropagate)
let ciImage = CIImage(cvImageBuffer: pixelBuffer!, options: attachments as! [String : Any]?)
let options: [String : Any] = [CIDetectorImageOrientation: exifOrientation(orientation: UIDevice.current.orientation),
CIDetectorSmile: true,
CIDetectorEyeBlink: true]
let allFeatures = faceDetector?.features(in: ciImage, options: options)

let formatDescription = CMSampleBufferGetFormatDescription(sampleBuffer)
let cleanAperture = CMVideoFormatDescriptionGetCleanAperture(formatDescription!, false)

guard let features = allFeatures else { return }

for feature in features {
if let faceFeature = feature as? CIFaceFeature {
let faceRect = calculateFaceRect(facePosition: faceFeature.mouthPosition, faceBounds: faceFeature.bounds, clearAperture: cleanAperture)
// let featureDetails = [“has smile: (faceFeature.hasSmile)”,
// “has closed left eye: (faceFeature.leftEyeClosed)”,
// “has closed right eye: (faceFeature.rightEyeClosed)”]
//
//update(with: faceRect, text: featureDetails.joined(separator: “n”))
update(with: faceRect, text: “”)

}
}

if features.count == 0 {
DispatchQueue.main.async {
self.detailsView.alpha = 0.0
}
}
else
{
//Capture Image contained within Bounds
let myOrigin = CapturedFaceRect.origin
let myTopRight = CGPoint(x: CapturedFaceRect.maxX, y: CapturedFaceRect.minY)
let myBottomLeft = CGPoint(x: CapturedFaceRect.minX, y: CapturedFaceRect.maxY)
let myBottomRight = CGPoint(x: CapturedFaceRect.maxX, y: CapturedFaceRect.maxY)

let croppedFace = cropFaceForPoints(image: ciImage, topLeft: myOrigin, topRight: myTopRight, bottomLeft: myBottomLeft, bottomRight: myBottomRight)
CapturedImage = convert(cmage: croppedFace)

}

func cropFaceForPoints(image: CIImage, topLeft: CGPoint, topRight: CGPoint, bottomLeft: CGPoint, bottomRight: CGPoint) -> CIImage {

var newImage: CIImage
newImage = image.applyingFilter(
“CIPerspectiveTransformWithExtent”,
withInputParameters: [
“inputExtent”: CIVector(cgRect: image.extent),
“inputTopLeft”: CIVector(cgPoint: topLeft),
“inputTopRight”: CIVector(cgPoint: topRight),
“inputBottomLeft”: CIVector(cgPoint: bottomLeft),
“inputBottomRight”: CIVector(cgPoint: bottomRight)])
newImage = image.cropping(to: newImage.extent)

return newImage
}

func convert(cmage:CIImage) -> UIImage
{
let context:CIContext = CIContext.init(options: nil)
let cgImage:CGImage = context.createCGImage(cmage, from: cmage.extent)!
let image:UIImage = UIImage.init(cgImage: cgImage)
return image
}

func exifOrientation(orientation: UIDeviceOrientation) -> Int {
switch orientation {
case .portraitUpsideDown:
return 8
case .landscapeLeft:
return 3
case .landscapeRight:
return 1
default:
return 6
}
}

func videoBox(frameSize: CGSize, apertureSize: CGSize) -> CGRect {
let apertureRatio = apertureSize.height / apertureSize.width
let viewRatio = frameSize.width / frameSize.height

var size = CGSize.zero

if (viewRatio > apertureRatio) {
size.width = frameSize.width
size.height = apertureSize.width * (frameSize.width / apertureSize.height)
} else {
size.width = apertureSize.height * (frameSize.height / apertureSize.width)
size.height = frameSize.height
}

var videoBox = CGRect(origin: .zero, size: size)

if (size.width < frameSize.width) {
videoBox.origin.x = (frameSize.width – size.width) / 2.0
} else {
videoBox.origin.x = (size.width – frameSize.width) / 2.0
}

if (size.height CGRect {
let parentFrameSize = previewLayer!.frame.size
let previewBox = videoBox(frameSize: parentFrameSize, apertureSize: clearAperture.size)

var faceRect = faceBounds

swap(&faceRect.size.width, &faceRect.size.height)
swap(&faceRect.origin.x, &faceRect.origin.y)

let widthScaleBy = previewBox.size.width / clearAperture.size.height
let heightScaleBy = previewBox.size.height / clearAperture.size.width

faceRect.size.width *= widthScaleBy
faceRect.size.height *= heightScaleBy
faceRect.origin.x *= widthScaleBy
faceRect.origin.y *= heightScaleBy

faceRect = faceRect.offsetBy(dx: 0.0, dy: previewBox.origin.y)
let frame = CGRect(x: parentFrameSize.width – faceRect.origin.x – faceRect.size.width / 2.0 – previewBox.origin.x / 2.0, y: faceRect.origin.y, width: faceRect.width, height: faceRect.height)
CapturedFaceRect = faceBounds

return frame
}
}

extension ViewController {
func update(with faceRect: CGRect, text: String) {
DispatchQueue.main.async {
UIView.animate(withDuration: 0.2) {
//self.detailsView.detailsLabel.text = text
self.detailsView.alpha = 1.0
self.detailsView.frame = faceRect
}
}
}
}

Face tracking with AVFoundation

Ready to take your business to the next level with a digital product?

Breaking Down the Apple Vision Pro: A Revolutionary Spatial Computer

Native vs. Cross-platform App Development – All You Need to Know as an Entrepreneur

10 Ways AI Can Speed Up your Mobile App Development | Guide for Developers

Flutter vs. React Native – Which is Better for Your Project? [Updated Analysis for 2025]

Convert Your Native Project to Kotlin Multiplatform – Developer’s Guide

Resources

Leave a Reply Cancel reply

3 responses to "Face tracking with AVFoundation"

Face tracking with AVFoundation

Ready to take your business to the next level with a digital product?

Related articles

Breaking Down the Apple Vision Pro: A Revolutionary Spatial Computer

Native vs. Cross-platform App Development – All You Need to Know as an Entrepreneur

10 Ways AI Can Speed Up your Mobile App Development | Guide for Developers

Flutter vs. React Native – Which is Better for Your Project? [Updated Analysis for 2025]

Convert Your Native Project to Kotlin Multiplatform – Developer’s Guide

Resources

Leave a Reply Cancel reply

3 responses to "Face tracking with AVFoundation"

I have a project in mind and want to talk about it!

Woohoo! Your answers just landed in our inbox!

I have a project  in mind and want to talk about it!

Woohoo! Your answers just
landed in our inbox!