Fast-DiM: Towards Fast Diffusion Morphs
Abstract
Diffusion Morphs (DiM) are a recent state-of-the-art method for creating high quality face morphs; however, they require a high number of network function evaluations (NFE) to create the morphs. We propose a new DiM pipeline, Fast-DiM, which can create morphs of a similar quality but with fewer NFE. We investigate the ODE solvers used to solve the Probability Flow ODE and the impact they have on the the creation of face morphs. Additionally, we employ an alternative method for encoding images into the latent space of the Diffusion model by solving the Probability Flow ODE as time runs forwards. Our experiments show that we can reduce the NFE by upwards of 85% in the encoding process while experiencing only 1.6% reduction in Mated Morph Presentation Match Rate (MMPMR). Likewise, we showed we could cut NFE, in the sampling process, in half with only a maximal reduction of 0.23% in MMPMR.
Index Terms:
Morphing Attack, Face Recognition, Diffusion Models, Numerical Methods, Probability Flow ODE, Score-based Generative Models, ODE SolversI Introduction
Face recognition (FR) systems are a common biometric modality used for identity verification across a diverse range of applications, from simple tasks such as unlocking a smart phone to official businesses such as banking, e-commerce, and law enforcement. Unfortunately, while FR systems can reach excellent performance with low false rejection and acceptance rates, they are uniquely vulnerable to a new class of attacks, that is, the face morphing attack [1]. Face morphing attacks aim to compromise one of the most fundamental properties of biometric security, i.e., the one-to-one mapping from biometric data to the associated identity. To achieve this the attacker creates a morphed face which contains biometric data of both identities. Then one morphed image, when presented, forces the FR system to register a match with two disjoint identities, violating this fundamental principle, see Figure 1 for an example.
Face morphing attacks, thus, pose a significant threat towards FR systems. One notable affected area by this attack is the e-passport, wherein the applicant submits a passport photo either in digital or printed format. This is particularly relevant for countries where e-passports are used for both issuance and renewal of documents. Critically, an adversary who is blacklisted from accessing a certain system, such as e-passport, can create a morph to gain access as a non-blacklisted individual.
In response to the severity of face morphing attacks, an abundance of algorithms have been developed to identify these attacks [1]. There are two broad classes of Morphing Attack Detection (MAD) algorithms based on the scenario in which they operate. The first scenario is where the MAD algorithm is only shown a single image and tasked with deciding if the particular image is a morphed image or a bona fide image [2]. Algorithms which solve this problem are known as Single image-based MAD (S-MAD) algorithms. The second scenario is where the MAD algorithm is presented two images, of which one image is verified to be a bona fide image, e.g., through live capture, and the other is the unknown image that the model is tasked to classify. Algorithms which solve this problem are known as Differential MAD (D-MAD) algorithms [2]. By construction the S-MAD problem is much more difficult that the D-MAD problem, as the D-MAD algorithm has the guaranteed bona fide image to compare against, whereas the S-MAD problem offers no such luxury [1].



A plethora of morphing attacks have been developed, for the purposes of this work we broadly categorize them into two categories: landmark-based morphing attacks and representation-based morphing attacks [3]. Landmark-based morphing attacks use local features to create the morphed image by warping and aligning the landmarks within each face then followed by pixel-wise compositing. Landmark-based attacks have been shown to be highly effective against FR systems [5]. In contrast, representation-based morphing attacks use a machine learning model to embed the original bona fide faces into a representation space which are then combined to produce a new representation that contains information from both identities. This new representation is then used by a generative model to construct the morphed image. Recently, there has been an explosion of work exploring deep-learning based face morphing using generative models like Generative Adversarial Networks (GANs) [2]. Currently, FR systems seem especially vulnerable to landmark-based attacks [2, 3]; however, landmark-based attacks are prone to more noticeable artefacts than representation-based attacks [3].
Recent work has shown that Diffusion Morphs (DiM) can achieve state-of-the-art performance rivaling that of GAN-based methods [3]. However, DiM requires a high number of Network Function Evaluations (NFE), incurring great computation demand and complexity. This renders the integration of further techniques, like identity-based optimization, much more difficult. To draw samples from Diffusion models, an initial image of white noise is deployed and the Probability Flow Ordinary Differential Equation (PF-ODE) is solved as time runs backwards [6]. However, for DiM models an additional encoding step from the original image back to noise is needed, which also uses many NFE to calculate this encoding [7, 3]. We posit that this encoding step is identical to solving the PF-ODE as time runs forwards and propose to use an additional ODE solver to accomplish this encoding. Additionally, we propose to use ODE solvers with faster convergence guarantees in solving the PF-ODE, in lieu of the slower ODE solver used by previous work [3]. We then study the impact of these design choice on the application of face morphing. We summarize our contributions in this work as follows:
-
1.
We propose a novel morphing method named Fast-DiM which can achieve similar performance to DiM but with greatly reduced number of NFE.
-
2.
We perform an extensive study on the impact of the ODE solvers on DiMs.
-
3.
We compare our method to state-of-the-art morphing attacks via a vulnerability and detectability study.
-
4.
To the best of our knowledge, we are the first to study the impact of ODE solvers for the Probability Flow ODE as time runs forward on autoencoding tasks.
II Prior Work
A naïve but simple approach to construct face morphs is to simply take a pixel-wise average of the two images. Unsurprisingly, this approach often yields significant artefacts. These artefacts are especially apparent when the images are not aligned, resulting in strange deformations, e.g., the mouth of one subject overlaps with the nose of another or the morphed image containing four eyes. A simple remedy to this problem is to align the faces so that the key landmarks of the faces overlap, i.e., the nose of subject one aligns with the nose of subject two and so forth with each landmark. The approaches which use this system of warping and aligning the images so the landmarks overlap for each face before taking a pixel-wise average to construct the morph are known as Landmark-based morphs. These Landmark-based morphs often exhibit artefacts outside the core area of the face, enabling easy detection of the morphing attack through simple visual inspection or with MAD algorithms [3, 5].
Later work explored the use of deep generative models to create face morphs. The key idea in this approach is to perform the morphing at the representation-level rather than at the pixel-level. Initial work in this direction pursued the use of Generative Adversarial Networks (GANs) for this purpose. GANs train a generator network through an adversarial strategy. This generator network maps latent vectors from, a typically low dimensional manifold, to the image space. Now to enable face morphing attacks, an encoding strategy which can embed images into the latent space of the generator is needed. This encoding strategy could be an additional encoding network, or something else like optimization with Stochastic Gradient Descent (SGD). Additionally, this encoding strategy needs to have low distortion on the inversion, i.e., an image that is encoded and then mapped back to the image space via the generator should be “very close” to the original image. Using this encoding strategy, the latent representations for two identities are then averaged to produce a new latent representation, i.e., the morphed latent. This morphed latent is then used as the input to the generator which maps the morphed latent back into the image space, yielding the morphed image.
The MIPGAN model by Zhang et al. [2] proposes an extension on GAN-based approach by adding an identity-based loss function derived from an FR system and using it to optimize the morph creation process. Two bona fide images are embedded into the latent space using an encoding network which predicts the latents from the original images. The two latent representations from this procedure are denoted as for subjects and , separately. The morphed latent representation is initially constructed as a linear interpolation between these two latents, i.e., . This initial representation is used as the starting point for a second optimization procedure, wherein the optimal morphed latent is found such that the output of generator network is minimized with respect to a loss function which measures the similarity between the morphed face and two bona fide faces via an FR system and additional perceptual loss metrics. At the end of this optimization procedure, a morphed face should have been found which fools the FR system used to guide the optimization procedure.
Blasingame et al. [3] propose DiM, a novel face morphing approach which uses Diffusion Autoencoders [7] to construct morphed faces. Unlike GANs, which learn a mapping from the latent space to the image space, Diffusion models consider a Stochastic Differential Equation (SDE) which perturbs the initial image distribution into an isotropic Gaussian on the image space over time, given by the Itô SDE
(1) |
where , are the drift and diffusion coefficients and is the Brownian motion. Let be the noise schedule of the diffusion process where denotes how much of the original image is present at time and denotes how much noise is present, such that at any time
(2) |
for some Gaussian noise . Then the drift and diffusion coefficients are
(3) |
Song et al. [6] show that there exists a reverse Ordinary Differential Equation (ODE) known as the Probability Flow ODE (PF-ODE) with the same marginals as exists with the form
(4) |
as time flows backwards from to where is called the score function. Diffusion models learn to model this score function with a neural net, often a U-Net, . By using the learned score function, a wide array of numerical ODE solvers can be deployed to solve the PF-ODE, enabling sampling of the data distribution by drawing an initial condition from the isotropic Gaussian and running the ODE solver.
The Diffusion Autoencoder model consists of a conditioned noise prediction U-Net and encoder network which learns the latent representation for an image [7]. This model uses the deterministic version of the Denoising Diffusion Implicit Model (DDIM) solver where is the time schedule used for sampling with inference steps. Additionally, the deterministic DDIM solver is reversed to introduce the “stochastic encoder” . In this work we refer to this “stochastic encoder” as a DiffAE forward solver, since it has a similar objective to solving Equation 4 as time runs forwards from to .
Let denote face images of two bona fide subjects, and , separately. The DiM morphing procedure is outlined in Algorithm 1. At a high level this approach encodes the bona fide images into stochastic latent codes and by running the reverse DDIM solver. These stochastic latent codes are then morphed using spherical interpolation111For a vector space and two vectors , the spherical interpolation by a factor of is given as where . to give the morphed stochastic latent code . The semantic latent codes are averaged to obtain . These morphed latent codes are then used with the DDIM solver to generate the morphed image . This approach is labeled variant A in [3], while the authors also recommend another approach called variant C which uses an additional “pre-morph” stage to alter the bona fide images before encoding. We call these two approaches DiM-A and DiM-C, respectively. In this work we primarily investigate and improve upon two aspects of the DiM framework, which are the mechanism for encoding into , i.e., the forward ODE solver , and the mechanism for generating the morphed image from , i.e., the PF-ODE solver .
III Experimental Setup
Here we outline the structure of our experiments, giving details on how we compare our proposed morphing attack Fast-DiM against other methods. We explain the dataset we use for evaluation, the FR systems we evaluate against, and the metrics we use to assess the vulnerability of the FR systems to morphing attacks.
III-A Dataset
To evaluate the effectiveness of the morphing algorithms explored in this paper, we use the SYN-MAD 2022222https://github.com/marcohuber/SYN-MAD-2022 competition dataset [5]. The SYN-MAD 2022 dataset consists of pairs of identities used for face morphing from the Face Research Lab London (FRLL) dataset [4]. The FRLL dataset consists of high-quality samples of 102 different individuals with two images per subject, one of a “neutral” expression and the other of a “smiling” expression. The ElasticFace [8] FR system was used to calculate the embedding of all the frontal images of the FRLL dataset. Once the embedding were calculated, the top 250 most similar pairs for each gender, in terms of cosine similarity, were selected [5]. These 500 pairs are used to create the morphed images. In this work we use only the 500 pairs of “neutral” images when creating and evaluating the morphs.
The SYN-MAD 2022 creates morphs with three landmark-based approaches and two GAN-based approaches. These are the open-source OpenCV333https://learnopencv.com/face-morph-using-opencv-cpp-python/, commercial-of-the-shelf (COTS) FaceMorpher444https://www.luxand.com/facemorpher/, and online-tool Webmorph555https://webmorph.org/ landmark-based morphing algorithms. Note the FaceMorpher from SYN-MAD 2022 is not the same as the FaceMorpher from [3] which is another landmark-based open-source face morphing algorithm of the same name. The two GAN-based algorithms are the MIPGAN-I and MIPGAN-II models from [2]. We run the DiM algorithm on this dataset using variants A and C from [3] on the same 500 pairs to evaluate against previous Diffusion-based work. The OpenCV morphs from the SYN-MAD 2022 dataset consist of only 489 morphs due to technical issues with the other 11 morphs. To ensure a fair comparison all evaluation is done on this subset of the SYN-MAD 2022 dataset.
We align all the bona fide images from FRLL and the landmark-based morphs are aligned and cropped to the face using the dlib library based on the alignment pre-processing used to create FFHQ dataset [9]. As the MIPGAN and DiM algorithms use the alignment script when creating their morphs, it was not necessary to re-run the alignment script on the morphs created from these algorithms.
III-B Face Recognition Systems
Three publicly available FR systems were used to evaluate the effectiveness of the face morphing attacks. In particular, the ArcFace666https://github.com/deepinsight/insightface [10], AdaFace777https://github.com/mk-minchul/AdaFace [11], and ElasticFace888https://github.com/fdbtrs/ElasticFace [8] FR systems were used. All systems convert the input image into an embedding within a high-dimensional vector space. The distance between the embeddings of the probe and target images are then compared to determine if the probe image belongs to the same identity as the target image. If this distance is sufficiently “small”, the probe image is said to belong to the same identity as the target image.
The ArcFace system is based on the Improved ResNet (IResNet-100) architecture trained on the MS1M-RetinaFace dataset999https://github.com/deepinsight/insightface. The IResNet architecture is able to improve from the baseline ResNet architecture without increasing the number of parameters or computational costs. The ArcFace system uses an additive angular margin loss which aims to enforce intra-class compactness and inter-class distance.
ElasticFace [8] builds upon the work of ArcFace by relaxing the fixed penalty margin used by ArcFace and proposes to use an elastic penalty margin loss. These improvements allow ElasticFace to achieve state-of-the-art performance. The ElasticFace system used in this paper is based on the IResNet-100 architecture trained on the MS1M-ArcFace dataset101010See Footnote 9..
AdaFace uses an adaptive margin loss by approximating the image quality with feature norms [11]. This approximation of image quality is used to give less weight to misclassified samples during training that have “low” quality. This improvement to the loss allows the system to achieve state-of-the-art recognition performance. The AdaFace system used in this paper is based on the IResNet-100 architecture trained on the MS1M-ArcFace dataset.
All three FR systems require an input of pixels. Every image is resized such that the shortest side of the image is 112 pixels long. The resulting image is then cropped into a pixel grid. The image is then normalized so the pixels take values in . Lastly, the AdaFace system was trained on BGR images so the image tensor is shuffled from the RGB to the BGR format for the AdaFace system.

III-C Metrics
The Mated Morph Presentation Match Rate (MMPMR) metric is widely used as a measure of vulnerability of FR systems when facing morphing attacks. The MMPMR metric proposed by Scherhag et al. [12] is defined as
(5) |
where is the similarity score of the -th subject of morph , is the total number of morphed images, is the verification threshold, and is the total number of subjects contributing to morph . In practice the verification threshold is set to achieve a pre-specified False Match Rate (FMR) for the given FR system. The similarity score, or conversely distance score, is a measure of the difference between the embeddings for the morphed image and bona fide image. For our experiments we use the cosine distance to measure the distance between embeddings.
Morphing Attack Potential (MAP) is an extension on the MMPMR metric proposed by Ferrara et al. [13] which aims to provide a more comprehensive assessment of the risk a particular morphing attack poses to FR systems. The MAP metric is a matrix such that denotes the proportion of morphed images that successfully trigger a match decision against at least attempts for each contributing subject by at least of the FR systems [13]. Since the SYN-MAD 2022 only has one probe image per subject, as we exclude the bona fide image used in the creation of the morph, we simply report which still provides insight into the generality of a morphing attack.
Additionally, we use the Learned Perceptual Image Patch Similarity (LPIPS) [14] metric to assess the similarity between images. LPIPS computes the similarity between the activations of two images for some neural network. This measure has been shown to correlate well with human assessment of image similarity [14]. We use the VGG network as the backbone for the LPIPS metric in our experiments.
IV Fast-DiM
We present our novel morphing algorithm, Fast-DiM, as a series of design considerations and changes from the original DiM model, which we summarize in Table I. In our initial design exploration we found that DiM-A outperformed DiM-C slightly, in contrast with the recommendation of [3]. This could be in part due to differences in FR systems as we chose a more modern set of FR systems to evaluate on. Nevertheless, because of this initial strength of DiM-A over DiM-C in our own testing, we start from developing our Fast-DiM model from DiM-A. For our experiments we measure the MMPMR values on the SYN-MAD 2022 dataset across the ArcFace, ElasticFace, and AdaFace FR systems in addition to reporting the NFE for each model. Note, the False Match Rate (FMR) is set at 0.1% for each FR system when reporting the MMPMR.
IV-A The ODE Solver
We begin by swapping out the DDIM ODE solver for a faster ODE solver as Preechakul et al. [7] recommend 100 iterations using the DDIM solver. Thankfully, much research has been done on developing numerical ODE solvers which can solve the PF-ODE, see Equation 4, with fewer steps. Lu et al. [15] proposed the DPM++ solver which is a high-order multi-step ODE solver developed specifically for solving the PF-ODE. We implement the DPM++ 2M solver to work with the U-Net from the Diffusion Autoencoder model to allow for faster sampling. DPM++ 2M is a second-order multi-step solver which achieves state-of-the-art performance compared to other ODE solvers [15]. Following the algorithm outlined in Algorithm 1, the original DDIM solver is swapped out with the DPM++ 2M solver. Figure 2111111Note that for illustrative purposes our figures use the DiM-C variant as it has less visible artefacts. provides an illustration of two morphs, one generated with the original DDIM solver with steps and the other generated using the DPM++ 2M solver with steps. Remarkably, there is little difference upon visual inspection, providing great hope that the DPM++ 2M solver can simply be used in lieu of the DDIM solver.
MMPMR() | ||||
---|---|---|---|---|
ODE Solver | NFE() | AdaFace | ArcFace | ElasticFace |
DDIM | 100 | 92.23 | 90.18 | 93.05 |
DPM++ 2M | 50 | 92.02 | 90.18 | 93.05 |
DPM++ 2M | 20 | 91.62 | 89.98 | 93.25 |
The impact of NFE and the choice of ODE solver on the potency of a morph is outlined in Table II. Noticeably, there is little to no reduction in MMPMR across all three FR systems when using DPM++ 2M with steps versus the DDIM solver with steps, meaning that switching to this solver is a straightforward improvement in NFE while essentially sacrificing no morphing performance. However, there is a slight reduction in MMPMR values when using steps. As such we recommend the DPM++ 2M solver with as it reduces NFE by half while requiring no compromise in morphing performance.



IV-B Noise Injection
Next we ask if it is even necessary to use the forward ODE solver, , or if we can simply start the morph from some random noise, . As is intended to contain all the semantic details, we wonder if all the information necessary to create an image which would fool an FR system was contained within or if it was also contained in . In Figure 3 we illustrate the differences between the constructed from running the DiffAE forward solver, Equation (8) in [7], versus simply sampling white noise. Interestingly, a silhouette of the head is clearly visible in the encoded ; moreover, there exists bands of relative uniformity emanating from this silhouette. Clearly, the output of this forward solver is not the unit Gaussian we would expect for the formulation in Equation 1. However, as evidenced by the second row, this deviation from pure white noise is what enables the excellent reconstruction abilities of the Diffusion Autoencoder. In the second row of Figure 3 the output of the Diffusion Autoencoder model is shown. As expected the image using the from the DiffAE forward solver has a more faithful reconstruction whereas the image generated from white noise has noticeable variations. The primary question is if these variations are enough to cause an FR system to reject the image.
In addition to investigating starting from pure white noise, we also examine only adding partial noise to the image. From Equation 2 we can sample arbitrary levels of added noise by following the noise schedule, . Therefore, instead of starting the Diffusion model from , we could start from for some . In Figure 4 we illustrate the morphing process. First we perform a pixel-wise average of the two aligned bona fide images. We then inject noise in accordance to the noise schedule at time to get a noisy version of the pixel-wise morph. The Diffusion model is then run from time back to to remove the added noise. Note, the model is still conditioned on the morphed latent representation. The goal is that the added noise can mask some of the artefacts from a pixel-wise average while retaining some of the low-frequency information that could be helpful to the generative process.
Noise Level | AdaFace | ArcFace | ElasticFace |
---|---|---|---|
1.0 | 4.5 | 3.48 | 2.04 |
0.6 | 9.41 | 6.75 | 4.91 |
0.5 | 15.13 | 12.27 | 9.41 |
0.4 | 27.61 | 21.68 | 21.27 |
0.3 | 45.81 | 40.49 | 37.01 |
In Table III we present the MMPMR values associated with this technique at various noise levels. We define the noise level to be for any given timestep . We discover that the forward ODE solver is paramount to the success of creating high quality Diffusion morphs, as evidenced by the abysmal MMPMR numbers resulting from white noise over the encoded . We notice that as the noise level decreases the MMPMR does improve; however, we attribute this to the morphed images converging to the pixel-wise average rather than any merit to this particular idea. While upon visual inspection we find images generated from white noise to have less artefacts in the morphed images and to retain more high-frequency details, we conclude that to create an effective morph using Diffusion Autoencoders, it is necessary to calculate through some forward ODE solver.
LPIPS() | MSE() | |||||||
---|---|---|---|---|---|---|---|---|
Forward ODE Solver | ||||||||
DiffAE, Equation 9 | 0.2370 | 0.1404 | 0.1211 | 0.1113 | 0.0037 | 0.0020 | 0.0014 | 0.0010 |
DDIM, Equation 11 | 0.2953 | 0.1843 | 0.1173 | 0.0760 | 0.0055 | 0.0015 | 0.0005 | 0.0004 |
DPM++ 2M, Section IV-C | 0.3247 | 0.1159 | 0.1120 | 0.0752 | 0.0082 | 0.0009 | 0.0005 | 0.0004 |
IV-C Solving the Forward ODE
Motivated by our findings in Section IV-B we decide to explore the forward ODE solver to see if we can achieve a reduction in NFE from this encoding. The goal of the forward ODE solver is to find an such that when used as the starting point for the Diffusion model the output is . In order to discuss the forward ODE solver we briefly revisit the DDIM solver used to sample Diffusion models. Using the conventions of Lu et al. [15] the original DDIM update equation can be written as follows
(6) |
where is the data prediction model which can be found from the noise prediction model via
(7) |
and and is the log Signal to Noise Ratio (log-SNR). While Equation 6 provides a way to estimate from , the goal of the forward ODE solver is to find from , i.e., to move forward in time. The forward update equation used by Diffusion Autoencoders and DiM can be found by rearranging Equation 6 to find the next sample from .
(8) |
However, this formulation clearly can’t work for the forward pass as the calculation of depends on an evaluation of the data prediction model on . Preechakul et al. [7] remedy this by evaluating the network on instead, turning Equation 8 into
(9) |
Note, for more consistent notation we wrote Equation 9 in terms of finding from .
Doubtful of the validity substitution used to construct Equation 9 from Equation 8, we propose an alternative formulation to solving the forward ODE. We observe that the aim of “stochastic encoder” from [3, 7] is quite similar to solving the PF-ODE, Equation 4, as time runs forward from to . We propose to instead construct an additional ODE solver with the initial condition that solves the PF-ODE forwards in time. We propose two formulations: one using the first-order single-step DDIM solver, and the other using the second-order multi-step DPM++ 2M solver.
In Proposition 4.1 of [15] Lu et al. show that an exact solution of the PF-ODE is given by
(10) |
given some initial value and where is a change of variables from time to log-SNR . Setting and we construct a first-order approximation of Equation 10
(11) |
This first-order approximation of Equation 10 is the update equation of the DDIM solver as time runs forward, so we call it the DDIM solver for the forward ODE.
While at first glance this may appear similar to the DiffAE forward solver, we show that the local difference at step between our solver and the DiffAE forward solver is given by the following equation
(12) |
where is the network evaluation at time . This reveals that while similar in goal, our strategy is meaningfully different from that in [7].

Following the derivations of Lu et al. [15] we construct a second-order multi-step approximation of Equation 10 as time runs forward:
(13) |
This set of equations represent the heart of DPM++ 2M solver, hence we call this the DPM++ 2M solver for the forward ODE.
We believe that our formulations for solving the forward ODE are more principled than the one in [7] and we disagree with the validity of the substitution used to find Equation 9. To verify our theoretical intuition we experimentally compare our two formulations for the forward ODE solver against the DiffAE forward solver. First, we assess the impact the forward ODE solver plays on the reconstruction ability of the Diffusion Autoencoder. In Table IV we measure the LPIPS metric and Mean Squared Error (MSE) between the original and reconstructed images from the FRLL dataset. We use the same DPM++ 2M PF-ODE solver across all three different forward ODE solvers. We find that our proposed formulations vastly outperform the DiffAE forward solver in both LPIPS and MSE metrics. Interestingly, the DiffAE forward solver performs best at steps, however, the reconstruction error is too high across all three solvers to be useful for autoencoding or morphing applications. We notice the DPM++ 2M forward ODE solver at steps achieves almost the same performance as the DiffAE forward solver at steps. Our proposed implementation allows us to cut the NFE down by 200 to keep similar performance or down by 150 to achieve superior performance.
MMPMR() | ||||
---|---|---|---|---|
ODE Solver | NFE() | AdaFace | ArcFace | ElasticFace |
DiffAE | 250 | 92.02 | 90.18 | 93.05 |
DDIM | 100 | 91.82 | 88.75 | 91.21 |
DPM++ 2M | 100 | 90.59 | 87.12 | 90.8 |
DDIM | 50 | 89.78 | 86.3 | 89.37 |
DPM++ 2M | 50 | 90.18 | 86.5 | 88.96 |
In addition, we evaluate the impact the choice of forward ODE solver has on morphing performance. In Table V we measure the MMPMR with different forward ODE solvers using the same DPM++ 2M PF-ODE solver with steps for all experiments. Note, we report the NFE only for solving the forward ODE. Unfortunately, the superior autoencoding performance of the DDIM and DPM++ 2M ODE solvers does not seem to be reflected in the MMPMR numbers with the MMPMR experiencing a slight drop in performance across all three FR systems. Interestingly, while DPM++ 2M slightly outperforms DDIM at steps which aligns with our experimental observations in Table V, we find that DDIM actually outperforms DPM++ 2M at . In light of these results, we recommend the DDIM solver with steps as the forward ODE solver, because it greatly reduces the NFE with only a slight decrease in MMPMR.
Figure 5 illustrates the impact on the morphing process the different forward ODE solvers play. We notice that DDIM produces sharper and crisper images than DiffAE, preserving more of the high-frequency content of the original bona fide images. DPM++ 2M at steps begins to create noticeable artefacts in the image which is only amplified when we reduced the number of steps to . This visual assessment seems to roughly correlate with the performance observed in Table V; however, our human assessment seems to favor the morphs using the DDIM solver over the DiffAE solver.
V Results
For completeness we compare our Fast-DiM algorithm against other face morphing algorithms on the SYN-MAD 2022 dataset. We compare against three landmark-based techniques: OpenCV, Webmorph, and FaceMorpher; in addition to the MIPGAN-I, MIPGAN-II, DiM-A, and DiM-C morphing attacks. We denote our proposed model from Section IV-A as Fast-DiM and our proposed model from Section IV-C as Fast-DiM-ode as it uses the forward ODE solver. The Fast-DiM model uses steps with the DPM++ 2M solver and the DiffAE forward solver with . Likewise, the Fast-DiM-ode model uses the same PF-ODE solver, but the DDIM forward ODE solver with .
For DiM models we chose to report the total NFE across both solving the forward ODE and PF-ODE as rather than , as one can simply batch the encoding of the two bona fide images into a single image tensor, exchanging time for memory. We believe is a better representation of the NFE for these models as it represents the minimal NFE. The MIPGAN family of models use 150 optimization steps when constructing the morphed image, so we report NFE = 150 for the MIPGAN models.
MMPMR() | ||||
---|---|---|---|---|
Morphing Attack | NFE() | AdaFace | ArcFace | ElasticFace |
FaceMorpher [5] | - | 89.78 | 87.73 | 89.57 |
Webmorph [5] | - | 97.96 | 96.93 | 98.36 |
OpenCV [5] | - | 94.48 | 92.43 | 94.27 |
MIPGAN-I [2] | 150 | 72.19 | 77.51 | 66.46 |
MIPGAN-II [2] | 150 | 70.55 | 72.19 | 65.24 |
DiM-A [3] | 350 | 92.23 | 90.18 | 93.05 |
DiM-C [3] | 350 | 89.57 | 83.23 | 86.3 |
Fast-DiM | 300 | 92.02 | 90.18 | 93.05 |
Fast-DiM-ode | 150 | 91.82 | 88.75 | 91.21 |
V-A Vulnerability
Table VI provides the MMPMR at FMR = 0.1% for all the evaluated morphing attacks and report the NFE, if applicable. We notice that, unsurprisingly, the landmark-based morphing attacks are highly effective, often outshining their representation-based counterparts. This trend has been noticed in prior works and is consistent with the current state of face morphing research [1, 2, 5, 3]. Fast-DiM represents only the slightest decline in performance from DiM-A on the AdaFace FR system but otherwise retains the excellent performance for a representation-based attack. Fast-DiM-ode, however, experiences a decline in MMPMR of no more than 1.6% while still maintaining superior performance to DiM-C and the MIPGAN models. It even outperforms the landmark-based COTS FaceMorpher attack. Fast-DiM, and Fast-DiM-ode, manage to outperform all other studied representation-based attacks with the exception of DiM-A.
Number of FR Systems | ||||
---|---|---|---|---|
Morphing Attack | NFE() | 1 | 2 | 3 |
FaceMorpher [5] | - | 92.23 | 89.57 | 85.28 |
Webmorph [5] | - | 98.77 | 98.36 | 96.11 |
OpenCV [5] | - | 97.55 | 93.87 | 89.78 |
MIPGAN-I [2] | 150 | 85.07 | 72.39 | 58.69 |
MIPGAN-II [2] | 150 | 80.37 | 69.73 | 57.87 |
DiM-A [3] | 350 | 96.93 | 92.43 | 86.09 |
DiM-C [3] | 350 | 92.84 | 87.53 | 78.73 |
Fast-DiM | 300 | 97.14 | 92.43 | 85.69 |
Fast-DiM-ode | 150 | 95.91 | 91.21 | 84.66 |
We present the values for the different morphing attacks in Table VII. We observe that Fast-DiM achieves slightly higher performance in fooling a single FR system than DiM-A; however, it quickly loses out in the case of multiple FR systems. The Fast-DiM and Fast-DiM-ode models outperform all representation-based morphing attacks other than DiM-A and even outperform the FaceMorpher attack. Despite the dramatic reduction in NFE, the Fast-DiM and Fast-DiM-ode models manage to pose a potent threat to FR systems, managing to fool all three FR systems at least 84% of the time.
Dataset-A | Dataset-B | Dataset-C | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
APCER @ BPCER() | APCER @ BPCER() | APCER @ BPCER() | ||||||||||
Morphing Attack | EER() | 0.1% | 1.0% | 5.0% | EER() | 0.1% | 1.0% | 5.0% | EER() | 0.1% | 1.0% | 5.0% |
FaceMorpher [5] | 0 | 0.49 | 0 | 0 | 91.67 | 100 | 100 | 100 | 5.39 | 46.08 | 16.18 | 5.88 |
OpenCV [5] | 0 | 0 | 0 | 0 | 46.57 | 100 | 97.55 | 89.71 | 0 | 0 | 0 | 0 |
Webmorph [5] | 0 | 0 | 0 | 0 | 46.57 | 99.51 | 98.04 | 91.18 | 0 | 0 | 0 | 0 |
MIPGAN-I [2] | 2.94 | 17.65 | 9.31 | 1.47 | 0.98 | 3.43 | 0.98 | 0.49 | 0 | 0.49 | 0 | 0 |
MIPGAN-II [2] | 3.92 | 69.61 | 17.16 | 2.45 | 0.98 | 2.45 | 0.98 | 0.49 | 0 | 0 | 0 | 0 |
DiM-A [3] | 34.31 | 96.57 | 91.18 | 77.94 | 75.49 | 100 | 100 | 99.02 | 0.98 | 5.88 | 1.47 | 0 |
DiM-C [3] | 27.94 | 92.16 | 87.25 | 64.71 | 66.18 | 100 | 99.51 | 98.04 | 0 | 0.49 | 0 | 0 |
Fast-DiM | 46.57 | 98.53 | 98.04 | 89.22 | 68.14 | 100 | 100 | 99.02 | 1.47 | 5.88 | 2.94 | 0 |
Fast-DiM-ode | 39.71 | 96.08 | 92.16 | 79.41 | 70.59 | 100 | 100 | 99.51 | 2.45 | 7.35 | 5.39 | 0.49 |
V-B Detectability Study
To study the detectability of the Fast-DiM attacks, we implement an S-MAD detector trained on various morphing attacks. We follow the approach of [3] in designing our detectability study and use a SE-ResNeXt101-32x4d network pre-trained on the ImageNet dataset by NVIDIA as the backbone for our S-MAD detector. The SE-ResNeXt101-32x4d is a state-of-the-art image recognition model based on the ResNeXt architecture with additional squeeze and excitation layers added. We employ a stratified -fold cross validation strategy in performing the detectability study to ensure fair reporting of the results and to preserve the class balance between morphed and bona fide images in each fold. We opt to use for our experiments. We fine tune our detection model on three different subsets of the SYN-MAD 2022 dataset, each representing a different scenario for a potential S-MAD algorithm. We enumerate these datasets as
-
1.
Dataset-A: consisting of FaceMorpher, OpenCV, and Webmorph.
-
2.
Dataset-B: consisting of MIPGAN-I and MIPGAN-II.
-
3.
Dataset-C: consisting of OpenCV, MIPGAN-II, and DiM-C morphs.
We develop Dataset-A to illustrate an S-MAD algorithm trained on landmark-based attacks which may reflect an older S-MAD system. We then develop Dataset-B to illustrate an S-MAD algorithm only trained on GAN-based attacks. Lastly, we present Dataset-C to illustrate a realistic scenario for a strong S-MAD algorithm wherein the S-MAD algorithm is trained on a blend of different morphing attacks, one landmark-based, one GAN-based, and one Diffusion-based. As we use a powerful pre-trained SE-ResNeXt101-32x4d model as our backbone, we only fine tune for 3 epochs and employ an exponential learning rate scheduler with differential learning rates to combat any potential overfitting of the model. Additionally, we use a label smoothing with rate in our cross entropy loss function to further combat overfitting. The S-MAD detection algorithm achieves a minimum of 98% class balanced accuracy on each training fold before evaluating. Importantly, with our -fold strategy none of the bona fide images and morphs made from those bona fides used during training are used in evaluation.
To assess the detectability of the morphing attacks by the S-MAD system, we measure the Attack Presentation Classification Error Rate (APCER) and Bona fide Presentation Classification Error Rate (BPCER), in addition to the Equal Error Rate (EER). In Table VIII we present the results of the detectability study of the different morphing attacks evaluated against the S-MAD system trained on datasets, A, B, and C. In Dataset-A we observe that just like the DiM models the Fast-DiM models are very difficult to detect when no DiM variant is present in the training algorithm, potentially posing a grave threat to pre-existing S-MAD systems. Surprisingly, the MIPGAN variants are easily detected in this scenario as well even though the S-MAD model was only trained on landmark-based morphs. In a similar vein, when training on the MIPGAN variants, Dataset-B, the performance of the S-MAD detector is abysmal with high error rates across the board with exception of the MIPGAN training set. Interestingly, the FaceMorpher attack does exceedingly well in this scenario, which may be due in part to the uniqueness of the proprietary technique deployed by this morphing attack. Lastly, in our most challenging scenario found in Dataset-C we observe that the inclusion of a DiM-C into the training set greatly reduced the effectiveness of all morphing attacks, showing that training on a diverse set of high quality morphing attacks is essential to achieving state-of-the-art S-MAD performance. The FaceMorpher attack does better than rest in this scenario as well which we attribute to the same reasoning from before. Both Fast-DiM variants have slightly higher error rates than their DiM counterparts. We believe this is due to the difference in ODE solvers which we observed that give the images a sharper appearance compared to the more blurry appearance of the DiM models, see Figure 5 for an illustration.
VI Conclusion
In this paper we have introduced Fast-DiM, an approach for generating high quality face morphs with lower NFE than existing models. We have empirically demonstrated that our proposed model can use fewer NFE than previous Diffusion-based methods for face morphing while remaining a potent representation-based morphing attack. We have shown that by replacing the DDIM PF-ODE solver with the DPM++ 2M PF-ODE solver in combination with solving the PF-ODE as time runs forwards using DDIM, we can achieve a remarkable reduction in NFE over prior methods. Our results show that we can cut the NFE for solving the PF-ODE in half while retaining the same high quality morphing performance and that we can achieve an upwards of reduction in NFE for solving the PF-ODE as time runs forwards with only a maximal reduction in MMPMR. We hope that this work will enable future exploration on techniques that leverage the iterative process of Diffusion models for face morphing that were once prohibitive due to the high computational demands of the previous methods.
Acknowledgment
This material is based upon work supported by the Center for Identification Technology Research and National Science Foundation under Grant #1650503.
References
- [1] Z. Blasingame and C. Liu, “Leveraging adversarial learning for the detection of morphing attacks,” 2021 IEEE International Joint Conference on Biometrics (IJCB), pp. 1–8, 2021.
- [2] H. Zhang, S. Venkatesh, R. Ramachandra, K. Raja, N. Damer, and C. Busch, “Mipgan—generating strong and high quality morphing attacks using identity prior driven gan,” IEEE Transactions on Biometrics, Behavior, and Identity Science, vol. 3, no. 3, pp. 365–383, 2021.
- [3] Z. W. Blasingame and C. Liu, “Leveraging diffusion for strong and high quality face morphing attacks,” IEEE Transactions on Biometrics, Behavior, and Identity Science, vol. 6, no. 1, pp. 118–131, 2024.
- [4] L. DeBruine and B. Jones, “Face Research Lab London Set,” 5 2017. [Online]. Available: https://figshare.com/articles/dataset/Face_Research_Lab_London_Set/5047666
- [5] M. Huber, F. Boutros, A. T. Luu, K. Raja, R. Ramachandra, N. Damer, P. C. Neto, T. Gonçalves, A. F. Sequeira, J. S. Cardoso, J. Tremoço, M. Lourenço, S. Serra, E. Cermeño, M. Ivanovska, B. Batagelj, A. Kronovšek, P. Peer, and V. Štruc, “Syn-mad 2022: Competition on face morphing attack detection based on privacy-aware synthetic training data,” in 2022 IEEE International Joint Conference on Biometrics (IJCB), 2022, pp. 1–10.
- [6] Y. Song, J. Sohl-Dickstein, D. P. Kingma, A. Kumar, S. Ermon, and B. Poole, “Score-based generative modeling through stochastic differential equations,” in International Conference on Learning Representations, 2021. [Online]. Available: https://openreview.net/forum?id=PxTIG12RRHS
- [7] K. Preechakul, N. Chatthee, S. Wizadwongsa, and S. Suwajanakorn, “Diffusion autoencoders: Toward a meaningful and decodable representation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2022, pp. 10 619–10 629.
- [8] F. Boutros, N. Damer, F. Kirchbuchner, and A. Kuijper, “Elasticface: Elastic margin loss for deep face recognition,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, June 2022, pp. 1578–1587.
- [9] T. Karras, S. Laine, and T. Aila, “A style-based generator architecture for generative adversarial networks,” in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 4396–4405.
- [10] J. Deng, J. Guo, N. Xue, and S. Zafeiriou, “Arcface: Additive angular margin loss for deep face recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 4690–4699.
- [11] M. Kim, A. K. Jain, and X. Liu, “Adaface: Quality adaptive margin for face recognition,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022.
- [12] U. Scherhag, A. Nautsch, C. Rathgeb, M. Gomez-Barrero, R. N. J. Veldhuis, L. Spreeuwers, M. Schils, D. Maltoni, P. Grother, S. Marcel, R. Breithaupt, R. Ramachandra, and C. Busch, “Biometric systems under morphing attacks: Assessment of morphing techniques and vulnerability reporting,” in 2017 International Conference of the Biometrics Special Interest Group (BIOSIG), 2017, pp. 1–7.
- [13] M. Ferrara, A. Franco, D. Maltoni, and C. Busch, “Morphing attack potential,” in 2022 International Workshop on Biometrics and Forensics (IWBF), 2022, pp. 1–6.
- [14] R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and O. Wang, “The unreasonable effectiveness of deep features as a perceptual metric,” in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018, pp. 586–595.
- [15] C. Lu, Y. Zhou, F. Bao, J. Chen, C. Li, and J. Zhu, “Dpm-solver++: Fast solver for guided sampling of diffusion probabilistic models,” 2023.