VerifyML: Obliviously Checking Model Fairness Resilient to Malicious Model Holder
Abstract
In this paper, we present VerifyML, the first secure inference framework to check the fairness degree of a given Machine learning (ML) model. VerifyML is generic and is immune to any obstruction by the malicious model holder during the verification process. We rely on secure two-party computation (2PC) technology to implement VerifyML, and carefully customize a series of optimization methods to boost its performance for both linear and nonlinear layer execution. Specifically, (1) VerifyML allows the vast majority of the overhead to be performed offline, thus meeting the low latency requirements for online inference. (2) To speed up offline preparation, we first design novel homomorphic parallel computing techniques to accelerate the authenticated Beaver’s triple (including matrix-vector and convolution triples) generation procedure. It achieves up to computation speedup and gains at least less communication overhead compared to state-of-the-art work. (3) We also present a new cryptographic protocol to evaluate the activation functions of non-linear layers, which is – faster and has lesser communication than existing 2PC protocol against malicious parties. In fact, VerifyML even beats the state-of-the-art semi-honest ML secure inference system! We provide formal theoretical analysis for VerifyML security and demonstrate its performance superiority on mainstream ML models including ResNet-18 and LeNet.
Index Terms:
Privacy Protection, Deep Learning, Cryptography.1 Introduction
Machine learning (ML) systems are increasingly being used to inform and influence people’s decisions, leading to algorithmic outcomes that have powerful implications for individuals and society. For example, most personal loan default risks are calculated by automated ML tools. This approach greatly speeds up the decision-making process, but as with any decision-making algorithm, there is a tendency to provide accurate results for the majority, leaving certain individuals and minority groups disadvantaged [1, 41]. This problem is widely defined as the unfairness of the ML model. It often stems from the underlying inherent human bias in the training samples, and a trained ML model amplifies this bias to the point of causing discriminatory decisions about certain groups and individuals.
Actually, the unfairness of ML model entangles in every corner of society, not only being spied on in financial risk control. A prime example comes from COMPAS [18], an automated software used in US courts to assess the probability of criminals reoffending. A investigation of the software reveals a bias against African-Americans, i.e., COMPAS having a higher false positive rate for African-American offenders than white criminals, owing to incorrectly estimating their risk of reoffending. Similar model decision biases pervade other real-world applications including childcare systems [7], employment matching [33], AI chatbots, and ad serving algorithms [16]. As mentioned earlier, these resulting unfair decisions stem from neglected biases and discrimination hidden in data and algorithms.
To alleviate the above problems, a series of recent works [4, 24, 34, 32, 31] have proposed for formalizing measures of fairness for classification models, as well as their variants, in aim to provide instructions for verifying the fairness of a given model. Several evaluation tools have also been released that facilitate automated checks for discriminatory decisions in a given model. For example, Aequitas [36] as a toolkit provides testing of models against several bias and fairness metrics corresponding to different population subgroups. It feeds back test reports to developers, researchers and governments to assist them in making conscious decisions to avoid tending to harm specific population groups. IBM also offers a toolkit AI Fairness 360 [3], which aims to bringing fairness research algorithms to the industrial setting, creating a benchmark where all fairness algorithms can be evaluated, and providing an environment for researchers to share their ideas.
Existing efforts in theory and tools have led the entire research community to work towards unbiased verification of the ML model fairness. However, existing verification mechanisms either require to white-box access the target model or require clients to send queries in plaintext to the model holder, which is impractical as it incurs a range of privacy concerns. Specifically, model holders are often reluctant to disclose model details because training a commercial model requires a lot of human cost, resources, and experience. Therefore, ML models, as precious intellectual property rights, need to be properly protected to ensure the company’s competitiveness in the market. On the other hand, the queries that clients used to test model fairness naturally contain sensitive information, including loan records, disease history, and even criminal information. These highly private data should clearly be guaranteed confidentiality throughout the verification process. Hence, these requirements for privacy raises a challenging but meaningful question:
Can we design a verification framework that only returns the fairness of the model to the client and the parties cannot gain any private information?
We materialize the above question to a scenario where a client interacts with the model holder to verify the fairness of the model. Specifically, before using the target model’s inference service, the client sends a set of queries for testing fairness to the model holder, which returns inference results to the client enabling it to locally evaluate how fair the model is. In such a scenario, the client is generally considered to be semi-honest since it needs to evaluate the model correctly for subsequent service. The model holder may be malicious, it may trick the client into believing that the model is of high fairness by arbitrarily violating the verification process. A natural solution to tackle such concerns is to leverage state-of-the-art generic 2PC tools [22, 20, 6] that provide malicious security. It guarantees that if either entity behaves maliciously, they will be caught and the protocol aborted, protecting privacy. However, direct grafting of standard tools incurs enormous redundant overhead, including heavy reliance on zero-knowledge proofs [11], tedious computational authentication and interaction [15] (see Section 3 for more details).
To reduce the overhead, we propose VerifyML, a 2PC-based secure verification framework implemented on the model holder-malicious threat model. In this model, the client is considered semi-honest but the model holder is malicious and can arbitrarily violate the specification of the protocol. We adaptively customize a series of optimization methods for VerifyML, which show much better performance than the fully malicious baseline. Our key insight is to move the vast majority of operations to the client to bypass cumbersome data integrity verification and reduce the frequency of interactions between entities. Further, we design highly optimized methods to perform linear and nonlinear layer functions for ML, which brings at least speedup compared to state-of-the-art techniques. Overall, our contributions are as follows:
-
We leverage the hybrid combination of HE-GC to design VerifyML. In VerifyML, the execution of ML’s linear layer is implemented by homomorphic encryption (HE) while the non-linear layer is performed by the garbled circuit (GC). VerifyML allows more than 95% of operations to be completed in the offline phase, thus providing very low latency in the online inference phase. Actually, VerifyML’s online phase even beats DELPHI [29], the state-of-the-art scheme for secure ML inference against only semi-honest adversaries.
-
We design a series of optimization methods to reduce the overhead of the offline stage. Specifically, we design new homomorphic parallel computation methods, which are used to generate authenticated Beaver’s triples, including matrix-vector and convolution triples, in a Single Instruction Multiple Data (SIMD) manner. Compared to existing techniques, we generate triples of matrix-vector multiplication without any homomorphic rotation operation, which is very computationally expensive compared to other homomorphic operations including addition and multiplication. Besides, we reduce the communication complexity of generating convolution triples (aka matrix multiplication triples) from cubic to quadratic with faster computing performance.
-
We design computationally-friendly GC to perform activation functions of nonlinear layers (mainly ReLU). Our key idea is to minimize the number of expensive multiplication operations in the GC. Then, we use the GC as a one-time pad to simplify verifying the integrity of the input from the server. Compared to the state-of-the-art works, our non-linear layer protocol achieves at least an order of magnitude performance improvement.
-
We provide formal theoretical analysis for VerifyML security and demonstrate its performance superiority on various datasets and mainstream ML models including ResNet-18 and LeNet. Compared to state-of-the-art work, our experiments show that VerifyML achieves up to computation speedup and gains at least less communication overhead for linear layer computation. For non-linear layers, VerifyML is also – faster and has lesser communication than existing 2PC protocol against malicious parties. Meanwhile, VerifyML demonstrates an encouraging online runtime boost by and over existing works on LeNet and ResNet-18, respectively, and at least an order of magnitude communication cost reduction.
2 Preliminaries
2.1 Threat Model
We consider a secure ML inference scenario, where a model holder and a client interact with each other to evaluate the fairness of the target model. In such a model holder-malicious threat model, holds the model while the client owns the private test set used to verify the fairness of the model. The client is generally considered to be semi-honest, that is, it follows the protocol’s specifications in the interaction process for evaluating the fairness of the model unbiased. However, it is possible to infer model parameters by passively analyzing data streams captured during interactions. The model holder is malicious. It may arbitrarily violate the specification of the protocol to trick clients into believing that they hold a high-fairness model. The network architecture is assumed to be known to both and . VerifyML aims to construct such a secure inference framework that enables to correctly evaluate the fairness of model without knowing any details of the model parameters, meanwhile, knows nothing about the client’s input. We provide a formal definition of the threat model in Appendix A.
2.2 Notations
We use and to denote the computational security parameter and the statistical security parameter, respectively. represents the set for . In our VerifyML, all the arithmetic operations are calculated in the field , where is a a prime and we define . This means that there is a natural mapping for elements in to . For example, indicates the -th bit of on this mapping, i.e, . Given two vectors and , and an element , indicates the element-wise addition, and mean that each component of performs addition and multiplication with , respectively. represents the inner production between vectors and . Similarly, given any function , denotes evaluation of on each component on . represents the concatenation of and . is used to represent the uniform distribution on the set for any .
For ease of exposition, we consider an ML model, usually a neural network model , consisting of alternating linear and nonlinear layers. We assume that the specification of the linear layer is and the non-linear layer is . Given an initial input (i.e. query) , the model holder will sequentially execute and . Finally, outputs the inference result .
2.3 ML Fairness Measurement
Let be the set of possible inputs and be the set of all possible labels. In addition, let be a finite set related to fairness (e.g., ethnic group). We assume that is drawn from a probability space with an unknown distribution , and use to denote the model inference result given an input . Based on these, we review the term of the empirical fairness gap (EFG) [38], which is widely used to measure the fairness of ML models against a specific group. To formalize the formulation of EFG, we first describe the definition of conditional risk as follows:
(1) |
Given a set of samples satisfying distribution , is the expectation of the number of misclassified entries in the test set that belong to group , where represents the indicator function with a predicate . Given an independent sample set , the empirical conditional risk is defined as follows:
(2) |
where indicates the number of samples in from group . Then, we describe the term fairness gap (FG), which is used to measure the maximum margin of any two groups, specifically,
(3) |
Likewise, the empirical fairness gap (EFG) is defined as
(4) |
Lastly, we say a ML model is -fair on , if its fairness gap is smaller than with confidence . Formally, a -fair is defined as satisfying the following conditions:
(5) |
In practice, we usually replace FG in Eqn.5 with EFG to facilitate the measurement of fairness. Note that once the client gets enough predictions in the target model, it can locally evaluate the fairness of the model according to Eqn.5.
2.4 Fully Homomorphic Encryption
Let the plaintext space be , informally, a Fully homomorphic encryption (FHE) under the public key encryption system usually contains the following algorithms:
-
. Taking the security parameter as input, is a random algorithm used to output the public key and the corresponding secret key required for homomorphic encryption.
-
. Given and a plaintext , the algorithm outputs a ciphertext encrypting .
-
. Taking and a ciphertext as input, decrypts and outputs the corresponding plaintext .
-
. Given , two ciphertexts and , and a function , the algorithm outputs a ciphertext encrypting .
We require FHE to satisfy correctness, semantic security, and functional privacy111Functional privacy ensures that given a ciphertext , which is an encrypted share of obtained by homomorphically evaluating , is indistinguishable from ciphertext encrypting a share of for any . . In VerifyML, we use the SEAL library [37] to implement the fully homomorphic encryption. In addition, we utilize ciphertext packing technology (CPT) [39] to encrypt multiple plaintexts into a single ciphertext, thus enabling homomorphic computation in a SIMD manner. Specifically, given two plaintext vectors and , we can pack and into ciphertexts and each of them containing plaintext slots. Homomorphic operations between and including addition and multiplication are equivalent to performing the same element-wise operations on the corresponding plaintext slots.
FHE also provides algorithm to handle operations between data located in different plaintext slots. Informally, given a plaintext vector is encrypted into a single ciphertext , transforms into another ciphertext whose encrypted plaintext vector is . In this way, data on different plaintext slots can be moved to the same position to achieve element-wise operations under ciphertext. In FHE, rotation operations are computational expensive compared to homomorphic addition and multiplication operations. Therefore, the optimization criterion for homomorphic SIMD operations is to minimize the number of rotation operations.
2.5 Parallel Matrix Homomorphic Multiplication
We review the parallel homomorphic multiplication method between arbitrary matrices proposed by Jiang et al.[17], which will be used to accelerate the generation of authenticated triples for convolution in VerifyML. We take the homomorphic multiplication of two dimensional matrices as an example. Specifically, given a dimensional matrix , we first define four useful permutations, , , , and , over the field . Let , , and . Then for two square matrices and of order , we can calculate the matrix multiplication between the two by the following formula:
(6) |
where denotes the element-wise multiplication. We provide a toy example of the multiplication of two matrices in Figure 1 for ease of understanding.
We can convert a -dimensional matrix to a vector of length by encoding map : . A ciphertext is said to encrypt a matrix if it encrypts the corresponding plaintext vector . Therefore, given two square matrices and , the multiplication of the two under the ciphertext is calculated as follows:
(7) |

In the following sections, we will use to represents multiplication between any matrixes and in ciphertext. denotes the elewent-wise homomorphic multiplication between two ciphertexts. In Section 4.1.2, we describe how to utilize the parallel homomorphic multiplication described above to boost the generation of authenticated convolution triples.
2.6 Secret Sharing
-
Additive Secret Sharing. Given any , a 2-out-of-2 additive secret sharing of is a pair , where is a random value uniformly selected from , and . Additive secret sharing is perfectly hiding, that is, given a share or , is perfectly hidden.
-
Authenticated Shares. Given a random value (known as the MAC key) uniformly chosen from , for any , the authenticated shares of on denote that each party holds 222Sometimes in we omit for brevity., where we have . While in the general malicious 2PC setting, should be generated randomly through interactions between all parties, in our model holder-malicious model, can be picked up by and secretly shared with . Authenticated sharing provides bits of statistical security. Informally, if a malicious tries to forge the shared to be , by tampering with its shares to , for non-zero , the probability of parties being authenticated to hold the share of (i.e., ) is at most .
2.7 Authenticated Beaver’s Triples
In VerifyML, we require the technique of authenticated Beaver’s triples to detect possible breaches of the protocol from the malicious model holder. In more detail, authenticated Beaver’s multiplication triple is denoted that each holds a tuple , where , , , and satisfy . Giving and holding authenticated shares of and , i.e., (, ), (, ), respectively, to compute the authenticated share of the product of and , the parties first reveal and , and then each party locally computes the authenticated share of as follows:
(8) |
Authenticated Beaver’s multiplication triple is independent of the user’s input in the actual execution of the secure computing protocol, thus can be generated offline (see Section 4) to speed up the performance of online secure multiplication computations. Inspired by existing work to construct custom triples for specific mathematical operations [30] for improving performance, we generalize traditional Beaver’s triples to matrix-vector multiplication and convolution domains. We provide the definitions of matrix-vector and convolution triples below and leave the description of generating them to Section 4.
-
Authenticated Matrix-Vector triples: is denoted that each holds a tuple , where is a matrix uniformly chosen from , represents a vector selected from , and satisfying , where and are determined depending on the ML model architecture.
-
Authenticated Convolution triples (aka matrix multiplication triples333We can reduce the convolution operation to matrix multiplication by transforming the inputs of convolution appropriately. We provide a detailed description in Section 4.): is denoted that each holds a tuple , where and are tensors uniformly chosen from and , respectively. satisfying convolution , where , , , , , and are determined depending on the model architecture.
2.8 Oblivious Transfer
We take OTn to denote the 1-out-of-2 Oblivious Transfer (OT) [13, 10]. In OTn, the inputs of the sender (assuming for convenience) are two strings , and the input of the receiver () is a bit for selection. At the end of the OT-execution, learns while learns nothing. In this paper, we require that the instance of OTn is secure against a semi-honest sender and a malicious receiver. We use OT to represent instances of OTn. We exploit [21] to implement OT with the communication complexity of bits.
2.9 Garbled Circuits
The garbling scheme [35, 8] for boolean circuits parsing arbitrary functions consists of a pair of algorithms (, ) defined as follows:
-
. Giving the security parameter and an arbitrary Boolean circuit , the algorithm outputs a garbled circuit , a set of input labels of this , and a set of output labels , where the size of each label is bits. For any , we refer to as the garbled input of , and as the garbled output of .
-
. Giving the garbled circuit and a set of input labels , the algorithm outputs a label .
Let , the above garbled scheme (, ) is required to satisfy the following properties:
-
Correctness. is faithfully performed on the and correctly outputs garbled results when given the garbled input of . Formally, for any Boolean circuit and input , holds that
-
Security. Given , the garbled circuit of and garbled inputs of any can be simulated by a polynomial probability-time simulator . Formally, for any circuit and input , we have , where indicates computational indistinguishability.
-
Authenticity. This implies that given the garbled input of and , it is infeasible to guess the output label of . Formally, for any circuit and , we have .
Without loss of generality, the garbled scheme described above can be naturally extended to securely implement Boolean circuits with multi-bit outputs. In VerifyML, we utilize state-of-the-art optimization strategies, including point-and-permute [12], free-XOR [23] and half-gates [42] to construct the garbling scheme.
3 Technical Intuition
VerifyML is essentially a 2PC protocol over the model holder-malicious threat model, where the client unbiasedly learns the inference results on a given test set, thereby faithfully evaluating the fairness of the target model locally. For boosting the performance of the 2PC protocol execution, we customize a series of optimization methods by fully exploring the advantages of cryptographic primitives and their natural ties in inference process. Below we present a high-level technically intuitive overview of VerifyML’s design.
3.1 Offline-Online Paradigm
Consistent with state-of-the-art work on the setting of semi-honest models [29], VerifyML is deconstructed into an offline stage and an online stage, where the preprocessing process of the offline stage is independent of the input of model holders and clients. In this way, the majority () of the computation can be performed offline to minimize the overhead of the online process. Figure 2 provides an overview of VerifyML, where we describe the computational parts required for the offline and online phase, respectively.
Offline Phase. This phase the client and model holder pre-compute data in preparation for subsequent online execution, which is independent of input from all parties. That is, VerifyML can run this phase without knowing the client’s input and the model holder’s input . Preprocessing for the linear layer. The Client interacts with the model holder to generate authenticated triples for matrix-vector multiplication and convolution. Preprocessing for the nonlinear layer. The client constructs a garbled circuit for circuit C parsing ReLU. The client sends and a set of ciphertexts to the model holder for generating the authenticated shares of ReLU’s results. |
Online Phase. This phase is divided into following parts. Preamble. The client secretly shares its input with the model holder, and similarly, the model holder shares the model parameter with the client. Thus both the model holder and the client hold an authenticated share of and . Note that the sharing of can be done offline, if the model to be verified is knowed in advance. Layer evaluation. Let be the result of evaluating the first layers of model on . At the beginning of the -th layer, both the client and the model holder hold an authenticated share about and the -th layer parameter , i.e., parties hold . 1. Linear layer . The client interacts with the model holder to perform the authenticated shares of , where both parties securely compute matrix-vector multiplication and convolution operations with the aid of triples generated in the precomputing process. 2. Nonlinear layer. After the linear layer, the two parties hold the authenticated shares of . The client and the model holder invoke the OT to send the garbled input of to the model holder. The model holder evaluates the , and eventually the two parties get authenticated shares of the ReLU result. Consistency check. The client interacts with the model holder to check any malicious behavior of the model holder during the entire inference process. The client uses the properties of the authenticated sharing to construct the consistency check protocol. If consistency passes, the client locally computes the fairness of the target model, otherwise the client outputs abort. |
3.2 Linear Layer Optimization
As described in Figure 2, we move almost all linear operations into the offline phase, where we construct customized triples for matrix-vector multiplication and convolution to accelerate linear execution. Specifically, 1) we design an efficient construction of matrix-multiplication triples instead of generating Beaver’s multiplication triples for individual multiplications (see Section 4.1.1). Our core insight is a new packed homomorphic multiplication method for matrices and vectors. We explore the inherent connection between secret sharing and homomorphic encryption to remove all the rotation operation in parallel homomorphic computation. 2) We extend the idea of generating matrix multiplicative triples over semi-honest models [30] into convolution domain over the model holder-malicious threat model (see Section 4). The core of our construction is derived from E2DM [17], which proposes a state-of-the-art method for parallel homomorphic multiplication between arbitrary matrices. We further optimize E2DM to achieve at least computational speedup compared to naive use.
Our optimization technique for linear layer computation exhibits superior advantages compared to state-of-the-art existing methods [22, 20]444Note that several efficient parallel homomorphic computation methods [19, 43] with packed ciphertext have been proposed and run on semi-honest or client-malicious models [29, 26, 5] for secure inference. It may be possible to transfer these techniques to our method to speed up triple’s generation, but this is certainly non-trivial and we leave it for future work.. In more detail, we reduce the communication overhead from cubic to quadratic (both for offline and online phases) compared to Overdrive [22], which is the mainstream tool for generating authenticated multiplicative triples on malicious adversary models(see Section 4 for detailed analysis).
3.3 Non-linear Layer Optimization
We use the garbled circuit to achieve secure computation of nonlinear functions (mainly ReLU) in ML models. Specifically, assumed that and learn the authenticated sharing about after executing the -th linear layer, that is, each party holds . Then, will be used as the input of ReLU (dented as for brevity) in the -th nonlinear layer for both parties learning the authentication sharing about , i.e., . However, constructing such a satisfactory garbling scheme has the following intractable problems.
-
How to validate input from the malicious model holder. Since the model holder is malicious, it must be ensured that the input from the model holder in the (i.e. ) is consistent with the share obtained by the previous linear layer. In the traditional malicious adversary model [22, 20, 6], a standard approach is to verify the correctness of the authenticated sharing of all inputs from malicious entities in the . However, this is very expensive and takes tens of seconds or even minutes to process a ReLU function. It obviously does not meet the practicality of ML model inference because a modern ML model usually contains thousands of ReLU functions.
-
How to minimize the number of multiplication encapsulated into . For the -th nonlinear layer, we need to compute the authenticated shares of the ReLU output, i.e. . This requires at least two multiplications on the field, if all computations are encapsulated into the . Note that performing arithmetic multiplication operations in the is expensive and requires at least communication overhead.
We design novel protocols to remedy the above problems through the following insights: (1) garbled circuits already achieve malicious security against garbled circuit evaluators (i.e., the model holder in our setting) [26]. This means that we only need to construct a lightweight method to check the consistency between the input of the malicious adversary in the nonlinear layer and the results obtained by the previous linear layer. Then, this method can be integrated with to achieve end-to-end nonlinear secure computing (see Section 4). (2) It is enough to calculate the output label for each bit of ’s share (i.e., , for ) in the GC, rather than obtaining the exact arithmetic share of [5]. Moreover, we can parse ReLU function as , where the sign function equals 1 if and 0 otherwise. Hence, we only encapsulate the non-linear part of (i.e., ) into the , thereby substantially minimizing the number of multiplication operations.
Compared with works [22, 20, 6] with malicious adversary, VerifyML reduces the communication overhead of each ReLU function from to , where . Our experiments show that VerifyML achieves - computation speedup and gains less communication overhead for nonlinear layer computation.
Remark 3.1 . Beyond the above optimization strategies, we also do a series of strategies to reduce the overhead in the implementation process, including removing the reliance on distributed decryption primitives in previous works [22, 20, 6] and minimizing the number of calls to zero-knowledge proofs of ciphertexts. In the following section, we provide a comprehensive technical description of the proposed method.
4 The VerifyML Framework
4.1 Offline Phase
In this section, we describe the technical details of VerifyML. As described above, VerifyML is divided intooffline and online phases. We first describe the operations that need to be precomputed in the offline phase, including generating matrix-vector multiplications and triples for convolution, and garbled circuits for constructing the objective function. Then, we introduce the technical details of the online phase.
4.1.1 Generating matrix-vector multiplication triple
Figure 3 depicts the interaction between the model holder and the client to generate triples of matrix-vector multiplications. Succinctly, first uniformly selects and and sends their encryption to , along with zero-knowledge proofs about these ciphertexts, where need to be transformed into matrix before encryption (step 2 in Figure 3). recovers and in ciphertext and then computes (step 3 in Figure 3). Then it returns the corresponding ciphertexts to . decrypts them and computes , and (step 4 in Figure 3).
Input: holds uniformly chosen from , and uniformly chosen from . In addition, hold a MAC key uniformly chosen from . |
Output: obtains where . |
Procedure: 1. and participate in a secure two-party computation such that obtains an FHE public secret key pair (, ) while obtains the public key . This process is performed only once. 2. first converts into a -dimensional matrix where each row constitutes a copy of . Then, send the encryptions and to along with zero-knowledge (ZK) proofs of plaintext knowledge of the two ciphertexts 444A ZK proof of knowledge for ciphertexts is used to state that and are valid ciphertexts generated from the given FHE cryptosystem. Readers can refer to [22, 6] for more details.. 3. also converts into a -dimensional matrix where each row constitutes a copy of . Then it samples from . sends , , , and to . 4. decrypts , , and to obtain , respectively. Then, it sums the elements of each row of the matrices 555Note that for , we only take the all elements in the first row as by default.The operation for is the same as above., and to form the vectors , and . does the same for to obtain , and . 5. outputs , where . |

Figure 4 provides an example of the multiplication of a -dimensional matrix and a -dimensional vector to facilitate understanding. To compute the additive sharing of (step(a) in Figure 4), is first transformed into a matrix by copying, where each row of contains a copy of . then performs element-wise multiplications (step(b) in Figure 4) for and under the ciphertext. To construct the additive sharing of , uniformly chooses a random matrix and computes (step(c) in Figure 4). sends the ciphertext result to . decrypts it and sums each row in plaintext to obtain vector (step(d) in Figure 4), similarly, performs the same operation on matrix to obtain .
Remark 4.1: Compared to generating multiplication triples for single multiplication [22, 20], our constructed matrix-multiplication triples enable the communication overhead to be independent of the number of multiplications, only related to the size of the input. This reduces the amount of data that needs to be exchanged between and . In addition, we move the majority of the computation to be executed by the semi-honest party, which avoids the need for distributed decryption and frequent zero-knowledge proofs in malicious adversary settings. Compared to existing parallel homomorphic computation methods [17, 14], our matrix-vector multiplication does not involve any rotation operation, which is very computationally expensive compared to other homomorphic operations. This stems from our observation of the inner tie between HE and secret sharing. Since the final ciphertext result needs to be secretly shared to and , we can first perform the secret sharing under the ciphertext (see step(c) and step(d) in Figure 4), and then perform all rotation and summation operations under the plaintext.
Security. Our protocol for generating matrix-vector multiplication triples, , is secure against the malicious model holder and the semi-honest client . We provide the following theorem and prove it in Appendix B.
Theorem 4.1.
Let the fully homomorphic encryption used in have the properties defined in Section 2.4. is secure against the malicious model holder and the semi-honest client .
4.1.2 Generating convolution triple
We describe the technical details of generating authenticated triples for convolution. Briefly, for a given convolution operation, we first convert it to equivalent matrix multiplications, and then generate triples for the matrix multiplications. We start by reviewing the definition of convolution and how to translate it into the equivalent matrix multiplication. Then, we explain how to generate authenticated triples.
① Convolution. Assuming an input tensor of size with channels, denoted as , where and are spatial coordinates, and is the channel. Let kernels with a size of denote as tensor , where are shifts of the spatial coordinates, and are the channels and kernels, respectively. The convolution between and (i.e., ) is defined as below:
(9) |
The resulting tensor has spatial coordinates and channels. We have and , where represents the number of turns to zero-pad the input, and represents the stride size of the kernel movement [25]. Note that the entries of to be zero if or are outside of the ranges and , respectively.
② Conversion between convolution and matrix multiplication. Based on Eqn.(9), we can easily convert convolution into an equivalent matrix multiplication. Specifically, we construct a matrix with dimension , where . Similarly, we construct a matrix of dimension such that . Then, the original convolution operation is transformed into , where . In Appendix 9, we provide a detailed example to implement the above transformation.
Input: holds uniformly chosen from , and uniformly chosen from . In addition, holds a MAC key uniformly chosen from |
Output: obtains , where . |
Procedure: 1. and participate in a secure two-party computation such that obtains an FHE public-secret key pair (, ) while obtains the public key . This process is performed only once. 2. first converts and into equivalent matrixes and and , where while . Then, sends the encryptions and to along with zero-knowledge (ZK) proofs of plaintext knowledge of the two ciphertexts. 3. also converts and into equivalent matrixes and and . Then it samples , and computes , , , and . sends , , and to . 4. decrypts , , and to obtain , respectively. Then, Both and converts these matrices back into tensors to get for . 5. outputs , where . |
③Generating convolution triple. Figure 4 depicts the interaction between the model holder and the client to generate triples of convolution. Succinctly, first uniformly selects and and sends their encryption to , along with zero-knowledge proofs about these ciphertexts (step 2 in Figure 4). recovers and under the ciphertext and then computes (step 3 in Figure 4). Then it returns the corresponding ciphertexts to . decrypts these ciphertexts and computes , and (step 4 in Figure 4). Finally, obtains , where .
Remark 4.2: We utilize the method in [17] to perform the homomorphic multiplication operations involved in generating convolution triples in parallel. Given the multiplication of two -dimensional matrices, it reduces the computational complexity from to , compared with the existing method [14]. Besides, [17] requires only one ciphertext to represent a single matrix whereas existing work [14] requires ciphertexts (assuming the number of plaintext slots in FHE is greater than ). In addition, compared to generating multiplication triples for single multiplication [22, 20], the communication overhead of our method is independent of the number of multiplications, only related to the size of the input, i.e., reduce the communication cost from cubic to quadratic (both offline and online phases).
Remark 4.3: We further exploit the properties of semi-honest clients to improve the performance of generating convolution triples. Specifically, for the multiplication of matrices and , the permutations and can be done in plaintext beforehand, which reduces the rotation in half compared to the original method (see Section 3.2 in [17] for comparison). Moreover, we move the majority of the computation to be executed by the semi-honest party, which avoids the need for distributed decryption and frequent zero-knowledge proofs in malicious adversary settings.
Security. Our protocol for generating authenticated convolution triples, , is secure against the malicious model holder and the semi-honest client . We provide the following theorem.
Theorem 4.2.
Let the fully homomorphic encryption used in have the properties defined in Section 2.4. is secure against the malicious model holder and the semi-honest client .
Proof.
The proof logic of this theorem is very similar to Theorem 4.1, we omit it for brevity. ∎
4.1.3 Preprocessing for the nonlinear layer
This process is performed by the client to generate garbled circuits of nonlinear functions for the model holder. Note that we do not generate for ReLu but for the nonlinear part of ReLU, i.e. given an arbitrary input . We first define a truncation function , which outputs the last bits of the input, where satisfies . Then, the client is required to generate random ciphertexts and send them to the model holder as follows.
-
Given the security parameter , and the boolean circuit denoted the nonlinear part of ReLU, computes , where is the garbled circuit of , represent all possible garbled input and output labels, respectively. sends to the model holder .
-
uniformly selects , and from for every . Then, sets .
-
parses as for every and , where and .
-
For every and , sends and to , where and .
Security. We leave the explanation of above ciphertexts sent by to to the following sections. Here we briefly describe the security of preprocessing for nonlinear layers. It is easy to infer that the above preprocessing for the nonlinear layer is secure against the semi-honest client and the malicious model holder . Specifically, for the client , since the entire preprocessing process does not require the participation of the model holder, the client cannot obtain any private information about the model holder. Similarly, for the malicious model holder , since the preprocessing is non-interactive and the generated ciphertext satisfies the security defined in Section 2.9, cannot obtain the plaintext corresponding to the ciphertext sent by the client.
4.2 Online Phase
In this section, we describe the online phase of VerifyML. We first explain how VerifyML utilizes the triples generated in the offline phase to generate authenticated shares for matrix-vector multiplication and convolution. Then, we describe the technical details of the nonlinear operation.
4.2.1 Perform linear layers in the online phase
Preamble: Consider a neural network (NN) consists of linear layers and nonlinear layers. Let the specification of the linear layer is and the non-linear layer is . |
Input: holds , i.e., weights for the linear layers. holds as the input of NN, a random MAC key from to be used throughout the protocol execution. |
Output: obtains for and . |
Procedure: |
Input Sharing: 1. To share ’s input , all parties pick up a fresh authenticated element of the same dimension as . 2. is opened to , and then it sends to . 3. locally computes for . 4. To share ’s input , randomly selects two masks and of the same dimension as . Then, it sends to . sets . 5. For each , Matrix-vector Multiplication: To generate an authenticated triple of multiplications between matrix and vector , where and are variables generated in the inference process. and take a fresh authenticated matrix-vector triple of dimensions consistent with and . Then, both party open and . Finally, locally computes based on Eqn.(8). Convolution: To generate an authenticated triple of Convolution between tensors and , where and are variables generated in the inference process. and take a fresh authenticated Convolution triple of dimensions consistent with and . Then, both party open and . Finally, locally computes based on Eqn.(8). 6. obtains for and . |
Figure 5 depicts the interaction of the model holder and the client to perform linear layer operations in the online phase. Specifically, given the model holder’s input and the client’s input , both parties first generate authenticated shares of their respective inputs (steps 1-4 in Figure 5). Since the client is considered semi-honest, its input is shared more efficiently than the model holder, i.e. only local computations are required on randomly selected masks, while the sharing process of model holder’s input is consistent with the previous malicious settings [22, 20, 6]. After that, the model holder and the client use the triples generated in the offline phase (i.e., matrix-vector multiplication triples and convolution triples) to generate authenticated sharing of linear layer computation results (step 5 in Figure 5).
Security. Our protocol for performing linear layer operations in the online phase, , is secure against the malicious model holder and the semi-honest client . We provide the following theorem.
Theorem 4.3.
Let triples used in are generated from and . is secure against the malicious model holder and the semi-honest client .
4.2.2 Perform non-linear layers in the online phase
Input: holds and holds for and . In addition, holds the MAC key . |
Output: obtains and for and . |
Procedure(take single as an example): 1. Garbled Circuit Phase: and invoke the OT (see Section 2.8), where ’s inputs are while ’s input is . Hence, learns . Also, sends its garbled inputs to . With and , evaluates . 2. Authentication Phase 1: parses as where and for every . computes and for every . 3. Local Computation Phase: outputs , and . outputs , and . 4. Authentication Phase 2: For every where , randomly select a fresh authenticated triple . All parties reveal and to each other, and then locally compute and based on Eqn.(8). obtains and . |
In this section, we present the technical details of the execution of nonlinear functions in the online phase. We mainly focus on how to securely compute the activation function ReLU, which is the most representative nonlinear function in deep neural networks. As shown in Figure 5, the result obtained from each linear layer is held by both parties in the format of authenticated sharing. Similarly, for the function in the -th nonlinear layer, the goal of VerifyML is to securely compute and share it to the model holder and client in the authenticated sharing manner. We describe details in Figure 6.
Garbled Circuit Phase. As described in Section 4.1.3, in the offline phase, constructs a for the nonlinear part of ReLU (i.e., for arbitrary input ) and sent it to . In the online phase, and invoke the OT, where as the sender whose inputs are while ’s (as the receiver) input is . As a result, gets set of garbled inputs of in . Then, evaluates with garbled inputs of and learns the set of output labels for the bits of and .
Authentication Phase 1. This phase aims to calculate the share of the authentication of each bit of , i.e., , and for , based on the previous phase. We take an example of how to calculate . It is clear that the share of is either 0 or depending on whether is 0 or 1. Recall that the output of the GC is two output labels corresponding to each (each one for and 1). We use the symbol and to denote and , respectively. To calculate the shares of , randomly selects in the offline phase and encrypts it as and encrypts as . sends the two ciphertexts to and sets its own share of to . Since has obtained in the previous phase, it can definitely decrypt it and obtain its own share of . Computation of and follows a similar logic, utilizing the random values , sent by to in the offline phase, respectively.
Local Computation Phase. This process is used to calculate the share of , and based on the results learned by all parties in the previous stage. For example, to compute the share of , each party locally multiplies the share of with and sums all the resultant values. Each party computes the share of and in a similar manner.
Authentication Phase 2. We compute the shares of , and . Since each party holds the authenticated shares of and , we can achieve this based on Eqn.(8).
Remark 4.4. We adopt two methods to minimize the number of multiplication operations involved in the . One is to compute the garbled output of per-bit of in . Another is to encapsulate only the nonlinear part of ReLU into . In this way, we avoid computing and in , which is multiply operation intensive. Compared with works [22, 20, 6] with malicious adversary, VerifyML reduces the communication overhead of each ReLU function from to , where .
Remark 4.5. We devise a lightweight method to check whether the model holder’s input at the non-linear layer is consistent with what it has learned at the previous layer. Specifically, at the end of evaluating the -th linear layer, both parties learns the share of . Then, is used as the input of the -th nonlinear. To check that is fed the correct input, We require to be recomputed in and share again to both parties. Therefore, after evaluating each nonlinear layer, both parties hold two independent shares of . This provides a way to determine if provided the correct input by verifying that the two independent shares are consistent (See Section 4.3 for more details).
Correctness. We analyze the correctness of our protocol as follows. Based on the correctness of OT, the model holder holds . Using for , and the correctness of (, ) for circuit , we learn and , for . Therefore, for , we have and . Hence, and . Based on these, we have
-
.
-
.
-
.
Since each party holds the authenticated shares of and , we can easily compute the shares of , and . This concludes the correctness proof.
Security. Our protocol for performing nonlinear layer operations in the online phase, , is secure against the malicious model holder and the semi-honest client . We provide the following theorem and prove it in Appendix D.
4.3 Consistency Check
VerifyML performs and alternately in the online phase to output the inference result for a given input , where all intermediate results output by the nonlinear layer and the linear layer are held on and in an authenticated sharing manner. To verify the correctness of , the client needs to perform a consistency check on all computed results. If the verification passes, locally evaluates the fairness of the ML model based on Eqn.(2). Otherwise, abort. In more detail, for sharing ’s input and executing each linear layer , VerifyML needs to pick up a large number of fresh authenticated single elements or triples (see Figure 5) and open them for computation. Assume that the set of all opened elements is , and holds as well as , we need to perform a consistency check to verify . Beside, For executing each nonlinear layer , the inputs of are shares of and . To check that is fed the correct input, We require to be recomputed in the and share it again to both parties, denoting the new as . We also need to perform a consistency check to verify .
Input: holds , and for and . |
Output: obtains if verification passes. Otherwise, abort. |
Procedure For and , uniformly samples and and sends them to . computes , and sends to . computes . aborts if . Else, locally evaluates the fairness of the ML model based on Eqn.(2) by reconstructing . |
Figure 7 presents the details of consistency check, where we combine all the above checks into a single check by using random scalars picked by . The correctness of can be easily deduced by inspecting the implementation of the protocol. Specifically, By correctness of , we have for every linear layer . By correctness of , we have for all nonlinear layers. Hence, we have .
Security. We demonstrate that the consistency check protocol have an overwhelming probability to abort if tampered with the input during execution. We provide the following theorem and prove it in Appendix E.
Theorem 4.5.
In real execution, if tampers with its input, then aborts with probability at least .
5 Performance Evaluation
In this section, we conduct experiments to demonstrate the performance of VerifyML. Since there is no secure inference protocol specifically designed for the malicious model holder threat model, we choose the state-of-the-art generic MPC framework Overdrive [22]555Although work [6] shows better performance compared to Overdrive, it is difficult to compare with [6] because of the unavailability of its code. However, we clearly outperform [6] by constructing a more efficient method to generate triples. In addition, [6] requires fitting nonlinear functions such as ReLU to a quadratic polynomial to facilitate computation, which is also contrary to the motivation of this paper. as the baseline. Note that we also consider the client as a semi-honest entity when implementing Overdrive, so that Overdrive can also utilize the properties of semi-honest client to avoid redundant verification and zero-knowledge proof. In this way, we can “purely” discuss the technical advantages of VerifyML over Overdrive, while excluding the inherent advantages of VerifyML due to the weaker threat model. Specifically, we analyze the performance of VerifyML from offline and online phases, respectively, where we discuss the superiority of VerifyML over Overdrive in terms of computation and communication cost in performing linear and non-linear layers. In the end, We demonstrate the cost superiority of VerifyML compared to Overdrive on mainstream models including ResNet-18 and LeNet.
5.1 Implementation details
VerifyML is implemented through the C++ language and provides 128 bits of computational security and 40 bits of statistical security. The entire system operates on the 44-bit prime field. We utilize the SEAL homomorphic encryption library [37] to perform nonlinear layers including generative matrix-vector multiplication and convolution triples, where we set the maximum number of slots allowed for a single ciphertext as 4096. The garbled circuit for the nonlinear layer is constructed on the EMP toolkit [40] (with the OT protocol that resists active adversaries). Zero-knowledge proofs of plaintext knowledge are implemented based on MUSE [26]. Our experiments are carried out in both the LAN and WAN settings. LAN is implemented with two workstations in our lab. The client workstation has AMD EPYC 7282 1.4GHz CPUs with 32 threads on 16 cores and 32GB RAM. The server workstation has Intel(R) Xeon(R) E5-2697 v3 2.6GHz CPUs with 28 threads on 14 cores and 64GB RAM. The WAN setting is based on a connection between a local PC and an Amazon AWS server with an average bandwidth of 963Mbps and running time of around 14ms.
5.2 Performance of offline phase
5.2.1 Cost of generating matrix-vector multiplication triple
Dimension | Comm.cost (MB) | Running time (s) | ||||
Overdrive | VerifyML(Reduction) | Overdrive | VerifyML (speedup) | |||
LAN | WAN | LAN | WAN | |||
27.1 | 2.1 (12.9) | 2.3 | 17.7 | 0.9 (2.6) | 12.4 (1.5) | |
216.4 | 17.6 (12.3) | 15.3 | 26.2 | 7.6 (2.0) | 14.1 (1.6) | |
432.8 | 34.5 (12.5) | 30.6 | 43.4 | 15.1 (2.0) | 26.9 (1.6) | |
865.6 | 68.3 (12.7) | 60.9 | 72.4 | 29.2 (2.1) | 40.7 (1.7) | |
1326.2 | 135.7 (9.8) | 103.0 | 114.8 | 57.8 (1.8) | 68.2 (1.6) | |
2247.4 | 271.9 (8.3) | 187.1 | 199.1 | 117.3 (1.6) | 128.4 (1.5) | |
TABLE I describes the comparison of the overhead of VerifyML and Overdrive in generating matrix-vector multiplication triples in different dimensions. It is clear that VerifyML is superior in performance to Overdrive, both in terms of communication overhead and computational overhead. We observe that VerifyML achieves more than reduction in communication overhead and at least speedup in computation compared to Overdrive. This stems from Overdrive’s disadvantage in constructing triples, i.e. constructing triples for only a single multiplication operation (or multiplication between a single row of a matrix and a vector). In addition, the generation process requires frequent interaction between the client and the model holder (for zero-knowledge proofs and preventing breaches by either party). This inevitably incurs substantial computational and communication overhead. Our constructed matrix-multiplication triples enable the communication overhead to be independent of the number of multiplications, only related to the size of the input. This substantially reduces the amount of data that needs to be exchanged between and . In addition, we move the majority of the computation to be executed by , which avoids the need for distributed decryption and frequent zero-knowledge proofs in malicious adversary settings. Moreover, our matrix-vector multiplication does not involve any rotation operation. As a result, these optimization methods motivate VerifyML to exhibit a satisfactory performance overhead in generating triples.
5.2.2 Cost of generating convolution triple
Input | Kernel | Comm.cost (GB) | Running time (s) | ||||
Overdrive | VerifyML | Overdrive | VerifyML (Speedup) | ||||
LAN | WAN | LAN | WAN | ||||
17.1 | 2.1 | 1476.1 | 1494.6 | 924.7 (1.6) | 938.4 (1.6) | ||
67.8 | 8.2 | 6059.3 | 6059.31 | 3568.8 (1.7) | 3580.8 (1.7) | ||
467.5 | 56.8 | 40753.4 | 40767.1 | 25387.2 (1.6) | 25401.5 (1.6) | ||
83127.8 | 7324.3 | 7245056.2 | 7245068.8 | 4521023.3 (1.6) | 4521165.6 (1.6) | ||
TABLE II shows the comparison of the performance of VerifyML and Overdrive in generating convolution triples in different dimensions, where input tensor of size with channels is denoted as , and the size of corresponding kernel is denoted as . We observe that VerifyML is much lower than Overdrive in terms of computational and communication overhead. For instance, VerifyML gains a reduction of up to in communication cost and a speedup of at least in computation. This is due to the optimization method customized by VerifyML for generating convolution triples. Compared to Overdrive, which focuses on constructing authenticated triples for a single multiplication operation, VerifyML uses the homomorphic parallel matrix multiplication method constructed in [17] as the underlying structure to construct matrix multiplication triples equivalent to convolution triples. Since a single matrix is regarded as a computational entity, the above method makes the communication overhead between the client and the model holder only related to the size of the matrix, and independent of the number of operations of the multiplication between the two matrices (that is, the communication complexity is reduced from to given the multiplication between the two matrices). In addition, the optimized parallel matrix multiplication reduces the homomorphic rotation operation from to . This enables VerifyML to show significant superiority in computing convolution triples.
5.3 Performance of online phase
In the online phase, VerifyML is required to perform operations at the linear and nonlinear layers alternately. Here we discuss the overhead performance of VerifyML compared to Overdrive separately.
5.3.1 Performance of executing linear layers
Input | Kernel | Comm.cost (MB) | |
Overdrive | VerifyML (Reduction) | ||
46.1 | 0.5 (85.3) | ||
184.5 | 1.4 (128.0) | ||
1271.7 | 15.7 (81.2) | ||
226073.0 | 1459.8 (154.9) | ||
Since both VerifyML and Overdrive follow the same computational logic to perform the linear layer in the online phase, i.e. use pre-generated authenticated triples to compute matrix-vector multiplication and convolution, both exhibit similar compu- tational overhe. Therefore, we focus on analyzing the difference in communication overhead between the two of executing convolution. TABLE III depicts the communication overhead of VerifyML and Overdriveffor computing convolution in different dimensions. It is obvious that VerifyML shows superior performance in communication overhead compared to Overdrive. This is mainly due to the fact that Overdrive needs to open a fresh authenticated Beaver’s multiplication triple for each multiplication operation, which makes the communication overhead of executing the entire linear layer positively related to the total multiplication operations involved. In contrast, VerifyML customizes matrix-vector multiplication and convolution triples, which makes the cost independent of the number of multiplication operations in the linear layer. This substantially reduces the amount of data that needs to be exchanged during the execution.
5.3.2 Performance of executing nonlinear layers


Figure 6 provides the comparison of the cost between Overdrive and VerifyML. We observe that VerifyML outperforms Overdrive by in runtime on LAN Setting and in WAN Setting. For example, Overdrive takes and to compute ReLUs on LAN and WAN setting, respectively. Whereas, VerifyML took just and in the respective network settings. For communication overhead, we observed that Overdrive required KB of traffic to perform a single ReLU while we only need KB, which is at least a improvement. This is mainly due to the fact that our optimized substantially reduces the multiplication operations involved in evaluating in the . Moreover, Overdrive needs to verify the correctness of the input from the model holder in the , which is very expensive. Conversely, VerifyML designs lightweight consistency verification methods to achieve this.
5.4 Performance of end-to-end secure inference
LeNet | Phases | Comm.cost (MB) | Running time (s) | ||||
Overdrive | VerifyML | Overdrive | VerifyML (Speedup) | ||||
LAN | WAN | LAN | WAN | ||||
Offline | 3427.8 | 209.6 | 235.5 | 246.8 | 92.9 (2.5) | 104.6 (2.4) | |
Online | 2543.1 | 54.0 | 32.8 | 254.9 | 1.0 (32.6) | 21.9 (11.6) | |
Total | 5970.9 | 263.6 | 268.3 | 501.7 | 93.9 (2.9) | 126.5 (4.0) | |
ResNet18 | Phases | Comm.cost (MB) | Running time (s) | ||||
Overdrive | VerifyML | Overdrive | VerifyML (Speedup) | ||||
LAN | WAN | LAN | WAN | ||||
Offline | 2116018.6 | 257257.7 | 238774.2 | 238957.4 | 114003.1 (2.1) | 114978.8 (2.1) | |
Online | 19359.5 | 459.4 | 177.0 | 1373.7 | 5.5 (32.2) | 117.9 (11.7) | |
Total | 2135378.1 | 257717.1 | 238951.2 | 240331.1 | 114008.6 (2.1) | 115096.7 (2.1) | |
We compare the performance of VerifyML and Overdrive on real-world ML models. In our experiments, we choose ResNet-18 and LeNet, which are trained on the CelebA [28] and C-MNIST datasets[2] respectively. Note that CelebA and C-MNIST are widely used to check how fair a given trained model is. TABLE IV shows the performance of VerifyML and Overdrive in terms of computation and communication overhead. Compared to Overdrive, VerifyML demonstrates an encouraging online runtime boost by and over existing works on LeNet and ResNet-18, respectively, and at least an order of magnitude communication cost reduction. In online phase, Overdrive takes and to compute single query on LeNet and ReNet-18, respectively. Whereas, VerifyML took just and in the respective network settings. Consistent with the previous analysis, this stems from the customized optimization mechanism we designed for VerifyML.
5.5 Comparison with other works
Compared with DELPHI. We demonstrate that for the execution of non-linear layers, the communication overhead of VerifyML is even lower than the state-of-the-art scheme DELPHI [29] under hte semi-honest threat model. Specifically, for the -th nonlinear layer, DELPHI needs to calculate shares of in and share it with two parties. DELPHI requires at least additional AND gates, which incurs at least bits of communication, compared to only computing each bit of in VerifyML. In our experiment, For , , our method gives roughly less communication for generating shares of , i.e., DELPHI required KB of traffic to perform a single ReLU while we only need KB.
Compared with MUSE and SIMC. We note that several works such as MUSE [26] and SIMC [5] have been proposed to address ML secure inference on the client malicious threat model. Such a threat model considers that the server (i.e., the model holder) is semi-honest but the malicious client may arbitrarily violate the protocol to obtain private information. These works intuitively seem to translate to our application scenarios with appropriate modification. However, we argue that this is non-trivial. In more detail, in the client malicious model, the client’s inputs are encrypted and sent to the semi-honest model holder, which performs all linear operations for speeding up the computation. Since the model holder holds the model parameter in the plaintext, executing the linear layer only involves homomorphic operations between the plaintext and the ciphertext. Such type of computation is compatible with mainstream homomorphic optimization methods including GALA [43] and GAZELLE [19]. However, in VerifyML, the linear layer operation cannot be done in the model holder because it is considered malicious. One possible approach is to encrypt the model data and perform linear layer operations with two-party interaction. This is essentially performing homomorphic operations between ciphertext and ciphertext, which is not compatible with previous optimization strategies. Therefore, instead of simply fine-tuning MUSE [26] and SIMC [5], we must redesign new parallel homomorphic computation methods to fit this new threat model. On the other hand, we observe that the techniques for nonlinear operations in MUSE [26] and SIMC [5] can clearly be transferred to VerifyML. However, our method still outperforms SIMC (an upgraded version of MUSE). This mainly stems from the fact that we only encapsulate the nonlinear part of ReLU into to further reduce the number of multiplication operations. Experiments show that our method is about one third of SIMC in terms of computing and communication overhead.
6 Conclusion
In this paper, we proposed VerifyML, the first secure inference framework to check the fairness degree of a given ML model. We designed a series of optimization methods to reduce the overhead of the offline stage. We also presented optimized to substantially speed up operations in the non-linear layers. In the future, we will focus on designing more efficient optimization strategies to further reduce the computation overhead of VerifyML, to make secure ML inference more suitable for a wider range of practical applications.
References
- [1] Finastra Adam Lieberman. How data scientists can create a more inclusive financial services landscape, 2022.
- [2] Martin Arjovsky, Léon Bottou, Ishaan Gulrajani, and David Lopez-Paz. Invariant risk minimization. arXiv preprint arXiv:1907.02893, 2019.
- [3] Rachel KE Bellamy, Kuntal Dey, Michael Hind, Samuel C Hoffman, Stephanie Houde, Kalapriya Kannan, Pranay Lohia, Jacquelyn Martino, Sameep Mehta, Aleksandra Mojsilovic, et al. Ai fairness 360: An extensible toolkit for detecting, understanding, and mitigating unwanted algorithmic bias. arXiv preprint arXiv:1810.01943, 2018.
- [4] Sumon Biswas and Hridesh Rajan. Do the machine learning models on a crowd sourced platform exhibit bias? an empirical study on model fairness. In Proceedings of ACM joint meeting on European software engineering conference and symposium on the foundations of software engineering (ESEC/FSE), pages 642–653, 2020.
- [5] Nishanth Chandran, Divya Gupta, Sai Lakshmi Bhavana Obbattu, and Akash Shah. Simc: Ml inference secure against malicious clients at semi-honest cost. Cryptology ePrint Archive, 2021.
- [6] Hao Chen, Miran Kim, Ilya Razenshteyn, Dragos Rotaru, Yongsoo Song, and Sameer Wagh. Maliciously secure matrix multiplication with applications to private deep learning. In International Conference on the Theory and Application of Cryptology and Information Security (ASIACRYPT), pages 31–59. Springer, 2020.
- [7] Alexandra Chouldechova, Diana Benavides-Prado, Oleksandr Fialko, and Rhema Vaithianathan. A case study of algorithm-assisted decision making in child maltreatment hotline screening decisions. In Conference on Fairness, Accountability and Transparency, pages 134–148. PMLR, 2018.
- [8] Michele Ciampi, Vipul Goyal, and Rafail Ostrovsky. Threshold garbled circuits and ad hoc secure computation. In Annual International Conference on the Theory and Applications of Cryptographic Techniques (EUROCRYPT), pages 64–93. Springer, 2021.
- [9] Ivan Damgård, Valerio Pastro, Nigel Smart, and Sarah Zakarias. Multiparty computation from somewhat homomorphic encryption. In Annual Cryptology Conference (CRYPTO), pages 643–662. Springer, 2012.
- [10] Nico Döttling, Sanjam Garg, Mohammad Hajiabadi, Daniel Masny, and Daniel Wichs. Two-round oblivious transfer from cdh or lpn. Annual International Conference on the Theory and Applications of Cryptographic Techniques(EUROCRYPT), 12106:768, 2020.
- [11] Uriel Feige, Amos Fiat, and Adi Shamir. Zero-knowledge proofs of identity. Journal of cryptology, 1(2):77–94, 1988.
- [12] Tore Kasper Frederiksen, Thomas Pelle Jakobsen, Jesper Buus Nielsen, Peter Sebastian Nordholt, and Claudio Orlandi. Minilego: Efficient secure two-party computation from general assumptions. In Annual International Conference on the Theory and Applications of Cryptographic Techniques (EUROCRYPT), pages 537–556. Springer, 2013.
- [13] Alex B Grilo, Huijia Lin, Fang Song, and Vinod Vaikuntanathan. Oblivious transfer is in miniqcrypt. In Annual International Conference on the Theory and Applications of Cryptographic Techniques(EUROCRYPT), pages 531–561. Springer, 2021.
- [14] Shai Halevi and Victor Shoup. Algorithms in helib. In Annual Cryptology Conference, pages 554–571. Springer, 2014.
- [15] Carmit Hazay, Emmanuela Orsini, Peter Scholl, and Eduardo Soria-Vazquez. Concretely efficient large-scale mpc with active security (or, tinykeys for tinyot). In International Conference on the Theory and Application of Cryptology and Information Security (ASIACRYPT), pages 86–117. Springer, 2018.
- [16] Ayanna Howard and Jason Borenstein. The ugly truth about ourselves and our robot creations: the problem of bias and social inequity. Science and engineering ethics, 24(5):1521–1536, 2018.
- [17] Xiaoqian Jiang, Miran Kim, Kristin Lauter, and Yongsoo Song. Secure outsourced matrix computation and application to neural networks. In Proceedings of the ACM SIGSAC conference on computer and communications security (CCS), pages 1209–1222, 2018.
- [18] Surya Mattu Julia Angwin, Jeff Larson and ProPublica Lauren Kirchner. Machine bias, 2016.
- [19] Chiraag Juvekar, Vinod Vaikuntanathan, and Anantha Chandrakasan. GAZELLE: A low latency framework for secure neural network inference. In USENIX Security Symposium (USENIX Security 18), pages 1651–1669, 2018.
- [20] Marcel Keller. Mp-spdz: A versatile framework for multi-party computation. In Proceedings of ACM SIGSAC conference on computer and communications security (CCS), pages 1575–1590, 2020.
- [21] Marcel Keller, Emmanuela Orsini, and Peter Scholl. Actively secure ot extension with optimal overhead. In Annual Cryptology Conference (CRYPTO), pages 724–741. Springer, 2015.
- [22] Marcel Keller, Valerio Pastro, and Dragos Rotaru. Overdrive: making spdz great again. In Annual International Conference on the Theory and Applications of Cryptographic Techniques (EUROCRYPT), pages 158–189. Springer, 2018.
- [23] Vladimir Kolesnikov, Payman Mohassel, and Mike Rosulek. Flexor: Flexible garbling for xor gates that beats free-xor. In Annual Cryptology Conference (CRYPTO), pages 440–457. Springer, 2014.
- [24] Preethi Lahoti, Alex Beutel, Jilin Chen, Kang Lee, Flavien Prost, Nithum Thain, Xuezhi Wang, and Ed Chi. Fairness without demographics through adversarially reweighted learning. Advances in neural information processing systems (NeurIPS), 33:728–740, 2020.
- [25] Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. nature, 521(7553):436–444, 2015.
- [26] Ryan Lehmkuhl, Pratyush Mishra, Akshayaram Srinivasan, and Raluca Ada Popa. Muse: Secure inference resilient to malicious clients. In USENIX Security Symposium (USENIX Security 21), pages 2201–2218, 2021.
- [27] Yehuda Lindell. How to simulate it–a tutorial on the simulation proof technique. Tutorials on the Foundations of Cryptography, pages 277–346, 2017.
- [28] Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. Deep learning face attributes in the wild. In Proceedings of the IEEE international conference on computer vision, pages 3730–3738, 2015.
- [29] Pratyush Mishra, Ryan Lehmkuhl, Akshayaram Srinivasan, Wenting Zheng, and Raluca Ada Popa. Delphi: A cryptographic inference service for neural networks. In USENIX Security Symposium, pages 2505–2522, 2020.
- [30] Payman Mohassel and Yupeng Zhang. Secureml: A system for scalable privacy-preserving machine learning. In IEEE symposium on security and privacy (S&P), pages 19–38. IEEE, 2017.
- [31] Debarghya Mukherjee, Mikhail Yurochkin, Moulinath Banerjee, and Yuekai Sun. Two simple ways to learn individual fairness metrics from data. In International Conference on Machine Learning (ICML), pages 7097–7107. PMLR, 2020.
- [32] Luca Oneto and Silvia Chiappa. Fairness in machine learning. In Recent Trends in Learning From Data, pages 155–196. Springer, 2020.
- [33] Osonde A Osoba and William Welser IV. An intelligence in our image: The risks of bias and errors in artificial intelligence. Rand Corporation, 2017.
- [34] Flavien Prost, Pranjal Awasthi, Nick Blumm, Aditee Kumthekar, Trevor Potter, Li Wei, Xuezhi Wang, Ed H Chi, Jilin Chen, and Alex Beutel. Measuring model fairness under noisy covariates: A theoretical perspective. In Proceedings of AAAI/ACM Conference on AI, Ethics, and Society (AIES), pages 873–883, 2021.
- [35] Mike Rosulek and Lawrence Roy. Three halves make a whole? beating the half-gates lower bound for garbled circuits. In Annual International Cryptology Conference (CRYPTO), pages 94–124. Springer, 2021.
- [36] Pedro Saleiro, Benedict Kuester, Loren Hinkson, Jesse London, Abby Stevens, Ari Anisfeld, Kit T Rodolfa, and Rayid Ghani. Aequitas: A bias and fairness audit toolkit. arXiv preprint arXiv:1811.05577, 2018.
- [37] Microsoft SEAL (release 4.0). https://github.com/Microsoft/SEAL, March 2022. Microsoft Research, Redmond, WA.
- [38] Shahar Segal, Yossi Adi, Benny Pinkas, Carsten Baum, Chaya Ganesh, and Joseph Keshet. Fairness in the eyes of the data: Certifying machine-learning models. In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society (AIES), pages 926–935, 2021.
- [39] Nigel P Smart and Frederik Vercauteren. Fully homomorphic simd operations. Designs, codes and cryptography, 71(1):57–81, 2014.
- [40] Xiao Wang, Alex J Malozemoff, and Jonathan Katz. Emp-toolkit: Efficient multiparty computation toolkit. https://github.com/emp-toolkit, 2016.
- [41] Paul Weiss, Rifkind, Wharton, and Garrison LLP. Breaking new ground, cfpb will pursue discrimination as an ¡°unfair¡± practice across the range of consumer financial services, 2022.
- [42] Samee Zahur, Mike Rosulek, and David Evans. Two halves make a whole. In Annual International Conference on the Theory and Applications of Cryptographic Techniques (EUROCRYPT), pages 220–250. Springer, 2015.
- [43] Qiao Zhang, Chunsheng Xin, and Hongyi Wu. Gala: Greedy computation for linear algebra in privacy-preserved neural networks. In Proceedings of the Network and Distributed System Security (NDSS), 2021.
Appendix
Appendix A Threat Model
We formalize the threat model involved in VerifyML with the simulated paradigm [27]. We define two interactions to capture security: a real interaction by and in the presence of adversary and an environment , and an ideal interaction where parties send their respective inputs to a trusted entity that computes functionally faithfully. Security requires that for any adversary in real interaction, there exists a simulator in ideal interaction, such that no environment can distinguish real interaction from ideal interaction. Specifically, let be the two-party functionality such that and invoke on inputs and to obtain and , respectively. We say a protocol securely implements if it holds the following properties.
-
•
Correctness: If and are both honest, then gets and gets from the execution of on the inputs and , respectively.
-
•
Semi-honest Client Security: For a semi-honest adversary that compromises , there exists a simulator such that for any input , we have
where represents the view of during the execution of , and and are the inputs of and , respectively. represents the view simulated by when it is given access to and . indicates computational indistinguishability of two distributions and .
-
•
Malicious Model Holder Security: For the malicious adversary that compromises , there exists a simulator , such that for any input from , we have
where denotes ’s view during the execution of with ’s input . indicates the output of in the real protocol execution. Similarly, and represents the output of and the simulated view in the ideal interaction.
Appendix B Proof of Theorem 1
Proof.
Input: holds uniformly chosen from and uniformly chosen from . holds uniformly chosen from , and uniformly chosen from and a MAC key uniformly chosen from |
Output: obtains , where . |
Let shown in Figure 8 be the functionality of generating matrix-vector multiplication triple. We first prove security for semi-honest clients and then demonstrate security against malicious model holders.
Semi-honest client security. The simulator samples . The simulator and the semi-honest client run a secure two-party protocol to generate the public and secret keys for homomorphic encryption. When the simulator accesses the ideal functionality, it provides as output. In addition, the sends to the client along with the simulated zero-knowledge proof of well-formedness of ciphertexts. We now show the indistinguishability between real and simulated views by the following hybrid arguments.
-
•
: This corresponds to the real execution of the protocol.
-
•
: The simulator runs the two-party computation protocol with the semi-honest client to generate the public and secret keys for homomorphic encryption. When the simulator accesses the ideal functionality, we sample and send to the semi-honest client. This hybrid is computationally indistinguishable to .
-
•
: In this hybrid, instead of sending the encryptions and to , sends ciphertexts with all 0s (i.e., ) to the client. also provides a zero-knowledge (ZK) proof of plaintext knowledge of the ciphertexts. For any two plaintexts, FHE ensures that an adversary cannot distinguish them from their ciphertexts. In addition, zero-knowledge proofs also guarantee the indistinguishability of two ciphertexts. Therefore, this hybrid is indistinguishable from the previous one.
Malicious model holder security. The simulator samples . The simulator and the semi-honest client run a secure two-party protocol to generate the public and secret keys for homomorphic encryption. When the simulator accesses the ideal functionality, it provides as outputs. Once sends and , verifies the validity of the ciphertext from the client. If the verification is passed, extracts and and the randomness used for generating these ciphertexts, since it has access to the client’s input. Then, samples and , and queries the ideal functionalities on the input , , and to obtain . Then, uses these outputs and the randomness used to generate the initial ciphertexts to construct the four simulated ciphertexts. It sends the simulated ciphertexts to the client.
-
•
: This corresponds to the real execution of the protocol.
-
•
: The simulator runs the two-party computation protocol with the malicious model holder to generate the public and secret keys for homomorphic encryption. When the simulator accesses the ideal functionality, we sample and send them to the malicious model holder. This hybrid is computationally indistinguishable to .
-
•
: In this hybrid, checks the validity of the ciphertext from the client. If the zero-knowledge proofs are valid, extracts and and the randomness used for generating these ciphertexts, since it has access to the client’s input. The properties of zero-knowledge proofs ensure that this hybrid is indistinguishable from the previous one.
-
•
: exploits the functional privacy of FHE to generate , , , and . This hybrid is computationally indistinguishable to the previous hybrid from the function privacy of the FHE scheme. Note that view of the model holder in is identical to the view generated by .
∎
Appendix C Conversion between convolution and matrix multiplication

Figure 9 provides an example to convert a given convolution into the corresponding matrix multiplication. As shown in Figure 9, given input tensor of size with channels, denoted as , kernels with a size of denote as tensor , the convolution between and are converted an equivalent matrix multiplication and , where the number of turns to zero-pad is , and stride . Specifically, we construct a matrix with dimension , where . Similarly, we construct a matrix of dimension such that . Then, the original convolution operation is transformed into , where .
Appendix D Proof of Theorem 4
Proof.
Semi-honest client security.The security of the protocol against the semi-honest client is evident by observing the execution of the protocol. This stems from the fact that does obtain output in OT and does not receive any information from in subsequent executions. Here we focus on the security analysis of against malicious model holder .
Malicious model holder security. We first define the functionality of the protocol , denoted as , as shown in Figure 10. We use to refer to the view of the real interaction between and the adversary controlling , and then demonstrate indistinguishability from the simulated view interacted by the simulator and through standard hybrid arguments. In the following we will define three hybrid executions , and . We prove that is secure from the malicious model holder by proving indistinguishability among these hybrid executions.
Function . |
Input: holds and a MAC key uniformly chosen from . holds . |
Output: obtains for . |
: This hybrid execution is identical to except in the authentication phase. To be precise, in the authentication phase, the simulator use labels (described below) to replace the labels used in . Please note that in this hybrid the simulator can access ’ input and , where . Let . Therefore, for , we set if , otherwise, (i.e., the “other” label) is set to a random value chosen from uniformly, where the first bit of is . We provide the formal description of as follows, where the indistinguishability between the view of in and is directly derived from the authenticity of the garbled circuit.
-
1.
receives from as the input of OT.
-
2.
Garbled Circuit Phase:
-
–
For , first computes for each , and then for sends to as the output of OT. In addition, sends the garbled circuit and its garbled inputs to .
-
–
-
3.
Authentication Phase 1:
-
–
sets .
-
–
For , sets if .
-
–
For , if , sets as a random value chosen from uniformly, where first bit of is .
-
–
computes and sends to using . This process is same as in execution using .
-
–
-
4.
Local Computation Phase: The execution of this phase is indistinguishable from since no information needs to be exchanged between and .
-
5.
Authentication Phase 2:
-
–
The execution is identical to .
-
–
: We will make four changes to to obtain , and argue that is indistinguishable from from the adversary’s view. To be precise, let . First, we have based on the correctness of garbled circuits. Second, we note that ciphertexts are computed by exploiting the “other” set of output labels picked uniformly in . Based on this observation, actually can directly sample them uniformly at random. Third, in real execution, for every and , sends and to , and then computes , and based on them. To simulate this, only needs to uniformly select random values , and which satisfy , and . Finally, since , and are part the outputs of functionality , can obtain these as the outputs from . In summary, with the above changes, no longer needs of . We provide the formal description of as follows.
-
1.
receives from as the input of OT.
-
2.
Garbled Circuit Phase: Same as .
-
3.
Authentication Phase 1:
-
–
runs .
-
–
learns , and by sending to .
-
–
For , uniformly selects random values , and which satisfy , and .
-
–
For every , computes and . For ciphertexts , samples them uniformly at random.
-
–
sends to .
-
–
-
4.
Local Computation Phase: The execution of this phase is indistinguishable from since no information needs to be exchanged between and .
-
5.
Authentication Phase 2:
-
–
The execution is identical to .
-
–
: This hybrid we remove ’s dependence on ’s input . The indistinguishability between and stems from the security of the garbled circuit. We provide the formal description of below.
-
1.
receives from as the input of OT.
-
2.
Garbled Circuit Phase:
-
–
samples and sends to as the output of OT. also sends and to .
-
–
-
3.
Authentication Phase 1:
-
4.
Local Computation Phase: The execution of this phase is indistinguishable from since no information needs to be exchanged between and .
-
5.
Authentication Phase 2: Same as , where uses and to process this phase for .
∎
Appendix E Proof of Theorem 5
Proof.
Assuming that tampered with any of the inputs it holds during the execution, can be expressed as follows
where refers to the increment caused by ’s violation of the protocol. The above formula can be expressed as a 1-degree polynomial function with respect to the variable . It is clear that is a non-zero polynomial whenever introduces errors. Further, when is a non-zero polynomial, it has at most one root. Hence, over the choice of , the probability that is at most . Therefore, the probability that aborts is at least when cheats. ∎