This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Priority-Aware Private Matching Schemes for Proximity-Based Mobile Social Networks

Ben Niu, Tanran Zhang, Xiaoyan Zhu, Hui Li and Zongqing Lu B. Niu, X. Zhu and H. Li are with the School of Telecommunications Engineering, Xidian University, 710071, China. E-mail: xd.niuben@gmail.com and {xyzhu, lihui}@mail.xidian.edu.cn. T. Zhang is with GSIS, Tohoku University, Sendai, Japan. E-mail: xubu3@163.com. Z. Lu is with the Department of Computer Science and Engineering, The Pennsylvania State University, University Park, PA 16802. E-mail: zongqing@cse.psu.edu.
Abstract

The rapid developments of mobile devices and online social networks have resulted in increasing attention to Mobile Social Networking (MSN). The explosive growth of mobile-connected and location-aware devices makes it possible and meaningful to do the Proximity-based Mobile Social Networks (PMSNs). Users can discover and make new social interactions easily with physical-proximate mobile users through WiFi/Bluetooth interfaces embedded in their smartphones. However, users enjoy these conveniences at the cost of their growing privacy concerns. To address this problem, we propose a suit of priority-aware private matching schemes to privately match the similarity with potential friends in the vicinity. Unlike most existing work, our proposed priority-aware matching scheme (P-match) achieves the privacy goal by combining the commutative encryption function and the Tanimoto similarity coefficient which considers both the number of common attributes between users as well as the corresponding priorities on each common attribute. Further, based on the newly constructed similarity function which takes the ratio of attributes matched over all the input set into consideration, we design an enhanced version to deal with some potential attacks such as unlimitedly inputting the attribute set on either the initiator side or the responder side, etc. Finally, our proposed E-match avoids the heavy cryptographic operations and improves the system performance significantly by employing a novel use of the Bloom filter. The security and communication/computation overhead of our schemes are thoroughly analyzed and evaluated via detailed simulations and implementation.

Index Terms:
Priority, Privacy, Private Matching, PMSNs

1 Introduction

Social networking is one of the fastest-growing activities among mobile users domestically and worldwide. According to eMarketer [1], they estimate the number of US smartphone users will reach 192.4 millions by 2016, and 2.28 billions worldwide. By smartphones equipped with WiFi/Bluetooth modules, users can communicate with others easily and exchange information, content and media on shared communities such as Facebook or Foursquare. Among these services, an important classification is Proximity-based Mobile Social Networks (PMSNs), which deeply relies on mobile users’ physical proximity. This kind of applications can provide us more opportunities to discover and make new social interactions within some public places, such as airports, bars or other social spots. PMSNs thus gain increasing attention in social networking.

Normally, to enjoy these activities, people always need to reveal some information such as their attributes or personal information to potential friends nearby as the first step. A straightforward way is that, an initiator broadcasts her attributes to nearby users directly, and the responders decide whether to contact her based on common attributes. Obviously, the user’s privacy is revealed during such process. Since some of these attributes may be sensitive or private to the user, it is harmful to leak them to everyone nearby, especially the potential malicious users.

To address this problem, many research solutions [2, 3, 4, 5, 6, 7, 8, 9, Ben13G] have been proposed. Among these solutions, most of them employ third party servers, and thus, they become the bottlenecks from both security//privacy and system performance points of view. Although this kind of third party servers can be set to offline, the mobile users need to access them to register identities and obtain the matching results, which bring extra 3G/4G communication cost. Some researchers consider this situation as a Private Set Intersection (PSI) problem [10, 11, 12, 13] and try to achieve private matching while avoiding the third party servers by employing Secure Multi-party Computation (SMC) [8] and Paillier Cryptosystem [6, 9]. Unfortunately, the PSI-based solutions can avoid the trusted server effectively but always fail to improve the system performance due to the heavy cryptographic operations.

Moreover, existing schemes do not always produce accurate matching results. For example, [8] and [14] measure the similarity between users by simply counting the number of common attributes, and the matching decisions is made by checking whether the proximity measurement of two profiles is larger, equal, or smaller than a pre-defined threshold value in [15]. However, in reality, user interests may be associated with different priorities. We illustrate our concerns with an example in Fig. 1, the two-tuple represents user’s interests with the corresponding priorities. The scenario is that Alice’s father is suffering from cancer and she needs to go to hospital everyday after work. In this situation, the most interested person she wants to know is someone who is facing the same situation. Therefore, in the scenario shown in Fig. 1a, the potential friend of Alice is Bob instead of Charles, even though there are four common attributes between Alice and Charles and only one with Bob. However, this kind of schemes suffers the attack of choosing the attributes as many as possible on the adversary side, leading to the exposure of users’ personal information. Although, the authors in [8] proposed a solution by limiting the number of input attributes (e.g., 200) to avoid this attack, it is hard to define a proper number of input attributes for different users. To achieve a fine-grained private matching, Zhang et al. [9] consider the priority on each common attribute and define several privacy levels in their work, but they only pay attention on the difference of priorities on each common attribute between users, and ignore the priority value itself. That is to say, it works well for most cases except one shown in the scenario of Fig. 1b. Alice has one common attribute with either Bob or David, it makes sense that she prefers to make friends with Bob since she pays more attention on the attribute ”cancer”. However, the approach in [9] cannot differentiate David and Bob from the Alice’s point of view, since Alice and Bob have the common attribute of ”cancer” at priority 10, and Alice and David have the common attribute of ”music” at priority 3.

Refer to caption
(a) Scenario 1
Refer to caption
(b) Scenario 2
Figure 1: Motivation

From the aforementioned analysis, it is clear that existing work either rely on third party servers, or employ heavy cryptographic tools, or do not fully consider the users’ privacy in terms of the priority assigned on each attribute, or produce inaccurate matching results. In this paper, we propose a set of priority-aware private matching schemes to accelerate the widely used PMSNs. The main contributions of this paper are shown as follows.

\bullet We propose a set of schemes to achieve private matching for different privacy goals. P-match achieves the priority-aware private matching with considering both the number of common attributes and the corresponding priorities. In the enhanced version P-match+, we construct a priority-aware Ochiai similarity coefficient to consider the ratio of attributes matched over all the input set in our similarity function, which can effectively prevent several attacks, such as unlimitedly inputting users’ attribute sets. Finally, to make the private matching process more efficiently, we propose E-match, which improves the system performance by avoiding the heavy commutative encryption function.

\bullet We provide theoretic and experimental evidences that our proposed matching schemes are secure and can achieve our privacy goals. In addition, they are quite efficient compared to many existing work in terms of the computation cost, communication cost and energy consumption.

\bullet We implement the proposed schemes into smartphones. The experimental results indicate the effectiveness and efficiency of our work.

The rest of this paper is organized as follows. Section 2 reviews the related work. Section 3 presents the preliminaries. Section 4 describes the designs of our proposed schemes. Section 5 and 6 show the security analysis and the evaluation results. Finally, we conclude the paper in Section 7.

2 Related Work

There is a series of applications to provide private matching between users in PMSNs. Most of these solutions employ third party servers, which are always trusted and acting as matching centers to serve users. Specifically, each user sends her attributes to the server, the server replies users with the matching result to indicate the potential ”friends”. The servers need to know the users’ personal information to perform the matching process, it is thus much dangerous when the servers are compromised. Social serendipity [2] provided mobile users more opportunities to make social interactions with potential friends nearby. However, it deeply relied on a trusted server, which keeps all users’ profiles and computes the similarity between users when needed. The authors in [3] improved this problem by replacing the trusted server with a service provider, such as Facebook. However, users’ profiles are exchanged in plaintext, which lead to serious privacy leakage. SmokeScreen [16] introduced opaque identifier into the information exchanging phase to protect user’s real identity. It employed a broker, which also acts as a trusted server to provide matching results to users. As a result, the broker known who is interested in solving whose opaque identifier, which means the broker can infer the relationship between users with high possibility. In their follow-on work [4], they avoided this problem by not disclosing the personal information for matching, and using the location and time information as a replacement. Unfortunately, this kind of servers are still bottlenecks. The servers still need to know these information to perform the matching process. To avoid the third party servers, many cryptographic tools-based solutions have been proposed over recent years. Some researchers conclude this situation into the Private Set Intersection (PSI) or Authorized PSI (APSI) problem [10, 11, 12, 13], which can effectively avoid the third party servers. Based on PSI, Li et al. [8] proposed a set of privacy-preserving profile matching schemes, where an initiating user can find the best match with minimal information leakage to others based on the security properties of Secure Multi-party Computation (SMC). However, the expensive computation cost brought by the heavy cryptographic operations decrease the system performance significantly. In addition, their schemes may fail in reality due to the ignorance on the priorities assigned on each attributes. Zhang et al. presented a set of fine-grained private matching schemes to achieve the requirements of mobile users in reality. Under the protection of Paillier Cryptosystem, their schemes achieved fine-grained private matching with considering both the number of common attributes and the assigned priorities. However, they only paid attention on the differences of the priorities.

Agrawal et al. [17] proposed a privacy-preserving protocol by using the commutative encryption function, which is more lightweight, to realize secret information sharing between users. The keyed hash functions are also employed to protect the sensitive attributes for mobile users. Then, Vaidya et al. [18] extended the secret information sharing phase into N-party setting, and Veneta et al. [19] implemented this idea to detect friend-of-friend in mobile social networks. However, since the inherent weaknesses of the scheme in [17], such as unlimitedly inputting behavior and lying behavior, simply employing the commutative encryption function-based solution cannot provide thorough security and privacy properties. In this paper, we provide more properties by combining commutative encryption function with other techniques, such as similarity functions.

3 Preliminaries

In this section, we first state the problem and give the adversary models. Then, we describe the design goals and the cryptography tools in this paper.

3.1 Problem Statement

In PMSNs, each user holds a profile with two dimensional vectors, U={u1,v1,u2,v2,,um,vm}U=\{\langle u_{1},v_{1}\rangle,\langle u_{2},v_{2}\rangle,\cdots,\langle u_{m},v_{m}\rangle\}, where uiu_{i} represents the user’s attribute and viv_{i} means the priority value assigned to this particular attribute by the user, such as vi=1,,9v_{i}=1,\ldots,9. Normally, bigger value indicates higher priority. Given a user and her profile UU, the problem is how to find the best matched friend for the user with privacy-preserving based on the criteria that the best matched friend should have more common attributes with the user, especially the same priorities on the attributes.

3.2 Adversary Models

Since cryptography tools such as Public Key Infrastructure (PKI) can be easily implemented to protect current communication systems, the attacks from outside adversaries, such as eavesdropping the wireless communication channels or modifying, replying and injecting the captured messages can be easily prevented. Therefore, in this paper, we assume the initiator is honest-but-curious [10] or even an attacker directly. That means the initiator will honestly follow the protocols, but tries to learn more information than allowed in the honest-but-curious model; or the initiator can directly be an attacker, who may illegally input his attributes and the related priorities, to learn more information of nearby users. We also assume that the responder is legal or honest-but-curious with two reasons: the identity of a responder can be easily authorized by another authority entity, i.e., office in the hospital in the example mentioned in Sec. 1; another reason is that we can reverse our protocols to achieve a dual matching. We further assume that neither the initiator nor the responder can modify and obtain the result of parameters in the running protocols.

3.3 Design Goals

Our main goal is to thwart the aforementioned threats from either initiator or responder side. According to the amount of information disclosed during the protocols execution, we define two privacy levels from the initiator Alice’s point of view, which can also be equivalently defined from Bob’s viewpoint.

Definition 1 (Privacy Level I).

When the protocol ends, Bob learns the set of common attributes with Alice, as well as priorities on these common attributes, and Alice learns the similarity value.

In this case, the initiator believes that all the nearby users are legal or honest-but-curious, and they can be authorized by another entity. For instance, Alice may be a new patient in the hospital ward. Her aim is to find a best match.

Definition 2 (Privacy Level II).

When the protocol ends, Bob learns the number of common attributes and the similarity value only when it exceeds the pre-defined threshold, Alice learns the number of common attributes as well as the similarity value.

This case indicates two threats from both sides. The illegally input on the initiator side and curiously adjust the threshold on the responder side.

3.4 Cryptography Tools

Existing private matching solutions always rely on Private Set Intersection (PSI) [10, 12, 13] or Private Cardinality of Set Intersection (PCSI) [20], which is deeply based on heavy cryptographic operations such as Secure Multi-party Computation (SMC) [21]. To avoid the heavy cryptographic operations, our protocols utilize the commutative encryption function [17], which is more computational friendly and satisfies the condition: Ek1(Ek2(x))=Ek2(Ek1(x))E_{k_{1}}(E_{k_{2}}(x))=E_{k_{2}}(E_{k_{1}}(x)). Thus, a user who has the key k1k_{1} or k2k_{2} learns x1=x2x_{1}=x_{2} iff. Ek1(Ek2(x1))=Ek2(Ek1(x2))E_{k_{1}}(E_{k_{2}}(x_{1}))=E_{k_{2}}(E_{k_{1}}(x_{2})), but cannot learn any other xix_{i}s of other user if xix_{i} is not a common attribute. Since the priority for every attribute is considered here, it is required that the encryption function needs to be easily deciphered to compute the similarity of two users. So we adopt the power function fk(x)=xkmodpf_{k}(x)=x^{k}\mod p as our encryption function for a safe prime pp, i.e. pp and (p1)/2(p-1)/2 are both prime numbers. For all integers k1k_{1}, k2k_{2} and xpx\in\mathbb{Z}^{*}_{p}, \exists an integer nn, s.t.

fk1(fk2(x))\displaystyle f_{k_{1}}(f_{k_{2}}(x)) =\displaystyle= fk1(xk2modp)\displaystyle f_{k_{1}}(x^{k_{2}}\mod p) (1)
=\displaystyle= fk1(xk2np)\displaystyle f_{k_{1}}(x^{k_{2}}-np)
=\displaystyle= (xk2np)k1modp\displaystyle(x^{k_{2}}-np)^{k_{1}}\mod p
=\displaystyle= xk2k1modp,\displaystyle x^{k_{2}k_{1}}\mod p,

the last equality follows from the binomial theorem. Similarly it holds fk2(fk1(x))=xk1k2modpf_{k_{2}}(f_{k_{1}}(x))=x^{k_{1}k_{2}}\mod p. Therefore fk1(fk2(x))=fk2(fk1(x))f_{k_{1}}(f_{k_{2}}(x))=f_{k_{2}}(f_{k_{1}}(x)). To obtain the decryption function, we need a corresponding number kk^{\prime} to every kk. Choose kk^{\prime} such that kk=1modϕ(p)k^{\prime}k=1\mod\phi(p), where ϕ(p)\phi(p) is the Euler phi-function of pp and ϕ(p)=p1\phi(p)=p-1 since pp is a prime. We use the Extended Euclidean Algorithm to yield kk^{\prime} and let gk(y):=ykmodpg_{k}(y):=y^{k}\mod p for ypy\in\mathbb{Z}^{*}_{p}. Then

gk(fk(x))\displaystyle g_{k^{\prime}}(f_{k}(x)) =\displaystyle= gk(xkmodp)\displaystyle g_{k^{\prime}}(x^{k}\mod p) (2)
=\displaystyle= xkkmodp\displaystyle x^{k^{\prime}k}\mod p
=\displaystyle= xnϕ(p)+1modp\displaystyle x^{n\phi(p)+1}\mod p
=\displaystyle= x.\displaystyle x.

The last equality holds because of the Euler Theorem. To guarantee that xx and yy are both in p\mathbb{Z}^{*}_{p}, we need a cryptographic hash function hp()h_{p}(\cdot) which has the quadratic residues modulo pp as its range.

4 The Proposed Schemes

In this section, we present a suit of priority-aware private matching schemes. The basic version, P-match, satisfies Privacy Level I. As an improvement to achieve Privacy Level II, we propose the enhanced version P-match+. Finally, the efficient version, E-match, improves the performance significantly by avoiding the heavy cryptographic tools such as commutative encryption function.

In our scenario, there may be several users in a particular area at a particular time period. Specifically, each user holds a set of messages {xi,ai,KA,kA,xiX}\{\langle x_{i},a_{i}\rangle,K_{A},k_{A},x_{i}\in X\}, where XX is a set of attributes, aia_{i} is the corresponding priority of xix_{i}, KAK_{A} and kAk_{A} are two secret keys. Moreover, all the procedures are modp\mod p in our schemes. We use the notations shown in Table I.

TABLE I: Notations
|A||A| Number of elements in the set AA
RR Public attribute pool for all users, n=|R|n=|R|, R={ri}i=1nR=\{r_{i}\}_{i=1}^{n}
XX Attribute set of Alice, n1=|X|,n_{1}=|X|, X={xi}i=1n1X=\{x_{i}\}_{i=1}^{n_{1}}, xiRx_{i}\in R
YY Attribute set of Bob, n2=|Y|,n_{2}=|Y|, Y={yi}i=1n2Y=\{y_{i}\}_{i=1}^{n_{2}}, yiRy_{i}\in R
SS S=XYS=X\cap Y, q=|S|q=|S|
ai(bi)a_{i}(b_{i}) Priority of xi(yi)x_{i}(y_{i}), ai,bi={1,2,,10}a_{i},\,b_{i}=\{1,2,\ldots,10\}
VAV_{A} VA={ai}i=1qV_{A}=\{a_{i}\}_{i=1}^{q}, where each aia_{i} is the priority of xiSx_{i}\in S
VBV_{B} VB={bi}i=1qV_{B}=\{b_{i}\}_{i=1}^{q}, where each bib_{i} is the priority of yiSy_{i}\in S
ξ\xi^{*} the expected value of the random variable ξ\xi

4.1 Basic Version

4.1.1 Introducing the basic similarity function

Let Alice’s attribute set and corresponding priority vector be XX and AA respectively, and Bob’s attribute set and corresponding priority vector be YY and BB respectively. The set of common attributes between Alice and Bob is denoted as S=XYS=X\cap Y, where q=|S|q=|S| and S={s1,s2,,sq}S=\{s_{1},s_{2},\ldots,s_{q}\}. Then we arrange the corresponding priorities of Alice and Bob on the common attributes into vector VA=(a1,a2,,aq)V_{A}=(a_{1},a_{2},\ldots,a_{q}) and VB=(b1,b2,,bq)V_{B}=(b_{1},b_{2},\ldots,b_{q}), respectively. The most widely applied similarity function is cosine similarity:

cos(θ)=VAVBVAVB,\displaystyle\cos(\theta)=\frac{V_{A}\cdot V_{B}}{\|V_{A}\|\cdot\|V_{B}\|}, (3)

where θ\theta is the angle between VAV_{A} and VBV_{B}. It is often used to measure the angular similarity between two vectors. However, cosine similarity is orthogonal to the priorities on a common attribute. That is cosine similarity can be high when the priorities on the common attribute are quite different or exactly the same. This implies its defect. We do not accept the high similarity if the priorities differs far from each other. Thus, it is not the best choice in our scenario.

Different from cosine similarity, Jaccard similarity coefficient better fits our scenario, which considers the quotient of the size of the intersection over the size of the union of XX and YY. To simplify the computation we employ its variant form, Tanimoto similarity coefficient [22]:

T(A,B)=VAVBVA2+VB2VAVB,\displaystyle T(A,\,B)=\frac{V_{A}\cdot V_{B}}{\|V_{A}\|^{2}+\|V_{B}\|^{2}-V_{A}\cdot V_{B}}, (4)

where VAV_{A} and VBV_{B} are the same as in (3). The inner product appears in the numerator and the denominator of (4) displays the difference between VAV_{A} and VBV_{B}, and the norm term adjusts the size of the unit. After these refinement, Tanimoto similarity coefficient can embody the difference between two priority set on common attributes.

4.1.2 P-match

Based on the commutative encryption function [17] and Tanimoto similarity coefficient, we design our basic privacy-aware private matching scheme, P-match, which considers both the number of common attributes and the corresponding priorities on them. As an initialization, users encrypt their attributes and priorities under their secret keys. Fig. 2 shows the details and the procedure is as follows:

Refer to caption
Figure 2: Basic Private Matching Scheme
  1. (i)

    When two users are within the communication range of each other, the initiator Alice begins the matching process by broadcasting a message hp(xi)KA,aikA\langle h_{p}(x_{i})^{K_{A}},a_{i}^{k_{A}}\rangle;

  2. (ii)

    as one of the responders, Bob replies Alice with hp(yi)KBh_{p}(y_{i})^{K_{B}};

  3. (iii)

    based the received message, Alice computes (hp(yi)KB)KA(h_{p}(y_{i})^{K_{B}})^{K_{A}} and sends the two-tuple hp(yi)KB,(hp(yi)KB)KA\langle h_{p}(y_{i})^{K_{B}},(h_{p}(y_{i})^{K_{B}})^{K_{A}}\rangle together to Bob;

  4. (iv)

    Bob finds matches and creates a list L1=(hp(yi)KB)KA,yiL_{1}=\langle(h_{p}(y_{i})^{K_{B}})^{K_{A}},y_{i}\rangle, and computes a set F1=(hp(xi)KA)KBF_{1}=\langle(h_{p}(x_{i})^{K_{A}})^{K_{B}}\rangle. Then, he compares the first element in L1L_{1} with F1F_{1} to compute the common attributes with Alice, the common ones form as set SS, i.e., S=XYS=X\cap Y. Bob creates another list L2=(hp(si)KA)KB,si,bi,siSL_{2}=\langle(h_{p}(s_{i})^{K_{A}})^{K_{B}},s_{i},b_{i}\rangle,\forall s_{i}\in S unless S=S=\emptyset, and sends back the hp(xi)KA,(aikA)kB\langle h_{p}(x_{i})^{K_{A}},(a_{i}^{k_{A}})^{k_{B}}\rangle;

  5. (v)

    Alice computes her kAk_{A}^{\prime} by Extended Euclidean algorithm and decrypts the second part of the received message, then sends out the results hp(xi)KA,aikB\langle h_{p}(x_{i})^{K_{A}},a_{i}^{k_{B}}\rangle;

  6. (vi)

    upon the received messages, Bob creates another list L3=(hp(xi)KA)KB,aikBL_{3}=\langle(h_{p}(x_{i})^{K_{A}})^{K_{B}},a_{i}^{k_{B}}\rangle, then compares the first element in L3L_{3} with (hp(si)KA)KB(h_{p}(s_{i})^{K_{A}})^{K_{B}} in L2L_{2}. He obtains the corresponding aia_{i} by comparing if the received aikB=bikBa_{i}^{k_{B}}=b_{i}^{k_{B}} when he can find (hp(xi)KA)KBL2(h_{p}(x_{i})^{K_{A}})^{K_{B}}\in L_{2}, otherwise, discards them. When he gets these information, another list is created as L4=(hp(si)KA)KB,ai,biL_{4}=\langle(h_{p}(s_{i})^{K_{A}})^{K_{B}},a_{i},b_{i}\rangle. Then for q=|S|q=|S|, Bob uses priorities on each common attribute of Alice and himself to build two vectors VA=(a1,a2,,aq)V_{A}=(a_{1},a_{2},\cdots,a_{q}) and VB=(b1,b2,,bq)V_{B}=(b_{1},b_{2},\cdots,b_{q}), respectively. To measure the similarity, he computes the Tanimoto Coefficient T(VA,VB)=VAVBVA2+VB2VAVBT(V_{A},V_{B})=\frac{V_{A}\cdot V_{B}}{||V_{A}||^{2}+||V_{B}||^{2}-V_{A}\cdot V_{B}}, and compares with the threshold tt which is predefined by Bob. If T(VA,VB)<tT(V_{A},V_{B})<t, the process terminates, otherwise, Bob replies Alice with T(VA,VB)T(V_{A},V_{B}).

As the initiator may receive several replies from others, a ranking of Tanimoto Coefficient is provided for the initiator to choose the best match.

By utilizing P-match, we achieve Privacy Level I. Bob can learn the common attributes with Alice, as well as the corresponding priorities, while Alice learns nothing except the similarity value. However, there may be a problem when the scenario changes, i.e., Alice wants to know some knowledge since Bob may cheat on her. Thus, we propose an enhanced version P-match+.

4.2 Enhanced Version

4.2.1 Constructing the enhanced similarity function

When we only consider the common attributes and their priorities, Tanimoto similarity coefficient is energetic, but not effective enough if all the attributes and priorities are taken into account, because it is impossible to invent an efficient way to put Alice’s and Bob’s all attributes in the same order if XYX\neq Y. Hence we need another function, Ochiai similarity coefficient [23]

O(A,B)=|AB||A||B|.O(A,\,B)=\frac{\,|A\cap B|}{\sqrt{|A|\cdot|B|}}. (5)

This coefficient was firstly applied in biology and is useful in comparing the faunistic feature between two different localities. It is also a ratio, where the numerator is the size of the intersection of two sets AA, BB, and the denominator is the geometric average between the size of AA and BB. However, since all the priorities, on common and uncommon attributes, are considered here, and we cannot do the intersection between two ordered priority sets, we need some more effective similarity function. For that purpose, we notice that our priority is a kind of weight function. For a finite set Z={z1,z2,,zλ}Z=\{z_{1},z_{2},\ldots,z_{\lambda}\} where every ziz_{i} has the weight w(zi)w(z_{i}), i=1,2,,λi=1,2,\ldots,\lambda, let the weight function be w:Z𝕀w:Z\rightarrow\mathbb{I}, then the un-weighted sum on ZZ is defined by i=1λzi\sum^{\lambda}_{i=1}z_{i} while the weighted sum on ZZ is defined by i=1λziw(zi)\sum^{\lambda}_{i=1}{z_{i}w(z_{i})}. If we let Z=XZ=X, 𝕀={1,,9}\mathbb{I}=\{1,\ldots,9\} and f(xi)=aif(x_{i})=a_{i}, then the weighted model becomes our scenario. This implies that we can regard the priority as a counting rule of an attribute, which is, we count an attribute xix_{i} aia_{i} times.

Therefore, we construct a new priority-aware coefficient P(A,B)P(A,\,B) based on Ochiai similarity coefficient and the weighted sample. For two attribute sets X={x1,x2,,xn1}X=\{x_{1},x_{2},\ldots,x_{n_{1}}\} and Y={y1,y2,,yn2}Y=\{y_{1},y_{2},\ldots,y_{n_{2}}\}, the corresponding priority vectors are A=(a1,a2,,an1)A=(a_{1},a_{2},\cdots,a_{n_{1}}) and B=(b1,b2,,bn2)B=(b_{1},b_{2},\cdots,b_{n_{2}}), let S=XY={s1,,sq}S=X\cap Y=\{s_{1},\ldots,s_{q}\}. First we generate two counting sets:

X\displaystyle X^{\prime} =\displaystyle= {x1,x1+1,,x1+a1,\displaystyle\{x_{1},x_{1}+1,\ldots,x_{1}+a_{1},
x2,x2+1,,x2+a2,\displaystyle x_{2},x_{2}+1,\ldots,x_{2}+a_{2},\ \ldots
xn1,xn1+1,,xn1+an1},\displaystyle x_{n_{1}},x_{n_{1}}+1,\ldots,x_{n_{1}}+a_{n_{1}}\},
Y\displaystyle Y^{\prime} =\displaystyle= {y1,y1+1,,y1+b1,\displaystyle\{y_{1},y_{1}+1,\ldots,y_{1}+b_{1},
y2,y2+1,,y2+b2,\displaystyle y_{2},y_{2}+1,\ldots,y_{2}+b_{2},\ \ldots
yn2,yn2+1,,yn2+bn2}.\displaystyle y_{n_{2}},y_{n_{2}}+1,\ldots,y_{n_{2}}+b_{n_{2}}\}.

Then

|X|=i=1n1ai,|Y|=i=1n2bi.\displaystyle|X^{\prime}|=\sum^{n_{1}}_{i=1}a_{i},\ |Y^{\prime}|=\sum^{n_{2}}_{i=1}b_{i}. (8)

Next we find the intersection of XX^{\prime} and YY^{\prime},

XY\displaystyle X^{\prime}\cap Y^{\prime} =\displaystyle= {s1,s1+1,,s1+c1,\displaystyle\{s_{1},s_{1}+1,\ldots,s_{1}+c_{1},
s2,s2+1,,s2+c2,\displaystyle s_{2},s_{2}+1,\ldots,s_{2}+c_{2},\ \ldots
sq,sq+1,,sq+cq},\displaystyle s_{q},s_{q}+1,\ldots,s_{q}+c_{q}\},

where every ci=min{ai,bi}c_{i}=\min\{a_{i},b_{i}\} for i=1,,qi=1,\ldots,q. Then

|XY|=i=1qci=i=1qmin{ai,bi}.\displaystyle|X^{\prime}\cap Y^{\prime}|=\sum^{q}_{i=1}c_{i}=\sum^{q}_{i=1}\min\{a_{i},b_{i}\}. (10)

Provided (8) and (10), now we apply the Ochiai similarity coefficient to the two counting sets XX^{\prime} and YY^{\prime} given by (4.2.1) and (4.2.1), and define it to be our new similarity function:

P(A,B)\displaystyle P(A,\,B) =\displaystyle= O(X,Y)\displaystyle O(X^{\prime},\,Y^{\prime}) (11)
=\displaystyle= |XY||X||Y|\displaystyle\frac{\;\left|X^{\prime}\cap Y^{\prime}\right|}{\sqrt{|X^{\prime}|\cdot|Y^{\prime}|}}
=\displaystyle= i=1𝑞min{ai,bi}i=1n1aii=1n2bi.\displaystyle\frac{\overset{q}{\underset{i=1}{\sum}}\min\{a_{i},\,b_{i}\}}{\sqrt{\sum_{i=1}^{n_{1}}a_{i}}\cdot\sqrt{\sum_{i=1}^{n_{2}}b_{i}}}.

The range of the priority-aware similarity coefficient (11) is from 0 to 11, which are corresponding to the cases ”no common attribute at all” for 0 and ”same attributes, same priorities” for 1.

4.2.2 P-match+

To achieve Privacy Level II, we change some procedures from step vi in our basic version.

Input : when receives hp(xi)KA,aikB\langle h_{p}(x_{i})^{K_{A}},a_{i}^{k_{B}}\rangle from Alice
Output : n,P(A,B)n,P(A,B) to Alice
1 computes (hp(xi)KA)KB(h_{p}(x_{i})^{K_{A}})^{K_{B}};
2 sends hp(xi)KA,(hp(xi)KA)KB\langle h_{p}(x_{i})^{K_{A}},(h_{p}(x_{i})^{K_{A}})^{K_{B}}\rangle to Alice;
3 computes kBk_{B}^{\prime} to decrypt aia_{i};
4 creates L3=(hp(xi)KA)KB,aiL_{3}=\langle(h_{p}(x_{i})^{K_{A}})^{K_{B}},a_{i}\rangle;
5 compares (hp(xi)KA)KB(h_{p}(x_{i})^{K_{A}})^{K_{B}} in L3L_{3} with (hp(si)KA)KB(h_{p}(s_{i})^{K_{A}})^{K_{B}} in L2L_{2};
6 for those (hp(xi)KA)KB(h_{p}(x_{i})^{K_{A}})^{K_{B}} in L2L_{2}, computes ci=min{ai,bi}c_{i}=\min\{a_{i},b_{i}\};
7 P(A,B)=i=1qcii=1|X|aii=1|Y|biP(A,B)=\frac{\sum_{i=1}^{q}c_{i}}{\sqrt{\sum_{i=1}^{|X|}a_{i}}\cdot\sqrt{\sum_{i=1}^{|Y|}b_{i}}}, where q=|S|q=|S|;
8 if (P(A,B)<t)(P(A,B)<t) then
9   terminates;
10 
11 else
12   replies Alice with P(A,B)P(A,B)
13 end if
Algorithm 1 Part of Enhanced Version on Responder Side

Algorithm 1 shows the changes on responder side in our P-match+. When Bob receives hp(xi)KA,aikB\langle h_{p}(x_{i})^{K_{A}},a_{i}^{k_{B}}\rangle from Alice, he computes (hp(xi)KA)KB(h_{p}(x_{i})^{K_{A}})^{K_{B}} and sends them back to Alice together with hp(xi)KAh_{p}(x_{i})^{K_{A}}. He can also decrypt aia_{i} by computing kBk_{B}^{\prime}. Then he creates a list L3=(hp(xi)KA)KB,aiL_{3}=\langle(h_{p}(x_{i})^{K_{A}})^{K_{B}},a_{i}\rangle and compares the first element in L3L_{3} with (hp(si)KA)KB(h_{p}(s_{i})^{K_{A}})^{K_{B}} in L2L_{2}. Followed, he computes ci=min{ai,bi}c_{i}=\min\{a_{i},b_{i}\} for those (hp(xi)KA)KB(h_{p}(x_{i})^{K_{A}})^{K_{B}} in L2L_{2}, the priority-aware Ochiai Coefficient can be computed as P(A,B)=i=1qcii=1|X|aii=1|Y|biP(A,B)=\frac{\sum_{i=1}^{q}c_{i}}{\sqrt{\sum_{i=1}^{|X|}a_{i}}\cdot\sqrt{\sum_{i=1}^{|Y|}b_{i}}}, where q=|S|q=|S|. If P(A,B)<tP(A,B)<t, the algorithm terminates, otherwise, Bob replies P(A,B)P(A,B) to Alice.

Input : when receives hp(xi)KA,(hp(xi)KA)KB\langle h_{p}(x_{i})^{K_{A}},(h_{p}(x_{i})^{K_{A}})^{K_{B}}\rangle from Bob
Output : The number of common attributes with Bob
1 creates xi,(hp(xi)KA)KB\langle x_{i},(h_{p}(x_{i})^{K_{A}})^{K_{B}}\rangle;
2 compares (hp(xi)KA)KB(h_{p}(x_{i})^{K_{A}})^{K_{B}} with (hp(yi)KB)KA(h_{p}(y_{i})^{K_{B}})^{K_{A}};
3 if ((hp(xi)KA)KB=(hp(yi)KB)KA)((h_{p}(x_{i})^{K_{A}})^{K_{B}}=(h_{p}(y_{i})^{K_{B}})^{K_{A}}) then
4 si=xis_{i}=x_{i} and S=S{si}S=S\cup\{s_{i}\};
5 
6 else
7   terminates;
8 
9 end if
10outputs |S||S|;
Algorithm 2 Part of Enhanced Version on Initiator Side

While on the initiator side, Alice needs to do some computation work to obtain the number of the common attributes, the details are shown in Algorithm 2. Based on the received message hp(xi)KA,(hp(xi)KA)KB\langle h_{p}(x_{i})^{K_{A}},(h_{p}(x_{i})^{K_{A}})^{K_{B}}\rangle from Bob, the algorithm creates a list of xi,(hp(xi)KA)KB\langle x_{i},(h_{p}(x_{i})^{K_{A}})^{K_{B}}\rangle, and obtains the set of common attributes by comparing (hp(xi)KA)KB(h_{p}(x_{i})^{K_{A}})^{K_{B}} in this list with (hp(yi)KB)KA(h_{p}(y_{i})^{K_{B}})^{K_{A}}, which is received before. It terminates if there is no match found by computing (hp(xi)KA)KB=(hp(yi)KB)KA(h_{p}(x_{i})^{K_{A}})^{K_{B}}=(h_{p}(y_{i})^{K_{B}})^{K_{A}}, otherwise, outputs |S||S| to Alice.

Refer to caption
Figure 3: Efficient Private Matching Scheme

4.3 Efficient Version

For both P-match and P-match+, they can achieve their designed goals at the cost of system performance. Since the heavy cryptographic operations, such as the keyed hash functions and exponentiation operations, are hard to perform in current mobile devices, the performance drops dramatically when the number of attributes goes large. We thus design a more efficient version, E-match, which employs a Bloom filter [Bloom70] instead of the heavy exponentiation operations in an honest-but-curious environment. A Bloom filter is a space-efficient probabilistic data structure which is used to test whether an element is a member of a set.

4.3.1 Initialization

Suppose that the public database is R={ri}i=1nR=\{r_{i}\}^{n}_{i=1} consisting of attributes of all users. We let {hi()}i=1l\{h_{i}(\cdot)\}_{i=1}^{l} be a family of hash functions with hj(ri)[1,λ]h_{j}(r_{i})\in[1,\lambda] for an attribute riRr_{i}\in R, and \mathcal{H} be a large public pool of hash functions of such hih_{i} with each indexed by a unique identifier. An empty λ\lambda-bit Bloom filter is an array of λ\lambda bits, all setting to 0 bits. To add an element rir_{i} to the λ\lambda-bit Bloom filter, we set all the bits in hj(ri)h_{j}(r_{i}) positions 11, 1jl1\leq j\leq l. To check whether an rir_{i} is some user’s attribute, we verify whether all the bits in positions hj(ri)h_{j}(r_{i}) are 11. If not, rir_{i} is not this user’s attribute; otherwise, rir_{i} is his/her attribute with a probability determined by n,λn,\lambda and ll.

We label each riRr_{i}\in R by the 2-tuple {j,ri(j)}j=1κ\{j,r_{i}(j)\}^{\kappa}_{j=1}, where ri(j)=ri+j1r_{i}(j)=r_{i}+j-1 is the counting function of rir_{i}. Then the database RR is extended to the set {i,{j,ri(j)}j=1κ}i=1n\{i,\{j,r_{i}(j)\}^{\kappa}_{j=1}\}^{n}_{i=1}. Moreover, this extended set can be identified with the indexed set R={{i,j,ri(j)}j=1κ}i=1nR^{\prime}=\{\{i,j,r_{i}(j)\}^{\kappa}_{j=1}\}^{n}_{i=1}. If Alice has an attribute set X={xi}i=1n1X=\{x_{i}\}_{i=1}^{n_{1}} with the priority set {ai}i=1n1\{a_{i}\}^{n_{1}}_{i=1}, we can assign her a personal set SA={{i,j,xi(j)}j=1ai}i=1n1S_{A}=\{\{i,j,x_{i}(j)\}^{a_{i}}_{j=1}\}^{n_{1}}_{i=1}, where xi=rix_{i}=r_{i^{\prime}} for an attribute riRr_{i^{\prime}}\in R, and ai{1,2,,κ}a_{i}\in\{1,2,\cdots,\kappa\} is the priority of xix_{i}. The same technique can be applied to Bob, we get SB={{i,j,yi(j)}j=1bi}i=1n2S_{B}=\{\{i,j,y_{i}(j)\}^{b_{i}}_{j=1}\}^{n_{2}}_{i=1}, where yiy_{i} is in Bob’s attribute set YY, |Y|=n2|Y|=n_{2}, bib_{i} is the priority of yiy_{i}. Denote q1:=|SA|q_{1}:=|S_{A}|, q2:=|SB|q_{2}:=|S_{B}|, then we have q1=Σi=1n1ai=|X|q_{1}=\Sigma_{i=1}^{n_{1}}a_{i}=|X^{\prime}|, q2=Σi=1n2bi=|Y|q_{2}=\Sigma_{i=1}^{n_{2}}b_{i}=|Y^{\prime}| with XX^{\prime} given by Equation 4.2.1, and YY^{\prime} given by Equation 4.2.1.

We pair every ll a random number ll^{\prime}, 1<l<l1<l^{\prime}<l, and publish the 22-tuple (l,l)(l,l^{\prime}) to all users. Now we can use a λ\lambda-bit Bloom filter to check how many attributes are in Alice’ set XX with her priorities, and how many are in Bob’s set YY with his priorities, so that one of them can learn the similarity level.

4.3.2 E-match

We then present our proposed E-match shown in Fig. 3 in details.

  1. (i)

    Alice sends a request to Bob.

  2. (ii)

    Bob agrees.

  3. (iii)

    Alice randomly chooses {hi}i=1l\{h_{i}\}_{i=1}^{l}\subset\mathcal{H} with indexes denoted by A\mathcal{H}_{A}. Alice adds each {i,j,xi(j)}\{i,j,x_{i}(j)\} in SAS_{A} into a λ\lambda-bit Bloom filter array, denoted by BFABF_{A}, with ll^{\prime} different random hash functions in A\mathcal{H}_{A} and (ll)(l-l^{\prime}) random hash functions out of \mathcal{H} (Computation offline above). Alice sends A\mathcal{H}_{A} and BFABF_{A} to Bob.

  4. (iv)

    Bob counts the number of 0-bits in BFABF_{A}, denoted by d1d_{1}, then he adds every {i,j,yi(j)}\{i,j,y_{i}(j)\} in SBS_{B} to BFABF_{A} using A\mathcal{H}_{A} to get a λ\lambda-bit Bloom filter array, denoted by BFBBF_{B}. He counts the number of 0 bits BFBBF_{B}, say, d0d_{0}, and computes

    P(A,B)=l[q2λ(lnd1lnd0)]lλq2(lnλlnd1),\displaystyle P^{*}(A,B)=\frac{\sqrt{l}[q_{2}-\lambda(\ln d_{1}-\ln d_{0})]}{l^{\prime}\sqrt{\lambda q_{2}(\ln\lambda-\ln d_{1})}}, (12)

    Bob compares P(A,B)P^{*}(A,B) with his pre-defined threshold tt, so that he decides whether to match Alice or not.

5 Security Analysis

In this section, we prove that our proposed schemes can achieve the required privacy level in turns.

5.1 Analysis of the Basic Scheme

Theorem 1.

P-match ensures Privacy Level I if the commutative encryption function is secure.

Proof. For Alice. BobBob encrypts the hash value of his attributes using his secret keys KBK_{B} and kBk_{B}, then sends the messages (hp(yi)KBh_{p}(y_{i})^{K_{B}} and hp(xi)KA,(aikA)kB\langle h_{p}(x_{i})^{K_{A}},(a_{i}^{k_{A}})^{k_{B}}\rangle) to Alice side. As mentioned in Section 3.4, the commutative encryption function is secure, so it is computationally impossible to Alice to obtain any of the attributes or the corresponding priorities of BobBob. When the protocol ends, what Alice is able to get is a similarity value, which, in general, indicates a rough similarity. However, it is impossible to deduce any personal information, such as the number of common attributes or the corresponding priorities, from the similarity value.

For Bob. As a responder, Bob learns more information than Alice. It is reasonable because a responder has the weaker motivation to start an attack. In P-match, even if this kind of attacks happens, the commutative encryption function and cryptographic hash function can guarantee that what Bob gets are only the common attributes and the corresponding priorities of Alice, since all encryption function are injective, i.e. (hp(xi)KA)KB=(hp(yi)KB)KA(h_{p}(x_{i})^{K_{A}})^{K_{B}}=(h_{p}(y_{i})^{K_{B}})^{K_{A}} if and only if xi=yix_{i}=y_{i}, it holds the similar conclusion for the corresponding priorities. So, Bob knows nothing about Alice except the common attributes and priorities so that he can compute the similarity value on his own side. ∎

5.2 Analysis of the Enhanced Version

Theorem 2.

P-match+ ensures Privacy Level II if the commutative encryption function is secure.

Proof. Similar to the proof of Theorem 1, the commutative encryption function and keyed hash function provide end users with a secure channel, it means that only the one who has the secret key can decrypt the message if he has at least one common attribute with the other user. Here, the following two possible threats are our focal points in this match, 1) Alice may illegally input her attributes as well as the priorities; 2) Bob tries to learn extra information from the received messages from Alice by adjusting his threshold.

For case 1), the more Alice inputs attributes xix_{i} with the higher priorities aia_{i}, the more personal information of Bob she gets easily and possibly. We note that for the priority-aware Ochiai coefficient, the denominator of (11) has the factor i=1|X|ai\sqrt{\sum_{i=1}^{|X|}a_{i}}, which goes large quickly if either |X||X| or any |ai||a_{i}| becomes large. Meanwhile, the numerator i=1qmin{ai,bi}\sum_{i=1}^{q}\min\{a_{i},\,b_{i}\} and the other term i=1|Y|bi\sqrt{\sum_{i=1}^{|Y|}b_{i}} in the denominator are preserved for the same responder Bob. Hence, if Alice is a malicious user, the priority-aware Ochiai coefficient between her and any other responder Bob decreases dramatically so that Bob terminates the protocol according to his pre-defined threshold, and Alice’s attack fails.

For case 2), to get more extra information from Alice, Bob does not want to terminate the protocol, so he must lower his threshold tt to proceed Algorithm 1. But in P-match+ Alice does not send Bob anything except hp(xi)KAh_{p}(x_{i})^{K_{A}}, aikAa_{i}^{k_{A}} and aikBa_{i}^{k_{B}} in the beginning phase. Since the encryption function fk(x)=xkf_{k}(x)=x^{k} is pre-image resistant, so Alice’ information which is known to Bob is not change at all, whether Bob adjusts his threshold. That means Bob cannot get more information from Alice by lowering the threshold tt.

Considering the proof of Theorem 1, we obtain the security for the enhanced version P-match+. ∎

5.3 Analysis of the Priorities

Agrawal et al. [17] proved that, it computes impossible to map xikx_{i}^{k} to xix_{i} without knowing the secret key kk under the Decisional Diffie-Hellman hypothesis (DDH). Specifically, for fixed values of ii and jj, xi,fk(xi),yj,fk(yj)\langle x_{i},f_{k}(x_{i}),y_{j},f_{k}(y_{j})\rangle is indistinguishable from xi,fk(xi),yj,z\langle x_{i},f_{k}(x_{i}),y_{j},z\rangle, where fk(x)=xkmodpf_{k}(x)=x^{k}\mod p. That means, Bob cannot map xikx_{i}^{k} back to xix_{i} if he does not know the value of kk, which is Alice’s secret key and only known to Alice. Based on these conclusions, each user’s attributes are safe. Now we consider the security on the priorities. For the two similarity functions T(A,B)T(A,B) and P(A,B)P(A,B), note that Alice knows the common attributes and Bob’s corresponding priorities if T(A,B)=1T(A,B)=1 and |X|=|XY||X|=|X\cap Y| in the basic version, and if P(A,B)=1P(A,B)=1 in the enhanced version. Otherwise, when the two conditions do not hold, we have the following results.

Theorem 3.

In P-match, if Alice has only one attribute, she can confirm a common attribute and the corresponding priority of Bob iff. there is only one common attribute between them. Otherwise, Alice knows nothing if she has more than one attribute.

Proof. When Alice and Bob have only one common attribute with priorities aa and bb, respectively, based on P-match, she can easily compute bb from T(A,B)=aba2+b2abT(A,B)=\frac{a\cdot b}{a^{2}+b^{2}-a\cdot b}, where aa and T(A,B)T(A,B) are known to Alice. The other direction is trivial. Now we assume Alice has more than one attribute. Since in P-match we only take the common attributes and their priorities into account, and we have proved that P-match ensures Privacy Level I if the commutative encryption function is secure in Theorem 1, which means Alice knows nothing if the encryption function is secure. We note that the attributes and corresponding priorities are encrypted by two different secret keys, and the security of the attribute part was proved by Agrawal et al. in [17], then Alice knows nothing. ∎

Theorem 4.

In P-match+, Alice can confirm a common attribute as well as the corresponding priority of Bob iff. she has only one attribute which is also the only one common attribute. Otherwise, Alice would know at most one common attribute and its corresponding priority of Bob.

Proof. The proof for the first part is similar to Theorem 3. Since in this enhanced version, Privacy Level II can be ensured if the commutative encryption function is secure, which means, Alice knows the common attributes. If Alice has at least two attributes and they have the common attributes of number q2q\geq 2, from the expression of P(A,B)P(A,B), Alice only knows the ratio i=1qmin{ai,bi}i=1mbi\frac{\sum_{i=1}^{q}\min\{a_{i},b_{i}\}}{\sqrt{{\sum_{i=1}^{m}}b_{i}}}. However, this ratio means nothing if Alice does not know the right priority of every common attribute. ∎

As stated above, we see that there is other potential attack that a malicious user inputs only one attribute with the corresponding priority to learn other’s secret information. However, there are many practical ways to figure out this problem, such as setting a rule to limit the minimum number of input. Thus, it is omitted here.

5.4 Analysis of the E-match

The security of the E-match is based on the probability of false positives for the λ\lambda-bit Bloom filter. The accuracy of the E-match is given by the following two results.

Theorem 5.

Via our protocol, BobBob can know the expected values of counts q1=|SA|q_{1}=|S_{A}| and q:=|SASB|q^{\prime}:=|S_{A}\cap S_{B}|, by

q1=λ(lnλlnd1)l,\displaystyle q_{1}^{*}=\frac{\lambda(\ln\lambda-\ln d_{1})}{l}, (13)

and

q=lq2+λ(lnd0lnd1)l,\displaystyle q^{\prime*}=\frac{lq_{2}+\lambda(\ln d_{0}-\ln d_{1})}{l^{\prime}}, (14)

where d1d_{1} is the number of 0-bits in BFABF_{A}, d0d_{0} is the number of 0-bits in BFBBF_{B}.

Proof. According to [CHE07], the distribution for 0-bits in a λ\lambda-Bloom filter can be regarded as a binomial distribution. When the length λ\lambda is large enough, it approximates a normal distribution asymptotically. We now suppose that λ\lambda is large enough. For a fixed bit in BFABF_{A}, the probability that it is set to 0 by adding one element xi(j)x_{i}(j) with ll hash functions is (11λ)l(1-\frac{1}{\lambda})^{l}, the probability that it is set to 0 by adding all elements xi(j)x_{i}(j) with ll hash functions is

(11λ)lq1elq1λ.(1-\frac{1}{\lambda})^{lq_{1}}\thickapprox e^{-\frac{lq_{1}}{\lambda}}.

Thus we have elq1λ=d1λe^{-\frac{lq_{1}}{\lambda}}=\frac{d_{1}}{\lambda}. Solving this equation leads to q1q_{1}^{*} given by the equation(13).

For a fixed bit in BFABF_{A}, the probability that it is set to 0 by adding qq^{\prime} common element with ll^{\prime} hash functions is (11λ)lq(1-\frac{1}{\lambda})^{l^{\prime}q^{\prime}}, the probability that it is set to 0 not by adding qq^{\prime} common element with ll^{\prime} hash functions is (11λ)lq1lq(1-\frac{1}{\lambda})^{lq_{1}-l^{\prime}q^{\prime}}. Thus for a bit in BFBBF_{B}, the conditional probability that it is set to 0 is

(11λ)lq2+lq1lqelq1+lq2lqλ.(1-\frac{1}{\lambda})^{lq_{2}+lq_{1}-l^{\prime}q^{\prime}}\thickapprox e^{-\frac{lq_{1}+lq_{2}-l^{\prime}q^{\prime}}{\lambda}}.

Then we have

elq1+lq2lqλ=d0λ.e^{-\frac{lq_{1}+lq_{2}-l^{\prime}q^{\prime}}{\lambda}}=\frac{d_{0}}{\lambda}.

Solving this equation, and combining with the Equation (13) lead to qq^{\prime*} in the Equation (14). If we note that the Equation (11) of Ochiai coefficient implies that

P(A,B)=qq1q2,P(A,B)=\frac{q^{\prime}}{\sqrt{q_{1}q_{2}}},

The Equation (12) is obtained from the Equation (13) and Equation (14). ∎

Theorem 6.

In the E-match, let q2=|SA|q_{2}=|S_{A}|, then, with the same notations shown in Theorem 5, we have

q1𝒩[q1,λl2(elq1λ1)],\displaystyle q_{1}^{*}\sim\mathcal{N}[q_{1},\frac{\lambda}{l^{2}}(e^{\frac{lq_{1}}{\lambda}}-1)], (15)

and

q𝒩[q,λ(l)2(elq1λ+eζ2ζ)]\displaystyle q^{\prime*}\sim\mathcal{N}[q^{\prime},\frac{\lambda}{(l^{\prime})^{2}}(e^{\frac{lq_{1}}{\lambda}}+e^{\zeta}-2-\zeta)] (16)

with ζ=lq1+lq2lqλ\zeta=\frac{lq_{1}+lq_{2}-l^{\prime}q^{\prime}}{\lambda}.

Proof. From [CHE07] we know that when λ\lambda is large enough,

d1𝒩[μ1(q1),σ12(q1)]\displaystyle d_{1}\sim\mathcal{N}[\mu_{1}(q_{1}),\sigma_{1}^{2}(q_{1})] (17)

where

μ1(q1)=λ(11λ)lq1λelq1λ,\displaystyle\mu_{1}(q_{1})=\lambda(1-\frac{1}{\lambda})^{lq_{1}}\approx\lambda e^{-\frac{lq_{1}}{\lambda}},
σ12(q1)=λ(11λ)lq1[1(11λ)lq1]λelq1λ(1elq1λ).\displaystyle\sigma_{1}^{2}(q_{1})=\lambda(1-\frac{1}{\lambda})^{lq_{1}}[1-(1-\frac{1}{\lambda})^{lq_{1}}]\approx\lambda e^{-\frac{lq_{1}}{\lambda}}(1-e^{-\frac{lq_{1}}{\lambda}}).

To apply the special version of the central limit theorem stated in the Theorem 6 in [Kodialam06], we let λ\lambda\rightarrow\infty, lq1lq_{1}\rightarrow\infty while lq1λ\frac{lq_{1}}{\lambda} is fixed. Regard q1q_{1} as a variable, and note that μ1(q1)\mu_{1}(q_{1}) is monotonically decreasing, then μ1\mu_{1} has an inverse function, denoted by g1g_{1}. Based on the Theorem 6 in [Kodialam06] and the Equation (17), we know that

g1(d1)𝒩[g1(μ1(q1)),δ12(q1)],\displaystyle g_{1}(d_{1})\sim\mathcal{N}[g_{1}(\mu_{1}(q_{1})),\delta_{1}^{2}(q_{1})], (18)

where δ12(q1)=σ12(q1)(g1(μ1(q1)))2\delta_{1}^{2}(q_{1})=\sigma_{1}^{2}(q_{1})(g_{1}^{\prime}(\mu_{1}(q_{1})))^{2}. Since g1(d1)=q1g_{1}(d_{1})=q_{1}^{*}, and

g1(μ1(q1))=1μ(q1)=lelq1,\displaystyle g_{1}^{\prime}(\mu_{1}(q_{1}))=\frac{1}{\mu^{\prime}(q_{1})}=-le^{-\frac{l}{q_{1}}}, (19)

we thus obtain the result in Equation (15).

To prove (16), we let

q0=lq1+lq2λ(lnλlnd0)l.\displaystyle q^{\prime*}_{0}=\frac{lq_{1}+lq_{2}-\lambda(\ln\lambda-\ln d_{0})}{l^{\prime}}. (20)

Then the same technique as used in [Sun13] results in

q0𝒩[q,λ(eζ1ζ)(l)2].\displaystyle q^{\prime*}_{0}\sim\mathcal{N}[q^{\prime},\frac{\lambda(e^{\zeta}-1-\zeta)}{(l^{\prime})^{2}}]. (21)

Since the distributions of q1q_{1}^{*} and q0q^{\prime*}_{0} are both normal distributions, the distribution of the 2-tuple (q1,q0)(q_{1}^{*},q^{\prime*}_{0}) is a multivariable normal distribution. Note that q=q0+llq1llq1q^{\prime*}=q^{\prime*}_{0}+\frac{l}{l^{\prime}}q_{1}^{*}-\frac{l}{l^{\prime}}q_{1}, thus from the Equations (15), and (21) we have the Equation (16).∎

The estimation of the E-match is given by the following theorem.

Theorem 7.

In E-match, let q1q^{*}_{1} and qq^{\prime*} be defined by the Equations (13) and (14), respectively. Then for ϵ1,ϵ2\epsilon_{1},\ \epsilon_{2} such that

ϵ1q1(λ(elq1λ1))12l\epsilon_{1}q_{1}\geq\frac{\left(\lambda(e^{\frac{lq_{1}}{\lambda}}-1)\right)^{\frac{1}{2}}}{l}

and

ϵ2q(λ(elq1λ+eζ2ζ))12l,\qquad\epsilon_{2}q^{\prime}\geq\frac{\left(\lambda(e^{\frac{lq_{1}}{\lambda}}+e^{\zeta}-2-\zeta)\right)^{\frac{1}{2}}}{l^{\prime}},

we have

Pr(|q1q1|ϵ1q1)1p1,Pr(|qq|ϵ2q)1p2,\displaystyle Pr(|q^{*}_{1}-q_{1}|\leq\epsilon_{1}q_{1})\geq 1-p_{1},\quad Pr(|q^{\prime*}-q^{\prime}|\leq\epsilon_{2}q^{\prime})\geq 1-p_{2}, (22)

where

p1λ(elq1λ1)ϵ12q12l2,p2λ(elq1λ+eζ2ζ)ϵ22(ql)2.p_{1}\geq\frac{\lambda(e^{\frac{lq_{1}}{\lambda}}-1)}{\epsilon_{1}^{2}q^{2}_{1}l^{2}},\quad p_{2}\geq\frac{\lambda(e^{\frac{lq_{1}}{\lambda}}+e^{\zeta}-2-\zeta)}{\epsilon^{2}_{2}(q^{\prime}l^{\prime})^{2}}.

Proof. Combining the Equations (15), (16) with Chebyshev’s inequality leads to the result in Equation (22). ∎

We qualify the priority-aware attribute by the Shannon entropy, which is a common measurement of uncertainty. In the E-match, only Alice sends her information to other users, so that we only examine Alice’s set SAS_{A}, where SARS_{A}\subseteq R^{\prime}. On Bob’s side, he only knows κn\kappa\,n before the protocol, and he will know λ\lambda, ll and ll^{\prime} after the protocol. He will compute the expected value of |SA||S_{A}| but will not know any explicit attribute in SAS_{A}. Considering the size of RR^{\prime} and SAS_{A}, there are (κn)!q1!(κnq1)!\frac{(\kappa n)!}{q_{1}!\,(\kappa n-q_{1})!} choices of SARS_{A}\subseteq R^{\prime}. Among those choices, although some sets are counted repeatedly, BobBob does not know which sets are the same since he cannot know any of the priority aia_{i}. That means, the (κn)!q1!(κnq1)!\frac{(\kappa n)!}{q_{1}!\,(\kappa n-q_{1})!} candidate sets are all equally unknown in BobBob’s eyes. We replace q1q_{1} by q1q_{1}^{*}. Thus the uncertain attribute of Alice to Bob can be estimated in bits by

𝐄=log2(κn)!q1!(κnq1)!.\mathrm{\mathbf{E}}^{*}=\log_{2}\frac{(\kappa n)!}{q_{1}!\,(\kappa n-q_{1})!}.

Moreover we have the following result for the entropy.

Theorem 8.

In the E-match, suppose that BFABF_{A} is the Bloom filter constructed by AliceAlice using ll^{\prime} hash functions in A\mathcal{H}_{A} based on her priority-aware personal set SAS_{A}. Then after sending BFABF_{A} and A\mathcal{H}_{A} to Bob, her remaining privacy information of SAS_{A} against Bob is

𝐄=q1𝐄[i,j],\displaystyle\mathrm{\mathbf{E}}=q_{1}\mathrm{\mathbf{E}}[i,j], (23)

where

𝐄[i,j]\displaystyle\displaystyle\mathrm{\mathbf{E}}[i,j] =\displaystyle= x=1κn(κnx)Px(1P)(κnx)log2x,\displaystyle\sum^{\kappa n}_{x=1}{\kappa n\choose x}P^{x}(1-P)^{(\kappa n-x)}\log_{2}x,
P\displaystyle\displaystyle P =\displaystyle= i=1l(li)pi(1p)li,\displaystyle\sum^{l}_{i=1}{l\choose i}p^{i}(1-p)^{l-i},
p\displaystyle\displaystyle p =\displaystyle= 1elq1λ.\displaystyle 1-e^{-\frac{lq_{1}}{\lambda}}.

Proof. We refer to Theorem 2 in [Sun13] for the privacy-preserving spatiotemporal matching. The result in Equation (23) can be obtained if we consider each possible location cell (cID) to be each possible interest ri[j]r_{i}[j], and note that the size of the whole interest pool RR^{\prime} is κn\kappa n. ∎

6 Performance Evaluations

In this section, we first design an experiment to verify the correctness of our proposed protocols. Then we analyze the system complexity and show our experimental results.

6.1 Experiment of Correctness

To verify our work, we design a simple experiment by letting the initiator Alice do the matching with several candidates (5 users in our experiment) in vicinity. For simplicity, we assign 5 attributes for each user, they can set the priorities on each individual from 1 to 9 randomly. As a result, each user may have dozens of combinations on these attributes. We choose a snapshot of these priorities, which is shown in Table II. The notation ”-” means having no interest at all. Then we compute the matching results of several existing work.

TABLE II: Experiment setting
Cancer Music Football Tennis Cooking
Alice 8 4 1 3 2
Bob 7 - 2 - -
Charles 1 9 4 2 1
David 9 8 - 6 -
Emmy - 2 9 1 1
Frank 8 3 - - -

The best match of Alice in FindU [8] is Charles, cause they have 5 common attributes. Algorithm in [9] considers the difference of priorities on each common attributes, as a result, it is hard to choose the best candidate from Bob and David, since the differences on these attributes are: Alice to Bob, [1,4,1,3,2][1,4,1,3,2]; and Alice to David, [1,4,1,3,2][1,4,1,3,2]. While in P-match, the similarities with the nearby users are 0.9667, 0.3972, 0.8243, 0.2316, and 0.9870, respectively. Definitely, we prefer Frank as the best match. Because either Alice or Frank is more interested in Movie and Music in their common attributes, even though they have only these two common attributes. We also compute the similarity values of P-match+, the results are 0.1122, 0.0905, 0.1138, 0.0547 and 0.1314, respectively. The best match is still Frank. Similarly, the matching result in E-match can also be computed by the priority-aware Ochiai similarity coefficient with a probability determined by λ\lambda and ll, ll^{\prime}, then the matching results are same with P-match+. For instance, if we let λ=400\lambda=400, l=12l=12, l=11l^{\prime}=11, E-match chooses Frank as the best candidate for Alice, and computes the exact similarity coefficients with the probability of 0.850.85.

Remark. The range of Bob and David in the experiment above is different in P-match and P-match+. This is because we consider the intersection of every two users’ attributes as the sample in P-match and the union in P-match+. That means, Tanimoto similarity ignores the number of the common attributes and it only computes the similarity of the priorities on the same attributes. Meanwhile, Ochiai similarity computes the number of the common attributes and the priorities simultaneously in the union attributes set. In the experiment above, Bob and Alice have 2 common attributes while David and Alice have 3 common attributes, but the differences on common attributes are the same. So Bob is before David in P-match and after David in P-match+.

6.2 Complexity Analysis

To discuss the complexity of our schemes, we analyze the online/offline computation overhead and the communication cost from both the initiator and responder sides. The computation cost is measured by counting the keyed hash functions and exponentiation operations, since these operations are always resource-consuming in mobile devices. hh represents a keyed hash function, such as SHA-256 or SHA-512, while mul1mul_{1}, exp1exp_{1}and exp2exp_{2} means 1024-bit multiplication, 1024-bit and 2048-bit exponentiation operations, respectively. The communication cost is evaluated by computing the transmitted and received bits. We compare our work with algorithms in [20] and [24], since the former algorithm considers the malicious behavior in private matching as our work, and [24] tries to offload the computation overhead in existing secure two-party computation, which is well used in secure private matching problem. We assume that there are several mobile users in vicinity, and each user holds mm attributes, where every attribute has a priority value from [1,κ][1,\kappa]. Table III shows the theoretic analysis in details. Our schemes have lower computation cost, especially the online parts.

TABLE III: Comparison of Matching Algorithms
Protocols Party Offline Comp. Online Comp. Comm.trans (in bits)
[20] Initiator (2m+2m2)(2m+2m^{2})exp1exp_{1}, (2m)(2m)h (m+m2)(m+m^{2})exp1exp_{1}, (m)(m)h 3m10243m\cdot 1024
Responder (m+m2)(m+m^{2})exp1exp_{1}, (2m)(2m)h (2m)(2m)exp1exp_{1} 4m10244m\cdot 1024
[24] Initiator (2rm)(2rm)exp1exp_{1}, (rm)(rm)exp2exp_{2} (rm)(rm)exp1exp_{1}, (2rm)(2rm)exp2exp_{2} rm2048rm\cdot 2048
Responder —- (2rm+1)(2rm+1)exp1exp_{1}, (2rm+1)(2rm+1)exp2exp_{2} rm2048rm\cdot 2048
P-match Initiator (2m+1)(2m+1)exp1exp_{1}, (m)(m)h (2m)(2m)exp1exp_{1} 4m10244m\cdot 1024
Responder (2m+1)(2m+1)exp1exp_{1}, (m)(m)h (3m)(3m)exp1exp_{1} 2m10242m\cdot 1024
E-match Initiator (2m)(2m)h, 1poly+ —- 1024
Responder (rm)(rm)h, ((r1)m)((r-1)m)mul1mul_{1} (rm)(rm)poly- 32

6.3 Experiment Setup

To study the feasibility of our algorithms, we first evaluate the time taken for generating SHA-256, SHA-512, exp1exp_{1}, exp2,exp_{2},1024-bit and 2048-bit safe primes, respectively. We then implement our proposed algorithms on a Thinkpad laptop (the cryptography library is Crypto++) with 1.82 GHz CPU, 4 GB RAM, and Windows 7 32-bit Professional to simulate the performance, which is used for the offline computation. We also implement our schemes on two SAMSUNG Nexus S smartphones with 1 GHz Cortex-A8 processor, 512MB RAM, Android v2.3.6, and Bluetooth v2.1. Each result in our experiments is an average of 1000 runs.

6.4 Experiment Results

Table IV and V show the mean, maximum, minimum, medium and standard deviation of time consumption of SHA-256, exp1exp_{1}, exp2exp_{2} and generating safe primes with 1024-bit and 2048-bit, respectively. We can clearly see the better performance provided by the laptop since the powerful computing capability. For example, when we generate a 1024-bit safe prime, averagely, it needs to consume 156.37 ms on the laptop and 582.28 ms on the Nexus S, respectively. Fortunately, it needs not to be worried in our work since we can put these work in the offline computation phase.

We then show some evaluation results on the offline/online computation cost, communication cost and the execution time, respectively. Specifically, the offline communication cost means the operations which can be pre-computed without supply from other entities. The online computation cost represents the operations that to be computed in real time. The communication cost indicates the transmitted data in bits and the execution time stands for the total time consumption to perform a private matching procedure between users, including both the online computation cost and the data transmitting time between users.

TABLE IV: Time consumption of different operations on laptop
Operation Mean Max Min Median Std
SHA-256 (μ\mus) 2.13 2.5 2.1 2.1 0.048
SHA-512 (μ\mus) 10.6 14 10.5 10.6 0.21
exp1exp_{1} (μ\mus) 340.6 483 338 339 7.13
exp2exp_{2} (μ\mus) 756.8 986 752 755 12.69
Prime-1024 (ms) 156.37 178 134 156 12.60
Prime-2048 (ms) 1545.27 1663 1413 1546 74.26
TABLE V: Time consumption of different operations on Nexus S (ms)
Operation Mean Max Min Median Std
SHA-256 18.79 20 18 19 0.83
SHA-512 22.17 24 21 22 1.10
exp1exp_{1} 39.17 75 20 39 9.05
exp2exp_{2} 59.94 110 35 58 15.31
Prime-1024 582.28 650 525 582 37.52
Prime-2048 7090.46 7175 6518 6892 219.36

6.4.1 The impact of mm

Fig. 4 shows the evaluation results (κ=10\kappa=10) of the impact of varying mm. We first test the offline computation cost when the number of attribute mm is changing from 20 to 200. Fig. 4a indicates that both of P-match and E-match have better performance than existing work [20, 24]. The computation cost in [20] is high since there are too many exp1exp_{1}s employed in their schemes. E-match outperforms than P-match by the reason of the utilization of poly-s instead of the exp1exp_{1}s. The offline cost can be computed before the regular computations, so this part does not impact the execution time.

Fig. 4b compares the online computation cost of all the protocols in the log10\log 10 scale for varying mm. It makes sense that this part is sensitive to the execution time, so we aim to offload the online computation to offline as much as possible. We can clearly see the efficiency of our protocols over others. For users in P-match, they just need to perform 2m2m exp1exp_{1}s on the initiator side and 3m3m exp1exp_{1}s on the responder side. While in E-match, it decreases the computation cost significantly by validating the polynomial with several potential solutions. The online cost of the protocols in [20, 24] are much higher since they utilize several exp1exp_{1}s and exp2exp_{2}s in their processes.

Fig. 4c shows the communication cost between entities. Not surprisingly, the result of each protocol increases smoothly with the increasing mm. Our E-match shows a better performance on the communication cost through replacing the complicated exchanging phases between entities with a single polynomial. The initiator only needs to transmit some simple parameters of a specific polynomial to exchange both the attributes and the corresponding priorities. For example, E-match needs to consume 50.28Kb on bandwidth when m=100m=100. This is easy for Bluetooth v2.1, since the transmission rate can achieve approximately 900Kb/s in our experiments, which means that we only need to spend 55.87 milliseconds to finish all the transmissions.

Fig. 4d provides the total execution time of all the algorithms. The execution time in our experiments mainly includes the online computation time and the information transmission time. Comparing with the schemes in existing work [20, 24], our P-match, P-match+ and E-match degrade the execution time. When we look into our protocols more specifically, to get the common attributes securely, an initiator needs more time to complete the computation in P-match and P-match+. However, it is obvious that all of our proposed protocols can be finished within about 600ms in all simulated sceneries. For example, when the number of attributes m=200m=200, protocols in [24] and [20] need 20.52 and 20.66 seconds to complete the matching phase for each user, while our P-match and P-match+ require 0.33 and 0.42 seconds, respectively. This value in our E-match is 197.86 milliseconds, which is more practical for mobile users.

Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Refer to caption
(d)
Figure 4: Impact of the number of attributes mm on (a): offline computation cost; (b): online computation cost; (c): communication cost; (d): the execution time, κ=10\kappa=10

6.4.2 The impact of κ\kappa

In Fig. 5, we show some evaluation results (m=100m=100) of the impact of the varying κ\kappa. Fig. 5a indicates the changes on the offline computation cost with the varying κ\kappa. The evaluation results show that our P-match and E-match outperform other schemes in all the tested κ\kappas. [20] and P-match are steady in terms of various κ\kappa. Meanwhile, [24] and E-match are impacted by the changing κ\kappa. The reason is that, [24] needs extra exp2exp_{2}s to transform 1\ell_{1} distance into 2\ell_{2} distance by performing Johnson-Lindenstrauss embedding, andE-match need to re-build the polynomial f(x)f(x) by computing each possible priority value on the responder side.

Fig. 5b shows the offline computation cost of the schemes. The performance of [24] is the worst one when κ8\kappa\geq 8, due to a number of utilizations on the exp2exp_{2}s while less on the exp1exp_{1}s. E-match outperforms than all other schemes since it did not employ the heavy operations such as exp1exp_{1}s and exp2exp_{2}s.

Fig. 5c demonstrates the general trends of the communication cost of different schemes. We can see [20] and P-match are stable with increasing κ\kappa. For example, when m=100m=100, the communication cost are 0.68 Mb in [20] and 0.59 Mb in P-match. However, in [24], it is heavily impacted by the increasing κ\kappa, the communication cost exceeds 3.91 Mb when κ=10\kappa=10. This situation is changed a lot in our proposed E-match, it is not stable with the increasing κ\kappa, nevertheless, it does not change too much, for example, the communication cost is about 50.28 Kb when κ=10\kappa=10, which is very comfortable for Bluetooth v2.1.

In Fig. 5d, the evaluation results clearly show the advantages of our proposed schemes. For instance, when κ=10\kappa=10, compared with 5.40 seconds in [20] and 9.54 seconds in [24], our P-match and P-match+ need 161.54 and 217.81 milliseconds, respectively. And in our E-match, the execution time is only 29.01 milliseconds to achieve the same goal.

Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Refer to caption
(d)
Figure 5: Impact of the upper bound of priorities κ\kappa on (a): offline computation cost; (b): online computation cost; (c): communication cost; (d): the execution time, m=100m=100

6.4.3 Implementation Results on SAMSUNG Nexus S Smartphones

To validate the usability of our proposed protocols, we implement them on SAMSUNG Nexus S smartphones to test the performance. Fig. 6 shows some selected results on smartphones, which may be a litter different from the simulation results on laptop, the reason can be found in Table IV and V. Generally speaking, the main differences between them are the online computation cost and the execution time.

Fig. 6a shows the online computation cost on the Nexus S with the varying mm. The performance of [20] and [24] are quite similar with each other, however, they need several tens of seconds or more to do the online commination, which cannot be accepted by mobile users. Our proposed P-match and E-match perform better than others. Specifically, E-match is the best since it only needs to verify the polynomial by several possible results, which is quite simple for modern smartphones.

In Fig. 6b, we show the changes on the execution time of different schemes. Generally, the execution time is heavily related with the online computation time, and it increases with the increasing mm. Not surprisingly, the performance of our E-match is much better than others.

Next, Fig. 6c indicates the online computation cost on the Nexus S with the changing κ\kappa when mm is set to 100. Similar with the evaluation results of online computation cost on the laptop, the performance of [20] and [24] are unacceptable for mobile users since they need several minutes to finish the matching phase. Our P-match cannot be affected by the changing κ\kappa, however, it still needs 15.67 seconds to do the matching. While in our E-match, it is lightweight and only consumes 103.6 millisecond to complete the same phase when κ=10\kappa=10.

In Fig. 6d, the changes on the varying κ\kappa bring less effect on the execution time for the smartphones in different schemes. Our E-match outperforms than others significantly. For instance, when κ=10\kappa=10, the execution time of [20] and [24] are 584.46 and 757.60 seconds, respectively. The results of our P-match and P-match+ are 17.41 and 23.47 seconds, respectively. While in our E-match, it only needs 186.48 millisecond to process the matching with others, which is absolutely efficient.

Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Refer to caption
(d)
Figure 6: Impact of number of attributes mm on (a): online computation cost; (b): the execution time, κ=10\kappa=10, impact of the priorities κ\kappa on (c): online computation cost; (d): the execution time, m=100m=100

6.5 Energy Consumption

We also compute and compare the energy consumption of our scheme with others. The most energy consumed operations for modern smartphones are local computation, display and network transmission [25]. In our work, since we did not use heavy graphics, we pay much attention on two main factors, local computation cost and network transmission cost. We use the energy consumption model [26] Ecomputing=PcompTcomp+0.3167TrunE_{computing}=P_{comp}\cdot T_{comp}+0.3167T_{run} to estimate local computation cost, where PcompP_{comp} represents the CPU s power consumption, TcompT_{comp} means the time spent for computation and TrunT_{run} indicates the total protocol run time. For a smartphone with 1 GHz CPU, we choose Pcomp0.38wP_{comp}\approx 0.38w [25]. The energy consumption model of the network transmission cost is based on [27]: Enetwork=ntEt+nrErE_{network}=n_{t}\cdot E_{t}+n_{r}\cdot E_{r}, where ntn_{t} and nrn_{r} are the transmitted and received data in bytes, and Et4.8μJE_{t}\approx 4.8\mu J is transmitting energy per byte, Er6.7μJE_{r}\approx 6.7\mu J is the receiving energy per byte. For simplicity, we omit the initial connection establishment energy since it is common in all schemes. Then our energy consumption model can be denoted as:

E\displaystyle E =\displaystyle= Ecomputing+Enetwork\displaystyle E_{computing}+E_{network} (24)
=\displaystyle= PcompTcomp+0.3167Trun+ntEt+nrEr.\displaystyle P_{comp}T_{comp}+0.3167T_{run}+n_{t}E_{t}+n_{r}E_{r}.

Fig. 7a shows the comparison of energy consumption on each side of the protocols. It is clear that our protocols consume less power than other schemes. For example, when the number of attributes m=100m=100 and κ=10\kappa=10, the initiators in the protocols [20, 24] need to consume 280.18JJ and 178.00JJ, respectively. While in our P-match, it consumes lower energy of 9.81JJ to achieve the same goal. Finally, in our efficient version E-match, it degrades the energy consumption from both the computation and network transmission aspects, as a result, it consumes 55.60mJmJ and 33.00mJmJ on the initiator and the responder sides, respectively. We can conclude that our protocols are very practical in terms of energy consumption.

Refer to caption
(a) κ=10\kappa=10
Refer to caption
(b) m=100m=100
Figure 7: Energy consumption

Fig. 7b indicates the impact of κ\kappa on the energy consumption, the number of attributes is fixed to 100. Since the similar result with Fig. 7a, we only point out that the energy consumption of our E-match is increasing slowly with the varying κ\kappa.

7 Conclusion

We propose a Priority-aware Private Matching problem to satisfy the requirements of our real social life for the first time. Comparing to existing work, the matching processes of our P-matchs consider both the number of common attributes and the corresponding priorities. To avoid possible attacks from both initiator and responder, we then construct a priority-aware Ochiai similarity function in our enhanced version. Finally, an efficient version called E-match is also proposed to decrease the cost. The followed security analysis and performance evaluation show the correctness and efficiency of our algorithms. Our future work is to deploy our P-match and E-match into a large scale of real mobile environment to test the performance.

Acknowledgment

The preliminary work is accepted by IEEE MASS 2013 [28]. This work was supported by National Natural Science Foundation of China under Grant 61272457.

References

  • [1] E. Noah. (2011, Nov.) Mobile social networking shows promise, but rich media has higher engagement. [Online]. Available: http://www.emarketer.com/Articles
  • [2] N. Eagle and A. Pentland, “Social serendipity: mobilizing social software,” Pervasive Computing, IEEE, vol. 4, no. 2, pp. 28 – 34, jan.-march 2005.
  • [3] L. P. Cox, A. Dalton, and V. Marupadi, “Smokescreen: flexible privacy controls for presence-sharing,” in Proc. of ACM MobiSys 2007, pp. 233–245.
  • [4] J. Manweiler, R. Scudellari, and L. P. Cox, “Smile: encounter-based trust for mobile social services,” in Proc. of ACM CCS 2009, pp. 246–255.
  • [5] A.-K. Pietiläinen, E. Oliver, J. LeBrun, G. Varghese, and C. Diot, “Mobiclique: middleware for mobile social networking,” in ACM workshop on Online social networks 2009, pp. 49–54.
  • [6] W. Dong, V. Dave, L. Qiu, and Y. Zhang, “Secure friend discovery in mobile social networks,” in Proc. of IEEE INFOCOM 2011, pp. 1647 –1655.
  • [7] R. Lu, X. Lin, X. Liang, and X. Shen, “A secure handshake scheme with symptoms-matching for mhealthcare social network,” Mobile Networks and Applications, vol. 16, no. 6, pp. 683–694, Dec. 2011.
  • [8] M. Li, N. Cao, S. Yu, and W. Lou, “Findu: Privacy-preserving personal profile matching in mobile social networks,” in Proc. of IEEE INFOCOM 2011, pp. 2435 –2443.
  • [9] R. Zhang, Y. Zhang, J. Sun, and G. Yan, “Fine-grained private matching for proximity-based mobile social networking,” in Proc. of IEEE INFOCOM 2012, pp. 1969 –1977.
  • [10] M. J. Freedman, K. Nissim, and B. Pinkas, “Efficient Private Matching and Set Intersection,” in Proc. of LNCS EUROCRYPT 2004, pp. 1–19.
  • [11] L. Kissner and D. Song, “Privacy-preserving set operations,” in Proc. of CRYPTO 2005, pp. 241–257.
  • [12] Y. Sang and H. Shen, “Efficient and secure protocols for privacy-preserving set operations,” ACM Trans. Inf. Syst. Secur., vol. 13, no. 1, pp. 9:1–9:35, Nov. 2009.
  • [13] G. Ateniese, E. De Cristofaro, and G. Tsudik, “(if) size matters: size-hiding private set intersection,” in Proc. of ACM PKC 2011.
  • [14] Z. Yang, B. Zhang, J. Dai, A. Champion, D. Xuan, and D. Li, “E-smalltalker: A distributed mobile system for social networking in physical proximity,” in Proc. of IEEE ICDCS 2010, pp. 468 –477.
  • [15] X. Liang, X. Li, K. Zhang, R. Lu, X. Lin, and X. Shen, “Fully anonymous profile matching in mobile social networks,” Selected Areas in Communications, IEEE Journal on, to appear.
  • [16] J. Manweiler, R. Scudellari, Z. Cancio, and L. P. Cox, “We saw each other on the subway: secure, anonymous proximity-based missed connections,” in Proc. of ACM HotMobile, 2009.
  • [17] R. Agrawal, A. Evfimievski, and R. Srikant, “Information sharing across private databases,” in Proc. of ACM SIGMOD 2003, pp. 86–97.
  • [18] J. Vaidya and C. Clifton, “Secure set intersection cardinality with application to association rule mining,” J. Comput. Secur., vol. 13, no. 4, pp. 593–622, Jul. 2005.
  • [19] M. von Arb, M. Bader, M. Kuhn, and R. Wattenhofer, “Veneta: Serverless friend-of-friend detection in mobile social networking,” in Proc. of IEEE WIMOB,.
  • [20] E. D. Cristofaro, J. Kim, and G. Tsudik, “Linear-complexity private set intersection protocols secure in malicious model,” in Proc. of LNCS ASIACRYPT 2010, pp. 213–231.
  • [21] A. C. Yao, “Protocols for secure computations,” in Proc. of IEEE SFCS 1982.
  • [22] A. Ochiai, “Zoogeographical studies on the soleoid fishes found japan and its neighboring regions. ii,” Bull. Jap. Soc. sci. Fish., vol. 22, no. 9, pp. 526–530, 1957.
  • [23] L. Alan.H., “A proof of the triangle inequality for the tanimoto distance,” J. Math. Chem., vol. 26, pp. 263–265, 1999.
  • [24] S. Rane, W. Sun, and A. Vetro, “Privacy-preserving approximation of l1 distance for multimedia applications,” in Proc. of IEEE ICME 2010, pp. 492 –497.
  • [25] R. Mittal, A. Kansal, and R. Chandra, “Empowering developers to estimate app energy consumption,” in Proc. of ACM Mobicom 2012, pp. 317–328.
  • [26] A. Carroll and G. Heiser, “An analysis of power consumption in a smartphone,” in Proc. of ACM USENIXATC 2010, pp. 21–21.
  • [27] A. Rahmati and L. Zhong, “Context-for-wireless: context-sensitive energy-efficient wireless data transfer,” in Proc. of ACM MobiSys 2007, pp. 165–178.
  • [28] B. Niu, X. Zhu, T. Zhang, H. Chi, and H. Li, “P-match: Priority-aware friend discovery for proximity-based mobile social networks,” in Proc. of IEEE MASS 2013.