This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Differentially private sliced inverse regression in the federated paradigm

Shuaida He Department of Statistics and Data Science, Sourthern University of Science and Technology, Shenzhen, China. Jiarui Zhang Co-first author. Department of Statistics and Data Science, Sourthern University of Science and Technology, Shenzhen, China. Xin Chen Corresponding author. Department of Statistics and Data Science, Southern University of Science and Technology, Shenzhen 518055, China; Email: chenx8@sustech.edu.cn
Abstract

Sliced inverse regression (SIR), which includes linear discriminant analysis (LDA) as a special case, is a popular and powerful dimension reduction tool. In this article, we extend SIR to address the challenges of decentralized data, prioritizing privacy and communication efficiency. Our approach, named as federated sliced inverse regression (FSIR), facilitates collaborative estimation of the sufficient dimension reduction subspace among multiple clients, solely sharing local estimates to protect sensitive datasets from exposure. To guard against potential adversary attacks, FSIR further employs diverse perturbation strategies, including a novel vectorized Gaussian mechanism that guarantees differential privacy at a low cost of statistical accuracy. Additionally, FSIR naturally incorporates a collaborative variable screening step, enabling effective handling of high-dimensional client data. Theoretical properties of FSIR are established for both low-dimensional and high-dimensional settings, supported by extensive numerical experiments and real data analysis.

1 Introduction

In recent years, preserving data ownership and privacy in statistical data analysis has gained significant importance. In numerous applications, records carrying sensitive personal information are sourced from diverse origins, posing challenges in aggregating them into a unified dataset for joint analysis. This obstacle arises from commonly known transmission restrictions, such as expensive communication costs between a central server and local sites, but it is the privacy concerns that greatly intensify it. A typical example highlighting this challenge is the collection of patient-level observations from different clinical sites (Duan et al., 2022; Tong et al., 2022; Liu et al., 2022).

To this end, efficiently extracting valid information from decentralized data while preserving data privacy is critical. Federated learning (McMahan et al., 2017; Li et al., 2020), a widely adopted distributed computing framework, has emerged as a promising approach that emphasizes privacy protection by aggregating partial estimates from different clients to yield a centralized model. However, while this computing paradigm preserves client data ownership, it does not directly guarantee record-level privacy, a crucial concern in scenarios lacking a trusted central server or a secure communication environment. In such cases, adversaries can access statistics calculated by local clients during the uploading procedure (Homer et al., 2008; Calandrino et al., 2011), posing risks of various privacy attacks, including re-identification, reconstruction, and tracing attacks (Bun et al., 2014; Dwork et al., 2015; Kamath et al., 2019), among others.

The tracing attack, also known as membership inference attack, is particularly insidious among those adversaries, as it attempts to identify whether a target individual is a member of a given dataset. While this may appear as a subtle privacy breach, extensive research has demonstrated that even the presence of a single record in a specific dataset can be highly sensitive information (Dwork et al., 2017). This type of information is closely connected to the concept of differential privacy (Dwork et al., 2006), a rigorous definition of privacy widely adopted in both academia (Dwork and Lei, 2009; Wasserman and Zhou, 2010; Avella-Medina, 2021) and real-world applications (Erlingsson et al., 2014; Apple, 2017; Ding et al., 2017; Drechsler, 2023). To be specific, let 𝒳\mathcal{X} be the sample space and 𝒟𝒳n\mathcal{D}\in\mathcal{X}^{n} be a dataset of nn records, a randomized algorithm or mechanism :𝒳n𝒯\mathcal{M}:\mathcal{X}^{n}\rightarrow\mathcal{T} guarantees (ε,δ)(\varepsilon,\delta)-differential privacy if

P((𝒟)𝒮)eεP((𝒟)𝒮)+δP\left(\mathcal{M}(\mathcal{D})\in\mathcal{S}\right)\leq e^{\varepsilon}P\left(\mathcal{M}\left(\mathcal{D}^{\prime}\right)\in\mathcal{S}\right)+\delta (1)

for all measurable outputs 𝒮𝒯\mathcal{S}\subseteq\mathcal{T} and all datasets 𝒟,𝒟\mathcal{D},\mathcal{D}^{\prime} that differ in a single record. The parameter set (δ,ϵ)(\delta,\epsilon) controls the level of privacy. Intuitively, (𝒟)\mathcal{M}(\mathcal{D}) captures the global characteristics of 𝒟\mathcal{D}, and the goal of privacy is to simultaneously protect every record in 𝒟\mathcal{D} while releasing (𝒟)\mathcal{M}(\mathcal{D}). A common strategy to construct a differentially private estimator is perturbing its non-private counterpart by random noises (Dwork et al., 2014). Along this line, a variety of fundamental data analyses, such as mean estimation (Kamath et al., 2019), covariance estimation (Biswas et al., 2020), linear regression (Talwar et al., 2015), have been revisited in the context of (ϵ,δ\epsilon,\delta)-differential privacy protection. In particular, Cai et al. (2021) proposed a sharp tracing attack to establish minimax lower bounds for differentially private mean estimation and linear regression.

This paper proposes a federated computation framework for estimating the subspace pursued by a class of supervised dimension reduction methods, termed sufficient dimension reduction (SDR, (Cook, 1994)), with the guarantee of differential privacy. Briefly speaking, SDR seeks to replace the original predictors X=(X1,,Xp)TX=(X_{1},\dots,X_{p})^{{\mathrm{\scriptscriptstyle T}}} in a regression problem with a minimal set of their linear combinations without loss of information for predicting the response YY. Mathematically, this can be expressed as

YX|βTX,Y{\perp\!\!\!\perp}X|\beta^{{\mathrm{\scriptscriptstyle T}}}X, (2)

where βp×d(d<p)\beta\in\mathbb{R}^{p\times d}\ (d<p) and {\perp\!\!\!\perp} indicates independence. The subspace spanned by β\beta, denoted by 𝒮Y|Xspan(β)\mathcal{S}_{Y|X}\triangleq\operatorname{span}(\beta), is the parameter of interest and known as the sufficient dimension reduction subspace, or SDR subspace (Cook, 1996, 2009). Here d=dim(𝒮Y|X)d=\dim(\mathcal{S}_{Y|X}) is often called the structure dimension. To illustrate the potential risk of privacy breaches in estimating 𝒮Y|X\mathcal{S}_{Y|X}, a motivating example is provided.

Example 1 (A tracing attack on linear discriminant analysis).

We visit the Body Fat Prediction Dataset (Penrose et al., 1985), which includes a response variable BodyFat and 1313 covariates, such as Age, Weight, and Neck circumference, etc. By constructing a binary response Y=1(BodyFat>18)Y={\mathrm{1}}(\operatorname{BodyFat}>18) to indicate whether a man is overweight or not, we can perform linear discriminant analysis (LDA). Notice that pr(Y=1)0.5pr(Y=1)\approx 0.5 for this dataset. To simplify the problem, we transform the covariate XX to meet the assumption XN(0,Ip)X\sim N(0,I_{p}), allowing us to obtain the discriminant direction by

β=E[X{1(Y=1)1(Y=0)}].\beta=E[X\{{\mathrm{1}}(Y=1)-{\mathrm{1}}(Y=0)\}].

Given the sensitive information contained in each record, it is desirable to estimate β\beta in a differentially private manner. Unfortunately, directly releasing the sample estimate of β\beta would compromise individual privacy. To demonstrate this, we propose a tracing attack:

𝒜β(z0,β^(𝒟))=x0{1(y0=1)1(y0=0)}β,β^(𝒟),\mathcal{A}_{\beta}(z_{0},\widehat{\beta}(\mathcal{D}))=\langle x_{0}\{{\mathrm{1}}(y_{0}=1)-{\mathrm{1}}(y_{0}=0)\}-\beta,\widehat{\beta}(\mathcal{D})\rangle,

where 𝒟={zi:zi(xi,yi)}i=1n\mathcal{D}=\{z_{i}:z_{i}\triangleq(x_{i},y_{i})\}_{i=1}^{n} denotes the dataset for computing β^(𝒟)\widehat{\beta}(\mathcal{D}), β^(𝒟)i=1nxi{1(yi=1)1(yi=0)}/n\widehat{\beta}(\mathcal{D})\triangleq\sum_{i=1}^{n}x_{i}\{{\mathrm{1}}(y_{i}=1)-{\mathrm{1}}(y_{i}=0)\}/n, and z0(x0,y0)z_{0}\triangleq(x_{0},y_{0}) is a single record to be traced (i.e., to be identified z0𝒟z_{0}\in\mathcal{D} or not).

Clearly, 𝒜β(z0,β^(𝒟))\mathcal{A}_{\beta}(z_{0},\widehat{\beta}(\mathcal{D})) takes a large value if z0𝒟z_{0}\in\mathcal{D} and tends towards zero if z0𝒟z_{0}\notin\mathcal{D}. This intuitive observation inspires us to treat 𝒜β(,β^(𝒟))\mathcal{A}_{\beta}(\cdot,\widehat{\beta}(\mathcal{D})) as a binary classifier that outputs “in” if 𝒜β(z0,β^(𝒟))>τ\mathcal{A}_{\beta}(z_{0},\widehat{\beta}(\mathcal{D}))>\tau and “out” otherwise, given a proper threshold τ\tau. To show the effectiveness of this classifier, we conduct the following experiment. First, since β\beta is an unknown parameter, we replace it with x{1(y=1)1(y=0)}x^{\prime}\{{\mathrm{1}}(y^{\prime}=1)-{\mathrm{1}}(y^{\prime}=0)\}, where (x,y)(x^{\prime},y^{\prime}) is a random sample drawn from 𝒟\mathcal{D}. We then randomly divide the remaining samples in 𝒟\mathcal{D} into two disjoint parts of equal size, 𝒟1\mathcal{D}_{1} and 𝒟2\mathcal{D}_{2}. As only 𝒟1\mathcal{D}_{1} will be utilized to calculate β^\widehat{\beta}, we tag the samples in 𝒟1\mathcal{D}_{1} with the label “in” and those in 𝒟2\mathcal{D}_{2} with the label “out”. We can then predict the label of each datum z0𝒟1𝒟2z_{0}\in\mathcal{D}_{1}\cup\mathcal{D}_{2} by computing 𝒜β(z0,β^(𝒟1))\mathcal{A}_{\beta}(z_{0},\widehat{\beta}(\mathcal{D}_{1})). To evaluate the results, we plot the Receiver Operating Characteristic (ROC) curve, which allows us to avoid specifying a particular threshold τ\tau. The red solid line in Figure 1 shows the result obtained using the raw β^(𝒟1)\widehat{\beta}(\mathcal{D}_{1}), which wraps a relatively large area, indicating that 𝒜β\mathcal{A}_{\beta} is an effective classifier, thus is sharp in attacking β^(𝒟1)\widehat{\beta}(\mathcal{D}_{1}). In contrast, the blue dashed line, representing result obtained using a differentially private β~(𝒟1)\widetilde{\beta}(\mathcal{D}_{1}) (see Section 3 for details), suggests that 𝒜β\mathcal{A}_{\beta} fails to accurately identify the label of z0z_{0} based on β~(𝒟1)\widetilde{\beta}(\mathcal{D}_{1}), which means that β~(𝒟1)\widetilde{\beta}(\mathcal{D}_{1}) preserves the privacy of 𝒟1\mathcal{D}_{1}.

Refer to caption
Figure 1: Tracing attack against the discriminant direction of LDA. The ROC curves depict the tracing results based on a naive estimate (red solid line) and a differentially private counterpart (blue dashed line).

The preceding example further reveals the risk of privacy leakage in releasing an unprotected estimate of the basis that spans the SDR subspace 𝒮Y|X\mathcal{S}_{Y|X}. The risk arises because the discriminant direction obtained from LDA is identical to the basis provided by the celebrated sliced inverse regression (SIR, (Li, 1991)), except for scaling (Chen and Li, 2001). In fact, LDA is a special case of SIR when the response is categorical. As SIR has played a pivotal role in sufficient dimension reduction, we are motivated to extend it for handling decentralized datasets containing sensitive information. Our approach, which we call Federated SIR (FSIR), has the following major contributions.

First, as the name implies, FSIR leverages a federated paradigm to integrate information from diverse clients to estimate the sufficient dimension reduction subspace 𝒮Y|X\mathcal{S}_{Y|X}. In contrast to existing distributed SDR methods that are mainly guided by the divide-and-conquer strategy to tackle massive data without considering privacy leakage during communication (Xu et al., 2022; Chen et al., 2022; Cui et al., 2023), our approach prioritizes both client-level (data ownership) and record-level (differential) privacy. While similar privacy considerations have been explored in the context of principal component analysis (PCA) (Chaudhuri et al., 2013; Jiang et al., 2016; Grammenos et al., 2020; Duchi et al., 2022), an unsupervised counterpart of sliced inverse regression, we are not aware of any literature exploring its extension to sufficient dimension reduction. Our work shows that by combining appropriate perturbation strategies, FSIR can guarantee (ϵ,δ)(\epsilon,\delta)-differential privacy at a low cost of statistical accuracy.

Second, we introduce a novel vectorized Gaussian mechanism designed to preserve the information necessary for accurately identifying 𝒮Y|X\mathcal{S}_{Y|X} during the perturbation process. This mechanism allows flexible noise direction at the cost of a slightly higher variance. Using a specific algorithm, it generates multivariate noises with a similar eigenspace structure as the signal. Experimental results demonstrate the superiority of this new mechanism over the commonly used i.i.d. Gaussian mechanism.

Third, FSIR can handle high-dimensional data efficiently on each client. A large body of literature has contributed to estimating 𝒮Y|X\mathcal{S}_{Y|X} on a unified high-dimensional dataset, including early work on penalized regression-based formulations of SDR methods by Li (2007), the widely-adopted coordinate-independent sparse estimation method (CISE) by Chen et al. (2010), and the general sparse SDR framework SEAS by Zeng et al. (2022), among others. In particular, extensive studies on high-dimensional sliced inverse regression have made notable progress in both methodological development and theoretical understanding (Lin et al., 2019; Tan et al., 2018; Lin et al., 2021; Tan et al., 2020). Despite these advances, an effective strategy for producing a sparse SIR on a decentralized dataset is still lacking. Specifically, when dealing with enormous client devices, many of which have limited computation resources, it is desirable to achieve sparsity with a minimal cost on the client-side while leveraging the benefits of federated computing. To achieve these goals, we propose a simple yet effective collaborative feature screening method that seamlessly integrates into the FSIR framework to yield a sparse estimator of the basis. Theoretical guarantees for this collaborative screening strategy will be discussed.

Moreover, FSIR is designed to be communication efficient and computational effective. Compared with multi-round optimization-based approaches (Cui et al., 2023), FSIR achieves communication efficiency through a one-shot aggregation of local estimates from collaborative clients, thus circumvents introducing noises in each optimization step (Abadi et al., 2016). Moreover, the primary computation cost on each client is attributed to performing singular value decomposition on a small size matrix, leading to computational effectiveness. These characteristics make FSIR well-suited for applications on edge devices, such as smartphones, where efficient and privacy-preserving computations are crucial.

The following notation will be used in this paper. For a positive integer HH, we write [H][H] as shorthand for {1,,H}\{1,\dots,H\}. For a vector βp\beta\in\mathbb{R}^{p} and a subset [p]\mathcal{I}\subseteq[p], we use β\beta^{\mathcal{I}} to denote the restriction of vector β\beta to the index set \mathcal{I}. Similarly, for ,𝒥[p]\mathcal{I},\mathcal{J}\subseteq[p], A,𝒥A^{\mathcal{I},\mathcal{J}} denotes the ||×|𝒥||\mathcal{I}|\times|\mathcal{J}| sub-matrix formed by restricting the rows of AA to \mathcal{I} and columns to 𝒥\mathcal{J}. In addition, for sub-matrix B=A,𝒥B=A^{\mathcal{I},\mathcal{J}}, let e(B)e(B) be the embedded matrix into p×p\mathbb{R}^{p\times p} by putting 0 on entries outside ×𝒥\mathcal{I}\times\mathcal{J}. We write supp(β):={j[p]:βj0}\operatorname{supp}(\beta):=\left\{j\in[p]:\beta_{j}\neq 0\right\}. For two vectors β1\beta_{1} and β2\beta_{2} of same dimension, denote the angle between them by (β1,β2)\angle(\beta_{1},\beta_{2}). For xx\in\mathbb{R} and R>0R>0, let ΠR(x)\Pi_{R}(x) denote the projection of xx onto the closed interval [R,R][-R,R]. Define a multi-set as an ordered pair (,m)(\mathcal{B},m) where \mathcal{B} is the underlying set formed of distinct elements and m:+m:\mathcal{B}\rightarrow\mathbb{Z}^{+} is the function giving the multiplicity of an element in \mathcal{B}. For two sets 1\mathcal{B}_{1} and 1\mathcal{B}_{1}, denote their multi-set sum by 12\mathcal{B}_{1}\biguplus\mathcal{B}_{2}. For a parameter θΘ\theta\in\Theta, θ^\widehat{\theta} denotes a raw estimator and θ~\widetilde{\theta} denotes a differentially private estimator.

2 Problem Formulation

2.1 Preliminaries

We revisit the concept of differential privacy before moving on. Intuitively, the privacy level set (δ,ϵ)(\delta,\epsilon) in definition (1) measures how much information ()\mathcal{M}(\cdot) reveals about any individual record in the dataset 𝒟\mathcal{D}, where the smaller value of ϵ\epsilon or δ\delta imposes the more stringent privacy constraint. This definition leads to some nice properties of differential privacy. The two most useful ones are listed as follows, which provide a convenient way to construct complex differentially private algorithms.

Property 1 (Post-processing property, Dwork et al. (2006)).

If :𝒳nΘ0\mathcal{M}:\mathcal{X}^{n}\rightarrow\Theta_{0} is (ϵ,δ)(\epsilon,\delta)-differentially private and :Θ0Θ1\mathcal{M}^{\prime}:\Theta_{0}\rightarrow\Theta_{1} is any randomized algorithm, then \mathcal{M}^{\prime}\circ\mathcal{M} is (ϵ,δ)(\epsilon,\delta)-differentially private.

Property 2 (Composition property, Dwork et al. (2006)).

Let i\mathcal{M}_{i} be (ϵi,δi)(\epsilon_{i},\delta_{i})-differentially private for i=1,2i=1,2, then 12\mathcal{M}_{1}\circ\mathcal{M}_{2} is (ϵ1+ϵ2,δ1+δ2)(\epsilon_{1}+\epsilon_{2},\delta_{1}+\delta_{2})-differentially private.

A standard approach for developing differentially private algorithms involves introducing random noises to the output of their non-private counterparts, see (Dwork et al., 2014). One such technique, known as the Gaussian mechanism, employs independently and identically distributed (i.i.d.) Gaussian noise. The noise’s scale is governed by the l2l_{2} sensitivity of the algorithm, which quantifies the maximum change in the algorithm’s output resulting from substituting a single record in its input data set. The formal definition is presented below.

Definition 1 (l2l_{2}-sensitivity).

For an algorithm f:𝒟pf:\mathcal{D}\rightarrow\mathbb{R}^{p}, its l2l_{2}-sensitivity is defined as

Δ2(f)=sup𝒟,𝒟f(𝒟)f(𝒟)2,\Delta_{2}(f)=\sup_{\mathcal{D},\mathcal{D}^{\prime}}||f(\mathcal{D})-f(\mathcal{D}^{\prime})||_{2},

where 𝒟,𝒟𝒳n\mathcal{D},\mathcal{D}^{\prime}\in\mathcal{X}^{n} are datasets differing in a single record.

Definition 2 (Gaussian mechanism).

For any vector-valued algorithm f:𝒟pf:\mathcal{D}\rightarrow\mathbb{R}^{p} which satisfies Δ2(f)<\Delta_{2}(f)<\infty, the Gaussian mechanism is defined as

f~(𝒟)f(𝒟)+ξ,\widetilde{f}(\mathcal{D})\triangleq f(\mathcal{D})+\xi,

where ξ(ξ1,ξ2,,ξp)\xi\triangleq(\xi_{1},\xi_{2},\dots,\xi_{p}) and ξj,j[p]\xi_{j},\ j\in[p] is an i.i.d. sample drawn from 𝒩(0,σξ2)\mathcal{N}(0,\sigma_{\xi}^{2}).

Next we briefly review sliced inverse regression. Let YY\in\mathbb{R} be a response variable and X=(X1,,Xp)TpX=(X_{1},\dots,X_{p})^{\mathrm{\scriptscriptstyle T}}\in\mathbb{R}^{p} be the associated covariates, with covariance matrix Σ=cov(X)\Sigma={\mathrm{cov}}(X). Assume the SDR subspace 𝒮Y|X=span(β)\mathcal{S}_{Y|X}=\operatorname{span}(\beta), where βp×d\beta\in\mathbb{R}^{p\times d} is the basis matrix and d(d<p)d\ (d<p) is the structure dimension. SIR seeks for an unbiased estimate of 𝒮Y|X\mathcal{S}_{Y|X}. Without loss of generality, assume YY is categorical with HH fixed classes, denoted as Y[H]Y\in[H]; when YY is numerical, we simply follow Li (1991) to construct a discrete version Y~\widetilde{Y} of YY by partitioning its range into HH slices without overlapping. This discretization operation will not lose information for estimating SDR subspace, i.e., 𝒮Y~|X=𝒮Y|X\mathcal{S}_{\widetilde{Y}|X}=\mathcal{S}_{Y|X}, given HH sufficiently large (Bura and Cook, 2001), and we refer to a value h[H]h\in[H] as a slice or a class. In preparation, let phpr(Y=h)p_{h}\triangleq\mathrm{pr}(Y=h) and mh0E{XE(X)|Y=h}m^{0}_{h}\triangleq E\{X-E(X)|Y=h\}, the kernel matrix is then given by

Λ0h=1Hphmh0mh0T.\Lambda^{0}\triangleq\sum_{h=1}^{H}p_{h}m^{0}_{h}m^{0{\mathrm{\scriptscriptstyle T}}}_{h}.

Li (1991) has shown that, if XX satisfies the so-called linear conditional mean (LCM) condition, i.e., E(XβTX)E(X\mid\beta^{\mathrm{\scriptscriptstyle T}}X) is a linear function of βTX\beta^{\mathrm{\scriptscriptstyle T}}X, then Σ1mh0𝒮YX\Sigma^{-1}m^{0}_{h}\in\mathcal{S}_{Y\mid X} for h[H]h\in[H]. This result implies that Σ1col(Λ0)𝒮Y|X\Sigma^{-1}\operatorname{col}(\Lambda^{0})\subseteq\mathcal{S}_{Y|X}. In practice, we further assume the coverage condition, that is, Σ1col(Λ0)=𝒮Y|X\Sigma^{-1}\operatorname{col}(\Lambda^{0})=\mathcal{S}_{Y|X}, so the first dd eigenvectors of Σ1col(Λ0)\Sigma^{-1}\operatorname{col}(\Lambda^{0}) spans 𝒮Y|X\mathcal{S}_{Y|X} and the remaining pdp-d eigenvalues equal to zero.

At the sample level, we can compute the plug-in estimator Σ^\widehat{\Sigma} and col(V^H0)\operatorname{col}(\widehat{V}_{H}^{0}) to estimate 𝒮Y|X\mathcal{S}_{Y|X}, where V^H0\widehat{V}_{H}^{0} is the matrix formed by the top dd principal eigenvectors of Λ^0\widehat{\Lambda}^{0}. Lin et al. (2018) has shown that the space spanned by Σ^1V^H0\widehat{\Sigma}^{-1}\widehat{V}_{H}^{0} yields a consistent estimate of 𝒮Y|X\mathcal{S}_{Y|X} if and only if ρn:=pn0\rho_{n}:=\frac{p}{n}\rightarrow 0 as nn\rightarrow\infty, when the slice number HH is a fixed integer independent of nn and pp, and the structure dimension dd is bounded.

2.2 Computing SIR in a federated paradigm

Consider a dataset 𝒟{𝒟(1),,𝒟(K)}\mathcal{D}\triangleq\{\mathcal{D}^{(1)},\cdots,\mathcal{D}^{(K)}\} distributed across KK clients, where 𝒟(k)={(xi,yi):xip,yi[H]}i=1nk\mathcal{D}^{(k)}=\{(x_{i},y_{i}):x_{i}\in\mathbb{R}^{p},y_{i}\in[H]\}_{i=1}^{n_{k}} is the subset stored on the kkth client with nkn_{k} samples, k[K]k\in[K]. Let N=k=1KnkN=\sum_{k=1}^{K}n_{k} be the total size of 𝒟\mathcal{D}. Generally, we do not require nk>pn_{k}>p for any single client k[K]k\in[K], thus on some or even all clients, data may follow a high-dimensional setting (p>nkp>n_{k}, or pnkp\gg n_{k}). Meanwhile, given KK sufficiently large, it is also reasonable to assume a low-dimensional setting for the whole dataset 𝒟\mathcal{D}, that is, p<Np<N.

To handle this decentralized dataset, we need modify the classical SIR method slightly. Denote mhE[{XE(X)}1(Y=h)]m_{h}\triangleq E[\{X-E(X)\}{\mathrm{1}}(Y=h)], we immediately have Σ1mh𝒮YX\Sigma^{-1}m_{h}\in\mathcal{S}_{Y\mid X} for h[H]h\in[H], since mh=phmh0m_{h}=p_{h}m^{0}_{h}. For convenience of expression, define the slice mean matrix M(m1,,mH)p×HM\triangleq(m_{1},\dots,m_{H})\in\mathbb{R}^{p\times H} and the kernel matrix ΛMMT\Lambda\triangleq MM^{{\mathrm{\scriptscriptstyle T}}}, then

Σ1col(Λ)𝒮Y|X.\Sigma^{-1}\operatorname{col}(\Lambda)\subseteq\mathcal{S}_{Y|X}. (3)

Using Λ\Lambda instead of Λ0\Lambda^{0} is beneficial for our federated computation. The first advantage arises in scenarios with imbalanced classes, where a client only has a small portion of observations for certain classes. Estimating Λ0\Lambda^{0} in such cases can be highly unstable, resulting in a biased global estimator. This situation is frequently encountered in streaming data, as discussed by Cai et al. (2020) in developing their online SIR estimator. In our problem, utilizing Λ\Lambda further allows us to eliminate privacy leakage in uploading p^h\widehat{p}_{h}’s, thus avoiding unnecessary complexities in both algorithm design and theoretical analysis. Additionally, as detailed in Section 3, Λ\Lambda has a smaller l2l_{2}-sensitivity, reducing the noise scale required for perturbation strategies.

Data: 𝒟=k=1K𝒟(k)\mathcal{D}=\bigcup_{k=1}^{K}\mathcal{D}^{(k)}
Input: Clients number KK, privacy level sets (ϵx,δx)(\epsilon_{x},\delta_{x}) and (ϵm,δm)(\epsilon_{m},\delta_{m})
.
Client k[K]k\in[K] do in parallel
       1.if p>nkp>n_{k} then
             /*An alternative step to tackle high-dimensional data.*/
             Estimate local active variable set 𝒜kCCMDFilter\mathcal{A}^{k}\leftarrow\operatorname{CCMD-Filter} (See Algorithm 3 in Section 3.3);
             Upload 𝒜(k)\mathcal{A}^{(k)} and pull 𝒜\mathcal{A} from Server;
             X(k)X𝒜(k)X^{(k)}\leftarrow X^{(k)}_{\mathcal{A}};
            
       end if
      2. Upload the private slice mean matrix M~(k)\widetilde{M}^{(k)} (See Algorithm 2 in Section 3.2);
       3. Upload the private covariance matrix Σ~(k)\widetilde{\Sigma}^{(k)} as well as the sample size nkn_{k};
      
end
Server  do
       1. if 𝒜(k)\mathcal{A}^{(k)}\neq\emptyset then
             /*An alternative step to tackle high-dimensional data.*/
             Update 𝒜CCMDFilter\mathcal{A}\leftarrow\operatorname{CCMD-Filter} (See Algorithm 3 in Section 3.3);
            
       end if
      2. Merge Σ~=k=1KnkΣ~(k)/k=1Knk\widetilde{\Sigma}=\sum_{k=1}^{K}n_{k}\widetilde{\Sigma}^{(k)}/\sum_{k=1}^{K}n_{k};
       3. Merge M~=k=1KnkM~(k)/k=1Knk\widetilde{M}=\sum_{k=1}^{K}n_{k}\widetilde{M}^{(k)}/\sum_{k=1}^{K}n_{k} to obtain (U~,S~,V~)SVD(M~)(\widetilde{U},\widetilde{S},\widetilde{V})\leftarrow\operatorname{SVD}(\widetilde{M}) ;
       4. Return the global estimate β~=Σ~1U~\widetilde{\beta}=\widetilde{\Sigma}^{-1}\widetilde{U}.
      
end
Algorithm 1 The FSIR framework.

Algorithm 1 presents a federated framework for computing SIR on the decentralized dataset 𝒟\mathcal{D}, catching a glimpse of our FSIR method. In this framework, each client k[K]k\in[K] operates independently to estimate the slice mean matrix as M^(k)\widehat{M}^{(k)} and the covariance matrix as Σ^(k)\widehat{\Sigma}^{(k)} based on their local dataset 𝒟(k)\mathcal{D}^{(k)}, without needing to access external data. Subsequently, these two matrices are uploaded to the central server after a perturbation operation to ensure differential privacy (see details later). A global estimate of the SDR subspace is obtained on the server in Step 4, achieved through the merging of M~(k)\widetilde{M}^{(k)} and Σ~(k)\widetilde{\Sigma}^{(k)} for all k[K]k\in[K], where M~(k)\widetilde{M}^{(k)} and Σ~(k)\widetilde{\Sigma}^{(k)} denoting the perturbed matrices.

At first glance, the pooling and averaging operations at Step 22 and 33 on the server might suggest that FSIR is merely another application of the FedAvg algorithm (McMahan et al., 2017). However, FSIR distinguishes itself by emphasizing on one-shot communication and, more importantly, privacy protection. Throughout the procedure, each client only computes and uploads local model estimates to the server, ensuring that sensitive records would remain localized with their owner. In this way, FSIR restricts the central server or any other client jj can only probe the dataset of client kk through its uploaded estimates. Meanwhile, admitting the existence of potential malicious participants who may conduct tracing attacks on the shared estimates, FSIR necessitates the differential privacy of both M~(k)\widetilde{M}^{(k)} and Σ~(k)\widetilde{\Sigma}^{(k)}. This privacy requirement ensures that any operation performed on these estimates does not cause additional privacy breaches, thanks to the post-processing property of differential privacy.

An essential step in Algorithm 1 involves the construction of differentially private Σ~(k)\widetilde{\Sigma}^{(k)} and M~(k)\widetilde{M}^{(k)} for each client kk. Privacy-preserving estimation of covariance matrices has been extensively studied, as evidenced by a large body of literature (Chaudhuri et al., 2012; Grammenos et al., 2020; Biswas et al., 2020; Wang and Xu, 2021). While it is not the primary focus of this paper, we simply adapt the approach suggested by Chaudhuri et al. (2012) within our framework, corresponding to Step 33 on the client. The privacy of corresponding operation is guaranteed by Lemma 1. In the next section, our attention shifts towards the development of a differentially private slice mean matrix, whose singular subspace plays a crucial role in estimating 𝒮Y|X\mathcal{S}_{Y|X}.

Lemma 1.

(Chaudhuri et al., 2012) Let Xp×nX\in\mathbb{R}^{p\times n} be the centralized design matrix on the client with Xi1\|X_{i}\|\leq 1 for all i[n]i\in[n]. Denote Σ^=1nXXT\widehat{\Sigma}=\frac{1}{n}XX^{{\mathrm{\scriptscriptstyle T}}}. Let Ap×pA\in\mathbb{R}^{p\times p} be a symmetric random matrix whose entries are i.i.d. drawn from N(0,σx2)N(0,\sigma_{x}^{2}), where

σx=p+1nϵx{2log(p2+p22πδx)}1/2+1nϵx1/2.\sigma_{x}=\frac{p+1}{n\epsilon_{x}}\left\{2\log\left(\frac{p^{2}+p}{2\sqrt{2\pi}\delta_{x}}\right)\right\}^{1/2}+\frac{1}{n{\epsilon_{x}}^{1/2}}.

Then Σ~Σ^+A\widetilde{\Sigma}\triangleq\widehat{\Sigma}+A is (ϵx,δx)(\epsilon_{x},\delta_{x})-differentially private.

Notice the condition Xi1\|X_{i}\|\leq 1 is not commonly encountered in the realm of sufficient dimension reduction. We address this by devising a strategy to perturb Σ^/cR2=1n(X/cR)(X/cR)\widehat{\Sigma}/c_{R}^{2}=\frac{1}{n}(X/c_{R})(X/c_{R})^{\top} in accordance with the magnitudes suggested in Lemma 1. Here, cRc_{R} denotes the maximum norm maxiXi{\mathrm{max}}_{i}\|X_{i}\|, which is controlled by the truncation level RR (see Section 3.1). Notably, this scaling transformation has no impact on the estimation of 𝒮Y|X\mathcal{S}_{Y|X}.

3 Estimation with differential privacy

3.1 Private slice mean matrix: a preliminary way

On the client-side, a natural approach for constructing a differential private slice mean matrix is to apply the i.i.d. Gaussian mechanism to the mean vector of each slice; see Proposition 2, which can be easily derived from Theorem A.1 in Dwork et al. (2014).

Proposition 2.

Let {(XiT,yi)Tp+1,i[n]}\{(X_{i}^{\mathrm{\scriptscriptstyle T}},y_{i})^{{\mathrm{\scriptscriptstyle T}}}\in\mathbb{R}^{p+1},i\in[n]\} be i.i.d. samples on a client. Denote m^h1ni=1nXi1(yi=h)\widehat{m}_{h}\triangleq\frac{1}{n}\sum_{i=1}^{n}X_{i}{\mathrm{1}}(y_{i}=h) and assume its l2l_{2}-sensitivity Δ2<\Delta_{2}<\infty. Define m~hm^h+(ξ1,ξ2,,ξp)Tp\widetilde{m}_{h}\triangleq\widehat{m}_{h}+(\xi_{1},\xi_{2},\dots,\xi_{p})^{{\mathrm{\scriptscriptstyle T}}}\in\mathbb{R}^{p}, where ξjN(0,2Δ22log(1.25/δ)ϵ2)\xi_{j}\sim N(0,\frac{2\Delta_{2}^{2}\log(1.25/\delta)}{\epsilon^{2}}), then M~(m~1,,m~H)p×H\widetilde{M}\triangleq(\widetilde{m}_{1},\dots,\widetilde{m}_{H})\in\mathbb{R}^{p\times H} is (ϵ,δ)(\epsilon,\delta)-differential private.

Clearly, the noise scale is determined by both of the privacy level set (ϵ,δ)(\epsilon,\delta) and the l2l_{2}-sensitivity Δ2\Delta_{2}. Recall that sliced inverse regression usually assumes that XX is integrable and has an elliptical distribution to satisfy the LCM condition, which does not guarantee Δ2<\Delta_{2}<\infty in general. Therefore, we truncate XX to restrict it on a finite support. Given the truncation level RR, let

m¯h,j=1ni=1nΠR(Xij)1(yi=h),j[p].\overline{m}_{h,j}=\frac{1}{n}\sum_{i=1}^{n}\Pi_{R}(X_{ij}){\mathrm{1}}(y_{i}=h),j\in[p].

Denote m¯h=(m¯h,1,,m¯h,p)T\overline{m}_{h}=(\overline{m}_{h,1},\dots,\overline{m}_{h,p})^{{\mathrm{\scriptscriptstyle T}}} and M¯(m¯1,,m¯H)p×H.\overline{M}\triangleq(\overline{m}_{1},\dots,\overline{m}_{H})\in\mathbb{R}^{p\times H}. We then utilize M¯\overline{M} rather than M^\widehat{M} to meet the assumption Δ2<\Delta_{2}<\infty. Since a sample can be only located in one slice, we easily obtain Δ2=2Rp/n\Delta_{2}=2R\sqrt{p}/n over a pair of data sets which only differ by one single entry, where nn is the client sample size.

The differential privacy is always guaranteed at the expense of estimation accuracy (Wasserman and Zhou, 2010; Bassily et al., 2014). In real applications, a client could have extremely limited observations, causing the scale of added noises significantly larger than of the signal. For this, we suggest to set an upper bound σ02\sigma_{0}^{2} as a tolerated scale of noise and calculate the minimal sample size

nϵ,δ,p,R=2R2plog(1.25/δ)σ0ϵn_{\epsilon,\delta,p,R}=\frac{2R\sqrt{2p\log(1.25/\delta)}}{\sigma_{0}\epsilon}

required by each client. Thus for client k[K]k\in[K], only if nknϵ,δ,p,Rn_{k}\geq n_{\epsilon,\delta,p,R}, we compute and upload M~k\widetilde{M}_{k} following Proposition 2.

Remark.

The truncating operation is widely adopted for both theoretical studies and practical applications, we refer to (Lei, 2011; Cai et al., 2021) for more details on choosing a proper truncation level. As for the selection of privacy level set, we adopt the convention ϵ=O(1)\epsilon=O(1) and δ=o(1/n)\delta=o(1/n), which is usually considered as the most permissive setting to provide a nontrivial privacy protection (Dwork et al., 2014).

3.2 Private slice mean matrix: singular space oriented

The use of vanilla Gaussian mechanism in the previous section is intuitive and simple, yet has a significant limitation: it does not account for the impact of perturbation operation on the left singular space of M¯\overline{M}, which is crucial for estimating 𝒮Y|X\mathcal{S}_{Y|X}, as revealed by (3). To mitigate this, we propose a novel perturbation strategy, termed vectorized Gaussian (VGM) mechanism, to provide (ϵ,δ)(\epsilon,\delta)-differential privacy for M¯\overline{M} without sacrificing too much information we needed. We start by giving its definition.

Definition 3 (Vectorized Gaussian mechanism).

For any vector-valued algorithm f:𝒟pf:\mathcal{D}\rightarrow\mathbb{R}^{p} which satisfies Δ2(f)<\Delta_{2}(f)<\infty, define the vectorized Gaussian mechanism as

f~(𝒟)f(𝒟)+ξ,\widetilde{f}(\mathcal{D})\triangleq f(\mathcal{D})+\xi,

where ξ𝒩p(0,Σξ)\xi\sim\mathcal{N}_{p}(0,\Sigma_{\xi}).

Obviously, the noise vector ξ\xi is characterized by the covariance matrix Σξ\Sigma_{\xi}. In particular, if Σξ={2Δ22log(1.25/δ)/ϵ2}Ip\Sigma_{\xi}=\{2\Delta_{2}^{2}\log(1.25/\delta)/\epsilon^{2}\}I_{p}, the VGM mechanism coincides with the i.i.d. Gaussian mechanism. Theorem 3 provides a sufficient condition for holding the (ϵ,δ)(\epsilon,\delta)-differential privacy of VGM mechanism.

Theorem 3.

Let {(XiT,yi)Tp+1,i[n]}\{(X_{i}^{\mathrm{\scriptscriptstyle T}},y_{i})^{{\mathrm{\scriptscriptstyle T}}}\in\mathbb{R}^{p+1},i\in[n]\} be i.i.d. samples on a client. Denote m^h1ni=1nXi1(yi=h)\widehat{m}_{h}\triangleq\frac{1}{n}\sum_{i=1}^{n}X_{i}{\mathrm{1}}(y_{i}=h) and assume its l2l_{2}-sensitivity Δ2<\Delta_{2}<\infty. Define m~hm^h+ξp\widetilde{m}_{h}\triangleq\widehat{m}_{h}+\xi\in\mathbb{R}^{p}, where ξ𝒩p(0,Σξ)\xi\sim\mathcal{N}_{p}(0,\Sigma_{\xi}). Suppose the eigenvalue decomposition Σξ=UξVξUξT\Sigma_{\xi}=U_{\xi}V_{\xi}U_{\xi}^{{\mathrm{\scriptscriptstyle T}}}. Then M~(m~1,,m~H)p×H\widetilde{M}\triangleq(\widetilde{m}_{1},\dots,\widetilde{m}_{H})\in\mathbb{R}^{p\times H} is (ϵ,δ)(\epsilon,\delta)-differentially private, if Σξ\Sigma_{\xi} satisfies

Σξ124log2δ+2ϵ4{(log2δ)2+log2δϵ}1/2Δ22.\|\Sigma_{\xi}^{-1}\|_{2}\leq\frac{4\log{\frac{2}{\delta}}+2\epsilon-4\{(\log{\frac{2}{\delta}})^{2}+\log{\frac{2}{\delta}}\epsilon\}^{1/2}}{\Delta_{2}^{2}}.

By Taylor expansion and some algebraic computations, we can further obtain

4log2δ+2ϵ4{(log2δ)2+log2δϵ}1/2Δ22ϵ22Δ22log2δ.\frac{4\log{\frac{2}{\delta}}+2\epsilon-4\{(\log{\frac{2}{\delta}})^{2}+\log{\frac{2}{\delta}}\epsilon\}^{1/2}}{\Delta_{2}^{2}}\approx\frac{\epsilon^{2}}{2\Delta_{2}^{2}\log{\frac{2}{\delta}}}.

For expression convenience, we assume the client sample size nnϵ,δ,p,Rn\geq n_{\epsilon,\delta,p,R} and use

σvgm2n2ϵ28R2plog2δ\sigma^{2}_{vgm}\triangleq\frac{n^{2}\epsilon^{2}}{8R^{2}p\log{\frac{2}{\delta}}} (4)

as an approximate upper bound of Σξ12\|\Sigma_{\xi}^{-1}\|_{2}. We further denote the diagonal elements of VξV_{\xi} by v1,,vpv_{1},\dots,v_{p}, which are arranged in decreasing order.

Theorem 3 presents an inspiring result that allows for flexible design of the eigenvectors UξU_{\xi} while achieving differential privacy by bounding sps_{p}. Building upon this finding, we proceed by performing singular value decomposition on M¯\overline{M}, yielding M¯=W1SW2T\overline{M}=W_{1}SW_{2}^{{\mathrm{\scriptscriptstyle T}}}. We take Uξ=W1U_{\xi}=W_{1} and sort the diagonal elements of VξV_{\xi} in the same order as the diagonal elements of SS, while enforcing the constraint vpσvgm2v_{p}\geq\sigma^{2}_{vgm}. The key idea here is to construct a pp-dimensional noise vector ξ\xi with a specific covariance structure, such that its eigenspace aligns with the left singular subspace of the slice mean matrix M¯\overline{M}. Thus the perturbed matrix M~=M¯+E0\widetilde{M}=\overline{M}+E_{0} can preserve the relevant information about 𝒮Y|X\mathcal{S}_{Y|X}, where E0E_{0} is a noise matrix whose columns are generated from ξ\xi.

Algorithm 2 outlines more details about our strategy. Steps 11 to 33 split the left singular space of M¯\overline{M} into two parts: a dd-dimensional subspace spanned by the leading singular vectors and its orthogonal complement. Since the first part will be utilized to construct 𝒮Y|X\mathcal{S}_{Y|X}, we preserve these eigenvectors and maintain their ordering based on corresponding eigenvalues. Meanwhile, we propose that the eigenvectors in the second part have no significant difference thus share the same eigenvalue. Following this, Steps 44 construct a matrix VV by appropriately arranging diagonal elements, which leads to Σξ=W1VW1T\Sigma_{\xi}=W_{1}VW_{1}^{{\mathrm{\scriptscriptstyle T}}} and Σξ12=σvgm2\|\Sigma_{\xi}^{-1}\|_{2}=\sigma^{2}_{vgm}. Finally, Steps 55-66 generate noise vectors and upload perturbed slice mean matrix.

Data: 𝒟={(Xi,yi)p+1,i[n]}\mathcal{D}=\{(X_{i},y_{i})\in\mathbb{R}^{p+1},i\in[n]\}.
Input: Privacy level set (ϵ,δ)(\epsilon,\delta), truncation level RR.
Output: Perturbed slice mean matrix M~p×H\widetilde{M}\in\mathbb{R}^{p\times H}.
1. Compute σvgm2\sigma^{2}_{vgm} and the truncated slice mean matrix M¯\overline{M};
2. Obtain (W1,S,W2T)SVD(M¯)(W_{1},S,W_{2}^{{\mathrm{\scriptscriptstyle T}}})\leftarrow\operatorname{SVD}(\overline{M}), where Sp×pS\in\mathbb{R}^{p\times p}. Denote the jjth diagonal element of SS by sj,j[p]s_{j},\ j\in[p];
3. Compute the eigengap rj=sj+1sjr_{j}=s_{j+1}-s_{j} for j[p1]j\in[p-1] and estimate the structure dimension dd by ranking rjr_{j}’s;
4. Construct a diagonal matrix Vp×pV\in\mathbb{R}^{p\times p} with the jjth diagonal element vjv_{j} equals to σvgm2+rj\sigma^{2}_{vgm}+r_{j} for 1jd1\leq j\leq d and σvgm2\sigma^{2}_{vgm} for d+1jpd+1\leq j\leq p;
5. Generate a matrix Zp×HZ\in\mathbb{R}^{p\times H}, where Zj,hi.i.d.N(0,1)Z_{j,h}\sim_{i.i.d.}N(0,1) for j[p],h[H]j\in[p],\ h\in[H];
6. Obtain E0W1V1/2Zp×HE_{0}\triangleq W_{1}V^{1/2}Z\in\mathbb{R}^{p\times H} and upload M~M¯+E0\widetilde{M}\triangleq\overline{M}+E_{0}.
Algorithm 2 Compute private slice mean matrix via the VGM mechanism.
Remark.

Step 33 in Algorithm 2 estimates the structure dimension dd by simply ranking the eigengaps of M¯\overline{M}. While this approach circumvents additional computational burden, alternative methods exist for determining the structure dimension. We refer interested reader to (Luo and Li, 2021) for a detailed study.

Remark.

Compared with the i.i.d. Gaussian mechanism, the VGM mechanism preserves the information carried by the left singular subspace of M¯\overline{M} at the cost of a higher variance of noise. Notice we have

Σξ28R2plog2δn2ϵ2\|\Sigma_{\xi}\|_{2}\geq\frac{8R^{2}p\log{\frac{2}{\delta}}}{n^{2}\epsilon^{2}}

and the minimum is reached if all the eigenvalues of Σξ\Sigma_{\xi} are the same. In practice, we can fix the scale of Σξ2\|\Sigma_{\xi}\|_{2} to be CR2plog2δ/(n2ϵ2){CR^{2}p\log{\frac{2}{\delta}}}/({n^{2}\epsilon^{2}}), where the constant C8C\geq 8. To prevent the noise from overpowering the signal, the client sample size should be sufficiently large. Specifically, we require

CR2plog2δn2ϵ2=o(1)\frac{CR^{2}p\log{\frac{2}{\delta}}}{n^{2}\epsilon^{2}}=o(1)

for our theoretical investigation.

3.3 Sparse estimation in the high-dimensional setting

In many scenarios, client data can be high-dimensional (p>np>n) or even ultra-high-dimensional (p=o(exp(nα))p=o(\exp(n^{\alpha}))). Since SIR relies on p/n0p/n\rightarrow 0 as nn\rightarrow\infty to guarantee the consistency of V^H\widehat{V}_{H} (Lin et al., 2018), we introduce a feature screening procedure as an initial step to handle these data. Recall that SIR is developed upon the observation that Σ1{E(X|Y)E(X)}𝒮YX\Sigma^{-1}\{E(X|Y)-E(X)\}\in\mathcal{S}_{Y\mid X}, implying that 𝒮Y|X\mathcal{S}_{Y|X} degenerates if E(X|Y)E(X)=0E(X|Y)-E(X)=0 almost surely. This motivates us to define the active set as

𝒯={j[p]E(Xj|Y)E(Xj)a.s.}\mathcal{T}=\{j\in[p]\mid E(X_{j}|Y)\neq E(X_{j})\ \mathrm{a.s.}\}

and utilize the criterion

ωj,h|E{XjE(Xj)|Y=h}|,h[H]\omega_{j,h}\triangleq|E\{X_{j}-E(X_{j})|Y=h\}|,\ h\in[H]

to identify 𝒯\mathcal{T}. To take advantage of the decentralized data, we compute ωj,h(k)\omega_{j,h}^{(k)} on each client k[K]k\in[K] and vote for screening on the server. The entire procedure is designed to operate in a federated paradigm and we call it the Collaborative Conditional Mean Difference (CCMD) filter. We clarify its implementation details in Algorithm 3. With the inclusion of the CCMD filter, our FSIR framework now successfully handles high-dimensional data and is fully operational.

Data: 𝒟=k=1K𝒟(k)\mathcal{D}=\bigcup_{k=1}^{K}\mathcal{D}^{(k)}.
Input: slice number HH, client number KK, threshold tt.
Output: Estimated active set 𝒯^\widehat{\mathcal{T}}.
Client k[K]k\in[K] do in parallel:
       1. Compute Ω(k)p×H\Omega^{(k)}\in\mathbb{R}^{p\times H} where Ωj,h(k)w^j,h(k)\Omega_{j,h}^{(k)}\triangleq\widehat{w}_{j,h}^{(k)};
       2. For label h[H]h\in[H]:
             Estimate the active set on hhth slice 𝒯^h(k){j[p]Ωj,h(k)>t}\widehat{\mathcal{T}}^{(k)}_{h}\triangleq\{j\in[p]\mid\Omega_{j,h}^{(k)}>t\};
            
       end
      3. Upload the kkth multiset (k)h[H]𝒯^h(k)\mathcal{B}^{(k)}\triangleq\biguplus_{h\in[H]}\widehat{\mathcal{T}}^{(k)}_{h};
      
end
Server  do:
       4. Obtain the global multiset k[K](k)\mathcal{B}\triangleq\biguplus_{k\in[K]}\mathcal{B}^{(k)};
       5. Return the global active set 𝒯^{jmultiplicity(j)>K2}\widehat{\mathcal{T}}\triangleq\{j\in\mathcal{B}\mid\operatorname{multiplicity}(j)>\frac{K}{2}\}.
      
end
Algorithm 3 The Collaborative Conditional Mean Difference (CCMD) filter.
Remark.

Step 3 in Algorithm 3 does not involve privacy leakage, since this operation only uploads an index set of possible active variables rather than their mean values. For the later case, Dwork et al. (2018) has developed a differentially private approach, named “Peeling”, to estimate the top-ss largest elements in a pp-dimensional mean vector.

4 Theoretical analysis

We focus on the homogeneous scenario, where all clients share the same SDR subspace. The heterogeneous case is left for future research due to its additional technical challenges. For convenience, we define the central curve m(Y)E(X|Y)m(Y)\triangleq{E}(X|Y) and assume E(X)=0E(X)=0. Suppose sample sizes across all slices in the decentralized dataset 𝒟\mathcal{D} are equal, denoted by n0=N/Hn_{0}=N/H. The following technical conditions are required for our theoretical studies.

Condition 1.

For any βp×d\beta\in\mathbb{R}^{p\times d}, E(X|βTX)E(X|\beta^{{\mathrm{\scriptscriptstyle T}}}X) is a linear combination of βTX\beta^{{\mathrm{\scriptscriptstyle T}}}X.

Condition 2.

The rank of var{m(Y)}{\mathrm{var}}\{m(Y)\} equals the structure dimension dd.

Condition 3.

XX is sub-Gaussian and there exist positive constants C1C_{1}, C2C_{2} such that

C1λmin(Σ)λmax(Σ)C2,\displaystyle C_{1}\leq\lambda_{\mathrm{min}}(\Sigma)\leq\lambda_{\mathrm{max}}(\Sigma)\leq C_{2},

where λmin(Σ)\lambda_{\mathrm{min}}(\Sigma) and λmax(Σ)\lambda_{\mathrm{max}}(\Sigma) are the minimal and maximal eigenvalues of Σ\Sigma respectively.

Condition 4.

Assume m(Y)m(Y) has finite fourth moment and is ν\nu-sliced stable (See definition 4).

Definition 4.

For two positive constants γ1<1<γ2\gamma_{1}<1<\gamma_{2}, let 𝒜H(γ1,γ2)\mathcal{A}_{H}(\gamma_{1},\gamma_{2}) be the collection of all the partition =a0<a1<<aH=-\infty=a_{0}<a_{1}<\cdots<a_{H}=\infty of \mathbb{R} satisfying that

γ1Hpr(aiY<ai+1)γ2H.\displaystyle\frac{\gamma_{1}}{H}\leq{\mathrm{pr}}(a_{i}\leq Y<a_{i+1})\leq\frac{\gamma_{2}}{H}.

Then m(Y)m(Y) is called ν\nu-sliced stable with respect to YY for some ν>0\nu>0 if there exist positive constants γi\gamma_{i}, i=1,2,3i=1,2,3 such that for any αp\alpha\in\mathbb{R}^{p} and any partition in 𝒜H(γ1,γ2)\mathcal{A}_{H}(\gamma_{1},\gamma_{2}),

1H|h=0H1var{αTm(Y)|ahYah+1}|γ3Hνvar{αTm(Y)}.\displaystyle\frac{1}{H}|\sum_{h=0}^{H-1}{\mathrm{var}}{\left\{\alpha^{{\mathrm{\scriptscriptstyle T}}}m(Y)|a_{h}\leq Y\leq a_{h+1}\right\}}|\leq\frac{\gamma_{3}}{H^{\nu}}{\mathrm{var}}{\left\{\alpha^{{\mathrm{\scriptscriptstyle T}}}m(Y)\right\}}.
Condition 5.

s=|𝒮|ps=|\mathcal{S}|\ll p where 𝒮={i|βj(i)0for some j,1jd}\mathcal{S}=\{i|\beta_{j}(i)\neq 0\quad\text{for some }j,1\leq j\leq d\} and |𝒮||\mathcal{S}| is the number of elements in the set of |𝒮||\mathcal{S}|.

Condition 6.

Assume max1<i<pri{\mathrm{max}}_{1<i<p}r_{i} is bounded where rir_{i} is the number of non-zero elements in the iith row of Σ\Sigma.

Condition 1-2 are essential for the unbiasedness of SIR, as discussed in Section 2.1. Conditions 3-5 were introduced by Lin et al. (2018) to ensure the consistency of SIR. In particular, Condition 4 characterizes the smoothness of the central curve by defining its ν\nu-sliced stable property.

For the ultra-high dimensional setting, Condition 5 imposes a sparsity structure on the loading’s of β\beta. Furthermore, instead of imposing the “approximately bandable” condition on the covariance matrix as utilized in (Lin et al., 2018), we only propose the row sparsity structure in Condition 6, which is a relative mild assumption.

Now we are ready to state the main theoretical results of FSIR. According to algorithms in Section 3, we first write

M~=k=1KnkM~(k)/N=k=1KnkM^(k)/N+k=1KnkE0(k)/N\displaystyle\widetilde{M}=\sum_{k=1}^{K}n_{k}\widetilde{M}^{(k)}/N=\sum_{k=1}^{K}n_{k}\widehat{M}^{(k)}/N+\sum_{k=1}^{K}n_{k}E_{0}^{(k)}/N

and Λ^=M~M~T\widehat{{\Lambda}}=\widetilde{M}\widetilde{M}^{{\mathrm{\scriptscriptstyle T}}}, where N=k=1KnkN=\sum_{k=1}^{K}n_{k}. Theorem 4 provides the convergence rate of kernel matrix Λ^\widehat{{\Lambda}}.

Theorem 4.

Under conditions 1-4, if nϵ,δ,p,R2ϵ2/(R4H2ν1p3log2δ){n_{\epsilon,\delta,p,R}^{2}\epsilon^{2}}/({R^{4}H^{2\nu-1}p^{3}\log{\frac{2}{\delta}}})\rightarrow\infty, then we have

Λ^Λp2=Op(1Hν+1+HpN+p1/2N1/2).\displaystyle\|\widehat{\Lambda}-\Lambda_{p}\|_{2}=O_{p}\left(\frac{1}{H^{\nu+1}}+\frac{Hp}{N}+{\frac{p^{1/2}}{N^{1/2}}}\right). (5)

From Section 2.2 we know that

Σ~=1NXXT+1Nk=1Knk(cR(k))2A(k)=1NXXT+k=1K(cR(k))2A,\displaystyle\widetilde{\Sigma}=\frac{1}{N}XX^{{\mathrm{\scriptscriptstyle T}}}+\frac{1}{N}\sum_{k=1}^{K}n_{k}(c^{(k)}_{R})^{2}A^{(k)}=\frac{1}{N}XX^{{\mathrm{\scriptscriptstyle T}}}+\sum_{k=1}^{K}(c^{(k)}_{R})^{2}A,

where A=nkA(k)/NA=n_{k}A^{(k)}/N is a symmetric random Gaussian matrix and Ai,j𝒩(0,σk2)A_{i,j}\sim\mathcal{N}(0,\sigma_{k}^{2}) for iji\geq j where

σx=p+1Nϵx{2log(p2+p22πδx)}1/2+1Nϵx1/2.\displaystyle\sigma_{x}=\frac{p+1}{N\epsilon_{x}}\left\{2\log{\left(\frac{p^{2}+p}{2\sqrt{2\pi}\delta_{x}}\right)}\right\}^{1/2}+\frac{1}{N{\epsilon_{x}}^{1/2}}.

Then we have the following Lemma:

Lemma 5.

Under condition 3, we have

Σ~Σ20\displaystyle\|\widetilde{\Sigma}-\Sigma\|_{2}\rightarrow 0

as p/N0p/N\rightarrow 0 and NN\rightarrow\infty. It is also easy to see that

Σ~1Σ120.\displaystyle\|\widetilde{\Sigma}^{-1}-{\Sigma^{-1}}\|_{2}\rightarrow 0.

With lemma 5, we can show that our FSIR procedure provides a consistent estimate of the sufficient dimension reduction space.

Theorem 6.

Under conditions 1-4 and assuming that p/N0p/N\rightarrow 0, we have

Σ~1Λ^Σ1Λp20\displaystyle\|\widetilde{\Sigma}^{-1}\widehat{\Lambda}-\Sigma^{-1}\Lambda_{p}\|_{2}\rightarrow 0

as NN\rightarrow\infty with probability converging to one.

As demonstrated by Theorems 4 and 6, FSIR is capable of obtaining a reliable estimation of the central space in low dimensional scenarios. Additionally, we will illustrate how FSIR can effectively address high dimensional settings through the implementation of a screening procedure. Prior to that, we will outline the key properties of the screening procedure.

Let the sample mean in the hhth slice be x¯h\bar{x}_{h}, and we define the inclusion set and exclusion set below, which depend on a threshold value tt,

𝒯={j|E{x(j)|y}is not a constant},\displaystyle\mathcal{T}=\left\{j|{E}\{x(j)|y\}\quad\text{is not a constant}\right\},
h={j||x¯h(j)|>t},h=1,,H.\displaystyle\mathcal{I}_{h}=\left\{j||\bar{x}_{h}(j)|>t\right\},h=1,...,H.
=hh={j|there exists ah[1,H],s.t.|x¯h(j)|>t},\displaystyle\mathcal{I}=\cup_{h}\mathcal{I}_{h}=\left\{j|\text{there exists a}\quad h\in[1,H],\text{s.t.}\quad|\bar{x}_{h}(j)|>t\right\},
c={j|for allh[1,H],|x¯h(j)|<t}.\displaystyle\mathcal{I}^{c}=\left\{j|\text{for all}\quad h\in[1,H],\quad|\bar{x}_{h}(j)|<t\right\}.

Furthermore, we write the smallest sample size of slices as nϵ,δ,s,Rn_{\epsilon,\delta,s,R} in high dimension settings.

Assumption 1.

Signal strength: C>0\exists C>0 and ω>0\omega>0 such that var[E{x(k)|y}]>Csω\mathrm{var}[E\{x(k)|y\}]>Cs^{-\omega} when E{x(k)|y}{E}\{x(k)|y\} is not a constant.

Proposition 7.

Under condition 1 - 6 and assumption 1, and let t=Csω/2t=Cs^{-{\omega}/{2}} for some constant C>0C>0 such that t<2Cμsω/2t<2C_{\mu}s^{-{\omega}/{2}}, we have

i) 𝒯\mathcal{T}\subset\mathcal{I} holds with probability at least

1C1exp{C2nϵ,δ,s,RsωHν2+C3log(H)+C4log(s)}\displaystyle 1-C_{1}\exp\left\{-C_{2}\frac{n_{\epsilon,\delta,s,R}s^{-\omega}}{H\nu^{2}}+C_{3}\log{(H)}+C_{4}\log{(s)}\right\}
C5exp{nϵ,δ,s,Rsω2CHτ2+2sω2τ+C6log(s)+C7log(H)}.\displaystyle-C_{5}\exp\left\{\frac{-n_{\epsilon,\delta,s,R}s^{-\omega}}{2CH\tau^{2}+2s^{-\frac{\omega}{2}}\tau}+C_{6}\log{(s)}+C_{7}\log{(H)}\right\}.

ii) 𝒯cc\mathcal{T}^{c}\subset\mathcal{I}^{c} holds with probability at least

1C1exp{nϵ,δ,s,Rsω2CHτ2+2sω2τ+C2log(H)+C3log(pcs)}.\displaystyle 1-C_{1}\exp\left\{\frac{-n_{\epsilon,\delta,s,R}s^{-\omega}}{2CH\tau^{2}+2s^{-\frac{\omega}{2}}\tau}+C_{2}\log{(H)}+C_{3}\log{(p-cs)}\right\}.

This Proposition has a simple implication. We may choose

H=o(Cnϵ,δ,s,Rsωτ2/logp)H=o({Cn_{\epsilon,\delta,s,R}s^{-\omega}}{\tau^{-2}/\log{p}})

so that

Cnϵ,δ,s,RsωHτ2logp.\frac{Cn_{\epsilon,\delta,s,R}s^{-\omega}}{H\tau^{2}}\gg\log{p}.

With the help of proposition 7, we provide the properties of screening procedure in the theorem below.

Theorem 8.

Let 𝒯0={jmultiplicity(j)>K2}\mathcal{T}_{0}=\left\{j\mid\operatorname{multiplicity}(j)>\frac{K}{2}\right\}, we have

i) 𝒯𝒯0\mathcal{T}\subset\mathcal{T}_{0} holds with probability at least

1CsK3/2(K2p1e2)K/2,\displaystyle 1-CsK^{3/2}(\frac{K^{2}p_{1}}{e^{2}})^{K/2},

where

p1=C1exp{C2nϵ.δ,s,RsωHν2+C3log(H)}+C4exp{nϵ,δ,s,Rsω2CHτ2+2sω2τ+C5log(H)}.\displaystyle p_{1}=C_{1}\exp\left\{-C_{2}\frac{n_{\epsilon.\delta,s,R}s^{-\omega}}{H\nu^{2}}+C_{3}\log{(H)}\right\}+C_{4}\exp\left\{\frac{-n_{\epsilon,\delta,s,R}s^{-\omega}}{2CH\tau^{2}+2s^{-\frac{\omega}{2}}\tau}+C_{5}\log{(H)}\right\}.

ii) 𝒯c𝒯0c\mathcal{T}^{c}\subset\mathcal{T}_{0}^{c} holds with probability at least

1CsK3/2(K2p1e2)K/2,\displaystyle 1-CsK^{3/2}(\frac{K^{2}p_{1}}{e^{2}})^{K/2},

where

p2=C1exp{nϵ,δ,s,Rsω2CHτ2+2sω2τ+C2log(H)}.\displaystyle p_{2}=C_{1}\exp\left\{\frac{-n_{\epsilon,\delta,s,R}s^{-\omega}}{2CH\tau^{2}+2s^{-\frac{\omega}{2}}\tau}+C_{2}\log{(H)}\right\}.

It can be implied that we only need

HlogHmin(nϵ,δ,s,Rsων2,nϵ,δ,s,RH1τ2sω).H\log{H}\ll{\mathrm{min}}{({n_{\epsilon,\delta,s,R}}{s^{-\omega}\nu^{-2}},{n_{\epsilon,\delta,s,R}}{H^{-1}\tau^{-2}s^{-\omega}})}.

Under this assumption, we have 𝒯=𝒯0\mathcal{T}=\mathcal{T}_{0} with probability converging to one. Thus we know 𝒯0=𝒯\mathcal{T}_{0}=\mathcal{T} with probability converging to one, allowing us to draw some conclusions for high-dimensional FSIR which are shown in the following two theorems.

Theorem 9.

Under condition 1 - 6 and assumption 1 and choosing the same tt as Theorem 8, if HlogHmin(nϵ,δ,s,Rsων2,nϵ,δ,s,RH1τ2sω)H\log{H}\ll{\mathrm{min}}{({n_{\epsilon,\delta,s,R}}{s^{-\omega}\nu^{-2}},{n_{\epsilon,\delta,s,R}}{H^{-1}\tau^{-2}s^{-\omega}})}, we have

e(Λ^𝒯0,𝒯0)Λ20\displaystyle\|e(\widehat{\Lambda}^{\mathcal{T}_{0},\mathcal{T}_{0}})-\Lambda\|_{2}\rightarrow 0

as nn\rightarrow\infty with probability converging to one.

Theorem 10.

Under the same conditions and assumptions in Theorem 9

e{(Σ^𝒯0,𝒯0)1Λ^𝒯0,𝒯0}e{(Σ𝒯0,𝒯0)1Λ𝒯0,𝒯0}20\displaystyle\|e\left\{(\widehat{\Sigma}^{\mathcal{T}_{0},\mathcal{T}_{0}})^{-1}\widehat{\Lambda}^{\mathcal{T}_{0},\mathcal{T}_{0}}\right\}-e\left\{(\Sigma^{\mathcal{T}_{0},\mathcal{T}_{0}})^{-1}\Lambda^{\mathcal{T}_{0},\mathcal{T}_{0}}\right\}\|_{2}\rightarrow 0

as nn\rightarrow\infty with probability converging to 1.

By the sparsity of β\beta, e{(Σ𝒯,𝒯)1Λ𝒯,𝒯}e\left\{(\Sigma^{\mathcal{T},\mathcal{T}})^{-1}\Lambda^{\mathcal{T},\mathcal{T}}\right\} can be regarded as the true central subspace, thus it is reasonable using e{(Σ^𝒯0,𝒯0)1Λ^𝒯0,𝒯0}e\left\{(\widehat{\Sigma}^{\mathcal{T}_{0},\mathcal{T}_{0}})^{-1}\widehat{\Lambda}^{\mathcal{T}_{0},\mathcal{T}_{0}}\right\} as the estimation of central subspace when 𝒯0=𝒯\mathcal{T}_{0}=\mathcal{T}.

5 Simulation studies

5.1 Simulation setting

We consider five models in simulation studies; see details below. Among them, Model (I) uses the logistic regression to generates a binary response YY by a single index β1TX\beta_{1}^{{\mathrm{\scriptscriptstyle T}}}X, where β1p\beta_{1}\in\mathbb{R}^{p}. Model (II) is another single index model emphasizing a heterogeneous covariance structure. For both of them, the structure dimension dim(𝒮Y|X)=1\operatorname{dim}(\mathcal{S}_{Y|X})=1. Model (III) and (IV) further consider the double index setting. Without loss of generality, we specify a β2p\beta_{2}\in\mathbb{R}^{p} which is orthogonal to β1\beta_{1}, so dim(𝒮Y|X)=2\operatorname{dim}(\mathcal{S}_{Y|X})=2. For each of the Model (II) to (IV), assume ϵN(0,1)\epsilon\sim N(0,1) and ϵX\epsilon{\perp\!\!\!\perp}X. Model (V) presents an inverse regression model with ϵN(0,Ip)\epsilon\sim N(0,I_{p}) and ϵY\epsilon{\perp\!\!\!\perp}Y. Denote matrix Γ(β1,β2)p×2\Gamma\triangleq(\beta_{1},\beta_{2})\in\mathbb{R}^{p\times 2}, then it is semi-orthogonal and 𝒮Y|X=span(Γ)\mathcal{S}_{Y|X}=\operatorname{span}(\Gamma); see (Cook and Forzani, 2008).

  • (I) Y=1(1/{1+exp(β1TX)})Y={\mathrm{1}}(1/\{1+\exp(\beta_{1}^{{\mathrm{\scriptscriptstyle T}}}X)\}), where XN(0,Ip)X\sim N(0,I_{p}).

  • (II) Y=10.5+(β1TX+1)2+ϵY=\frac{1}{0.5+(\beta_{1}^{{\mathrm{\scriptscriptstyle T}}}X+1)^{2}}+\epsilon where XN(0,Σ)X\sim N(0,\Sigma) with Σi,j=0.5|ij|\Sigma_{i,j}=0.5^{|i-j|} for 1i,jp1\leq i,j\leq p.

  • (III) Y=β1TX(β2TX)3+1+ϵY=\frac{\beta_{1}^{{\mathrm{\scriptscriptstyle T}}}X}{(\beta_{2}^{{\mathrm{\scriptscriptstyle T}}}X)^{3}+1}+\epsilon where XN(0,Ip)X\sim N(0,I_{p}).

  • (IV) Y=sin(β1TX)exp(β2TX+ϵ)Y=\sin(\beta_{1}^{{\mathrm{\scriptscriptstyle T}}}X)\exp(\beta_{2}^{{\mathrm{\scriptscriptstyle T}}}X+\epsilon) where XN(0,Σ)X\sim N(0,\Sigma) with Σi,j=0.5|ij|\Sigma_{i,j}=0.5^{|i-j|} for 1i,jp1\leq i,j\leq p.

  • (V) X=Γf(Y)+ϵX=\Gamma f(Y)+\epsilon where Γ(β1,β2)p×2\Gamma\triangleq(\beta_{1},\beta_{2})\in\mathbb{R}^{p\times 2}, f(Y)=(Y,Y2)Tf(Y)=(Y,Y^{2})^{{\mathrm{\scriptscriptstyle T}}} and YN(0,1)Y\sim N(0,1).

In the low-dimensional setting, we take a fixed covariate dimension p=10p=10. For model (I) and (II), generate β1,jUnif(0.4,0.8)\beta_{1,j}\sim\operatorname{Unif}(0.4,0.8) for 1jp1\leq j\leq p and set β1\beta_{1} by normalizing (β1,1,,β1,p)T(\beta_{1,1},\dots,\beta_{1,p})^{{\mathrm{\scriptscriptstyle T}}}. For model (III) to (V), specify β1=(1,1,1,1,1,0,,0)T/5\beta_{1}=(1,1,1,1,1,0,\dots,0)^{{\mathrm{\scriptscriptstyle T}}}/\sqrt{5} and β2=(0,0,0,0,0,1,,1)T/5\beta_{2}=(0,0,0,0,0,1,\dots,1)^{{\mathrm{\scriptscriptstyle T}}}/\sqrt{5}. In the high-dimensional setting, specify p=500,1000,2000p=500,1000,2000, separately. Denote the size of active set by ss and let s0s/2s_{0}\triangleq\lceil{s/2}\rceil. When p500p\leq 500, we take s=5s=5, otherwise, s=10s=10. For model (I) and (II), generate β1,jUnif(0.4,0.8)\beta_{1,j}\sim\operatorname{Unif}(0.4,0.8) for 1js1\leq j\leq s and β1,j=0\beta_{1,j}=0 for s<jps<j\leq p. For model (III) to (V), set β1,j=1\beta_{1,j}=1 for 1js01\leq j\leq s_{0} and β1,j=0\beta_{1,j}=0 for s0<jps_{0}<j\leq p; similarly, set β2,j=1\beta_{2,j}=1 for ss0jss-s_{0}\leq j\leq s and β2,j=0\beta_{2,j}=0 otherwise. Finally, β1\beta_{1} and β2\beta_{2} are obtained by normalizing (β1,1,,β1,p)T(\beta_{1,1},\dots,\beta_{1,p})^{{\mathrm{\scriptscriptstyle T}}} and (β2,1,,β2,p)T(\beta_{2,1},\dots,\beta_{2,p})^{{\mathrm{\scriptscriptstyle T}}} respectively.

To evaluate the accuracy of the estimated β^\widehat{\beta}, we use the so-called projection loss

d(β^,β)=β^(β^Tβ^)1β^Tβ(βTβ)1βTF,d(\widehat{\beta},\beta)=||\widehat{\beta}(\widehat{\beta}^{{\mathrm{\scriptscriptstyle T}}}\widehat{\beta})^{-1}\widehat{\beta}^{{\mathrm{\scriptscriptstyle T}}}-\beta(\beta^{{\mathrm{\scriptscriptstyle T}}}\beta)^{-1}\beta^{{\mathrm{\scriptscriptstyle T}}}||_{F},

which measures the distance between the subspace spanned by β^\widehat{\beta} and the target subspace spanned by β\beta in a coordinate-free manner. The angle between β^\widehat{\beta} and β\beta, denoted by (β^,β)\angle(\widehat{\beta},\beta), is also presented in a more intuitive way.

5.2 Results

The first part of our numerical study illustrates the cost of (ϵ,δ)(\epsilon,\delta)-privacy in SIR estimation under the ordinary circumstance which means K=1K=1. We compare two strategies for ensuring the privacy of the slice mean matrix: the vanilla i.i.d. Gaussian mechanism and the improved vectorized Gaussian mechanism (VGM). We denote the SIR combined with the former strategy as SIR-IID and the latter as SIR-VGM. To make comparison easier, we generate datasets only for models (I), (III), and (V) separately, varying the sample size nn from 100100 to 5000050000. We take the privacy level sets (ϵx,δx)(\epsilon_{x},\delta_{x}) at (1,1/n1.1)(1,1/n^{1.1}) and (ϵm,δm)(\epsilon_{m},\delta_{m}) at (1,1/n1.1)(1,1/n^{1.1}) separately. In each experiment, we calculate the angle (β^,β)\angle(\widehat{\beta},\beta), where β^\widehat{\beta} is estimated using the original SIR, SIR-IID, and SIR-VGM, respectively. Figure 2 plots the averaged angle obtained from 400400 replications, with an error bar locating 90%90\% of the results. The plot clearly shows that SIR-VGM, indicated by the red curve, significantly outperforms SIR-IID, represented by the light blue curve, across all three models. Notably, SIR-IID yields unsatisfactory results for Model (III) and (V), both of which assume a structure dimension d=2d=2. Furthermore, as the sample size increases, the curve of SIR-VGM approaches the lower bound achieved by the original SIR method. This observation suggests that the vectorized Gaussian mechanism introduces only a negligible loss of accuracy when the sample size is sufficiently large.

Refer to caption
Figure 2: The cost of (ϵ,δ)(\epsilon,\delta)-privacy in the SIR estimation when K=1K=1.

When K>1K>1, the original SIR method can not be applied. The subsequent study investigates the effectiveness of the federated paradigm (K>1K>1) by reporting the performance of two approaches, namely FSIR-IID and FSIR-VGM, under Model (I), against varying KK. As mentioned earlier, Model (I) involves a binary response, which implies that the estimated β^\widehat{\beta} is scaled to the discriminant direction LDA gives. Consequently, this study also evaluates the performance of two differentially private LDA methods within the federated framework. We take the client sample size n{100,500,1000}n\in\{100,500,1000\} and the client number KK ranges from 22 to 200200. The privacy parameters are unchanged, except for ϵx=3\epsilon_{x}=3. The selection of a relatively large value of ϵx\epsilon_{x} in this context is intended to regulate the scale of the added noise on the covariance matrix when comparing FSIR-IID and FSIR-VGM. In each setting, we calculate (β^,β)\angle(\widehat{\beta},\beta) and repeat the corresponding experiment 400400 times. Figure 3 depicts the averaged angles for two methods against KK. The results demonstrate that both methods achieve smaller angles as KK increases. Moreover, FSIR-VGM performs better than FSIR-IID across all settings, particularly when nn is large.

Refer to caption
Figure 3: Averaged angle of FSIR-IID and FSIR-VGM calculated for model (I) plotted against client number KK. From left to right, the client sample size nn ranges from 100100 to 10001000.

For a comprehensive assessment of FSIR-IID and FSIR-VGM, we evaluate their performance across all five models using the subspace distance metric d(β,β^)d(\beta,\widehat{\beta}), which is defined above. For simplicity, we fix the slice number to H=8H=8 when the response is continuous, and assume that ϵx\epsilon_{x} and ϵm\epsilon_{m} share the same value ϵ{1,2}\epsilon\in\{1,2\}. In the low-dimensional setting, we set p=10p=10 and conduct experiments for combinations of client sample size n{500,1000,2500,5000}n\in\{500,1000,2500,5000\} and client number K{1,10,50,100}K\in\{1,10,50,100\}. Table 1 and 2 report the averaged d(β,β^)d(\beta,\widehat{\beta}) based on 400400 replications, with standard errors in parenthesis. Notice when K=1K=1, indicating a single client in the federated framework, FSIR-IID (FSIR-VGM) reduces to SIR-IID (SIR-VGM). For high-dimensional cases, we fix n=1000n=1000 for each client and set p=500,1000,2000p=500,1000,2000. The simulation results are summarized in Table 3 and 4. We can see as client number KK increases, FSIR-IID and FSIR-VGM get better results, while FSIR-VGM performs significantly better than FSIR-IID in almost all settings.

Table 1: p=10,H=8,ϵ=1p=10,\ H=8,\ \epsilon=1.
n FSIR-IID FSIR-VGM
K=1 K=10 K=50 K=100 K=1 K=10 K=50 K=100
I 500 1.306 (0.13) 1.268 (0.15) 0.650 (0.19) 0.382 (0.10) 1.290 (0.14) 1.298 (0.12) 0.598 (0.17) 0.384 (0.09)
1000 1.260 (0.15) 0.547 (0.13) 0.206 (0.05) 0.141 (0.03) 1.277 (0.16) 0.500 (0.15) 0.190 (0.04) 0.128 (0.03)
2500 1.131 (0.23) 0.327 (0.07) 0.135 (0.03) 0.091 (0.02) 1.092 (0.25) 0.287 (0.08) 0.115 (0.03) 0.078 (0.02)
5000 0.563 (0.13) 0.186 (0.04) 0.079 (0.02) 0.054 (0.01) 0.428 (0.11) 0.140 (0.03) 0.058 (0.01) 0.040 (0.01)
II 500 1.360 (0.07) 1.374 (0.06) 1.292 (0.14) 1.061 (0.22) 1.328 (0.10) 1.275 (0.14) 1.167 (0.23) 0.760 (0.26)
1000 1.370 (0.06) 1.311 (0.11) 0.788 (0.18) 0.565 (0.14) 1.274 (0.16) 1.057 (0.27) 0.369 (0.09) 0.241 (0.07)
2500 1.381 (0.05) 1.107 (0.15) 0.614 (0.15) 0.445 (0.10) 1.263 (0.16) 0.547 (0.18) 0.230 (0.06) 0.163 (0.04)
5000 1.331 (0.10) 0.854 (0.16) 0.438 (0.11) 0.308 (0.08) 0.938 (0.28) 0.281 (0.08) 0.136 (0.03) 0.111 (0.02)
III 500 1.780 (0.13) 1.760 (0.13) 1.364 (0.18) 1.156 (0.23) 1.773 (0.13) 1.716 (0.14) 1.303 (0.18) 0.952 (0.25)
1000 1.750 (0.13) 1.353 (0.18) 0.846 (0.23) 0.561 (0.16) 1.648 (0.20) 0.754 (0.14) 0.280 (0.05) 0.193 (0.04)
2500 1.628 (0.18) 1.142 (0.22) 0.577 (0.18) 0.387 (0.09) 1.365 (0.25) 0.422 (0.08) 0.169 (0.03) 0.113 (0.02)
5000 1.347 (0.17) 0.835 (0.25) 0.355 (0.09) 0.237 (0.06) 0.641 (0.13) 0.215 (0.04) 0.087 (0.02) 0.062 (0.01)
IV 500 1.762 (0.13) 1.704 (0.16) 1.452 (0.25) 1.089 (0.24) 1.715 (0.17) 1.679 (0.17) 1.431 (0.21) 0.957 (0.25)
1000 1.778 (0.13) 1.427 (0.22) 0.577 (0.11) 0.401 (0.07) 1.698 (0.16) 1.241 (0.27) 0.459 (0.08) 0.314 (0.06)
2500 1.724 (0.16) 0.974 (0.18) 0.421 (0.08) 0.302 (0.06) 1.579 (0.22) 0.735 (0.18) 0.287 (0.06) 0.211 (0.04)
5000 1.481 (0.21) 0.613 (0.11) 0.283 (0.05) 0.208 (0.04) 1.174 (0.26) 0.361 (0.07) 0.178 (0.03) 0.148 (0.02)
V 500 1.774 (0.12) 1.722 (0.16) 1.054 (0.24) 0.629 (0.12) 1.702 (0.16) 1.683 (0.19) 1.034 (0.22) 0.580 (0.12)
1000 1.741 (0.15) 0.952 (0.16) 0.367 (0.06) 0.245 (0.05) 1.669 (0.18) 0.817 (0.17) 0.296 (0.05) 0.194 (0.04)
2500 1.630 (0.17) 0.604 (0.11) 0.251 (0.05) 0.173 (0.03) 1.497 (0.23) 0.462 (0.08) 0.181 (0.03) 0.124 (0.02)
5000 1.005 (0.17) 0.364 (0.06) 0.154 (0.03) 0.107 (0.02) 0.739 (0.15) 0.226 (0.04) 0.092 (0.02) 0.063 (0.01)
Table 2: p=10,H=8,ϵ=2p=10,\ H=8,\ \epsilon=2.
n FSIR-IID FSIR-VGM
K=1 K=10 K=50 K=100 K=1 K=10 K=50 K=100
I 500 1.294 (0.13) 0.890 (0.29) 0.292 (0.07) 0.190 (0.05) 1.279 (0.14) 0.859 (0.26) 0.278 (0.07) 0.181 (0.04)
1000 0.939 (0.24) 0.277 (0.06) 0.111 (0.02) 0.077 (0.02) 0.850 (0.26) 0.234 (0.06) 0.094 (0.02) 0.064 (0.01)
2500 0.527 (0.12) 0.190 (0.04) 0.079 (0.02) 0.054 (0.01) 0.407 (0.11) 0.143 (0.04) 0.060 (0.01) 0.041 (0.01)
5000 0.323 (0.08) 0.113 (0.03) 0.047 (0.01) 0.034 (0.01) 0.216 (0.05) 0.076 (0.02) 0.032 (0.01) 0.022 (0.01)
II 500 1.370 (0.07) 1.366 (0.07) 0.926 (0.18) 0.653 (0.16) 1.302 (0.13) 1.204 (0.20) 0.557 (0.16) 0.373 (0.13)
1000 1.381 (0.05) 1.037 (0.18) 0.549 (0.13) 0.401 (0.10) 1.208 (0.20) 0.448 (0.13) 0.205 (0.05) 0.146 (0.04)
2500 1.325 (0.10) 0.869 (0.17) 0.426 (0.11) 0.296 (0.08) 0.950 (0.23) 0.280 (0.07) 0.141 (0.03) 0.116 (0.02)
5000 1.153 (0.15) 0.661 (0.16) 0.316 (0.08) 0.220 (0.06) 0.488 (0.13) 0.174 (0.04) 0.106 (0.02) 0.094 (0.02)
III 500 1.768 (0.13) 1.540 (0.15) 1.065 (0.25) 0.713 (0.20) 1.724 (0.14) 1.296 (0.24) 0.453 (0.09) 0.291 (0.05)
1000 1.558 (0.16) 1.062 (0.24) 0.489 (0.12) 0.323 (0.08) 1.147 (0.23) 0.365 (0.07) 0.150 (0.03) 0.105 (0.02)
2500 1.355 (0.17) 0.823 (0.23) 0.345 (0.09) 0.230 (0.05) 0.676 (0.12) 0.226 (0.04) 0.098 (0.02) 0.067 (0.01)
5000 1.133 (0.21) 0.527 (0.12) 0.218 (0.05) 0.149 (0.03) 0.405 (0.08) 0.135 (0.03) 0.058 (0.01) 0.039 (0.01)
IV 500 1.755 (0.15) 1.626 (0.19) 0.799 (0.18) 0.511 (0.12) 1.727 (0.15) 1.548 (0.20) 0.683 (0.16) 0.440 (0.09)
1000 1.722 (0.14) 0.835 (0.18) 0.356 (0.07) 0.266 (0.05) 1.533 (0.22) 0.567 (0.12) 0.250 (0.05) 0.193 (0.04)
2500 1.467 (0.21) 0.612 (0.12) 0.284 (0.06) 0.205 (0.04) 1.115 (0.25) 0.356 (0.07) 0.176 (0.03) 0.146 (0.02)
5000 1.039 (0.14) 0.425 (0.08) 0.211 (0.04) 0.164 (0.02) 0.521 (0.12) 0.211 (0.04) 0.136 (0.02) 0.125 (0.01)
V 500 1.757 (0.14) 1.370 (0.19) 0.488 (0.10) 0.322 (0.06) 1.717 (0.15) 1.360 (0.25) 0.441 (0.08) 0.283 (0.05)
1000 1.467 (0.20) 0.528 (0.10) 0.210 (0.04) 0.144 (0.03) 1.276 (0.25) 0.373 (0.07) 0.149 (0.03) 0.098 (0.02)
2500 1.008 (0.16) 0.369 (0.07) 0.152 (0.03) 0.103 (0.02) 0.704 (0.15) 0.224 (0.04) 0.091 (0.02) 0.063 (0.01)
5000 0.677 (0.12) 0.238 (0.04) 0.100 (0.02) 0.069 (0.01) 0.348 (0.06) 0.114 (0.02) 0.048 (0.01) 0.034 (0.01)
Table 3: n=1000,H=8,ϵ=1n=1000,\ H=8,\ \epsilon=1.
p FSIR-IID FSIR-VGM
K=1 K=10 K=50 K=100 K=1 K=10 K=50 K=100
I 500 0.523 (0.20) 0.192 (0.07) 0.073 (0.02) 0.049 (0.02) 0.308 (0.12) 0.106 (0.05) 0.051 (0.02) 0.033 (0.01)
1000 1.301 (0.13) 0.914 (0.27) 0.301 (0.08) 0.192 (0.05) 1.281 (0.13) 0.915 (0.27) 0.266 (0.07) 0.185 (0.05)
2000 1.318 (0.12) 0.926 (0.28) 0.298 (0.08) 0.197 (0.05) 1.302 (0.13) 0.894 (0.26) 0.291 (0.07) 0.191 (0.04)
II 500 1.387 (0.04) 0.734 (0.22) 0.355 (0.12) 0.245 (0.08) 1.375 (0.05) 0.389 (0.14) 0.198 (0.06) 0.176 (0.05)
1000 1.400 (0.02) 1.314 (0.11) 0.766 (0.18) 0.482 (0.13) 1.399 (0.02) 1.170 (0.24) 0.552 (0.17) 0.344 (0.10)
2000 1.402 (0.02) 1.294 (0.14) 0.752 (0.17) 0.516 (0.12) 1.399 (0.02) 1.170 (0.22) 0.565 (0.18) 0.354 (0.10)
III 500 1.917 (0.06) 0.516 (0.21) 0.211 (0.08) 0.132 (0.05) 1.908 (0.06) 0.205 (0.07) 0.079 (0.04) 0.049 (0.02)
1000 1.961 (0.03) 1.277 (0.26) 0.565 (0.10) 0.424 (0.08) 1.958 (0.03) 1.122 (0.25) 0.464 (0.07) 0.353 (0.07)
2000 1.962 (0.03) 1.206 (0.21) 0.611 (0.11) 0.501 (0.12) 1.956 (0.03) 1.008 (0.22) 0.529 (0.10) 0.450 (0.12)
IV 500 1.912 (0.06) 0.888 (0.24) 0.407 (0.09) 0.311 (0.06) 1.908 (0.06) 0.692 (0.23) 0.313 (0.07) 0.260 (0.05)
1000 1.961 (0.03) 1.728 (0.14) 1.297 (0.27) 0.801 (0.20) 1.954 (0.03) 1.680 (0.14) 1.228 (0.27) 0.762 (0.19)
2000 1.954 (0.03) 1.731 (0.15) 1.232 (0.26) 0.755 (0.18) 1.950 (0.04) 1.699 (0.17) 1.112 (0.29) 0.702 (0.17)
V 500 1.912 (0.06) 0.340 (0.10) 0.146 (0.04) 0.090 (0.03) 1.902 (0.07) 0.201 (0.06) 0.080 (0.02) 0.054 (0.02)
1000 1.957 (0.03) 1.444 (0.20) 0.510 (0.10) 0.334 (0.06) 1.954 (0.03) 1.333 (0.25) 0.453 (0.08) 0.294 (0.05)
2000 1.958 (0.02) 1.402 (0.22) 0.509 (0.09) 0.340 (0.06) 1.950 (0.03) 1.315 (0.26) 0.470 (0.08) 0.303 (0.05)
Table 4: n=1000,H=8,ϵ=2n=1000,\ H=8,\ \epsilon=2.
p FSIR-IID FSIR-VGM
K=1 K=10 K=50 K=100 K=1 K=10 K=50 K=100
I 500 0.362 (0.13) 0.123 (0.05) 0.047 (0.01) 0.037 (0.01) 0.167 (0.06) 0.069 (0.03) 0.029 (0.01) 0.025 (0.01)
1000 1.228 (0.17) 0.375 (0.09) 0.161 (0.04) 0.108 (0.02) 1.206 (0.20) 0.348 (0.09) 0.141 (0.03) 0.096 (0.02)
2000 1.209 (0.19) 0.386 (0.09) 0.160 (0.04) 0.109 (0.02) 1.204 (0.19) 0.337 (0.09) 0.138 (0.03) 0.094 (0.02)
II 500 1.386 (0.04) 0.511 (0.17) 0.255 (0.08) 0.199 (0.06) 1.370 (0.05) 0.220 (0.07) 0.161 (0.04) 0.156 (0.03)
1000 1.397 (0.03) 1.063 (0.22) 0.474 (0.13) 0.323 (0.09) 1.396 (0.03) 0.760 (0.27) 0.277 (0.08) 0.183 (0.05)
2000 1.401 (0.02) 1.003 (0.21) 0.467 (0.12) 0.328 (0.09) 1.394 (0.03) 0.692 (0.22) 0.270 (0.08) 0.190 (0.05)
III 500 1.913 (0.06) 0.317 (0.10) 0.127 (0.05) 0.087 (0.05) 1.906 (0.06) 0.113 (0.06) 0.045 (0.03) 0.035 (0.04)
1000 1.958 (0.03) 0.776 (0.14) 0.393 (0.09) 0.314 (0.11) 1.956 (0.03) 0.525 (0.08) 0.305 (0.11) 0.265 (0.13)
2000 1.963 (0.03) 0.813 (0.15) 0.489 (0.11) 0.425 (0.14) 1.957 (0.02) 0.582 (0.10) 0.425 (0.14) 0.394 (0.15)
IV 500 1.913 (0.06) 0.556 (0.14) 0.288 (0.06) 0.247 (0.04) 1.908 (0.06) 0.372 (0.09) 0.240 (0.04) 0.226 (0.03)
1000 1.957 (0.03) 1.517 (0.23) 0.633 (0.14) 0.428 (0.09) 1.951 (0.03) 1.431 (0.25) 0.544 (0.12) 0.382 (0.07)
2000 1.954 (0.03) 1.434 (0.23) 0.586 (0.13) 0.409 (0.09) 1.954 (0.03) 1.389 (0.25) 0.505 (0.11) 0.345 (0.08)
V 500 1.903 (0.07) 0.230 (0.07) 0.089 (0.03) 0.062 (0.02) 1.901 (0.07) 0.103 (0.03) 0.041 (0.01) 0.027 (0.01)
1000 1.952 (0.03) 0.697 (0.13) 0.274 (0.05) 0.181 (0.03) 1.953 (0.03) 0.577 (0.12) 0.225 (0.04) 0.150 (0.02)
2000 1.955 (0.03) 0.708 (0.13) 0.275 (0.05) 0.184 (0.03) 1.953 (0.03) 0.578 (0.10) 0.219 (0.04) 0.151 (0.03)

6 Real data analysis

6.1 Human Activity Recognition with Smartphones

To evaluate our method on classification problem, we examine the human activity recognition dataset (Anguita et al., 2013), which can be downloaded from the UCI website. The data were collected from 3030 volunteers who performed six activities (walking, walking upstairs, walking downstairs, sitting, standing, and laying) while wearing a smartphone on their waist. Sensor records were then captured using the smartphone’s embedded accelerometer and gyroscope at a rate of 5050 Hz, resulting in a dataset with 1029910299 samples and 561561 features. Given the sensitivity of cellphone data in real life, we suggest to partition the dataset into 3030 disjoint subsets according to different subjects (regarded as clients), each containing no more than 410 samples, thus representing a high-dimensional setting on any single client.

In this experiment, to clarify our idea more clearly, we study three physical activity levels of a person: active (includes walking and walking downstairs/upstairs), sedentary (includes standing and sitting), and lying down. We perform FSIR and its differentially private counterparts on the decentralized dataset. With the estimated sparsity parameter s^=5\hat{s}=5 and structure dimension d^=2\hat{d}=2, FSIR selects 1010 active covariates to construct a 22-dimensional SDR subspace 𝒮^Y|X\widehat{\mathcal{S}}_{Y|X}. Figure 4 depicts 200200 randomly selected samples for each activity on the estimated 𝒮^Y|X\widehat{\mathcal{S}}_{Y|X}. The three activity levels are easily distinguished in each plot, demonstrating the effectiveness of our proposed methods. Notably, both FSIR-IID and FSIR-VGM guarantee a (2,1/n1.1)(2,1/n^{1.1})-level differential privacy, where nn is the minimum sample size of a particular activity performed by a single individual across all activities of all subjects. We can see when an adequate number of samples is available, FSIR-VGM is preferable.

Refer to caption
(a) FSIR
Refer to caption
(b) FSIR-IID
Refer to caption
(c) FSIR-VGM
Figure 4: The scatter plot of the SDR subspace from FSIR, FSIR-IID and FSIR-VGM on a demo subset of the human activity recognition dataset.

6.2 Airline on-time performance

We apply our method to an airline on-time performance data in the R package “nycflights13”. This dataset comprises information on 336,776336,776 flights that departed from various airports in NYC (e.g., EWR, JFK, and LGA) to different destinations in year 2013. Our object is to analyze the Arrival Delay based on seven variables: Month, Day, Departure Delay, Arrival time, Scheduled arrival time, Air time, and Distance. Notice the response Arrival Delay is continuous, we take H=6H=6 slices. The whole dataset is divided into disjoint clients according to 1111 selected airlines, each containing n=5000n=5000 samples. We set the privacy level set to (ϵ,δ)=(1,1/n1.1)(\epsilon,\delta)=(1,1/n^{1.1}), ensuring a satisfactory level of privacy protection for the clients’ data. Given the estimated structure dimension d^=1\hat{d}=1, Figure 5 plots Arrival Delay (denoted by yy) against the projected feature β^1TX\widehat{\beta}_{1}^{{\mathrm{\scriptscriptstyle T}}}X, based on a randomly selected subset of 10001000 samples. Clearly, the VGM mechanism successfully preserves the pattern observed in FSIR.

Refer to caption
(a) FSIR
Refer to caption
(b) FSIR-IID
Refer to caption
(c) FSIR-VGM
Figure 5: Arrival Delay plotted against projected features from FSIR, FSIR-IID and FSIR-VGM on a demo subset.

7 Discussion

We provide a federated framework for performing sufficient dimension reduction on decentralized data, ensuring both data ownership and privacy. As the first investigation on privacy protection of the SDR subspace estimation, we clarify our idea in a relative simple setting, which implicitly assumes that data on different clients are homogeneous. In practice, however, data might not be identically distributed across clients and we leave this heterogeneous scenario for future study. Considering the SIR estimator still suffers from some limitations, we claim that strategies and technical tools provided in the work can help many other SDR estimators develop their own federated extensions, with a differentially private guarantee.

Supplementary material

References

  • Abadi et al. (2016) Martin Abadi, Andy Chu, Ian Goodfellow, H Brendan McMahan, Ilya Mironov, Kunal Talwar, and Li Zhang. Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC conference on computer and communications security, pages 308–318, 2016.
  • Anguita et al. (2013) Davide Anguita, Alessandro Ghio, Luca Oneto, Xavier Parra, Jorge Luis Reyes-Ortiz, et al. A public domain dataset for human activity recognition using smartphones. In Esann, volume 3, page 3, 2013.
  • Apple (2017) D Apple. Learning with privacy at scale. Apple Machine Learning Journal, 1(8), 2017.
  • Avella-Medina (2021) Marco Avella-Medina. Privacy-preserving parametric inference: a case for robust statistics. Journal of the American Statistical Association, 116(534):969–983, 2021.
  • Bassily et al. (2014) Raef Bassily, Adam Smith, and Abhradeep Thakurta. Private empirical risk minimization: Efficient algorithms and tight error bounds. In 2014 IEEE 55th annual symposium on foundations of computer science, pages 464–473. IEEE, 2014.
  • Biswas et al. (2020) Sourav Biswas, Yihe Dong, Gautam Kamath, and Jonathan Ullman. Coinpress: Practical private mean and covariance estimation. Advances in Neural Information Processing Systems, 33:14475–14485, 2020.
  • Bun et al. (2014) Mark Bun, Jonathan Ullman, and Salil Vadhan. Fingerprinting codes and the price of approximate differential privacy. In Proceedings of the forty-sixth annual ACM symposium on Theory of computing, pages 1–10, 2014.
  • Bura and Cook (2001) Efstathia Bura and R Dennis Cook. Extending sliced inverse regression: The weighted chi-squared test. Journal of the American Statistical Association, 96(455):996–1003, 2001.
  • Cai et al. (2021) T Tony Cai, Yichen Wang, and Linjun Zhang. The cost of privacy: Optimal rates of convergence for parameter estimation with differential privacy. The Annals of Statistics, 49(5):2825–2850, 2021.
  • Cai et al. (2020) Zhanrui Cai, Runze Li, and Liping Zhu. Online sufficient dimension reduction through sliced inverse regression. The Journal of Machine Learning Research, 21(1):321–345, 2020.
  • Calandrino et al. (2011) Joseph A Calandrino, Ann Kilzer, Arvind Narayanan, Edward W Felten, and Vitaly Shmatikov. ” you might also like:” privacy risks of collaborative filtering. In 2011 IEEE symposium on security and privacy, pages 231–246. IEEE, 2011.
  • Chaudhuri et al. (2012) Kamalika Chaudhuri, Anand Sarwate, and Kaushik Sinha. Near-optimal differentially private principal components. Advances in neural information processing systems, 25, 2012.
  • Chaudhuri et al. (2013) Kamalika Chaudhuri, Anand D Sarwate, and Kaushik Sinha. A near-optimal algorithm for differentially-private principal components. Journal of Machine Learning Research, 14, 2013.
  • Chen et al. (2022) Canyi Chen, Wangli Xu, and Liping Zhu. Distributed estimation in heterogeneous reduced rank regression: With application to order determination in sufficient dimension reduction. Journal of Multivariate Analysis, 190:104991, 2022.
  • Chen and Li (2001) Chun-Houh Chen and Ker-Chau Li. Generalization of fisher’ s linear discriminant analysis via the approach of sliced inverse regression. Journal of the Korean Statistical Society, 30(2):193–217, 2001.
  • Chen et al. (2010) Xin Chen, Changliang Zou, and R Dennis Cook. Coordinate-independent sparse sufficient dimension reduction and variable selection. 2010.
  • Cook (1994) R Dennis Cook. On the interpretation of regression plots. Journal of the American Statistical Association, 89(425):177–189, 1994.
  • Cook (1996) R Dennis Cook. Graphics for regressions with a binary response. Journal of the American Statistical Association, 91(435):983–992, 1996.
  • Cook (2009) R Dennis Cook. Regression graphics: Ideas for studying regressions through graphics. John Wiley & Sons, 2009.
  • Cook and Forzani (2008) R Dennis Cook and Liliana Forzani. Principal fitted components for dimension reduction in regression. Statistical Science, 23(4):485–501, 2008.
  • Cui et al. (2023) Wenquan Cui, Yue Zhao, Jianjun Xu, and Haoyang Cheng. Federated sufficient dimension reduction through high-dimensional sparse sliced inverse regression. arXiv preprint arXiv:2301.09500, 2023.
  • Ding et al. (2017) Bolin Ding, Janardhan Kulkarni, and Sergey Yekhanin. Collecting telemetry data privately. Advances in Neural Information Processing Systems, 30, 2017.
  • Drechsler (2023) Jörg Drechsler. Differential privacy for government agencies—are we there yet? Journal of the American Statistical Association, 118(541):761–773, 2023.
  • Duan et al. (2022) Rui Duan, Yang Ning, and Yong Chen. Heterogeneity-aware and communication-efficient distributed statistical inference. Biometrika, 109(1):67–83, 2022.
  • Duchi et al. (2022) John C Duchi, Vitaly Feldman, Lunjia Hu, and Kunal Talwar. Subspace recovery from heterogeneous data with non-isotropic noise. Advances in Neural Information Processing Systems, 35:5854–5866, 2022.
  • Dwork and Lei (2009) Cynthia Dwork and Jing Lei. Differential privacy and robust statistics. In Proceedings of the forty-first annual ACM symposium on Theory of computing, pages 371–380, 2009.
  • Dwork et al. (2006) Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. Calibrating noise to sensitivity in private data analysis. In Theory of cryptography conference, pages 265–284. Springer, 2006.
  • Dwork et al. (2014) Cynthia Dwork, Aaron Roth, et al. The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science, 9(3–4):211–407, 2014.
  • Dwork et al. (2015) Cynthia Dwork, Adam Smith, Thomas Steinke, Jonathan Ullman, and Salil Vadhan. Robust traceability from trace amounts. In 2015 IEEE 56th Annual Symposium on Foundations of Computer Science, pages 650–669. IEEE, 2015.
  • Dwork et al. (2017) Cynthia Dwork, Adam Smith, Thomas Steinke, and Jonathan Ullman. Exposed! a survey of attacks on private data. Annu. Rev. Stat. Appl, 4(1):61–84, 2017.
  • Dwork et al. (2018) Cynthia Dwork, Weijie J Su, and Li Zhang. Differentially private false discovery rate control. arXiv preprint arXiv:1807.04209, 2018.
  • Erlingsson et al. (2014) Úlfar Erlingsson, Vasyl Pihur, and Aleksandra Korolova. Rappor: Randomized aggregatable privacy-preserving ordinal response. In Proceedings of the 2014 ACM SIGSAC conference on computer and communications security, pages 1054–1067, 2014.
  • Grammenos et al. (2020) Andreas Grammenos, Rodrigo Mendoza Smith, Jon Crowcroft, and Cecilia Mascolo. Federated principal component analysis. Advances in Neural Information Processing Systems, 33:6453–6464, 2020.
  • Homer et al. (2008) Nils Homer, Szabolcs Szelinger, Margot Redman, David Duggan, Waibhav Tembe, Jill Muehling, John V Pearson, Dietrich A Stephan, Stanley F Nelson, and David W Craig. Resolving individuals contributing trace amounts of dna to highly complex mixtures using high-density snp genotyping microarrays. PLoS genetics, 4(8):e1000167, 2008.
  • Jiang et al. (2016) Wuxuan Jiang, Cong Xie, and Zhihua Zhang. Wishart mechanism for differentially private principal components analysis. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 30, 2016.
  • Kamath et al. (2019) Gautam Kamath, Jerry Li, Vikrant Singhal, and Jonathan Ullman. Privately learning high-dimensional distributions. In Conference on Learning Theory, pages 1853–1902. PMLR, 2019.
  • Lei (2011) Jing Lei. Differentially private m-estimators. Advances in Neural Information Processing Systems, 24, 2011.
  • Li (1991) Ker-Chau Li. Sliced inverse regression for dimension reduction. Journal of the American Statistical Association, 86(414):316–327, 1991.
  • Li (2007) Lexin Li. Sparse sufficient dimension reduction. Biometrika, 94(3):603–613, 2007.
  • Li et al. (2020) Tian Li, Anit Kumar Sahu, Ameet Talwalkar, and Virginia Smith. Federated learning: Challenges, methods, and future directions. IEEE signal processing magazine, 37(3):50–60, 2020.
  • Lin et al. (2018) Qian Lin, Zhigen Zhao, and Jun S Liu. On consistency and sparsity for sliced inverse regression in high dimensions. The Annals of Statistics, 46(2):580–610, 2018.
  • Lin et al. (2019) Qian Lin, Zhigen Zhao, and Jun S Liu. Sparse sliced inverse regression via lasso. Journal of the American Statistical Association, 114(528):1726–1739, 2019.
  • Lin et al. (2021) Qian Lin, Xinran Li, Dongming Huang, and Jun S. Liu. On the optimality of sliced inverse regression in high dimensions. The Annals of Statistics, 49(1):1 – 20, 2021. doi: 10.1214/19-AOS1813. URL https://doi.org/10.1214/19-AOS1813.
  • Liu et al. (2022) Xiaokang Liu, Rui Duan, Chongliang Luo, Alexis Ogdie, Jason H Moore, Henry R Kranzler, Jiang Bian, and Yong Chen. Multisite learning of high-dimensional heterogeneous data with applications to opioid use disorder study of 15,000 patients across 5 clinical sites. Scientific reports, 12(1):11073, 2022.
  • Luo and Li (2021) Wei Luo and Bing Li. On order determination by predictor augmentation. Biometrika, 108(3):557–574, 2021.
  • McMahan et al. (2017) Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Aguera y Arcas. Communication-efficient learning of deep networks from decentralized data. In Artificial intelligence and statistics, pages 1273–1282. PMLR, 2017.
  • Penrose et al. (1985) Keith W Penrose, AG Nelson, and AG Fisher. Generalized body composition prediction equation for men using simple measurement techniques. Medicine & Science in Sports & Exercise, 17(2):189, 1985.
  • Talwar et al. (2015) Kunal Talwar, Abhradeep Guha Thakurta, and Li Zhang. Nearly optimal private lasso. Advances in Neural Information Processing Systems, 28, 2015.
  • Tan et al. (2020) Kai Tan, Lei Shi, and Zhou Yu. Sparse SIR: Optimal rates and adaptive estimation. The Annals of Statistics, 48(1):64 – 85, 2020. doi: 10.1214/18-AOS1791. URL https://doi.org/10.1214/18-AOS1791.
  • Tan et al. (2018) Kean Ming Tan, Zhaoran Wang, Tong Zhang, Han Liu, and R Dennis Cook. A convex formulation for high-dimensional sparse sliced inverse regression. Biometrika, 105(4):769–782, 2018.
  • Tong et al. (2022) Jiayi Tong, Chongliang Luo, Md Nazmul Islam, Natalie E Sheils, John Buresh, Mackenzie Edmondson, Peter A Merkel, Ebbing Lautenbach, Rui Duan, and Yong Chen. Distributed learning for heterogeneous clinical data with application to integrating covid-19 data across 230 sites. NPJ digital medicine, 5(1):76, 2022.
  • Vershynin (2012) R Vershynin. Introduction to the non-asymptotic analysis of random matrices. In Compress Sensing, pages 210–268. Cambridge Univ Press, 2012.
  • Vershynin (2018) R Vershynin. High-dimensional probability: An introduction with applications in data science. Cambridge Univ Press, 2018.
  • Wang and Xu (2021) Di Wang and Jinhui Xu. Differentially private high dimensional sparse covariance matrix estimation. Theoretical Computer Science, 865:119–130, 2021.
  • Wasserman and Zhou (2010) Larry Wasserman and Shuheng Zhou. A statistical framework for differential privacy. Journal of the American Statistical Association, 105(489):375–389, 2010.
  • Weyl (1912) H Weyl. ”das asymptotische verteilungsgesetz der eigenwerte linearer partieller differentialgleichungen”. Math. Ann, 71:441–479, 1912.
  • Xu et al. (2022) Kelin Xu, Liping Zhu, and Jianqing Fan. Distributed sufficient dimension reduction for heterogeneous massive data. Statistica Sinica, 32:2455–2476, 2022.
  • Zeng et al. (2022) Jing Zeng, Qing Mai, and Xin Zhang. Subspace estimation with automatic dimension and variable selection in sufficient dimension reduction. Journal of the American Statistical Association, pages 1–13, 2022.