This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

PRR: Customized Diffusion Models by Image Space Rank Reduction

David S. Hippocampus
Department of Computer Science
Cranberry-Lemon University
Pittsburgh, PA 15213
hippo@cs.cranberry-lemon.edu
Use footnote for providing further information about author (webpage, alternative address)—not for acknowledging funding agencies.

Appendix A Theorem and Proof of PRR

Theorem 1.

For matrix WW, the image space of WW is a d-dimension vector space SdS_{d}. If we have matrix Q=[q1q2qr]Q=\begin{bmatrix}\vec{q_{1}}\ \vec{q_{2}}\ ...\ \vec{q_{r}}\end{bmatrix} with qiSdq_{i}\in S_{d} and vectors qiq_{i} are mutually orthonormal, then WQQTWW-QQ^{T}W has a (d-r)-dimension image space.

This is actually an intuitive result, and to minimize any potential misleading, we provide a proof here.

Proof

The dimension of the image space of a matrix is same as the dimension of the column space of the matrix.

For all column vectors of QQ and WW, {q1,q2qr,wi}Sd\{\vec{q}_{1},\vec{q}_{2}...\vec{q}_{r},\vec{w}_{i}\}\in S_{d}, we perform the Gram-Schmidt process to obtain an orthonormal basis {q1,q2qr,v1,v2vdr}\{\vec{q}_{1},\vec{q}_{2}...\vec{q}_{r},\vec{v}_{1},\vec{v}_{2}...\vec{v}_{d-r}\} of the image space of WW.

WQQTWW-QQ^{T}W transforms each column vector wi\vec{w_{i}} of WW into wij=1r(qjwi)qj\vec{w}_{i}-\sum_{j=1}^{r}(\vec{q}_{j}\cdot\vec{w}_{i})\vec{q}_{j}, which is a linear combination of {v1,v2vdr}\{\vec{v}_{1},\vec{v}_{2}...\vec{v}_{d-r}\}. Then WQQTWW-QQ^{T}W has a (d-r)-dimension image space.

Detailed Proof

Assume WW is a b×kb\times k matrix. Denote WQQTWW-QQ^{T}W as [u1u2uk][\vec{u}_{1}\ \vec{u}_{2}...\ \vec{u}_{k}]. Denote WW as [w1w2wk][\vec{w}_{1}\ \vec{w}_{2}...\ \vec{w}_{k}]. Present WQQTWW-QQ^{T}W in a vector form is

[u1u2uk]=[w1w2wk][q1q2qr][q1Tq2TqrT][w1w2wk][\vec{u}_{1}\ \vec{u}_{2}...\ \vec{u}_{k}]=[\vec{w}_{1}\ \vec{w}_{2}...\ \vec{w}_{k}]-[\vec{q}_{1}\ \vec{q}_{2}...\ \vec{q}_{r}]\left[\begin{matrix}\vec{q}_{1}^{T}\\ \vec{q}_{2}^{T}\\ ...\\ \vec{q}_{r}^{T}\\ \end{matrix}\right][\vec{w}_{1}\ \vec{w}_{2}...\ \vec{w}_{k}]

=[w1w2wk][q1q2qr][q1Tw1q1Tw2q1Twkq2Tw1q2Tw2q2TwkqrTw1qrTw2qrTwk]=\left[\vec{w}_{1}\ \vec{w}_{2}...\ \vec{w}_{k}\right]-\left[\vec{q}_{1}\ \vec{q}_{2}...\ \vec{q}_{r}\right]\left[\begin{matrix}\vec{q}_{1}^{T}\vec{w}_{1}&\vec{q}_{1}^{T}\vec{w}_{2}&...&\vec{q}_{1}^{T}\vec{w}_{k}\\ \vec{q}_{2}^{T}\vec{w}_{1}&\vec{q}_{2}^{T}\vec{w}_{2}&...&\vec{q}_{2}^{T}\vec{w}_{k}\\ ...&...&...&...\\ \vec{q}_{r}^{T}\vec{w}_{1}&\vec{q}_{r}^{T}\vec{w}_{2}&...&\vec{q}_{r}^{T}\vec{w}_{k}\end{matrix}\right]

=[w1w2wk][i=1rqiqiTw1i=1rqiqiTw2i=1rqiqiTwk]=\left[\vec{w}_{1}\ \vec{w}_{2}...\ \vec{w}_{k}\right]-\left[\begin{matrix}\sum_{i=1}^{r}\vec{q}_{i}\vec{q}_{i}^{T}\vec{w}_{1}&\sum_{i=1}^{r}\vec{q}_{i}\vec{q}_{i}^{T}\vec{w}_{2}&...&\sum_{i=1}^{r}\vec{q}_{i}\vec{q}_{i}^{T}\vec{w}_{k}\end{matrix}\right]

For each column, it is uj=wji=1rqiqiTwj\vec{u}_{j}=\vec{w}_{j}-\sum_{i=1}^{r}\vec{q}_{i}\vec{q}_{i}^{T}\vec{w}_{j}

The image space of WW is a d-dimension vector space SdS_{d}, then we have d<bd<b and d<kd<k.

WQQTWW-QQ^{T}W has a (d-r)-dimension image space means:

{g1,g2,gdr}\exists\{\vec{g}_{1},\vec{g}_{2},...\vec{g}_{d-r}\}, s.t. giSd\vec{g}_{i}\in S_{d}, ui{u1,u2uk}\forall\vec{u}_{i}\in\{\vec{u}_{1},\ \vec{u}_{2}...\ \vec{u}_{k}\}, {a1,adr}\exists\{a_{1},...a_{d-r}\}, s.t. ui=i=1draigi,ai\vec{u}_{i}=\sum_{i=1}^{d-r}\vec{a}_{i}\vec{g}_{i},a_{i}\in\mathbb{R}

Construct a new set PP containing all column vectors of the matrices QQ and WW, P={q1,q2,,qr,w1,w2,,wk}P=\{\vec{q}_{1},\vec{q}_{2},...,\vec{q}_{r},\vec{w}_{1},\vec{w}_{2},...,\vec{w}_{k}\}. Apply the Gram-Schmidt process to PP, we will get dd orthogonal basis vectors {v1,v2,,vd}\{\vec{v}_{1},\vec{v}_{2},...,\vec{v}_{d}\}

Apply the Gram-Schmidt process to PP:

Step1: v1=q1\vec{v}_{1}=\vec{q}_{1}

Step2: v2=q2q1q2q1q1q1=q2\vec{v}_{2}=\vec{q}_{2}-\frac{\vec{q}_{1}\cdot\vec{q}_{2}}{\vec{q}_{1}\cdot\vec{q}_{1}}\vec{q}_{1}=\vec{q}_{2}

Step3: v3=q3q1q3q1q1q1q2q3q2q2=q3\vec{v}_{3}=\vec{q}_{3}-\frac{\vec{q}_{1}\cdot\vec{q}_{3}}{\vec{q}_{1}\cdot\vec{q}_{1}}\vec{q}_{1}-\frac{\vec{q}_{2}\cdot\vec{q}_{3}}{\vec{q}_{2}\cdot\vec{q}_{2}}=\vec{q}_{3}

Step(r): vr=qr\vec{v}_{r}=\vec{q}_{r}

Step(r+1): vr+1=w1i=1rqiw1qiqiqi\vec{v}_{r+1}=\vec{w}_{1}-\sum_{i=1}^{r}\frac{\vec{q}_{i}\cdot\vec{w}_{1}}{\vec{q}_{i}\cdot\vec{q}_{i}}\vec{q}_{i}

Step(r+2): vr+2=w2i=1rqiw2qiqiqiv1w2v1v1v1\vec{v}_{r+2}=\vec{w}_{2}-\sum_{i=1}^{r}\frac{\vec{q}_{i}\cdot\vec{w}_{2}}{\vec{q}_{i}\cdot\vec{q}_{i}}\vec{q}_{i}-\frac{\vec{v}_{1}\cdot\vec{w}_{2}}{\vec{v}_{1}\cdot\vec{v}_{1}}\vec{v}_{1}

As we only have dd basis vectors, we will have the orthogonal basis {q1,q2,,qr,vr+1,,vd}\{\vec{q}_{1},\vec{q}_{2},...,\vec{q}_{r},\vec{v}_{r+1},...,\vec{v}_{d}\} finally.

Assume wj=l=1ralql+l=r+1dblvl\vec{w}_{j}=\sum_{l=1}^{r}a_{l}\vec{q}_{l}+\sum_{l=r+1}^{d}b_{l}\vec{v}_{l}

uj=wji=1rqiqiTwj=l=1ralql+l=r+1dblvli=1rqiqiT(l=1ralql+l=r+1dblvl)=l=1ralql+l=r+1dblvll=1ralql(as qiTql=1 if i=lqiTql=0 if ilqiTvl=0 if lr+1)
=l=r+1dblvl
\vec{u}_{j}=\vec{w}_{j}-\sum_{i=1}^{r}\vec{q}_{i}\vec{q}_{i}^{T}\vec{w}_{j}\\ =\sum_{l=1}^{r}a_{l}\vec{q}_{l}+\sum_{l=r+1}^{d}b_{l}\vec{v}_{l}-\sum_{i=1}^{r}\vec{q}_{i}\vec{q}_{i}^{T}(\sum_{l=1}^{r}a_{l}\vec{q}_{l}+\sum_{l=r+1}^{d}b_{l}\vec{v}_{l})\\ =\sum_{l=1}^{r}a_{l}\vec{q}_{l}+\sum_{l=r+1}^{d}b_{l}\vec{v}_{l}-\sum_{l=1}^{r}a_{l}\vec{q}_{l}\quad\text{(as $\vec{q}_{i}^{T}\vec{q}_{l}=1$ if $i=l$, $\vec{q}_{i}^{T}\vec{q}_{l}=0$ if $i\neq l$, $\vec{q}_{i}^{T}\vec{v}_{l}=0$ if $l\geq r+1$)}\\ =\sum_{l=r+1}^{d}b_{l}\vec{v}_{l}

As we want to proof {g1,g2,gdr}\exists\{\vec{g}_{1},\vec{g}_{2},...\vec{g}_{d-r}\}, s.t. giSd\vec{g}_{i}\in S_{d}, ui{u1,u2uk}\forall\vec{u}_{i}\in\{\vec{u}_{1},\ \vec{u}_{2}...\ \vec{u}_{k}\}, {a1,adr}\exists\{a_{1},...a_{d-r}\}, s.t. ui=i=1draigi,ai\vec{u}_{i}=\sum_{i=1}^{d-r}\vec{a}_{i}\vec{g}_{i},a_{i}\in\mathbb{R},

so we have {g1,g2,gdr}={vr+1,,vd}\{\vec{g}_{1},\vec{g}_{2},...\vec{g}_{d-r}\}=\{\vec{v}_{r+1},...,\vec{v}_{d}\}, s.t. giSd\vec{g}_{i}\in S_{d}, ui{u1,u2uk}\forall\vec{u}_{i}\in\{\vec{u}_{1},\ \vec{u}_{2}...\ \vec{u}_{k}\}, {a1,adr}\exists\{a_{1},...a_{d-r}\}, s.t. ui=i=1draigi,ai\vec{u}_{i}=\sum_{i=1}^{d-r}\vec{a}_{i}\vec{g}_{i},a_{i}\in\mathbb{R}

Therefore, WQQTWW-QQ^{T}W has a (d-r)-dimension image space.

Appendix B Comparisons of the generation performance for different subset parameters

EXCA, as a parameter subset, is choosed in our experiments. Here, we provide additional examples generated using other subsets of parameters. We experimented with eight different parameter subsets.

  • Exclude Cross-Attention (ExCA)

    • This method trains all layers except cross-attention and time embedding layers.

  • Exclude Self-Attention (ExSA)

    • This method trains all layers except self-attention layers.

  • Self-Attention Only (SAO)

    • This method trains only the self-attention layers.

  • Cross-Attention Only (CAO)

    • This method trains only the cross-attention layers.

  • Full Model Training (FMT)

    • This method trains all layers of the model.

  • Strict Cross-Attention (SCA)

    • This method trains only the queries and keys within the cross-attention mechanisms.

  • Exclude Cross-Attention High-Level (ExCA-HL)

    • This method trains all layers except cross-attention layers, with an emphasis on high-level feature representations.

  • Exclude Cross-Attention High-Level Last (ExCA-HL-Last)

    • This method trains all layers except cross-attention layers, focusing on the final stages of the high-level feature space.

As shown in Table 1, regardless of the subset used, the parameter count of PRR is significantly lower than that of LoRA. In Fig. 1, we use the example of a bear plushie, with r=16r=16, to illustrate the effects of different parameter subsets. It can be observed that models with larger parameter counts, such as FMT, CAO, and ExSA, tend to align better with the training images. Conversely, models with smaller parameter counts may not align well with the target subject but match the text more closely. The model with the largest parameter count, FMT, even produced a ’hybrid’ result in the multi-subject example "A girl is holding a small bear [V]". Models like SAO and SCA strike a better balance. EXCA, being a well-performing subset, has numerous examples listed in other sections and will not be repeated here.

rr=2 rr=16 rr=128
Subset PRR LoRA PRR LoRA PRR LoRA
Exclude Cross-Attention (ExCA) 1.8 MB 4.8 MB 13 MB 33 MB 87 MB 190 MB
Exclude Self-Attention (ExSA) 1.9 MB 4.8 MB 13 MB 33 MB 87 MB 190 MB
Self-Attention Only (SAO) 1.6M 3.1M 11M 21M 82M 163M
Cross-Attention Only (CAO) 1.6M 3.5M 11M 25M 82M 193M
Full Model Training (FMT) 3.4M 8.2M 23M 58M 169M 382M
Strict Cross-Attention (SCA) 1.2M 2.8M 7.9M 20M 62M 152M
Exclude Cross-Attention High-Level (ExCA-HL) 276K 700K 1.9M 5.0M 14M 30M
Exclude Cross-Attention High-Level Last (ExCA-HL-Last) 6.9K 53K 42K 403K 82K 803K
Table 1: Fine-tuning subsets of parameters in UNet, comparing them with LoRA at various ranks, along with their corresponding model sizes.

Refer to caption

Figure 1: PRR Single subject generation on different parameter subsets

Appendix C Proof of PRR Sequential Addition in PRR Combination

Here we will prove that in PRR combination, W0QmQmTW0=W1Q2Q2TW1=W0Q1Q1TW0Q2Q2TW0+Q2Q2TQ1Q1TW0W_{0}-Q_{m}^{\prime}Q_{m}^{\prime T}W_{0}=W_{1}-Q_{2}Q_{2}^{T}W_{1}=W_{0}-Q_{1}Q_{1}^{T}W_{0}-Q_{2}Q_{2}^{T}W_{0}+Q_{2}Q_{2}^{T}Q_{1}Q_{1}^{T}W_{0}.

For two trained PRRs, we have parameters:

Q1=[q11q12q1r1]d×r1Q1=\begin{bmatrix}\vec{q_{11}}\ \vec{q_{12}}\ ...\vec{q_{1r_{1}}}\end{bmatrix}_{d\times r_{1}}

Q2=[q21q22q2r2]d×r2Q2=\begin{bmatrix}\vec{q_{21}}\ \vec{q_{22}}\ ...\vec{q_{2r_{2}}}\end{bmatrix}_{d\times r_{2}}.

For the standard combination of Q1Q_{1} and Q2Q_{2}: Qm=[q11q12q1r1q21q22q2r2]d×(r1+r2)Qm=\begin{bmatrix}\vec{q_{11}}\ \vec{q_{12}}\ ...\vec{q_{1r_{1}}}\vec{q_{21}}\ \vec{q_{22}}\ ...\vec{q_{2r_{2}}}\end{bmatrix}_{d\times(r_{1}+r_{2})}

With QmRm=QmQ_{m}^{\prime}R_{m}^{\prime}=Q_{m}, same as what we dicussed in Appendix A, we can find out a orthonormal basis {q11,q12q1r1,p1,p2prmr1}\{\vec{q}_{11},\vec{q}_{12}...\vec{q}_{1r_{1}},\vec{p}_{1},\vec{p}_{2}...\vec{p}_{r_{m}-r_{1}}\} of rmr_{m} vectors from the column vectors of Q1Q_{1} and Q2Q_{2}.

Wm=W0QmQmTW0W_{m}=W_{0}-Q_{m}^{\prime}Q_{m}^{\prime T}W_{0} means for each column vector wi\vec{w_{i}} of WW, it is transferred to

wijr1(q1jwi)q1jlrmr1(plwi)pl\displaystyle\vec{w_{i}}-\sum_{j}^{r_{1}}(\vec{q}_{1j}\cdot\vec{w}_{i})\vec{q}_{1j}-\sum_{l}^{r_{m}-r_{1}}(\vec{p}_{l}\cdot\vec{w}_{i})\vec{p}_{l} (1)

For the practical combination Wm=W1Q2Q2TW1=W0Q1Q1TW0Q2Q2TW0+Q2Q2TQ1Q1TW0W_{m}=W_{1}-Q_{2}Q_{2}^{T}W_{1}=W_{0}-Q_{1}Q_{1}^{T}W_{0}-Q_{2}Q_{2}^{T}W_{0}+Q_{2}Q_{2}^{T}Q_{1}Q_{1}^{T}W_{0}, wi\vec{w_{i}} is transferred to

wijr1(q1jwi)q1jsr2(q2swi)q2s+sr2((jr1(q1jwi)q1j)q2s)q2s\displaystyle\vec{w_{i}}-\sum_{j}^{r_{1}}(\vec{q}_{1j}\cdot\vec{w}_{i})\vec{q}_{1j}-\sum_{s}^{r_{2}}(\vec{q}_{2s}\cdot\vec{w}_{i})\vec{q}_{2s}+\sum_{s}^{r_{2}}((\sum_{j}^{r_{1}}(\vec{q}_{1j}\cdot\vec{w}_{i})\vec{q}_{1j})\cdot\vec{q}_{2s})\vec{q}_{2s} (2)

We have q2s=jr1asjq1j+lrmr1bslpl\vec{q}_{2s}=\sum_{j}^{r_{1}}a_{sj}\vec{q}_{1j}+\sum_{l}^{r_{m}-r_{1}}b_{sl}\vec{p}_{l}, and asj=q1jq2sa_{sj}=\vec{q}_{1j}\cdot\vec{q}_{2s}, bsl=plq2sb_{sl}=\vec{p}_{l}\cdot\vec{q}_{2s} then expression 2 is

wijr1(q1jwi)q1jsr2(jr1asjq1j+lrmr1bslplwi)q2s+sr2((jr1(q1jwi)q1j)q2s)q2s\displaystyle\vec{w_{i}}-\sum_{j}^{r_{1}}(\vec{q}_{1j}\cdot\vec{w}_{i})\vec{q}_{1j}-\sum_{s}^{r_{2}}(\sum_{j}^{r_{1}}a_{sj}\vec{q}_{1j}+\sum_{l}^{r_{m}-r_{1}}b_{sl}\vec{p}_{l}\cdot\vec{w}_{i})\vec{q}_{2s}+\sum_{s}^{r_{2}}((\sum_{j}^{r_{1}}(\vec{q}_{1j}\cdot\vec{w}_{i})\vec{q}_{1j})\cdot\vec{q}_{2s})\vec{q}_{2s} (3)
=wijr1(q1jwi)q1jsr2((jr1asjq1j+lrmr1bslpl)wi)q2s+sr2((jr1(q1jwi)q1j)q2s)q2s\displaystyle=\vec{w_{i}}-\sum_{j}^{r_{1}}(\vec{q}_{1j}\cdot\vec{w}_{i})\vec{q}_{1j}-\sum_{s}^{r_{2}}((\sum_{j}^{r_{1}}a_{sj}\vec{q}_{1j}+\sum_{l}^{r_{m}-r_{1}}b_{sl}\vec{p}_{l})\cdot\vec{w}_{i})\vec{q}_{2s}+\sum_{s}^{r_{2}}((\sum_{j}^{r_{1}}(\vec{q}_{1j}\cdot\vec{w}_{i})\vec{q}_{1j})\cdot\vec{q}_{2s})\vec{q}_{2s} (4)
=sr2((jr1asjq1jwi)((jr1(q1jwi)q1j)q2s))q2s+sr2lrmr1(bslplwi)q2s\displaystyle=...-\sum_{s}^{r_{2}}((\sum_{j}^{r_{1}}a_{sj}\vec{q}_{1j}\cdot\vec{w}_{i})-((\sum_{j}^{r_{1}}(\vec{q}_{1j}\cdot\vec{w}_{i})\vec{q}_{1j})\cdot\vec{q}_{2s}))\vec{q}_{2s}+\sum_{s}^{r_{2}}\sum_{l}^{r_{m}-r_{1}}(b_{sl}\vec{p}_{l}\cdot\vec{w}_{i})\vec{q}_{2s} (5)

When substituting asja_{sj} and bslb_{sl}, the intermediate terms are eliminated, which is

wijr1(q1jwi)q1j+sr2lrmr1(plq2splwi)q2s\displaystyle\vec{w_{i}}-\sum_{j}^{r_{1}}(\vec{q}_{1j}\cdot\vec{w}_{i})\vec{q}_{1j}+\sum_{s}^{r_{2}}\sum_{l}^{r_{m}-r_{1}}(\vec{p}_{l}\cdot\vec{q}_{2s}\vec{p}_{l}\cdot\vec{w}_{i})\vec{q}_{2s} (6)
=wijr1(q1jwi)q1j+lrmr1sr2(plq2sq2s)plwi\displaystyle=\vec{w_{i}}-\sum_{j}^{r_{1}}(\vec{q}_{1j}\cdot\vec{w}_{i})\vec{q}_{1j}+\sum_{l}^{r_{m}-r_{1}}\sum_{s}^{r_{2}}(\vec{p}_{l}\cdot\vec{q}_{2s}\vec{q}_{2s})\vec{p}_{l}\cdot\vec{w}_{i} (7)

The term sr2(plq2sq2s)\sum_{s}^{r_{2}}(\vec{p}_{l}\cdot\vec{q}{2s}\vec{q}{2s}) represents the total component of pl\vec{p}_{l} in the column space of Q2Q_{2}. As pl\vec{p}_{l} is orthogonal with {q11,q12q1r1}\{\vec{q}_{11},\vec{q}_{12}...\vec{q}_{1r_{1}}\}, we have sr2(plq2sq2s)=pl\sum_{s}^{r_{2}}(\vec{p}_{l}\cdot\vec{q}_{2s}\vec{q}_{2s})=\vec{p}_{l}. Hence, it is proven that Expression 1 and Expression 7 are equal.

Appendix D compare different rank boundary γ\gamma

Appendix E different rr values in image editing and LoRA combination

Appendix F More generation results

comapre dreambooth and textual inversion

Refer to caption

Figure 2:

Appendix G Image Editing Mathematical Discussion

The stability of PRR outputs is reflected in that different Gaussian noises tend to yield the same result, represented as:

h=Wx=W(x+Δx)\displaystyle h=Wx=W(x+\Delta x) (8)

the equation is true when WΔx=𝟎W\Delta x=\mathbf{0}, which is denoted by Δxkernel(W)\Delta x\in kernel(W).

According to the rank-nullity theorem, for the linear transformation W:XHW:X\to H, rank(W)+nullity(W)=dimX{rank(W)+nullity(W)=\dim X}. (The nullity of WW is the dimension of WW kernel).) In PRR, our reduced rank r=rank(W0)rank(Wreduce)=nullity(Wreduce)nullity(W0)r=rank(W_{0})-rank(W_{reduce})=nullity(W_{reduce})-nullity(W_{0}), we have rank(W0)>rank(Wreduce)rank(W_{0})>rank(W_{reduce}) which implies nullity(Wreduce)>nullity(W0)nullity(W_{reduce})>nullity(W_{0}), more of Δx\Delta x in PRR will not produce different outputs. This reduced rank rr demonstrates the problem [meng2021sdedit] of the trade-off between faithful reconstruction and editability in image editing, As rr increases, the modifiable features decrease, making the reconstruction more faithful. As rr decreases, the modifiable features increase, bringing the diversity of the generated images closer to that of the underlying pre-trained generation model, and improving the editability. When a large rr is selected for training the PRR on a single image, the model generates images that closely resemble the training image, even when using various text prompts. This enables direct modification of the text prompt to facilitate image editing on the single train image.

Appendix H Survey

Optionally include supplemental material (complete proofs, additional experiments and plots) in appendix. All such materials SHOULD be included in the main submission.