PRR: Customized Diffusion Models by Image Space Rank Reduction
Appendix A Theorem and Proof of PRR
Theorem 1.
For matrix , the image space of is a d-dimension vector space . If we have matrix with and vectors are mutually orthonormal, then has a (d-r)-dimension image space.
This is actually an intuitive result, and to minimize any potential misleading, we provide a proof here.
Proof
The dimension of the image space of a matrix is same as the dimension of the column space of the matrix.
For all column vectors of and , , we perform the Gram-Schmidt process to obtain an orthonormal basis of the image space of .
transforms each column vector of into , which is a linear combination of . Then has a (d-r)-dimension image space.
Detailed Proof
Assume is a matrix. Denote as . Denote as . Present in a vector form is
For each column, it is
The image space of is a d-dimension vector space , then we have and .
has a (d-r)-dimension image space means:
, s.t. , , , s.t.
Construct a new set containing all column vectors of the matrices and , . Apply the Gram-Schmidt process to , we will get orthogonal basis vectors
Apply the Gram-Schmidt process to :
Step1:
Step2:
Step3:
…
Step(r):
Step(r+1):
Step(r+2):
…
As we only have basis vectors, we will have the orthogonal basis finally.
Assume
As we want to proof , s.t. , , , s.t. ,
so we have , s.t. , , , s.t.
Therefore, has a (d-r)-dimension image space.
Appendix B Comparisons of the generation performance for different subset parameters
EXCA, as a parameter subset, is choosed in our experiments. Here, we provide additional examples generated using other subsets of parameters. We experimented with eight different parameter subsets.
-
•
Exclude Cross-Attention (ExCA)
-
–
This method trains all layers except cross-attention and time embedding layers.
-
–
-
•
Exclude Self-Attention (ExSA)
-
–
This method trains all layers except self-attention layers.
-
–
-
•
Self-Attention Only (SAO)
-
–
This method trains only the self-attention layers.
-
–
-
•
Cross-Attention Only (CAO)
-
–
This method trains only the cross-attention layers.
-
–
-
•
Full Model Training (FMT)
-
–
This method trains all layers of the model.
-
–
-
•
Strict Cross-Attention (SCA)
-
–
This method trains only the queries and keys within the cross-attention mechanisms.
-
–
-
•
Exclude Cross-Attention High-Level (ExCA-HL)
-
–
This method trains all layers except cross-attention layers, with an emphasis on high-level feature representations.
-
–
-
•
Exclude Cross-Attention High-Level Last (ExCA-HL-Last)
-
–
This method trains all layers except cross-attention layers, focusing on the final stages of the high-level feature space.
-
–
As shown in Table 1, regardless of the subset used, the parameter count of PRR is significantly lower than that of LoRA. In Fig. 1, we use the example of a bear plushie, with , to illustrate the effects of different parameter subsets. It can be observed that models with larger parameter counts, such as FMT, CAO, and ExSA, tend to align better with the training images. Conversely, models with smaller parameter counts may not align well with the target subject but match the text more closely. The model with the largest parameter count, FMT, even produced a ’hybrid’ result in the multi-subject example "A girl is holding a small bear [V]". Models like SAO and SCA strike a better balance. EXCA, being a well-performing subset, has numerous examples listed in other sections and will not be repeated here.
=2 | =16 | =128 | ||||||
---|---|---|---|---|---|---|---|---|
Subset | PRR | LoRA | PRR | LoRA | PRR | LoRA | ||
Exclude Cross-Attention (ExCA) | 1.8 MB | 4.8 MB | 13 MB | 33 MB | 87 MB | 190 MB | ||
Exclude Self-Attention (ExSA) | 1.9 MB | 4.8 MB | 13 MB | 33 MB | 87 MB | 190 MB | ||
Self-Attention Only (SAO) | 1.6M | 3.1M | 11M | 21M | 82M | 163M | ||
Cross-Attention Only (CAO) | 1.6M | 3.5M | 11M | 25M | 82M | 193M | ||
Full Model Training (FMT) | 3.4M | 8.2M | 23M | 58M | 169M | 382M | ||
Strict Cross-Attention (SCA) | 1.2M | 2.8M | 7.9M | 20M | 62M | 152M | ||
Exclude Cross-Attention High-Level (ExCA-HL) | 276K | 700K | 1.9M | 5.0M | 14M | 30M | ||
Exclude Cross-Attention High-Level Last (ExCA-HL-Last) | 6.9K | 53K | 42K | 403K | 82K | 803K |
Appendix C Proof of PRR Sequential Addition in PRR Combination
Here we will prove that in PRR combination, .
For two trained PRRs, we have parameters:
.
For the standard combination of and :
With , same as what we dicussed in Appendix A, we can find out a orthonormal basis of vectors from the column vectors of and .
means for each column vector of , it is transferred to
(1) |
For the practical combination , is transferred to
(2) |
We have , and , then expression 2 is
(3) | |||
(4) | |||
(5) |
When substituting and , the intermediate terms are eliminated, which is
(6) | |||
(7) |
Appendix D compare different rank boundary
Appendix E different values in image editing and LoRA combination
Appendix F More generation results
comapre dreambooth and textual inversion
Appendix G Image Editing Mathematical Discussion
The stability of PRR outputs is reflected in that different Gaussian noises tend to yield the same result, represented as:
(8) |
the equation is true when , which is denoted by .
According to the rank-nullity theorem, for the linear transformation , . (The nullity of is the dimension of kernel).) In PRR, our reduced rank , we have which implies , more of in PRR will not produce different outputs. This reduced rank demonstrates the problem [meng2021sdedit] of the trade-off between faithful reconstruction and editability in image editing, As increases, the modifiable features decrease, making the reconstruction more faithful. As decreases, the modifiable features increase, bringing the diversity of the generated images closer to that of the underlying pre-trained generation model, and improving the editability. When a large is selected for training the PRR on a single image, the model generates images that closely resemble the training image, even when using various text prompts. This enables direct modification of the text prompt to facilitate image editing on the single train image.
Appendix H Survey
Optionally include supplemental material (complete proofs, additional experiments and plots) in appendix. All such materials SHOULD be included in the main submission.