This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

unsupervised machine learning for physical concepts

Ruyu Yang Graduate School of China Academy of Engineering Physics, Beijing 100193, China
Abstract

In recent years, machine learning methods have been used to assist scientists in scientific research. Human scientific theories are based on a series of concepts. How machine learns the concepts from experimental data will be an important first step. We propose a hybrid method to extract interpretable physical concepts through unsupervised machine learning. This method consists of two stages. At first, we need to find the Betti numbers of experimental data. Secondly, given the Betti numbers, we use a variational autoencoder network to extract meaningful physical variables. We test our protocol on toy models and show how it works.

I introduction

Physics is built on many different concepts, such as force, entropy, and Hamiltonian. Scientists derive meaningful concepts from observations by their ingenuity and then use formulas to connect them, constituting the theory. As a principle, this theory is always as simple as possible. Since the beginning of the 21st century, machine learning has developed rapidly Liu et al. (2018) and has been widely used in various fields Teichert et al. (2019); Bourilkov (2019); Mishra et al. (2016), including machine-assisted scientific research Vamathevan et al. (2019); Valletta et al. (2017); Schmidt et al. (2019); Carleo et al. (2019). A natural question arises: can machines propose scientific theories by themselves? It is undoubtedly a fundamental and challenging goal. Many works have studied this from different aspects Ha and Jeong (2021); Zhang and Lin (2018); Boullé et al. (2021); Zobeiry and Humfeld (2019); Farina et al. (2020); Wu and Tegmark (2019); Zheng et al. (2018); D’Agnolo and Wulzer (2019). In general, the establishment of the theory can be divided into two steps: identify the critical variables from observations and connect them by formula. A technique known as symbolic regression has been developed for the second step Udrescu and Tegmark (2020); Udrescu et al. (2020). The authors propose a network named AI Feymann to automatically find the formula the system obeys. To increase the success rate of symbolic regression, the critical variables identified by the first step should be as few as possible.

In this work, we focus on the first step of establishing a theory. We suggest using both Topological Data Analysis(TDA)Wasserman (2018); chazal2017introduction; Murugan and Robertson (2019) and variational autoencoder(VAE) Kingma and Welling (2019, 2013); An and Cho (2015); Khobahi and Soltanalian (2019); Rezende et al. (2014); Higgins et al. (2016); Burgess et al. (2018) to extract meaningful variables from extensive observations. TDA is a recent and fast-growing tool to analyze and exploit the complex topological and geometric structures underlying data. This tool is necessary for our protocol if specific structures such as circles and sphere are in experimental data. VAE is a deep learning technique for learning latent representations. This technique has been widely used in many problemsLuchnikov et al. (2019); Cerri et al. (2019) as a generative model. It can also be seen as a manifold learning network for dimensionality reduction and unsupervised learning tasks.

Our protocol has two stages. Firstly, we use TDA to infer relevant topological features for experimental data. In the simplest case, where the manifold has a low dimension, an essential feature for us is the Betti numbers, topology invariants. Once we get the topological features, we can design the proper architecture and loss function of VAE. As we will show later, the latent variables of VAE need to form a manifold homeomorphic to the manifold composed of observations. After the training of the VAE network, the latent variables represent the key variables discovered by this machine. Thanks to the structure of the VAE network, the latent variables, and the observations are in one-to-one correspondence. Using the trained VAE network, one can calculate the latent variables and the observations from each other. That means the formula derived by symbolic regression connecting the latent variables can predict the experimental phenomena.

We test our protocol on three toy models. They have different topological features. The first is a classical coupled harmonic oscillator, where the observations constitute a circle embedded in three-dimensional Euclidean space. Another example is two balls rotating around the same center, with different radius and angular velocities. With the other ball as the reference system, the observations are the Cartesian coordinates of the ball. The observations constitute a lemniscate curve. The third is a two-level system, and the observations are the expected value of some physical quantity,i.e., some hermitian matrices. The observations constitute a sphere.

This paper is organized as follows. In section II, we describe the works been done before and the problem we want to solve in more detail. In section III, we introduce the architecture of neuron networks and argue why we need the manifold of latent variants should be homeomorphic to that of observations. In section IV, we show the performance of this protocol on three toy models. We compare the observations and latent variables to show the relation between them.

II Related work and our goal

Data-driven scientific discoveries are not new ideas. It follows from the revolutionary work of Johannes Kepler and Sir Isaac Newton Roscher et al. (2020). Unlike the 17th century, we now have higher quality data and better calculation tools. People have done a lot of research on how to make machines help people make scientific discoveries in different contexts Carleo et al. (2019); Cichos et al. (2020); Karpatne et al. (2018); Wetzel (2017); Rodriguez-Nieva and Scheurer (2019); Wu and Tegmark (2019). In the early days, people paid more attention to the symbolic regression Kim et al. (2020); Udrescu and Tegmark (2020); Udrescu et al. (2020); Lample and Charton (2019). Another challenging direction is to let the machine design experiments. In  Melnikov et al. (2018); Krenn et al. (2016); Fösel et al. (2018); Bukov et al. (2018), authors designed automated search techniques and a reinforcement-learning-based network to generate new experimental setups. In condensed physics, machine learning has been used to characterize phase transitions Van Nieuwenburg et al. (2017); Uvarov et al. (2020); Carrasquilla and Melko (2017); Wang (2016); Ch’Ng et al. (2017); Yu and Deng (2021).

In recent years, some works have contributed to letting the machine search for key physical concepts. They used the different networks to extract key physical parameters from the experimental data Iten et al. (2020); Zheng et al. (2018); Lu et al. (2020). In Zheng et al. (2018), authors propose an unsupervised method to extract physical parameters by interaction networks Battaglia et al. (2016). Another helpful tool is the VAE network, which has been widely used for similar goal Lu et al. (2020); Iten et al. (2020).

VAE network is a powerful tool, by which one can minimize the numbers of extracted variables by making the variables independent of each other  Higgins et al. (2016); Burgess et al. (2018). To do this, one can choose the prior distribution P(z)P(z) as latent variables zz. However, this method fails when the manifold of observations have some topological features. This paper aims to solve this problem.

Here we describe this goal in more detail. Suppose we have an experimental system 𝒮\mathcal{S}. In this system 𝒮\mathcal{S}, some physical variables change as the experiment progresses. We use 𝒫\mathcal{P} to denote the set consists of the value of these physical variables. We use 𝒫(k)\mathcal{P}^{(k)} to represent the value of these variables in time kk. From the system 𝒮\mathcal{S} we can derive an experimental data set \mathcal{E}. Every data point is a vector (k)\mathcal{E}^{(k)}\in\mathcal{E} where (k)\mathcal{E}^{(k)} denotes the data point belongs to time kk. From the perspective of physics, the change of experimental data must be attributed to the change of key physical variables. Therefore there is a function f:𝒫f:\mathcal{P}\rightarrow\mathcal{E} such that f((k))=𝒫(k)f(\mathcal{E}^{(k)})=\mathcal{P}^{(k)} for any kk. In this paper we aim to find the function f~\tilde{f} and f~1\tilde{f}^{-1} such that 𝒫~(k)=f~1(𝒮(k))\tilde{\mathcal{P}}^{(k)}=\tilde{f}^{-1}(\mathcal{S}^{(k)}) where 𝒫~(k)\tilde{\mathcal{P}}^{(k)} constitute a set 𝒫~\tilde{\mathcal{P}}. We call 𝒫~(k)\tilde{\mathcal{P}}^{(k)} the effective physical variables. We remark that 𝒫~(k)\tilde{\mathcal{P}}^{(k)} is not necessary to equal to 𝒫(k)\mathcal{P}^{(k)}, but their dimensions should be the same. Effective physical variables are enough to describe the experimental system. In fact, one can redefine the physical quantity can get a theory that looks very different but the predicted results are completely consistent with existing theories.

The problem arise if we try to use neuron network to find proper function f~\tilde{f} and f~1\tilde{f}^{-1}. All the functions neuron network can simulate is continuousLABEL:, so f~\tilde{f} and f~1\tilde{f}^{-1} must be continuous. In a real physical system, ff and f1f^{-1} don’t have to be like this. Therefore, in some cases, we can never find a 𝒫~(k)\tilde{\mathcal{P}}^{(k)} which has the same dimension with 𝒫(k)\mathcal{P}^{(k)}. For example, suppose we have a ball rotating around a center. The observable data is the location of the ball, denoted by Cartesian coordinates (k)=(x(k),y(k))\mathcal{E}^{(k)}=(x^{(k)},y^{(k)}), which form a circle, and the simplest physical variable is the angle 𝒫(k)=θ(k)[0,2π)\mathcal{P}^{(k)}=\theta^{(k)}\in[0,2\pi). In this case, f:θ(x,y)f:\theta\rightarrow(x,y) is continuous while f1f^{-1} is not, which cannot be approximated by neural network. This is because ff is a periodic function of θ\theta. More generally, as long as the manifold composed of \mathcal{E} has holes with any dimension, this problem arises. We call these physical variables topological physical variables(TPVs). Back to the last example, we can find the Betti numbers of the ball’s location is [1,1,0][1,1,0], where the second 11 means that it has one TPVs. Due to this reason, we suggest using TDA to identify the topological features firstly. After knowing the numbers of TPVs, we can design proper latent variables and the corresponding loss function \mathcal{L}. For the case we have two PPVs, we need two latent variables, named xx and yy, and we add the topological term |x2+y21||x^{2}+y^{2}-1| to \mathcal{L} to restrict them.

In at least two cases, the manifold will have holes. One is the \mathcal{E} of a conversed system,e.g., the classical harmonic oscillator. Another situation stems from the limits of the physical quantity itself, such as the single-qubit state, which forms a sphere. In addition to these two categories, sometimes the choice of observations and references also affects their topological properties.

III architecture of neuron network

observationslatent vaxriablesEncoder
Figure 1: The observations of a system form a form a closed curve in a three-dimensional space. We can encoder it as a two-dimensional circle. However, we don’t have to encode it as a circle. Ellipse or other homeomorphic curves are all allowed.

As shown in figure  2, This network consists of two parts. The left is called encoder, and the right is called decoder. The encoder part is used to encode high-dimensional data into low-dimensional space, while the decoder part is used to recover the data, i.e., the input from low-dimensional data. The latent variables are low-dimensional data, and we hope they contain all the information needed to recover the input data. So that we require the output x^\hat{x} is as close as possible to input xx. Therefore, we can train this neuron network without supervision.

In general, the encoder and decoder can simulate arbitrary continuous function ff and f1f^{-1}. That imposes restrictions on latent variables:

Theorem 1.

The manifold composed of latent variables must be homeomorphic to that of input xx.

This limitation means that the Betti numbers should keep the same. Suppose the inputs xx form a circle, as shown in 1, the easiest quantity to describe these data is the angle. According to 1, this neuron network can never find such a quantity because the angle will form a line segment that is not homeomorphic to a circle. On the other hand, VAE network can’t even find the Cartesian coordinate {x1,x2}\{x_{1},x_{2}\} because x1x_{1} and x2x_{2} are not irrelevant. To handle this case, we suggest analyzing the Betti numbers as a priori knowledge of input xx and tell the neuron network.

input xxoutput x^\hat{x}EncoderDecoderLatent
Figure 2: There is a bottleneck in the network which forces a compressed knowledge representation. One needn’t to set the decoder part as the inverse of encoder part, and the networks of encoder and decoder don’t have to be the same. In general, the dimension of output x^\hat{x} should be the same as the input xx. The number of hidden variables is usually less than the input.

The general loss function is wirtten as Higgins et al. (2016):

=α𝔼qϕ(𝐳𝐱)[logpθ(𝐱𝐳)]+βDKL(qϕ(𝐳𝐱)p(𝐳))\mathcal{L}=-\alpha\mathbb{E}_{q_{\phi}(\mathbf{z}\mid\mathbf{x})}\left[\log p_{\theta}(\mathbf{x}\mid\mathbf{z})\right]+\beta D_{KL}\left(q_{\phi}(\mathbf{z}\mid\mathbf{x})\|p(\mathbf{z})\right) (1)

Here θ,ϕ\theta,\phi are the parameters of decoder and encoder networks, respectively. p(z)p(z) is the prior distribution about the latent variables zz. qϕ(z|x)q_{\phi}(z|x) and pθ(x|z)p_{\theta}(x|z) denote the conditional probability. DKLD_{KL} means the KL divergence of two distributions.

DKL(qϕ(𝐳𝐱)p(𝐳))=𝔼qϕ(𝐳𝐱)(logqϕ(𝐳𝐱)logp(𝐳))D_{KL}\left(q_{\phi}(\mathbf{z}\mid\mathbf{x})\|p(\mathbf{z})\right)=\mathbb{E}_{q_{\phi}(\mathbf{z}\mid\mathbf{x})}(\log q_{\phi}(\mathbf{z}\mid\mathbf{x})-\log p(\mathbf{z})) (2)

To calculate the loss function for a given output x^\hat{x}, one needs to set up a prior p(z)p(z), and parameterize the distribution pθp_{\theta} and qϕq_{\phi}. In traditional VAE, the distributions p(z)p(z), pθp_{\theta} and qϕq_{\phi} are often supposed to be multidimensional Gaussian distribution. The mean of p(z)p(z) is zero the and variance is 11. The mean and variance of pθp_{\theta} and qϕq_{\phi} is the output of encoder network.

In our cases, we choose the prior p(z)p(\vec{z}) as p(z)=p(zg)×p(zt)p(\vec{z})=p(\vec{z_{g}})\times p(\vec{z_{t}}). Where zg\vec{z_{g}} denotes the GPVs, and zt\vec{z_{t}} denotes the TPVs. With the same as traditional VAE, in order to minimize the number of GPVs, a convenient way is to choose p(zg)p(\vec{z_{g}}) as Gaussion distribution p(zg)=N(0,I)p(\vec{z_{g}})=N(0,I). According to the topological features, we choose a proper topological term 𝒯\mathcal{T}. We choose the p(zt)p(\vec{z_{t}}) as p(zt)=AeTp(\vec{z_{t}})=Ae^{-T}. Here AA is a constant. The variance can be asorpted into the hyperparameters in loss function.

For the conditional distribution we choose as s qϕ(𝐳𝐱)=N(z,I)q_{\phi}(\mathbf{z}\mid\mathbf{x})=N(\vec{z},I), with constantly variance and means determined by the encoder network. pθ(𝐱𝐳)p_{\theta}(\mathbf{x}\mid\mathbf{z}) has the same expression pθ(𝐱𝐳)=N(x,I)p_{\theta}(\mathbf{x}\mid\mathbf{z})=N(x,I). In this assumption, the KL divergence can be written as

DKL(qϕ(𝐳𝐱)p(𝐳))=𝔼qϕ(𝐳𝐱)(logp(𝐳))+constantD_{KL}\left(q_{\phi}(\mathbf{z}\mid\mathbf{x})\|p(\mathbf{z})\right)=-\mathbb{E}_{q_{\phi}(\mathbf{z}\mid\mathbf{x})}(\log p(\mathbf{z}))+constant (3)

Thus the loss function can be written as a simplier expression:

=αxx^2+β𝒯γlog(P(zg))\mathcal{L}=\alpha\|x-\hat{x}\|_{2}+\beta\mathcal{T}-\gamma log(P(\vec{z_{g}})) (4)

where α,β,γ\alpha,\beta,\gamma are hyperparameters and 𝒯\mathcal{T} is the topological term depending on the Betti numbers. For example, when the Betti numbers are {1,0,1}\{1,0,1\}, 𝒯\mathcal{T} can be written as 𝒯=|z12+z22+z321|\mathcal{T}=|z_{1}^{2}+z_{2}^{2}+z_{3}^{2}-1|. When the Betti numbers are {1,2,0}\{1,2,0\}, 𝒯\mathcal{T} can be written as 𝒯=|(z12+z22)2(z12z22)|\mathcal{T}=|(z_{1}^{2}+z_{2}^{2})-2(z_{1}^{2}-z_{2}^{2})|, which is known as a Lemniscate. Here we use z\vec{z} to denote the latent variables. P(zg)P(\vec{z_{g}}) is the distribution of zg\vec{z_{g}}. We need to choose some latent variables as TPVs, and others as general physical variables(GPVs). In general, we know how many TPVs we need from the Betti numbers, but we usually don’t know how many GPVs we need. One solution is to set up as many GPVs as possible, and the redundant GPVs will be zero.

IV Numerical Simulation

Refer to caption
(a) latents of osillator
Refer to caption
(b) comparison between latent and x1x_{1}
Refer to caption
(c) comparison between latent and vv
Refer to caption
(d) latent of the ball’s orbit
Refer to caption
(e) comparison between latent and xx
Refer to caption
(f) comparison between latent and yy
Refer to caption
(g) latent of the observations of quantum states
Refer to caption
(h) comparison between the azimuths
Figure 3: Fig  3(a) to  3(c) belong to the first numerical simulation. 3(d) to 3(f) belong to the second numerical simulation. 3(g) and 3(h) belong to the third simulation. z1z_{1} and z2z_{2} denote the TPVs and zgz_{g} denotes the GPV. 3(a) and 3(d) shows that the GPV is always near zero, which means that we don’t need a GPV in these model. For the case of quantum state, we compare the polar coordinates of a quantum state and the azimuths of effective Bloch sphere.

We test this neuron network by numerical simulation. Three toy models whose manifolds are different are considered here.

We use pytorch to implement neural networks. Our code can be found here. We use the same structure in the encoding and decoding network. They have two hidden layers, each with 20 neurons. In encoder network, we choose Tanh as the active function, while in decoder network, we choose ReLU as the active function. In numerical simulation, we first generate a database of 1000×m1000\times m, where m is the dimension of the observations. We then use the TDA tools to analyze the Betti numbers of the manifold constituted by observations and set up the latent neurons. We used the Adam optimizer and set the learning rate to 0.00010.0001. At each training session, we randomly select 100 sets of data from the database as a batch, then calculate the average of its loss and reverse propagate it. All training can be done on the desktop in less than 6 hours.

IV.1 classical Harmonic oscillator

k1k_{1}mmk2k_{2}
Figure 4: The conversed system consists of two spring connected to a ball. The spring constants k1k_{1},k2k_{2} and the mass of the ball keep unchanged during the experiment. The total energy comprises of the potential energy of two springs and the kinetic of the ball.

One type of vital importance is the system with some conserved quantity. In classical mechanics, we study these systems by their phase space, constituted by all possible states. Generalized coordinates and generalized momentum are usually not independent of each other, but it is not possible to uniquely determine the other when only one is known. This is caused by the topological nature of the phase space. In some cases, the experimenter may only be able to make observations and cannot exert influence on the observation object. At this time, the observations may constitute a compact manifold, and the traditional VAE network cannot accurately reduce the dimension of the parameter space.

The most common conversed system is the harmonic oscillator, which is a conservative system. We first consider a ball connected with two springs, as shown in Fig 4. We can write the energy as

E=12mv2+12k1(x12)2+12k2(x+12)2E=\frac{1}{2}mv^{2}+\frac{1}{2}k_{1}(x-\frac{1}{2})^{2}+\frac{1}{2}k_{2}(x+\frac{1}{2})^{2} (5)

Here we make m=1m=1 and k1=k2=1k_{1}=k_{2}=1. The unit is not important. In loss function  4 we make [α,β,γ]=[1,1,100][\alpha,\beta,\gamma]=[1,1,100]. In this system, the underlying changing physical variables are 𝒫(k)=(x1,v)\mathcal{P}^{(k)}=(x_{1},v) or (x2,v)(x_{2},v). The observations one can choose here are k={x1,x2,v}\mathcal{E}^{k}=\{x_{1},x_{2},v\} where x1x_{1} and x2x_{2} denote the distance from the ball to the bottom of two springs, respectively, and vv denote the speed of the ball. We specify that the direction of speed is positive to the right and negative to the left. We generate the observations (k)\mathcal{E}^{(k)} by randomly sampling from the evolution of the system. According to classical mechanics, we know that the manifold should be a circle embedded in 3-d Euclid space. Programme shows that the Betti numbers are (1,1,0)(1,1,0). It means that we need 2 latent variables (z1,z2)(z_{1},z_{2}) and the topological term in loss function  (4) is 𝒯=|z12+z221|\mathcal{T}=|z_{1}^{2}+z_{2}^{2}-1|. Besides, we set up one general physical variable, whose prior P(zg)P(z_{g}) is a Gaussian distribution.

After the train is finished, we calculate the latent variables corresponding to the observations in \mathcal{E} by encoder network. As is shown in Fig 3(a), the latent variables constitute a circle again. Fig  3(a) shows that the GPV zgz_{g} is always zero for different observations. This means that the effective physical variables are 𝒫(k)~=(z1,z2)\tilde{\mathcal{P}^{(k)}}=(z_{1},z_{2}).

We compared the new physical variables (z1,z2)(z_{1},z_{2}) with observations (x1,v)(x_{1},v), as shown in Fig LABEL:fig:os_x and  3(c). Given effective physical variables P~\tilde{P}, we can uniquely determine PP while the mapping is continuous. They are one-to-one correspondence, so P~\tilde{P} can be used to build a theory.

IV.2 Orbit

ball-2ball-1ω\omega2ω2\omega
Figure 5: Two small balls rotate around the same immobility point. The two balls move at a constant speed circumference on the planes of the two meetings. One ball has twice the angular speed of the other.

In classical physics and special relativity, the inertial reference system plays an important role. The physical laws in an inertial reference frame are usually simpler than those in a non-inertial reference frame. In fact, there was no concept of an inertial reference frame at the beginning. Due to the existence of gravity, there is no truly perfect inertial reference frame. When observing some simple motions in a non-inertial frame, the observations may constitute some complex manifolds.

Consider a scenario where two balls, labeled by 11 and 22, rotate around a fixed point, as shown in Fig 5. Both balls do a constant-speed circular motion. We assume that the distance between it and the fixed point is 11. Establishing the Cartesian coordinate system with a fixed point as the origin, ball1ball-1 starts at (1,0,0)(1,0,0) and ball2ball-2 starts at (0,0,1)(0,0,1).They have the same radius of rotation, but they are on different planes. ball1ball-1 moves in the z=0z=0 plane and ball2ball-2 moves in the y=0y=0 plane. ball1ball-1 has the angular speed of ω1=1\omega_{1}=1 and ball2ball-2 has the angular speed of ω2=2\omega_{2}=2. Unit is not important. Observations k=(x,y,z)\mathcal{E}^{k}=(x,y,z) is the three-dimensional coordinates of ball1ball-1 measured by ball2ball-2 as the reference system.

We generate the observations (k)\mathcal{E}^{(k)} by random sampling from the trajectory With TDA we know the betti numbers of \mathcal{E} are (1,2,0)(1,2,0) so the topological term is |(z12+z22)20.01×(z12z22)||(z_{1}^{2}+z_{2}^{2})^{2}-0.01\times(z_{1}^{2}-z_{2}^{2})|, and again We set up one GPV. The hyperparameters of loss function 4 is [α,β,γ]=[1,100,100][\alpha,\beta,\gamma]=[1,100,100].

After training, the latent variables corresponding to the observations \mathcal{E} are plotted, as shown in fig  3(d). The GPV is always zero for different observations, which means the effective physical variables is P~=(z1,z2)\tilde{P}=(z_{1},z_{2}), like the first example. Fig  3(e) and  3(f) shows the comparison between the effective physical variables and observations. The results show that P~\tilde{P} and observational measurements are still one-to-one, so P~\tilde{P} is an effective representation.

IV.3 Quantum state

Both of the previously introduced situations come from the limitations of experimental conditions. If we can modify the conserved quantities of conservative systems through experiments, or find some approximate inertial reference frames, such as the earth or distant stars, then it is possible to turn it into a situation that can be solved by traditional VAE networks. Unlike these, the topological properties of some observations are derived from the laws of physics. If this physical quantity can only take partial values in a certain experimental system, then the VAE network may work at this time. But if we want to establish a physical theory, you need to have a comprehensive understanding of the key parameters, and then you need a VAE network based on topological properties.

In quantum mechanics, quantum state is described by wave function, i.e. a vector in Hilbert space. Here we consider a two-level system, which according to quantum mechanics can be described as |ψ=a|0+b|1|{\psi}\rangle=a|{0}\rangle+b|{1}\rangle, where |a|2+|b|2=1|a|^{2}+|b|^{2}=1. The machine don’t know how to describe this system, but it will learn some efficient variables. Suppose we can get five observations in experiments,i.e. k={𝒪1,𝒪2,,𝒪5}\mathcal{E}^{k}=\{\mathcal{O}^{1},\mathcal{O}^{2},\dots,\mathcal{O}^{5}\}. In experiments, \mathcal{E} is due to the experimental setup. In our numerical simulation, we calculate observations by 𝒪i=ψ|Oi|ψ\mathcal{O}^{i}=\langle{\psi}|O^{i}|{\psi}\rangle, where OiO^{i} is the pauli matrix and the combination of pauli matrix:

𝒪1=[0110],𝒪2=[0ii0],𝒪3=[1001]𝒪4=[01+i1i0],𝒪5=[1ii1]\begin{split}\mathcal{O}_{1}=\left[\begin{array}[]{cc}0&1\\ 1&0\end{array}\right],\mathcal{O}_{2}=\left[\begin{array}[]{cc}0&i\\ -i&0\end{array}\right],\mathcal{O}_{3}=\left[\begin{array}[]{cc}1&0\\ 0&-1\end{array}\right]\\ \mathcal{O}_{4}=\left[\begin{array}[]{cc}0&1+i\\ 1-i&0\end{array}\right],\mathcal{O}_{5}=\left[\begin{array}[]{cc}1&i\\ -i&-1\end{array}\right]\end{split} (6)

We need many different states, so the coefficient aa and bb of the wave function are the physical variables that change as the experiment progresses. In this model, 𝒫\mathcal{P} consists of wave functions we use and 𝒫k={ak,bk}\mathcal{P}^{k}=\{a^{k},b^{k}\}. The first step is to characterize the topological feature of \mathcal{E}. By TDA, we can find that the Betti numbers of the data set are [1,0,1][1,0,1], which means that the manifold of \mathcal{E} is homeomorphic to a sphere. This can be understood because we can use the point on the Bloch sphere to represent the state of a two-level system. So that we know the number of TVs is 33. However, we don’t know how many GVs are. Different from TVs, we can assume GVs are independent of each other. In general, we do this by assuming P(z)P(\vec{z}) in (4) is independent Gaussian distribution. In this case, we only introduce one GV.

As shown in Fig  3(g), after training, TVs form a sphere. At the same time GV stabilizes near 0(not drawn in the figure). That means the five observations can be reduced to three variables continuously. We call these three variables as equivalent density matrix, denoted by ρ~\tilde{\rho}. Only two of them are independent. We can construct two independent angles (θ,ϕ)(\theta,\phi) by transforming Cartesian coordinates to polar coordinates. In figure  3(h) we compare (θ0,ϕ0)(\theta_{0},\phi_{0}) and ϕ1\phi_{1} corresponding to the equivalent density matrix. We can see that these two representations have a one-to-one correspondence. Unlike before, the mapping here is discontinuous.

V Conclusion

In this work, we discuss the defect of the traditional VAE network and propose a simple solution. We can extract the minimum effective physical variables from the experimental data with the improved method by classifying the latent variables. We test our approach on three models. They represent three different situations that may arise that traditional VAE cannot handle. Some of them come from experimental restrictions, and some come from physical laws. We think the latter situation is more essential. However, in the more complex case, the Betti numbers are not the only useful topological features. Two manifolds may have the same Betti numbers but not Homeomorphic. In such a case, more topological features are needed to design reasonable restrictions on latent variables. When the manifold dimension is higher, it may be hard for TDA to calculate the Betti numbers. One efficient way is to firstly reduce the dimensions by traditional autoencoder, and then calculate the Betti numbers of latent variables. Another important question is how to ensure that the relation between meaningful variables and observations is simple. We leave it for the future work.

Appendix A persistent homology and betti numbers

Persistent homology is a useful tool for analyzing topological data. The first step is to generate a simplicial complex from the observations. Vietoris–Rips complex is a way of forming a topological space from distances in a set of points. This method uses a generalisation of an ϵ\epsilon neighbouring graph and the final complex is

𝒱ϵ={σ|r(u,v)<ϵ,(u,v)σ}\mathcal{V}_{\epsilon}=\{\sigma|r(u,v)<\epsilon,(u,v)\in\sigma\} (7)

Here σ\sigma denotes the simplex and u,vu,v are two data points. rr is the Euclidean metric. A kk-simplex σ\sigma is expressed as σ=[p1,p2,,pk+1]\sigma=[p_{1},p_{2},\dots,p_{k+1}], where pp denotes the point in the space. Given a set of kk-simplex σi\sigma_{i}, one kk-chain is defined by

c=Σiciσic=\Sigma_{i}c_{i}\sigma_{i} (8)

Here the coefficients cic_{i} take values in some field. For the simplicial complex 𝒱\mathcal{V}, we denote the set of all kk-chains by Ck(𝒱)C_{k}(\mathcal{V}). We can introduce the addition between two kk-chain. For c=Σiciσic=\Sigma_{i}c_{i}\sigma_{i} and d=Σidiσid=\Sigma_{i}d_{i}\sigma_{i}, the addition is c+d=Σi(ci+di)σic+d=\Sigma_{i}(c_{i}+d_{i})\sigma_{i}. Furthermore, the set of all kk-chains Ck(𝒱)C_{k}(\mathcal{V}) form a abelian group (Ck(𝒱),+)(C_{k}(\mathcal{V}),+).

For different kk, a natural group homomorphism is the boundary operator

k:Ck(V)Ck1(V)\partial_{k}:C_{k}(V)\rightarrow C_{k-1}(V) (9)

Boundary operator is linear, and it can be defined by the simplex

k(σk)=Σi=0i=k(1)ip0,,pi1,pi+1,,pk\partial_{k}(\sigma_{k})=\Sigma_{i=0}^{i=k}(-1)^{i}\langle p_{0},\dots,p_{i-1},p_{i+1},\dots,p_{k}\rangle (10)

One important subgroup of Ck(𝒱)C_{k}(\mathcal{V}) is the kernel of the boundary operator ZkZ_{k}, namely kk-cycles. There is also an important subgroups of ZkZ_{k}, denoted by BkB_{k}, which is the imagine Bk(V)=Im(k+1Ck+1(𝒱))B_{k}(V)=Im(\partial_{k+1}C_{k+1}(\mathcal{V})).

Both ZkZ_{k} and BkB_{k} are normal, so we can define the quotient group Hk=Zk/BkH_{k}=Z_{k}/B_{k}, called the kk-th homology group. The rank of HkH_{k} is called the kk-th Betti number.

Appendix B Comparison with other work

As we point in the paper, the topological feature of latent space will influence the construction of the output. In the field of machine learning, this phenomenon is called manifold mismatch de Haan and Falorsi (2018); Davidson et al. (2018); Rey et al. (2019); Gong and Cheng (2019), which will lead to poor representations.

Let’s consider what happens when manifold mismatch occurs. Recall the first example, suppose we have a latent structure as S1S^{1}, i.e., a circle. While in normal VAE, the prior distribution p(z)p(z) will limit the latent structure to be RnR^{n}, where nn is the number of latent variables. This prior will make the hidden variables independent of each other. Like before, here we set n=3n=3. One parameter can describe this system, e.g. the angle θ\theta. Thus we get a network which maps the coordinate (x,y)(x,y) to a parameter θ\theta and then maps the θ\theta to the coordinate (x,y)(x,y). In practice, our data set is finite, and the finally network will only be effective for an arc on the circle, instead of the total circle. This problem can be worse when the manifold dimension is higher.

In the earlier works, some methods have been developed for capturing the topological features of date set. However, to our understanding, these methods are not suitable for our goal because they usually derive a set of entangled latent vatiables.

References