This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Shape-conditioned 3D Molecule Generation via Equivariant Diffusion Models

Ziqi Chen1, Bo Peng1, Srinivasan Parthasarathy1,2, Xia Ning1,2,3 Contact author
Abstract

Ligand-based drug design aims to identify novel drug candidates of similar shapes with known active molecules. In this paper, we formulated an in silico shape-conditioned molecule generation problem to generate 3D molecule structures conditioned on the shape of a given molecule. To address this problem, we developed a translation- and rotation-equivariant shape-guided generative model ๐–ฒ๐—๐–บ๐—‰๐–พ๐–ฌ๐—ˆ๐—…\mathop{\mathsf{ShapeMol}}\limits. ๐–ฒ๐—๐–บ๐—‰๐–พ๐–ฌ๐—ˆ๐—…\mathop{\mathsf{ShapeMol}}\limits consists of an equivariant shape encoder that maps molecular surface shapes into latent embeddings, and an equivariant diffusion model that generates 3D molecules based on these embeddings. Experimental results show that ๐–ฒ๐—๐–บ๐—‰๐–พ๐–ฌ๐—ˆ๐—…\mathop{\mathsf{ShapeMol}}\limits can generate novel, diverse, drug-like molecules that retain 3D molecular shapes similar to the given shape condition. These results demonstrate the potential of ๐–ฒ๐—๐–บ๐—‰๐–พ๐–ฌ๐—ˆ๐—…\mathop{\mathsf{ShapeMol}}\limits in designing drug candidates of desired 3D shapes binding to protein target pockets.

Introduction

Generating novel drug candidates is a critical step in drug discovery to identify possible therapeutic solutions. Conventionally, this process is characterized based on knowledge and experience from medicinal chemists, and is resource- and time-consuming. Recently, computational approaches to molecule generation have been developed to accelerate the conventional paradigm. Existing molecular generative models largely focus on generating either molecule SMILES strings or molecular graphsย (Gรณmez-Bombarelli etย al. 2018; Jin, Barzilay, and Jaakkola 2018; Chen etย al. 2021), with a recent shift towards 3D molecular structures. Several modelsย (Luo etย al. 2021; Peng etย al. 2022; Guan etย al. 2023) have been designed to generate 3D molecules conditioned on the protein targets, aiming to facilitate structured-based drug design (SBDD)ย (Batool, Ahmad, and Choi 2019), given that molecules exist in 3D space and the efficacy of drug molecules depends on their 3D structures fitting into protein pockets. However, SBDD relies on the availability of high-quality 3D structures of protein binding pockets, which are lacking for many targetsย (Zheng etย al. 2013).

Different from SBDD, ligand-based drug design (LBDD)ย (Acharya etย al. 2011) utilizes ligands known to interact with a protein target, and does not require knowledge of protein structures. In LBDD, shape-based virtual screening tools such as ROCSย (Hawkins, Skillman, and Nicholls 2006) have been widely used to identify molecules with similar shapes to known ligands by enumerating molecules in chemical libraries. However, virtual screen tools cannot probe the novel chemical space. Therefore, it is highly needed to develop generative methods to generate novel molecules with desired 3D shapes.

In this paper, we present a novel generative model for 3D molecule generation conditioned on given 3D shapes. Our method, denoted to as ๐–ฒ๐—๐–บ๐—‰๐–พ๐–ฌ๐—ˆ๐—…\mathop{\mathsf{ShapeMol}}\limits, employs an equivariant shape embedding module to map 3D molecule surface shapes into shape latent embeddings. It then uses a conditional diffusion generative model to generate molecules conditioned on the shape latent embeddings, by iteratively denoising atom positions and atom features (e.g., atom type and aromaticity). During molecule generation, ๐–ฒ๐—๐–บ๐—‰๐–พ๐–ฌ๐—ˆ๐—…\mathop{\mathsf{ShapeMol}}\limits can utilize additional shape guidance by pushing the predicted atoms far from the condition shapes to those shapes. ๐–ฒ๐—๐–บ๐—‰๐–พ๐–ฌ๐—ˆ๐—…\mathop{\mathsf{ShapeMol}}\limits with shape guidance is denoted as ๐–ฒ๐—๐–บ๐—‰๐–พ๐–ฌ๐—ˆ๐—…โ€‹+โ€‹๐—€\mathop{\mathsf{ShapeMol}\text{+}\mathsf{g}}\limits. The major contributions of this paper are as follows:

  • โ€ข

    To the best of our knowledge, ๐–ฒ๐—๐–บ๐—‰๐–พ๐–ฌ๐—ˆ๐—…\mathop{\mathsf{ShapeMol}}\limits is the first diffusion-based method for 3D molecule generation conditioned on 3D molecule shapes.

  • โ€ข

    ๐–ฒ๐—๐–บ๐—‰๐–พ๐–ฌ๐—ˆ๐—…\mathop{\mathsf{ShapeMol}}\limits leverages a new equivariant shape embedding module to learn 3D surface shape embeddings from cloud points sampled over molecule surfaces.

  • โ€ข

    ๐–ฒ๐—๐–บ๐—‰๐–พ๐–ฌ๐—ˆ๐—…\mathop{\mathsf{ShapeMol}}\limits uses a novel conditional diffusion model to generate 3D molecule structures. The diffusion model is equivariant to the translation and rotation of molecule shapes. A new weighting scheme over diffusion steps is developed to ensure accurate molecule shape prediction.

  • โ€ข

    ๐–ฒ๐—๐–บ๐—‰๐–พ๐–ฌ๐—ˆ๐—…\mathop{\mathsf{ShapeMol}}\limits utilizes new shape guidance to direct the generated molecules to better fit the shape condition.

  • โ€ข

    ๐–ฒ๐—๐–บ๐—‰๐–พ๐–ฌ๐—ˆ๐—…โ€‹+โ€‹๐—€\mathop{\mathsf{ShapeMol}\text{+}\mathsf{g}}\limits achieves the highest average 3D shape similarity between the generated molecules and condition molecules, compared to the state-of-the-art baseline.

For reproducibility purposes, detailed parameters in all the experiments, code and data are reported in Supplementary Sectionย LABEL:supp:experiments:parameters.

Related Work

Molecule Generation

A variety of deep generative models have been developed to generate molecules using various molecule representations, incliuding generating SMILES string representationsย (Gรณmez-Bombarelli etย al. 2018), or 2D molecular graph representationsย (Jin, Barzilay, and Jaakkola 2018; Chen etย al. 2021). Recent efforts have been dedicated to the generation of 3D molecules. These 3D molecule generative models can be divided into two categories: autoregressive models and non-autoregressive models. Autoregressive models generate 3D molecules by sequentially adding atoms into the 3D space ย (Luo etย al. 2021; Peng etย al. 2022). While these models ensure the validity and connectivity of generated molecules, any errors made in sequential predictions could accumulate in subsequent predictions. Non-autoregressive models generate 3D molecules using flow-based methodsย (Garciaย Satorras etย al. 2021) or diffusion methodsย (Guan etย al. 2023). In these models, atoms are generated or adjusted all together. For example, Hoogeboom et al.ย (2022) developed an equivariant diffusion model, in which an equivariant network is employed to jointly predict both the positions and features of all atoms.

Shape-Conditioned Molecule Generation

Following the idea of ligand-based drug design (LBDD)ย (Acharya etย al. 2011), previous work has been focused on generating molecules with similar 3D shapes to those of efficacy, based on the observation that structurally similar molecules tend to have similar properties. Papadopoulos et al.ย (2021) developed a reinforcement learning method to generate SMILES strings of molecules that are similar to known antagonists of DRD2 receptors in 3D shapes and pharmacophores. Imrie et al.ย (2021) generated 2D molecular graphs conditioned on 3D pharmacophores using a graph-based autoencoder. However, there is limited work that generates 3D molecule structures conditioned on 3D shapes. Adams and Coleyย (2023) developed a shape-conditioned generative framework ๐–ฒ๐–ฐ๐–ด๐–จ๐–ฃ\mathop{\mathsf{SQUID}}\limits for 3D molecule generation. ๐–ฒ๐–ฐ๐–ด๐–จ๐–ฃ\mathop{\mathsf{SQUID}}\limits learns a variational autoencoder to generate fragments conditioned on given 3D shapes, and decodes molecules by sequentially attaching fragments with fixed bond lengths and angles. While LBDD plays a vital role in drug discovery, the problem of generating 3D molecule structures conditioned on 3D shapes is still under-addressed.

Definitions and Notations

Problem Definition

Following Adams and Coley (2023), we focus on the 3D molecule generation conditioned on the shape of a given molecule (e.g., a ligand). Specifically, we aim to generate a new molecule ๐™ผy\mathop{\mathtt{M}_{y}}\limits, conditioned on the 3D shape of a given molecule ๐™ผx\mathop{\mathtt{M}_{x}}\limits, such that 1) ๐™ผy\mathop{\mathtt{M}_{y}}\limits is similar to ๐™ผx\mathop{\mathtt{M}_{x}}\limits in their 3D shapes, measured by ๐–ฒ๐—‚๐—†๐šœ(๐šœx,๐šœy)\mbox{$\mathop{\mathsf{Sim}_{\mathtt{s}}}\limits$}(\mbox{$\mathop{\mathtt{s}}\limits$}_{x},\mbox{$\mathop{\mathtt{s}}\limits$}_{y}), where ๐šœ\mathop{\mathtt{s}}\limits is the 3D shape of ๐™ผ\mathop{\mathtt{M}}\limits; and 2) ๐™ผy\mathop{\mathtt{M}_{y}}\limits is dissimilar to ๐™ผx\mathop{\mathtt{M}_{x}}\limits in their 2D molecular graph structures, measured by ๐–ฒ๐—‚๐—†๐š(๐™ผx,๐™ผy)\mbox{$\mathop{\mathsf{Sim}_{\mathtt{g}}}\limits$}(\mbox{$\mathop{\mathtt{M}_{x}}\limits$},\mbox{$\mathop{\mathtt{M}_{y}}\limits$}). This conditional 3D shape generation problem is motivated by the fact that in ligand-based drug design, it is desired to find chemically diverse and novel molecules that share similar shapes and similar activities with known active ligandsย (Ripphausen, Nisius, and Bajorath 2011). Such chemically diverse and novel molecules could expand the search space for drug candidates and potentially enhance the development of effective drugs.

Representations and Notations

We represent a molecule ๐™ผ\mathop{\mathtt{M}}\limits as a set of atoms ๐™ผ={a1,a2,โ‹ฏ,a|๐™ผ||ai=(๐ฑi,๐ฏi)}\mbox{$\mathop{\mathtt{M}}\limits$}=\{a_{1},a_{2},\cdots,a_{{|\mbox{$\mathop{\mathtt{M}}\limits$}|}}|a_{i}=(\mbox{$\mathop{\mathbf{x}}\limits$}_{i},\mbox{$\mathop{\mathbf{v}}$}_{i})\}, where |๐™ผ||\mbox{$\mathop{\mathtt{M}}\limits$}| is the number of atoms in ๐™ผ\mathop{\mathtt{M}}\limits; aia_{i} is the ii-th atom in ๐™ผ\mathop{\mathtt{M}}\limits; ๐ฑiโˆˆโ„3\mbox{$\mathop{\mathbf{x}}\limits$}_{i}\in\mathbb{R}^{3} represents the 3D coordinates of aia_{i}; and ๐ฏiโˆˆโ„K\mbox{$\mathop{\mathbf{v}}$}_{i}\in\mathbb{R}^{K} is aia_{i}โ€™s one-hot atom feature vector indicating the atom type and its aromaticity. Following Guan et al.ย (2023), bonds between atoms can be uniquely determined by the atom types and the atomic distances among atoms. We represent the 3D surface shape ๐šœ\mathop{\mathtt{s}}\limits of a molecule ๐™ผ\mathop{\mathtt{M}}\limits as a point cloud constructed by sampling points over the molecular surface. Details about the construction of point clouds from the surface of molecules are available in Supplementary Sectionย LABEL:supp:point_clouds. We denote the point cloud as ๐’ซ={z1,z2,โ‹ฏโ€‹z|๐’ซ||zj=(๐ณj)}\mbox{$\mathop{\mathcal{P}}\limits$}=\{z_{1},z_{2},\cdots\,z_{{|\mbox{$\mathop{\mathcal{P}}\limits$}|}}|z_{j}=(\mbox{$\mathop{\mathbf{z}}\limits$}_{j})\}, where |๐’ซ||\mbox{$\mathop{\mathcal{P}}\limits$}| is the number of points in ๐’ซ\mathop{\mathcal{P}}\limits; zjz_{j} is the jj-th point; and ๐ณjโˆˆโ„3\mbox{$\mathop{\mathbf{z}}\limits$}_{j}\in\mathbb{R}^{3} represents the 3D coordinates of zjz_{j}. We denote the latent embedding of ๐’ซ\mathop{\mathcal{P}}\limits as ๐‡๐šœโˆˆโ„dpร—3\mbox{$\mathop{\mathbf{H}^{\mathtt{s}}}\limits$}\in\mathbb{R}^{d_{p}\times 3}, where dpd_{p} is the dimension of the latent embedding.

Method

๐–ฒ๐—๐–บ๐—‰๐–พ๐–ฌ๐—ˆ๐—…\mathop{\mathsf{ShapeMol}}\limits consists of an equivariant shape embedding module ๐–ฒ๐–ค\mathop{\mathsf{SE}}\limits that maps 3D molecular surface shapes to latent embeddings, and an equivariant diffusion model ๐–ฃ๐–จ๐–ฅ๐–ฅ\mathop{\mathsf{DIFF}}\limits that generates 3D molecules conditioned on these embeddings. Figureย 1 presents the overall architecture of ๐–ฒ๐—๐–บ๐—‰๐–พ๐–ฌ๐—ˆ๐—…\mathop{\mathsf{ShapeMol}}\limits.

Refer to caption
Figure 1: Model Architecture of ๐–ฒ๐—๐–บ๐—‰๐–พ๐–ฌ๐—ˆ๐—…\mathop{\mathsf{ShapeMol}}\limits

Equivariant Shape Embedding (๐–ฒ๐–ค\mathop{\mathsf{SE}}\limits)

๐–ฒ๐—๐–บ๐—‰๐–พ๐–ฌ๐—ˆ๐—…\mathop{\mathsf{ShapeMol}}\limits pre-trains a shape embedding module ๐–ฒ๐–ค\mathop{\mathsf{SE}}\limits to generate surface shape embeddings ๐‡๐šœ\mathop{\mathbf{H}^{\mathtt{s}}}\limits. ๐–ฒ๐–ค\mathop{\mathsf{SE}}\limits uses an encoder ๐–ฒ๐–คโ€‹-โ€‹๐–พ๐—‡๐–ผ\mathop{\mathsf{SE}\text{-}\mathsf{enc}}\limits to map ๐’ซ\mathop{\mathcal{P}}\limits to the equivariant latent embedding ๐‡๐šœ\mathop{\mathbf{H}^{\mathtt{s}}}\limits. ๐–ฒ๐–ค\mathop{\mathsf{SE}}\limits employs a decoder ๐–ฒ๐–คโ€‹-โ€‹๐–ฝ๐–พ๐–ผ\mathop{\mathsf{SE}\text{-}\mathsf{dec}}\limits to optimize ๐‡๐šœ\mathop{\mathbf{H}^{\mathtt{s}}}\limits by recovering the signed distancesย (Park etย al. 2019) of sampled query points in 3D space to the molecule surface based on ๐‡๐šœ\mathop{\mathbf{H}^{\mathtt{s}}}\limits. ๐–ฒ๐—๐–บ๐—‰๐–พ๐–ฌ๐—ˆ๐—…\mathop{\mathsf{ShapeMol}}\limits uses ๐‡๐šœ\mbox{$\mathop{\mathbf{H}}\limits$}^{\mathtt{s}} to guide the diffusion process later.

Shape Encoder (๐–ฒ๐–คโ€‹-โ€‹๐–พ๐—‡๐–ผ\mathop{\mathsf{SE}\text{-}\mathsf{enc}}\limits)

๐–ฒ๐–คโ€‹-โ€‹๐–พ๐—‡๐–ผ\mathop{\mathsf{SE}\text{-}\mathsf{enc}}\limits generates equivariant shape embeddings ๐‡๐šœ\mbox{$\mathop{\mathbf{H}}\limits$}^{\mathtt{s}} from the 3D surface shape ๐’ซ\mathop{\mathcal{P}}\limits of molecules, such that ๐‡๐šœ\mbox{$\mathop{\mathbf{H}}\limits$}^{\mathtt{s}} is equivariant to both translation and rotation of ๐’ซ\mathop{\mathcal{P}}\limits. That is, any translation and rotation applied to ๐’ซ\mathop{\mathcal{P}}\limits is reflected in ๐‡๐šœ\mbox{$\mathop{\mathbf{H}}\limits$}^{\mathtt{s}} accordingly. To ensure translation equivariance, ๐–ฒ๐–คโ€‹-โ€‹๐–พ๐—‡๐–ผ\mathop{\mathsf{SE}\text{-}\mathsf{enc}}\limits shifts the center of each ๐’ซ\mathop{\mathcal{P}}\limits to zero to eliminate all translations. To ensure rotation equivariance, ๐–ฒ๐–คโ€‹-โ€‹๐–พ๐—‡๐–ผ\mathop{\mathsf{SE}\text{-}\mathsf{enc}}\limits leverages Vector Neurons (VNs)ย (Deng etย al. 2021) and Dynamic Graph Convolutional Neural Networks (DGCNNs)ย (Wang etย al. 2019) as follows:

{๐‡1๐š™,๐‡2๐š™,โ‹ฏ,๐‡|๐’ซ|๐š™}=VN-DGCNNโ€‹({๐ณ1,๐ณ2,โ‹ฏ,๐ณ|๐’ซ|}),\{\mbox{$\mathop{\mathbf{H}}\limits$}^{\mathtt{p}}_{1},\mbox{$\mathop{\mathbf{H}}\limits$}^{\mathtt{p}}_{2},\cdots,\mbox{$\mathop{\mathbf{H}}\limits$}^{\mathtt{p}}_{|{\mbox{$\mathop{\mathcal{P}}\limits$}}|}\}=\text{VN-DGCNN}(\{\mbox{$\mathop{\mathbf{z}}\limits$}_{1},\mbox{$\mathop{\mathbf{z}}\limits$}_{2},\cdots,\mbox{$\mathop{\mathbf{z}}\limits$}_{|{\mbox{$\mathop{\mathcal{P}}\limits$}}|}\}),\vspace{-5pt}
๐‡๐šœ=โˆ‘j๐‡j๐š™/|๐’ซ|,\mbox{$\mathop{\mathbf{H}}\limits$}^{\mathtt{s}}=\sum\nolimits_{j}\mbox{$\mathop{\mathbf{H}}\limits$}^{\mathtt{p}}_{j}/{|\mbox{$\mathop{\mathcal{P}}\limits$}|},

where VN-DGCNNโ€‹(โ‹…)\text{VN-DGCNN}(\cdot) is a VN-based DGCNN network to generate equivariant embedding ๐‡j๐š™โˆˆโ„dpร—3\mbox{$\mathop{\mathbf{H}}\limits$}^{\mathtt{p}}_{j}\in\mathbb{R}^{d_{p}\times 3} for each point zjz_{j} in ๐’ซ\mathop{\mathcal{P}}\limits; and ๐‡๐šœโˆˆโ„dpร—3\mbox{$\mathop{\mathbf{H}}\limits$}^{\mathtt{s}}\in\mathbb{R}^{d_{p}\times 3} is the embedding of ๐’ซ\mathop{\mathcal{P}}\limits generated via a mean-pooling over the embeddings of all the points. Note that VN-DGCNNโ€‹(โ‹…)\text{VN-DGCNN}(\cdot) generates a matrix as the embedding of each point (i.e., ๐‡j๐š™\mbox{$\mathop{\mathbf{H}}\limits$}^{\mathtt{p}}_{j}) to guarantee the equivariance.

Shape Decoder (๐–ฒ๐–คโ€‹-โ€‹๐–ฝ๐–พ๐–ผ\mathop{\mathsf{SE}\text{-}\mathsf{dec}}\limits)

To optimize ๐‡๐šœ\mbox{$\mathop{\mathbf{H}}\limits$}^{\mathtt{s}}, ๐–ฒ๐–ค\mathop{\mathsf{SE}}\limits learns a decoder ๐–ฒ๐–คโ€‹-โ€‹๐–ฝ๐–พ๐–ผ\mathop{\mathsf{SE}\text{-}\mathsf{dec}}\limits to predict the signed distance of a query point zqz_{q} sampled from 3D space using Multilayer Perceptrons (MLPs) as follows:

o~q=MLPโ€‹(concatโ€‹(โŸจ๐ณq,๐‡๐šœโŸฉ,โ€–๐ณqโ€–2,VN-Inโ€‹(๐‡๐šœ))),\tilde{o}_{q}=\text{MLP}(\text{concat}(\langle\mathbf{z}_{q},\mbox{$\mathop{\mathbf{H}}\limits$}^{\mathtt{s}}\rangle,\|\mathbf{z}_{q}\|^{2},\text{VN-In}(\mbox{$\mathop{\mathbf{H}}\limits$}^{\mathtt{s}}))), (1)

where o~q\tilde{o}_{q} is the predicted signed distance of zqz_{q}, with positive and negative values indicating zqz_{q} is inside or outside the surface shape, respectively; โŸจโ‹…,โ‹…โŸฉ\langle\cdot,\cdot\rangle is the dot-product operator; โ€–๐ณqโ€–2\|\mathbf{z}_{q}\|^{2} is the Euclidean norm of the coordinates of zqz_{q}; VN-Inโ€‹(โ‹…)\text{VN-In}(\cdot) is an invariant VN networkย (Deng etย al. 2021) that converts the equivariant shape embedding ๐‡๐šœโˆˆโ„dpร—3\mbox{$\mathop{\mathbf{H}^{\mathtt{s}}}\limits$}\in\mathbb{R}^{d_{p}\times 3} into an invariant shape embedding. Thus, ๐–ฒ๐–คโ€‹-โ€‹๐–ฝ๐–พ๐–ผ\mathop{\mathsf{SE}\text{-}\mathsf{dec}}\limits predicts the signed distance between the query point and 3D surface by jointly considering the position of the query point (โ€–๐ณqโ€–2\|\mathbf{z}_{q}\|^{2}), the molecular surface shape (VN-Inโ€‹(โ‹…)\text{VN-In}(\cdot)) and the interaction between the point and surface โŸจโ‹…,โ‹…โŸฉ\langle\cdot,\cdot\rangle. The predicted signed distance o~q\tilde{o}_{q} is used to calculate the loss for the optimization of ๐‡๐šœ\mathop{\mathbf{H}^{\mathtt{s}}}\limits (discussed below). As shown in the literatureย (Deng etย al. 2021), o~q\tilde{o}_{q} remains invariant to the rotation of the 3D molecule surface shapes (i.e., ๐’ซ\mathop{\mathcal{P}}\limits). We present the sampling process of zqz_{q} in the Supplementary Sectionย LABEL:supp:training:shapeemb.

๐–ฒ๐–ค\mathop{\mathsf{SE}}\limits Pre-training

๐–ฒ๐—๐–บ๐—‰๐–พ๐–ฌ๐—ˆ๐—…\mathop{\mathsf{ShapeMol}}\limits pre-trains ๐–ฒ๐–ค\mathop{\mathsf{SE}}\limits by minimizing the squared-errors loss between the predicted and the ground-truth signed distances of query points as follows:

โ„’๐šœ=โˆ‘zqโˆˆ๐’ตโ€–oqโˆ’o~qโ€–2,\mathcal{L}^{\mathtt{s}}=\sum\nolimits_{z_{q}\in\mathcal{Z}}\|o_{q}-\tilde{o}_{q}\|^{2}, (2)

where ๐’ต\mathcal{Z} is the set of sampled query points and oqo_{q} is the ground-truth signed distance of query point zqz_{q}. By pretraining ๐–ฒ๐–ค\mathop{\mathsf{SE}}\limits, ๐–ฒ๐—๐–บ๐—‰๐–พ๐–ฌ๐—ˆ๐—…\mathop{\mathsf{ShapeMol}}\limits learns ๐‡๐šœ\mathop{\mathbf{H}^{\mathtt{s}}}\limits that will be used as the condition in the following 3D molecule generation.

Shape-Conditioned Molecule Generation

In ๐–ฒ๐—๐–บ๐—‰๐–พ๐–ฌ๐—ˆ๐—…\mathop{\mathsf{ShapeMol}}\limits, a shape-conditioned molecule diffusion model, referred to as ๐–ฃ๐–จ๐–ฅ๐–ฅ\mathop{\mathsf{DIFF}}\limits, is used to generate a 3D molecule structure (i.e., atom coordinates and features) conditioned on a given 3D surface shape that is represented by the shape latent embedding ๐‡๐šœ\mathop{\mathbf{H}^{\mathtt{s}}}\limits (Eq.ย Shape Encoder (๐–ฒ๐–คโ€‹-โ€‹๐–พ๐—‡๐–ผ\mathop{\mathsf{SE}\text{-}\mathsf{enc}}\limits)). Following the denoising diffusion probabilistic modelsย (Ho, Jain, and Abbeel 2020), ๐–ฃ๐–จ๐–ฅ๐–ฅ\mathop{\mathsf{DIFF}}\limits includes a forward diffusion process based on a Markov chain, denoted as ๐–ฃ๐–จ๐–ฅ๐–ฅโ€‹-โ€‹๐–ฟ๐—ˆ๐—‹๐—๐–บ๐—‹๐–ฝ\mathop{\mathsf{DIFF}\text{-}\mathsf{forward}}\limits, which gradually adds noises step by step to the atom positions and features {(๐ฑi,๐ฏi)}\{(\mbox{$\mathop{\mathbf{x}}\limits$}_{i},\mbox{$\mathop{\mathbf{v}}$}_{i})\} in the training molecules. The noisy atom positions and features at step tt are represented as {(๐ฑi,t,๐ฏi,t)}\{(\mbox{$\mathop{\mathbf{x}}\limits$}_{i,t},\mbox{$\mathop{\mathbf{v}}$}_{i,t})\} (t=1,โ‹ฏ,Tt=1,\cdots,T), and the molecules without any noise are represented as {(๐ฑi,0,๐ฏi,0)}\{(\mbox{$\mathop{\mathbf{x}}\limits$}_{i,0},\mbox{$\mathop{\mathbf{v}}$}_{i,0})\}. At the final step TT, {(๐ฑi,T,๐ฏi,T)}\{(\mbox{$\mathop{\mathbf{x}}\limits$}_{i,T},\mbox{$\mathop{\mathbf{v}}$}_{i,T})\} are completely unstructured and resemble a simple distribution like a Normal distribution ๐’ฉโ€‹(๐ŸŽ,๐ˆ)\mathcal{N}(\mathbf{0},\mathbf{I}) or a uniform categorical distribution ๐’žโ€‹(๐Ÿ/K)\mathcal{C}(\mathbf{1}/K), in which ๐ˆ\mathbf{I} and ๐Ÿ\mathbf{1} denotes the identity matrix and identity vector, respectively.

During training, ๐–ฃ๐–จ๐–ฅ๐–ฅ\mathop{\mathsf{DIFF}}\limits is learned to reverse the forward diffusion process via another Markov chain, referred to as the backward generative process and denoted as ๐–ฃ๐–จ๐–ฅ๐–ฅโ€‹-โ€‹๐–ป๐–บ๐–ผ๐—„๐—๐–บ๐—‹๐–ฝ\mathop{\mathsf{DIFF}\text{-}\mathsf{backward}}\limits, to remove the noises in the noisy molecules. During inference, ๐–ฃ๐–จ๐–ฅ๐–ฅ\mathop{\mathsf{DIFF}}\limits first samples noisy atom positions and features at step TT from simple distributions and then generates a 3D molecule structure by removing the noises in the noisy molecules step by step until tt reaches 1.

Forward Diffusion Process (๐–ฃ๐–จ๐–ฅ๐–ฅโ€‹-โ€‹๐–ฟ๐—ˆ๐—‹๐—๐–บ๐—‹๐–ฝ\mathop{\mathsf{DIFF}\text{-}\mathsf{forward}}\limits)

Following the previous workย (Guan etย al. 2023), at step tโˆˆ[1,T]t\in[1,T], a small Gaussian noise and a small categorical noise are added to the continuous atom positions and discrete atom features {(๐ฑi,tโˆ’1,๐ฏi,tโˆ’1)}\{(\mbox{$\mathop{\mathbf{x}}\limits$}_{i,t-1},\mbox{$\mathop{\mathbf{v}}$}_{i,t-1})\}, respectively. When no ambiguity arises, we will eliminate subscript ii in the notations and use (๐ฑtโˆ’1,๐ฏtโˆ’1)(\mbox{$\mathop{\mathbf{x}}\limits$}_{t-1},\mbox{$\mathop{\mathbf{v}}$}_{t-1}) for brevity. The noise levels of the Gaussian and categorical noises are determined by two predefined variance schedules (ฮฒt๐šก,ฮฒt๐šŸ)โˆˆ(0,1)(\beta_{t}^{\mathtt{x}},\beta_{t}^{\mathtt{v}})\in(0,1), where ฮฒt๐šก\beta_{t}^{\mathtt{x}} and ฮฒt๐šŸ\beta_{t}^{\mathtt{v}} are selected to be sufficiently small to ensure the smoothness of ๐–ฃ๐–จ๐–ฅ๐–ฅโ€‹-โ€‹๐–ฟ๐—ˆ๐—‹๐—๐–บ๐—‹๐–ฝ\mathop{\mathsf{DIFF}\text{-}\mathsf{forward}}\limits. The details about variance schedules are available in Supplementary Sectionย LABEL:supp:forward:variance. Formally, for atom positions, the probability of ๐ฑt\mbox{$\mathop{\mathbf{x}}\limits$}_{t} sampled given ๐ฑtโˆ’1\mbox{$\mathop{\mathbf{x}}\limits$}_{t-1}, denoted as qโ€‹(๐ฑt|๐ฑtโˆ’1)q(\mbox{$\mathop{\mathbf{x}}\limits$}_{t}|\mbox{$\mathop{\mathbf{x}}\limits$}_{t-1}), is defined as follows,

qโ€‹(๐ฑt|๐ฑtโˆ’1)=๐’ฉโ€‹(๐ฑt|1โˆ’ฮฒt๐šกโ€‹๐ฑtโˆ’1,ฮฒt๐šกโ€‹๐ˆ),q(\mbox{$\mathop{\mathbf{x}}\limits$}_{t}|\mbox{$\mathop{\mathbf{x}}\limits$}_{t-1})=\mathcal{N}(\mbox{$\mathop{\mathbf{x}}\limits$}_{t}|\sqrt{1-\beta^{\mathtt{x}}_{t}}\mbox{$\mathop{\mathbf{x}}\limits$}_{t-1},\beta^{\mathtt{x}}_{t}\mathbf{I}), (3)

where ๐’ฉโ€‹(โ‹…)\mathcal{N}(\cdot) is a Gaussian distribution of ๐ฑt\mbox{$\mathop{\mathbf{x}}\limits$}_{t} with mean 1โˆ’ฮฒt๐šกโ€‹๐ฑtโˆ’1\sqrt{1-\beta_{t}^{\mathtt{x}}}\mbox{$\mathop{\mathbf{x}}\limits$}_{t-1} and covariance ฮฒt๐šกโ€‹๐ˆ\beta_{t}^{\mathtt{x}}\mathbf{I}. Following Hoogeboom et al.ย (2021), for atom features, the probability of ๐ฏt\mbox{$\mathop{\mathbf{v}}$}_{t} across KK classes given ๐ฏtโˆ’1\mbox{$\mathop{\mathbf{v}}$}_{t-1} is defined as follows,

qโ€‹(๐ฏt|๐ฏtโˆ’1)=๐’žโ€‹(๐ฏt|(1โˆ’ฮฒt๐šŸ)โ€‹๐ฏtโˆ’1+ฮฒt๐šŸโ€‹๐Ÿ/K),q(\mbox{$\mathop{\mathbf{v}}$}_{t}|\mbox{$\mathop{\mathbf{v}}$}_{t-1})=\mathcal{C}(\mbox{$\mathop{\mathbf{v}}$}_{t}|(1-\beta^{\mathtt{v}}_{t})\mbox{$\mathop{\mathbf{v}}$}_{t-1}+\beta^{\mathtt{v}}_{t}\mathbf{1}/K), (4)

where ๐’ž\mathcal{C} is a categorical distribution of ๐ฏt\mbox{$\mathop{\mathbf{v}}$}_{t} derived by noising ๐ฏtโˆ’1\mbox{$\mathop{\mathbf{v}}$}_{t-1} with a uniform noise ฮฒt๐šŸโ€‹๐Ÿ/K\beta^{\mathtt{v}}_{t}\mathbf{1}/K across KK classes.

Since the above distributions form Markov chains, the probability of any ๐ฑt\mbox{$\mathop{\mathbf{x}}\limits$}_{t} or ๐ฏt\mbox{$\mathop{\mathbf{v}}$}_{t} can be derived from ๐ฑ0\mbox{$\mathop{\mathbf{x}}\limits$}_{0} or ๐ฏ0\mbox{$\mathop{\mathbf{v}}$}_{0}:

qโ€‹(๐ฑt|๐ฑ0)\displaystyle q(\mbox{$\mathop{\mathbf{x}}\limits$}_{t}|\mbox{$\mathop{\mathbf{x}}\limits$}_{0}) =๐’ฉโ€‹(๐ฑt|ฮฑยฏt๐šกโ€‹๐ฑ0,(1โˆ’ฮฑยฏt๐šก)โ€‹๐ˆ),\displaystyle=\mathcal{N}(\mbox{$\mathop{\mathbf{x}}\limits$}_{t}|\sqrt{\mbox{$\mathop{\bar{\alpha}}\limits$}^{\mathtt{x}}_{t}}\mbox{$\mathop{\mathbf{x}}\limits$}_{0},(1-\mbox{$\mathop{\bar{\alpha}}\limits$}^{\mathtt{x}}_{t})\mathbf{I}), (5)
qโ€‹(๐ฏt|๐ฏ0)\displaystyle q(\mbox{$\mathop{\mathbf{v}}$}_{t}|\mbox{$\mathop{\mathbf{v}}$}_{0}) =๐’žโ€‹(๐ฏt|ฮฑยฏt๐šŸ๐ฏ0+(1โˆ’ฮฑยฏt๐šŸ)โ€‹๐Ÿ/K),\displaystyle=\mathcal{C}(\mbox{$\mathop{\mathbf{v}}$}_{t}|\mbox{$\mathop{\bar{\alpha}}\limits$}^{\mathtt{v}}_{t}\mbox{$\mathop{\mathbf{v}}$}_{0}+(1-\mbox{$\mathop{\bar{\alpha}}\limits$}^{\mathtt{v}}_{t})\mathbf{1}/K), (6)
whereย ฮฑยฏt๐šž\displaystyle\text{where }\mbox{$\mathop{\bar{\alpha}}\limits$}^{\mathtt{u}}_{t} =โˆฯ„=1tฮฑฯ„๐šž,ฮฑฯ„๐šž=1โˆ’ฮฒฯ„๐šž,๐šž=๐šกโ€‹ย orย โ€‹๐šŸ.\displaystyle=\prod\nolimits_{\tau=1}^{t}\alpha^{\mathtt{u}}_{\tau},\ \alpha^{\mathtt{u}}_{\tau}=1-\beta^{\mathtt{u}}_{\tau},\ {\mathtt{u}}={\mathtt{x}}\text{ or }{\mathtt{v}}.\;\;\; (7)

Note that ฮฑยฏt๐šž\bar{\alpha}^{\mathtt{u}}_{t} (๐šž=๐šกโ€‹ย orย โ€‹๐šŸ\mathtt{u}={\mathtt{x}}\text{ or }{\mathtt{v}}) is monotonically decreasing from 1 to 0 over t=[1,T]t=[1,T]. As tโ†’1t\rightarrow 1, ฮฑยฏt๐šก\mbox{$\mathop{\bar{\alpha}}\limits$}^{\mathtt{x}}_{t} and ฮฑยฏt๐šŸ\mbox{$\mathop{\bar{\alpha}}\limits$}^{\mathtt{v}}_{t} are close to 1, leading to that ๐ฑt\mbox{$\mathop{\mathbf{x}}\limits$}_{t} or ๐ฏt\mbox{$\mathop{\mathbf{v}}$}_{t} approximates ๐ฑ0\mbox{$\mathop{\mathbf{x}}\limits$}_{0} or ๐ฏ0\mbox{$\mathop{\mathbf{v}}$}_{0}. Conversely, as tโ†’Tt\rightarrow T, ฮฑยฏt๐šก\mbox{$\mathop{\bar{\alpha}}\limits$}^{\mathtt{x}}_{t} and ฮฑยฏt๐šŸ\mbox{$\mathop{\bar{\alpha}}\limits$}^{\mathtt{v}}_{t} are close to 0, leading to that qโ€‹(๐ฑT|๐ฑ0)q(\mbox{$\mathop{\mathbf{x}}\limits$}_{T}|\mbox{$\mathop{\mathbf{x}}\limits$}_{0}) resembles ๐’ฉโ€‹(๐ŸŽ,๐ˆ)\mathcal{N}(\mathbf{0},\mathbf{I}) and qโ€‹(๐ฏT|๐ฏ0)q(\mbox{$\mathop{\mathbf{v}}$}_{T}|\mbox{$\mathop{\mathbf{v}}$}_{0}) resembles ๐’žโ€‹(๐Ÿ/K)\mathcal{C}(\mathbf{1}/K).

Using Bayes theorem, the ground-truth Normal posterior of atom positions pโ€‹(๐ฑtโˆ’1|๐ฑt,๐ฑ0)p(\mbox{$\mathop{\mathbf{x}}\limits$}_{t-1}|\mbox{$\mathop{\mathbf{x}}\limits$}_{t},\mbox{$\mathop{\mathbf{x}}\limits$}_{0}) can be calculated in a closed-formย (Ho, Jain, and Abbeel 2020) as below,

pโ€‹(๐ฑtโˆ’1|๐ฑt,๐ฑ0)=๐’ฉโ€‹(๐ฑtโˆ’1|ฮผโ€‹(๐ฑt,๐ฑ0),ฮฒ~t๐šกโ€‹๐ˆ),\displaystyle p(\mbox{$\mathop{\mathbf{x}}\limits$}_{t-1}|\mbox{$\mathop{\mathbf{x}}\limits$}_{t},\mbox{$\mathop{\mathbf{x}}\limits$}_{0})=\mathcal{N}(\mbox{$\mathop{\mathbf{x}}\limits$}_{t-1}|\mu(\mbox{$\mathop{\mathbf{x}}\limits$}_{t},\mbox{$\mathop{\mathbf{x}}\limits$}_{0}),\tilde{\beta}^{\mathtt{x}}_{t}\mathbf{I}), (8)
ฮผโ€‹(๐ฑt,๐ฑ0)=ฮฑยฏtโˆ’1๐šกโ€‹ฮฒt๐šก1โˆ’ฮฑยฏt๐šกโ€‹๐ฑ0+ฮฑt๐šกโ€‹(1โˆ’ฮฑยฏtโˆ’1๐šก)1โˆ’ฮฑยฏt๐šกโ€‹๐ฑt,ฮฒ~t๐šก=1โˆ’ฮฑยฏtโˆ’1๐šก1โˆ’ฮฑยฏt๐šกโ€‹ฮฒt๐šก.\displaystyle\!\!\!\!\!\!\!\!\!\!\!\mu(\mbox{$\mathop{\mathbf{x}}\limits$}_{t},\mbox{$\mathop{\mathbf{x}}\limits$}_{0})\!=\!\frac{\sqrt{\bar{\alpha}^{\mathtt{x}}_{t-1}}\beta^{\mathtt{x}}_{t}}{1-\bar{\alpha}^{\mathtt{x}}_{t}}\mbox{$\mathop{\mathbf{x}}\limits$}_{0}\!+\!\frac{\sqrt{\alpha^{\mathtt{x}}_{t}}(1-\bar{\alpha}^{\mathtt{x}}_{t-1})}{1-\bar{\alpha}^{\mathtt{x}}_{t}}\mbox{$\mathop{\mathbf{x}}\limits$}_{t},\tilde{\beta}^{\mathtt{x}}_{t}\!=\!\frac{1-\bar{\alpha}^{\mathtt{x}}_{t-1}}{1-\bar{\alpha}^{\mathtt{x}}_{t}}\beta^{\mathtt{x}}_{t}.\;\;\; (9)

Similarly, the ground-truth categorical posterior of atom features pโ€‹(๐ฏtโˆ’1|๐ฏt,๐ฏ0)p(\mbox{$\mathop{\mathbf{v}}$}_{t-1}|\mbox{$\mathop{\mathbf{v}}$}_{t},\mbox{$\mathop{\mathbf{v}}$}_{0}) can be calculatedย (Hoogeboom etย al. 2021) as below,

pโ€‹(๐ฏtโˆ’1|๐ฏt,๐ฏ0)=๐’žโ€‹(๐ฏtโˆ’1|๐œโ€‹(๐ฏt,๐ฏ0)),\displaystyle p(\mbox{$\mathop{\mathbf{v}}$}_{t-1}|\mbox{$\mathop{\mathbf{v}}$}_{t},\mbox{$\mathop{\mathbf{v}}$}_{0})=\mathcal{C}(\mbox{$\mathop{\mathbf{v}}$}_{t-1}|\mathbf{c}(\mbox{$\mathop{\mathbf{v}}$}_{t},\mbox{$\mathop{\mathbf{v}}$}_{0})), (10)
๐œโ€‹(๐ฏt,๐ฏ0)=๐œ~/โˆ‘k=1Kc~k,\displaystyle\mathbf{c}(\mbox{$\mathop{\mathbf{v}}$}_{t},\mbox{$\mathop{\mathbf{v}}$}_{0})=\tilde{\mathbf{c}}/{\sum_{k=1}^{K}\tilde{c}_{k}}, (11)
๐œ~=[ฮฑt๐šŸโ€‹๐ฏt+1โˆ’ฮฑt๐šŸK]โŠ™[ฮฑยฏtโˆ’1๐šŸโ€‹๐ฏ0+1โˆ’ฮฑยฏtโˆ’1๐šŸK],\displaystyle\tilde{\mathbf{c}}=[\alpha^{\mathtt{v}}_{t}\mbox{$\mathop{\mathbf{v}}$}_{t}+\frac{1-\alpha^{\mathtt{v}}_{t}}{K}]\odot[\bar{\alpha}^{\mathtt{v}}_{t-1}\mbox{$\mathop{\mathbf{v}}$}_{0}+\frac{1-\bar{\alpha}^{\mathtt{v}}_{t-1}}{K}], (12)

where c~k\tilde{c}_{k} denotes the likelihood of kk-th class across KK classes in ๐œ~\tilde{\mathbf{c}}; โŠ™\odot denotes the element-wise product operation; ๐œ~\tilde{\mathbf{c}} is calculated using ๐ฏt\mbox{$\mathop{\mathbf{v}}$}_{t} and ๐ฏ0\mbox{$\mathop{\mathbf{v}}$}_{0} and normalized so as to represent probabilities. The proof of the above equations is available in Supplementary Sectionย LABEL:supp:forward:proof.

Backward Generative Process (๐–ฃ๐–จ๐–ฅ๐–ฅโ€‹-โ€‹๐–ป๐–บ๐–ผ๐—„๐—๐–บ๐—‹๐–ฝ\mathop{\mathsf{DIFF}\text{-}\mathsf{backward}}\limits)

๐–ฃ๐–จ๐–ฅ๐–ฅ\mathop{\mathsf{DIFF}}\limits learns to reverse ๐–ฃ๐–จ๐–ฅ๐–ฅโ€‹-โ€‹๐–ฟ๐—ˆ๐—‹๐—๐–บ๐—‹๐–ฝ\mathop{\mathsf{DIFF}\text{-}\mathsf{forward}}\limits by denoising from (๐ฑt,๐ฏt)(\mbox{$\mathop{\mathbf{x}}\limits$}_{t},\mbox{$\mathop{\mathbf{v}}$}_{t}) to (๐ฑtโˆ’1,๐ฏtโˆ’1)(\mbox{$\mathop{\mathbf{x}}\limits$}_{t-1},\mbox{$\mathop{\mathbf{v}}$}_{t-1}) at tโˆˆ[1,T]t\in[1,T], conditioned on the shape latent embedding ๐‡๐šœ\mathop{\mathbf{H}^{\mathtt{s}}}\limits. Specifically, the probabilities of (๐ฑtโˆ’1,๐ฏtโˆ’1)(\mbox{$\mathop{\mathbf{x}}\limits$}_{t-1},\mbox{$\mathop{\mathbf{v}}$}_{t-1}) denoised from (๐ฑt,๐ฏt)(\mbox{$\mathop{\mathbf{x}}\limits$}_{t},\mbox{$\mathop{\mathbf{v}}$}_{t}) are estimated by the approximates of the ground-truth posteriors pโ€‹(๐ฑtโˆ’1|๐ฑt,๐ฑ0)p(\mbox{$\mathop{\mathbf{x}}\limits$}_{t-1}|\mbox{$\mathop{\mathbf{x}}\limits$}_{t},\mbox{$\mathop{\mathbf{x}}\limits$}_{0}) (Eq.ย 8) and pโ€‹(๐ฏtโˆ’1|๐ฏt,๐ฏ0)p(\mbox{$\mathop{\mathbf{v}}$}_{t-1}|\mbox{$\mathop{\mathbf{v}}$}_{t},\mbox{$\mathop{\mathbf{v}}$}_{0}) (Eq.ย 10). Given that (๐ฑ0,๐ฏ0)(\mbox{$\mathop{\mathbf{x}}\limits$}_{0},\mbox{$\mathop{\mathbf{v}}$}_{0}) is unknown in the generative process, a predictor f๐šฏโ€‹(๐ฑt,๐ฏt,๐‡๐šœ)f_{\boldsymbol{\Theta}}(\mbox{$\mathop{\mathbf{x}}\limits$}_{t},\mbox{$\mathop{\mathbf{v}}$}_{t},\mbox{$\mathop{\mathbf{H}^{\mathtt{s}}}\limits$}) is employed to predict at tt the atom position and feature (๐ฑ0,๐ฏ0)(\mbox{$\mathop{\mathbf{x}}\limits$}_{0},\mbox{$\mathop{\mathbf{v}}$}_{0}) as below,

(๐ฑ~0,t,๐ฏ~0,t)=f๐šฏโ€‹(๐ฑt,๐ฏt,๐‡๐šœ),(\tilde{\mbox{$\mathop{\mathbf{x}}\limits$}}_{0,t},\tilde{\mbox{$\mathop{\mathbf{v}}$}}_{0,t})=f_{\boldsymbol{\Theta}}(\mbox{$\mathop{\mathbf{x}}\limits$}_{t},\mbox{$\mathop{\mathbf{v}}$}_{t},\mbox{$\mathop{\mathbf{H}}\limits$}^{\mathtt{s}}), (13)

where ๐ฑ~0,t\tilde{\mbox{$\mathop{\mathbf{x}}\limits$}}_{0,t} and ๐ฏ~0,t\tilde{\mbox{$\mathop{\mathbf{v}}$}}_{0,t} are the predictions of ๐ฑ0\mbox{$\mathop{\mathbf{x}}\limits$}_{0} and ๐ฏ0\mbox{$\mathop{\mathbf{v}}$}_{0} at tt; ๐šฏ{\boldsymbol{\Theta}} is the learnable parameter. Following Ho et al.ย (2020), with ๐ฑ~0,t\tilde{\mbox{$\mathop{\mathbf{x}}\limits$}}_{0,t}, the probability of ๐ฑtโˆ’1\mbox{$\mathop{\mathbf{x}}\limits$}_{t-1} denoised from ๐ฑt\mbox{$\mathop{\mathbf{x}}\limits$}_{t}, denoted as pโ€‹(๐ฑtโˆ’1|๐ฑt)p(\mbox{$\mathop{\mathbf{x}}\limits$}_{t-1}|\mbox{$\mathop{\mathbf{x}}\limits$}_{t}), can be estimated by the approximated posterior p๐šฏ((๐ฑtโˆ’1|๐ฑt,๐ฑ~0,t)p_{\boldsymbol{\Theta}}((\mbox{$\mathop{\mathbf{x}}\limits$}_{t-1}|\mbox{$\mathop{\mathbf{x}}\limits$}_{t},\tilde{\mbox{$\mathop{\mathbf{x}}\limits$}}_{0,t}) as below,

pโ€‹(๐ฑtโˆ’1|๐ฑt)\displaystyle p(\mbox{$\mathop{\mathbf{x}}\limits$}_{t-1}|\mbox{$\mathop{\mathbf{x}}\limits$}_{t}) โ‰ˆp๐šฏโ€‹(๐ฑtโˆ’1|๐ฑt,๐ฑ~0,t)\displaystyle\approx p_{\boldsymbol{\Theta}}(\mbox{$\mathop{\mathbf{x}}\limits$}_{t-1}|\mbox{$\mathop{\mathbf{x}}\limits$}_{t},\tilde{\mbox{$\mathop{\mathbf{x}}\limits$}}_{0,t}) (14)
=๐’ฉโ€‹(๐ฑtโˆ’1|ฮผ๐šฏโ€‹(๐ฑt,๐ฑ~0,t),ฮฒ~t๐šกโ€‹๐ˆ),\displaystyle=\mathcal{N}(\mbox{$\mathop{\mathbf{x}}\limits$}_{t-1}|\mu_{\boldsymbol{\Theta}}(\mbox{$\mathop{\mathbf{x}}\limits$}_{t},\tilde{\mbox{$\mathop{\mathbf{x}}\limits$}}_{0,t}),\tilde{\beta}^{\mathtt{x}}_{t}\mathbf{I}),

where ฮผ๐šฏโ€‹(๐ฑt,๐ฑ~0,t)\mu_{\boldsymbol{\Theta}}(\mbox{$\mathop{\mathbf{x}}\limits$}_{t},\tilde{\mbox{$\mathop{\mathbf{x}}\limits$}}_{0,t}) is an estimate of ฮผโ€‹(๐ฑt,๐ฑ0)\mu(\mbox{$\mathop{\mathbf{x}}\limits$}_{t},\mbox{$\mathop{\mathbf{x}}\limits$}_{0}) by replacing ๐ฑ0\mbox{$\mathop{\mathbf{x}}\limits$}_{0} with its estimate ๐ฑ~0,t\tilde{\mbox{$\mathop{\mathbf{x}}\limits$}}_{0,t} in Eq.ย 8. Similarly, with ๐ฏ~0,t\tilde{\mbox{$\mathop{\mathbf{v}}$}}_{0,t}, the probability of ๐ฏtโˆ’1\mbox{$\mathop{\mathbf{v}}$}_{t-1} denoised from ๐ฏt\mbox{$\mathop{\mathbf{v}}$}_{t}, denoted as pโ€‹(๐ฏtโˆ’1|๐ฏt)p(\mbox{$\mathop{\mathbf{v}}$}_{t-1}|\mbox{$\mathop{\mathbf{v}}$}_{t}), can be estimated by the approximated posterior p๐šฏโ€‹(๐ฏtโˆ’1|๐ฏt,๐ฏ~0,t)p_{\boldsymbol{\Theta}}(\mbox{$\mathop{\mathbf{v}}$}_{t-1}|\mbox{$\mathop{\mathbf{v}}$}_{t},\tilde{\mbox{$\mathop{\mathbf{v}}$}}_{0,t}) as below,

pโ€‹(๐ฏtโˆ’1|๐ฏt)โ‰ˆp๐šฏโ€‹(๐ฏtโˆ’1|๐ฏt,๐ฏ~0,t)=๐’žโ€‹(๐ฏtโˆ’1|๐œ๐šฏโ€‹(๐ฏt,๐ฏ~0,t)),\displaystyle\!\!\!\!\!\!\!\!p(\mbox{$\mathop{\mathbf{v}}$}_{t-1}|\mbox{$\mathop{\mathbf{v}}$}_{t})\!\!\approx\!\!p_{\boldsymbol{\Theta}}(\mbox{$\mathop{\mathbf{v}}$}_{t-1}|\mbox{$\mathop{\mathbf{v}}$}_{t},\tilde{\mbox{$\mathop{\mathbf{v}}$}}_{0,t})\!\!=\!\!\mathcal{C}(\mbox{$\mathop{\mathbf{v}}$}_{t-1}|\mathbf{c}_{\boldsymbol{\Theta}}(\mbox{$\mathop{\mathbf{v}}$}_{t},\tilde{\mbox{$\mathop{\mathbf{v}}$}}_{0,t})), (15)

where ๐œ๐šฏโ€‹(๐ฏt,๐ฏ~0,t)\mathbf{c}_{\boldsymbol{\Theta}}(\mbox{$\mathop{\mathbf{v}}$}_{t},\tilde{\mbox{$\mathop{\mathbf{v}}$}}_{0,t}) is an estimate of ๐œโ€‹(๐ฏt,๐ฏ0)\mathbf{c}(\mbox{$\mathop{\mathbf{v}}$}_{t},\mbox{$\mathop{\mathbf{v}}$}_{0}) by replacing ๐ฏ0\mbox{$\mathop{\mathbf{v}}$}_{0} with its estimate ๐ฏ~0,t\tilde{\mbox{$\mathop{\mathbf{v}}$}}_{0,t} in Eq.ย 10.

Equivariant Shape-Conditioned Molecule Predictor

In ๐–ฃ๐–จ๐–ฅ๐–ฅโ€‹-โ€‹๐–ป๐–บ๐–ผ๐—„๐—๐–บ๐—‹๐–ฝ\mathop{\mathsf{DIFF}\text{-}\mathsf{backward}}\limits, the predictor f๐šฏโ€‹(๐ฑt,๐ฏt,๐‡๐šœ)f_{\boldsymbol{\Theta}}(\mbox{$\mathop{\mathbf{x}}\limits$}_{t},\mbox{$\mathop{\mathbf{v}}$}_{t},\mbox{$\mathop{\mathbf{H}^{\mathtt{s}}}\limits$}) (Eq.ย 13) predicts the atom positions and features (๐ฑ~0,t,๐ฏ~0,t)(\tilde{\mbox{$\mathop{\mathbf{x}}\limits$}}_{0,t},\tilde{\mbox{$\mathop{\mathbf{v}}$}}_{0,t}) given the noisy data (๐ฑt,๐ฏt)(\mbox{$\mathop{\mathbf{x}}\limits$}_{t},\mbox{$\mathop{\mathbf{v}}$}_{t}) conditioned on ๐‡๐šœ\mathop{\mathbf{H}^{\mathtt{s}}}\limits. For brevity, in this subsection, we eliminate the subscript tt in the notations when no ambiguity arises. f๐šฏโ€‹(โ‹…)f_{\boldsymbol{\Theta}}(\cdot) leverages two multi-layer graph neural networks: (1) an equivariant graph neural network, denoted as ๐–ค๐–ฐโ€‹-โ€‹๐–ฆ๐–ญ๐–ญ\mathop{\mathsf{EQ}\text{-}\mathsf{GNN}}\limits, that equivariantly predicts atom positions that change under transformations, and (2) an invariant graph neural network, denoted as ๐–จ๐–ญ๐–ตโ€‹-โ€‹๐–ฆ๐–ญ๐–ญ\mathop{\mathsf{INV}\text{-}\mathsf{GNN}}\limits, that invariantly predicts atom features that remain unchanged under transformations. Following the previous workย (Guan etย al. 2023; Hoogeboom etย al. 2022), the translation equivariance of atom position prediction is achieved by shifting a fixed point (e.g., the center of point clouds ๐’ซ\mathop{\mathcal{P}}\limits) to zero, and therefore only rotation equivariance needs to be considered.

Atom Coordinate Prediction

In ๐–ค๐–ฐโ€‹-โ€‹๐–ฆ๐–ญ๐–ญ\mathop{\mathsf{EQ}\text{-}\mathsf{GNN}}\limits, the atom position ๐ฑil+1โˆˆโ„3\mbox{$\mathop{\mathbf{x}}\limits$}_{i}^{l+1}\in\mathbb{R}^{3} of aia_{i} at the (ll+1)-th layer is calculated in an equivariant way as below,

ฮ”โ€‹๐ฑil+1=โˆ‘jโˆˆNโ€‹(ai),iโ‰ j(๐ฑilโˆ’๐ฑjl)โ€‹MHA๐šกโ€‹(diโ€‹jl,๐กil+1,๐กjl+1,VN-Inโ€‹(๐‡๐šœ)),\Delta\mbox{$\mathop{\mathbf{x}}\limits$}_{i}^{l+1}\!=\sum_{\mathclap{j\in N(a_{i}),i\neq j}}(\mbox{$\mathop{\mathbf{x}}\limits$}_{i}^{l}-\mbox{$\mathop{\mathbf{x}}\limits$}_{j}^{l})\text{MHA}^{\mathtt{x}}(d_{ij}^{l},\mbox{$\mathop{\mathbf{h}}\limits$}_{i}^{l+1},\mbox{$\mathop{\mathbf{h}}\limits$}_{j}^{l+1},\text{VN-In}(\mbox{$\mathop{\mathbf{H}^{\mathtt{s}}}\limits$})),
๐ฑil+1=๐ฑil+Meanโ€‹(ฮ”โ€‹๐ฑil+1)+VN-Linโ€‹(๐ฑil,ฮ”โ€‹๐ฑil+1,๐‡๐šœ),\mbox{$\mathop{\mathbf{x}}\limits$}_{i}^{l+1}\!=\mbox{$\mathop{\mathbf{x}}\limits$}_{i}^{l}\!+\!\text{Mean}(\Delta{\mbox{$\mathop{\mathbf{x}}\limits$}_{i}^{l+1}})\!+\!\text{VN-Lin}(\mbox{$\mathop{\mathbf{x}}\limits$}_{i}^{l},\Delta\mbox{$\mathop{\mathbf{x}}\limits$}_{i}^{l+1},\mbox{$\mathop{\mathbf{H}^{\mathtt{s}}}\limits$}), (16)

where Nโ€‹(ai)N(a_{i}) is the set of NN-nearest neighbors of aia_{i} based on atomic distances; ฮ”โ€‹๐ฑil+1โˆˆโ„nhร—3\Delta\mbox{$\mathop{\mathbf{x}}\limits$}_{i}^{l+1}\in\mathbb{R}^{n_{h}\times 3} aggregates the neighborhood information of aia_{i}; MHA๐šกโ€‹(โ‹…)\text{MHA}^{\mathtt{x}}(\cdot) denotes the multi-head attention layer in ๐–ค๐–ฐโ€‹-โ€‹๐–ฆ๐–ญ๐–ญ\mathop{\mathsf{EQ}\text{-}\mathsf{GNN}}\limits with nhn_{h} heads; diโ€‹jld_{ij}^{l} is the distance between ii-th and jj-th atom positions ๐ฑil\mbox{$\mathop{\mathbf{x}}\limits$}_{i}^{l} and ๐ฑjl\mbox{$\mathop{\mathbf{x}}\limits$}_{j}^{l} at the ll-th layer; Meanโ€‹(ฮ”โ€‹๐ฑil+1)\text{Mean}(\Delta{\mbox{$\mathop{\mathbf{x}}\limits$}_{i}^{l+1}}) converts ฮ”โ€‹๐ฑil+1\Delta\mbox{$\mathop{\mathbf{x}}\limits$}_{i}^{l+1} into a 3D vector via meaning pooling to adjust the atom position; VN-Linโ€‹(โ‹…)โˆˆโ„3\text{VN-Lin}(\cdot)\in\mathbb{R}^{3} denotes the equivariant VN-based linear layerย (Deng etย al. 2021). VN-Linโ€‹(โ‹…)\text{VN-Lin}(\cdot) adjusts the atom positions to fit the shape condition represented by ๐‡๐šœ\mathop{\mathbf{H}^{\mathtt{s}}}\limits, by considering the current atom positions ๐ฑil\mbox{$\mathop{\mathbf{x}}\limits$}_{i}^{l} and the neighborhood information ฮ”โ€‹๐ฑil+1\Delta\mbox{$\mathop{\mathbf{x}}\limits$}_{i}^{l+1}. The learned atom position ๐ฑiL\mbox{$\mathop{\mathbf{x}}\limits$}_{i}^{L} at the last layer LL of ๐–ค๐–ฐโ€‹-โ€‹๐–ฆ๐–ญ๐–ญ\mathop{\mathsf{EQ}\text{-}\mathsf{GNN}}\limits is used as the prediction of ๐ฑ~i,0\tilde{\mbox{$\mathop{\mathbf{x}}\limits$}}_{i,0}, that is,

๐ฑ~i,0=๐ฑiL.\tilde{\mbox{$\mathop{\mathbf{x}}\limits$}}_{i,0}=\mbox{$\mathop{\mathbf{x}}\limits$}_{i}^{L}. (17)
Atom Feature Prediction

In ๐–จ๐–ญ๐–ตโ€‹-โ€‹๐–ฆ๐–ญ๐–ญ\mathop{\mathsf{INV}\text{-}\mathsf{GNN}}\limits, inspired by the previous workย (Guan etย al. 2023) and VN-Layerย (Deng etย al. 2021), the atom feature embedding ๐กil+1โˆˆโ„dh\mbox{$\mathop{\mathbf{h}}\limits$}_{i}^{l+1}\in\mathbb{R}^{d_{h}} of the ii-th atom aia_{i} at the (ll+1)-th layer of ๐–จ๐–ญ๐–ตโ€‹-โ€‹๐–ฆ๐–ญ๐–ญ\mathop{\mathsf{INV}\text{-}\mathsf{GNN}}\limits is updated in an invariant way as follows,

๐กil+1=๐กil+โˆ‘jโˆˆNโ€‹(ai),iโ‰ jMHA๐š‘โ€‹(diโ€‹jl,๐กil,๐กjl,VN-Inโ€‹(๐‡๐šœ)),๐กi0=๐ฏi,\!\!\!\mbox{$\mathop{\mathbf{h}}\limits$}_{i}^{l+1}\!\!=\!\!\mbox{$\mathop{\mathbf{h}}\limits$}_{i}^{l}\!+\!\sum_{\mathclap{j\in N(a_{i}),i\neq j}}\text{MHA}^{\mathtt{h}}(d_{ij}^{l},\mbox{$\mathop{\mathbf{h}}\limits$}_{i}^{l},\mbox{$\mathop{\mathbf{h}}\limits$}_{j}^{l},\text{VN-In}(\mbox{$\mathop{\mathbf{H}^{\mathtt{s}}}\limits$})),\\ \mbox{$\mathop{\mathbf{h}}\limits$}_{i}^{0}\!=\!\mbox{$\mathop{\mathbf{v}}$}_{i},\!\!\! (18)

where MHA๐š‘โ€‹(โ‹…)โˆˆโ„dh\text{MHA}^{\mathtt{h}}(\cdot)\in\mathbb{R}^{d_{h}} denotes the multi-head attention layer in ๐–จ๐–ญ๐–ตโ€‹-โ€‹๐–ฆ๐–ญ๐–ญ\mathop{\mathsf{INV}\text{-}\mathsf{GNN}}\limits. The learned atom feature embedding ๐กiL\mbox{$\mathop{\mathbf{h}}\limits$}_{i}^{L} at the last layer LL encodes the neighborhood information of aia_{i} and the conditioned molecular shape, and is used to predict the atom features as follows:

๐ฏ~i,0=softmaxโ€‹(MLPโ€‹(๐กiL)).\tilde{\mbox{$\mathop{\mathbf{v}}$}}_{i,0}=\text{softmax}(\text{MLP}(\mbox{$\mathop{\mathbf{h}}\limits$}_{i}^{L})). (19)

The proof of equivariance in Eq.ย 16 and invariance in Eq.ย 18 is available in Supplementary Sectionย LABEL:supp:backward:equivariance and ย LABEL:supp:backward:invariance.

Model Training

๐–ฒ๐—๐–บ๐—‰๐–พ๐–ฌ๐—ˆ๐—…\mathop{\mathsf{ShapeMol}}\limits optimizes ๐–ฃ๐–จ๐–ฅ๐–ฅ\mathop{\mathsf{DIFF}}\limits by minimizing the squared errors between the predicted positions (๐ฑ~0,t\tilde{\mbox{$\mathop{\mathbf{x}}\limits$}}_{0,t}) and the ground-truth positions (๐ฑ0\mbox{$\mathop{\mathbf{x}}\limits$}_{0}) of atoms in molecules. Given a particular step tt, the error is calculated as follows:

โ„’t๐šกโ€‹(๐™ผ)=wt๐šกโ€‹โˆ‘โˆ€aโˆˆ๐™ผโ€–๐ฑ~0,tโˆ’๐ฑ0โ€–2,\displaystyle\mathcal{L}^{\mathtt{x}}_{t}({\mbox{$\mathop{\mathtt{M}}\limits$}})=w_{t}^{\mathtt{x}}\sum\nolimits_{\forall a\in{{\mbox{$\mathop{\mathtt{M}}\limits$}}}}\|\tilde{\mbox{$\mathop{\mathbf{x}}\limits$}}_{0,t}-\mbox{$\mathop{\mathbf{x}}\limits$}_{0}\|^{2}, (20)
whereโ€‹wt๐šก=minโก(ฮปt,ฮด),ฮปt=ฮฑยฏt๐šก/(1โˆ’ฮฑยฏt๐šก),\displaystyle\text{where}\ w_{t}^{\mathtt{x}}=\min(\lambda_{t},\delta),\ \lambda_{t}={\bar{\alpha}^{\mathtt{x}}_{t}}/({1-\bar{\alpha}^{\mathtt{x}}_{t}}),

where wt๐šกw_{t}^{\mathtt{x}} is a weight at step tt, and is calculated by clipping the signal-to-noise ratio ฮปt>0\lambda_{t}>0 with a threshold ฮด>0\delta>0. Note that because ฮฑยฏt๐šก\bar{\alpha}_{t}^{\mathtt{x}} decreases monotonically as tt increases from 1 to TT (Eq.ย 7), wt๐šกw_{t}^{\mathtt{x}} decreases monotonically over tt as well until it is clipped. Thus, wt๐šกw_{t}^{\mathtt{x}} imposes lower weights on the loss when the noise level in ๐šกt\mathtt{x}_{t} is higher (i.e., at later/larger step tt). This encourages the model training to focus more on accurately recovering molecule structures when there are sufficient signals in the data, rather than being potentially confused by major noises in the data.

๐–ฒ๐—๐–บ๐—‰๐–พ๐–ฌ๐—ˆ๐—…\mathop{\mathsf{ShapeMol}}\limits also minimizes the KL divergenceย (Kullback and Leibler 1951) between the ground-truth posterior pโ€‹(๐ฏtโˆ’1|๐ฏt,๐ฏ0)p(\mbox{$\mathop{\mathbf{v}}$}_{t-1}|\mbox{$\mathop{\mathbf{v}}$}_{t},\mbox{$\mathop{\mathbf{v}}$}_{0}) (Eq.ย 10) and its approximate pฮธโ€‹(๐ฏtโˆ’1|๐ฏt,๐ฏ~0,t)p_{\theta}(\mbox{$\mathop{\mathbf{v}}$}_{t-1}|\mbox{$\mathop{\mathbf{v}}$}_{t},\tilde{\mbox{$\mathop{\mathbf{v}}$}}_{0,t}) (Eq.ย 15) for discrete atom features to optimize ๐–ฃ๐–จ๐–ฅ๐–ฅ\mathop{\mathsf{DIFF}}\limits, following the literatureย (Hoogeboom etย al. 2021). Particularly, the KL divergence at tt for a given molecule is calculated as follows:

โ„’t๐šŸ(๐™ผ)=โˆ‘โˆ€aโˆˆ๐™ผKL(p(๐ฏtโˆ’1|๐ฏt,๐ฏ0)|p๐šฏ(๐ฏtโˆ’1|๐ฏt,๐ฏ~0,t)),\mathcal{L}^{\mathtt{v}}_{t}({\mbox{$\mathop{\mathtt{M}}\limits$}})=\sum\nolimits_{\forall a\in{\mbox{$\mathop{\mathtt{M}}\limits$}}}\text{KL}(p(\mbox{$\mathop{\mathbf{v}}$}_{t-1}|\mbox{$\mathop{\mathbf{v}}$}_{t},\mbox{$\mathop{\mathbf{v}}$}_{0})|p_{\boldsymbol{\Theta}}(\mbox{$\mathop{\mathbf{v}}$}_{t-1}|\mbox{$\mathop{\mathbf{v}}$}_{t},\tilde{\mbox{$\mathop{\mathbf{v}}$}}_{0,t})),
=โˆ‘โˆ€aโˆˆ๐™ผKLโ€‹(๐œโ€‹(๐ฏt,๐ฏ0)|๐œ๐šฏโ€‹(๐ฏt,๐ฏ~0,t)),\!\!\!\!\!\!\!\!\!\!\!\!=\sum\nolimits_{\forall a\in{\mbox{$\mathop{\mathtt{M}}\limits$}}}\text{KL}(\mathbf{c}(\mbox{$\mathop{\mathbf{v}}$}_{t},\mbox{$\mathop{\mathbf{v}}$}_{0})|\mathbf{c}_{\boldsymbol{\Theta}}(\mbox{$\mathop{\mathbf{v}}$}_{t},\tilde{\mbox{$\mathop{\mathbf{v}}$}}_{0,t})), (21)

where ๐œโ€‹(๐ฏt,๐ฏ0)\mathbf{c}(\mbox{$\mathop{\mathbf{v}}$}_{t},\mbox{$\mathop{\mathbf{v}}$}_{0}) is a categorical distribution of ๐ฏtโˆ’1\mbox{$\mathop{\mathbf{v}}$}_{t-1} (Eq.ย 11); ๐œ๐šฏโ€‹(๐ฏt,๐ฏ~0,t)\mathbf{c}_{\boldsymbol{\Theta}}(\mbox{$\mathop{\mathbf{v}}$}_{t},\tilde{\mbox{$\mathop{\mathbf{v}}$}}_{0,t}) is an estimate of ๐œโ€‹(๐ฏt,๐ฏ0)\mathbf{c}(\mbox{$\mathop{\mathbf{v}}$}_{t},\mbox{$\mathop{\mathbf{v}}$}_{0}) (Eq.ย 15). The overall ๐–ฒ๐—๐–บ๐—‰๐–พ๐–ฌ๐—ˆ๐—…\mathop{\mathsf{ShapeMol}}\limits loss function is defined as follows:

โ„’=โˆ‘โˆ€๐™ผโˆˆโ„ณโˆ‘โˆ€tโˆˆ๐’ฏ(โ„’t๐šกโ€‹(๐™ผ)+ฮพโ€‹โ„’t๐šŸโ€‹(๐™ผ)),\mathcal{L}=\sum\nolimits_{\forall{\mbox{$\mathop{\mathtt{M}}\limits$}}\in\mathcal{M}}\sum\nolimits_{\forall t\in\mathcal{T}}(\mathcal{L}^{\mathtt{x}}_{t}(\mbox{$\mathop{\mathtt{M}}\limits$})+\xi\mathcal{L}^{\mathtt{v}}_{t}(\mbox{$\mathop{\mathtt{M}}\limits$})), (22)

where โ„ณ\mathcal{M} is the set of all the molecules in training; ๐’ฏ\mathcal{T} is the set of sampled timesteps; ฮพ>0\xi>0 is a hyper-parameter to balance โ„’t๐šก\mathcal{L}^{\mathtt{x}}_{t}(๐™ผ\mathop{\mathtt{M}}\limits) and โ„’t๐šŸ\mathcal{L}^{\mathtt{v}}_{t}(๐™ผ\mathop{\mathtt{M}}\limits). During training, step tt is uniformly sampled from {1,2,โ‹ฏ,1000}\{1,2,\cdots,1000\}. The derivation of the loss functions is available in Supplementary Sectionย LABEL:supp:training:loss.

Molecule Generation

During inference, ๐–ฒ๐—๐–บ๐—‰๐–พ๐–ฌ๐—ˆ๐—…\mathop{\mathsf{ShapeMol}}\limits generates novel molecules by gradually denoising (๐ฑT,๐ฏT)(\mbox{$\mathop{\mathbf{x}}\limits$}_{T},\mbox{$\mathop{\mathbf{v}}$}_{T}) to (๐ฑ0,๐ฏ0)(\mbox{$\mathop{\mathbf{x}}\limits$}_{0},\mbox{$\mathop{\mathbf{v}}$}_{0}) using the equivariant shape-conditioned molecule predictor. Specifically, ๐–ฒ๐—๐–บ๐—‰๐–พ๐–ฌ๐—ˆ๐—…\mathop{\mathsf{ShapeMol}}\limits samples ๐ฑT\mbox{$\mathop{\mathbf{x}}\limits$}_{T} and ๐ฏT\mbox{$\mathop{\mathbf{v}}$}_{T} from ๐’ฉโ€‹(๐ŸŽ,๐ˆ)\mathcal{N}(\mathbf{0},\mathbf{I}) and ๐’žโ€‹(๐Ÿ/K)\mathcal{C}(\mathbf{1}/K), respectively. After that, ๐–ฒ๐—๐–บ๐—‰๐–พ๐–ฌ๐—ˆ๐—…\mathop{\mathsf{ShapeMol}}\limits samples ๐ฑtโˆ’1\mbox{$\mathop{\mathbf{x}}\limits$}_{t-1} from ๐ฑt\mbox{$\mathop{\mathbf{x}}\limits$}_{t} using p๐šฏโ€‹(๐ฑtโˆ’1|๐ฑt,๐ฑ~0,t)p_{\boldsymbol{\Theta}}(\mbox{$\mathop{\mathbf{x}}\limits$}_{t-1}|\mbox{$\mathop{\mathbf{x}}\limits$}_{t},\tilde{\mbox{$\mathop{\mathbf{x}}\limits$}}_{0,t}) (Eq.ย 14). Similarly, ๐–ฒ๐—๐–บ๐—‰๐–พ๐–ฌ๐—ˆ๐—…\mathop{\mathsf{ShapeMol}}\limits samples ๐ฏtโˆ’1\mbox{$\mathop{\mathbf{v}}$}_{t-1} from ๐ฏt\mbox{$\mathop{\mathbf{v}}$}_{t} using p๐šฏโ€‹(๐ฏtโˆ’1|๐ฏt,๐ฏ~0,t)p_{\boldsymbol{\Theta}}(\mbox{$\mathop{\mathbf{v}}$}_{t-1}|\mbox{$\mathop{\mathbf{v}}$}_{t},\tilde{\mbox{$\mathop{\mathbf{v}}$}}_{0,t}) (Eq.ย 15) until tt reaches 1.

๐–ฒ๐—๐–บ๐—‰๐–พ๐–ฌ๐—ˆ๐—…\mathop{\mathsf{ShapeMol}}\limits with Shape Guidance

During molecule generation, ๐–ฒ๐—๐–บ๐—‰๐–พ๐–ฌ๐—ˆ๐—…\mathop{\mathsf{ShapeMol}}\limits can also utilize additional shape guidance by pushing the predicted atoms to the shape of the given molecule ๐™ผx\mathop{\mathtt{M}_{x}}\limits. Particularly, following Adams and Coley et al.ย (2023), the shape used for guidance is defined as a set of points ๐’ฌ\mathcal{Q} sampled according to atom positions in ๐™ผx\mathop{\mathtt{M}_{x}}\limits. Particularly, for each atom aia_{i} in ๐™ผx\mathop{\mathtt{M}_{x}}\limits, 20 points are randomly sampled into ๐’ฌ\mathcal{Q} from a Gaussian distribution centered at ๐ฑi\mbox{$\mathop{\mathbf{x}}\limits$}_{i} with variance ฯ•\phi. Given the predicted atom position ๐ฑ~0,t\tilde{\mbox{$\mathop{\mathbf{x}}\limits$}}_{0,t} at step tt, ๐–ฒ๐—๐–บ๐—‰๐–พ๐–ฌ๐—ˆ๐—…\mathop{\mathsf{ShapeMol}}\limits applies the shape guidance by adjusting the predicted positions to ๐™ผx\mathop{\mathtt{M}_{x}}\limits as follows:

๐ฑ0,tโˆ—=(1โˆ’ฯƒ)โ€‹๐ฑ~0,t+ฯƒโ€‹โˆ‘๐ณโˆˆnโ€‹(๐ฑ~0,t;๐’ฌ)๐ณ/n,whenย โ€‹โˆ‘๐ณโˆˆnโ€‹(๐ฑ~0,t;๐’ฌ)dโ€‹(๐ฑ~0,t,๐ณ)/n>ฮณ,\displaystyle\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\mbox{$\mathop{\mathbf{x}}\limits$}_{0,t}^{*}\!\!=\!\!(1\!-\!\sigma)\tilde{\mbox{$\mathop{\mathbf{x}}\limits$}}_{0,t}\!\!+\!\sigma\!{\sum_{\mathclap{\mathbf{z}\in n(\tilde{{\mbox{$\mathop{\mathbf{x}}\limits$}}}_{0,t};\mathcal{Q})}}\mathbf{z}}/{n},\text{when }\!\!\sum_{\mathclap{\mathbf{z}\in n(\tilde{{\mbox{$\mathop{\mathbf{x}}\limits$}}}_{0,t};\mathcal{Q})}}d(\tilde{\mbox{$\mathop{\mathbf{x}}\limits$}}_{0,t},\mathbf{z})/n\!>\!\gamma, (23)

where ฯƒ>0\sigma>0 is the weight used to balance the prediction ๐ฑ~0,t\tilde{\mbox{$\mathop{\mathbf{x}}\limits$}}_{0,t} and the adjustment; dโ€‹(๐ฑ~0,t,๐ณ)d(\tilde{\mbox{$\mathop{\mathbf{x}}\limits$}}_{0,t},\mathbf{z}) is the Euclidean distance between ๐ฑ~0,t\tilde{\mbox{$\mathop{\mathbf{x}}\limits$}}_{0,t} and ๐ณ\mathbf{z}; nโ€‹(๐ฑ~0,t;๐’ฌ)n(\tilde{\mbox{$\mathop{\mathbf{x}}\limits$}}_{0,t};\mathcal{Q}) is the set of nn-nearest neighbors of ๐ฑ~0,t\tilde{\mbox{$\mathop{\mathbf{x}}\limits$}}_{0,t} in ๐’ฌ\mathcal{Q} based on dโ€‹(โ‹…)d(\cdot); ฮณ>0\gamma>0 is a distance threshold. By doing the above adjustment, the predicted atom positions will be pushed to those of ๐™ผx\mathop{\mathtt{M}_{x}}\limits if they are sufficiently far away. Note that the shape guidance is applied exclusively for steps

t=T,Tโˆ’1,โ‹ฏ,Sโ€‹, whereย โ€‹S>1,t=T,T-1,\cdots,S\text{, where }S>1, (24)

not for all the steps, and thus it only adjusts predicted atom positions when there are a lot of noises and the prediction needs more guidance. ๐–ฒ๐—๐–บ๐—‰๐–พ๐–ฌ๐—ˆ๐—…\mathop{\mathsf{ShapeMol}}\limits with the shape guidance is referred to as ๐–ฒ๐—๐–บ๐—‰๐–พ๐–ฌ๐—ˆ๐—…โ€‹+โ€‹๐—€\mathop{\mathsf{ShapeMol}\text{+}\mathsf{g}}\limits.

Experiments

Data

Following ๐–ฒ๐–ฐ๐–ด๐–จ๐–ฃ\mathop{\mathsf{SQUID}}\limitsย (Adams and Coley 2023), we used molecules in the MOSES datasetย (Polykovskiy etย al. 2020), with their 3D conformers calculated by RDKitย (Landrum etย al. 2023). We used the same training and test split as in ๐–ฒ๐–ฐ๐–ด๐–จ๐–ฃ\mathop{\mathsf{SQUID}}\limits. Please note that ๐–ฒ๐–ฐ๐–ด๐–จ๐–ฃ\mathop{\mathsf{SQUID}}\limits further modifies the generated conformers into artificial ones, by adjusting acyclic bond distances to their empirical means and fixing acyclic bond angles using heuristic rules. Unlike ๐–ฒ๐–ฐ๐–ด๐–จ๐–ฃ\mathop{\mathsf{SQUID}}\limits, we did not make any additional adjustments to the calculated 3D conformers, as ๐–ฒ๐—๐–บ๐—‰๐–พ๐–ฌ๐—ˆ๐—…\mathop{\mathsf{ShapeMol}}\limits is designed with sufficient flexibility to accept any 3D conformers as input and generate 3D molecules without restrictions on fixed bond lengths or angles. Limited by the predefined fragment library, ๐–ฒ๐–ฐ๐–ด๐–จ๐–ฃ\mathop{\mathsf{SQUID}}\limits also removes molecules with fragments not present in its fragment library. In contrast, we kept all the molecules, as ๐–ฒ๐—๐–บ๐—‰๐–พ๐–ฌ๐—ˆ๐—…\mathop{\mathsf{ShapeMol}}\limits is not based on fragments. Our final training dataset contains 1,593,653 molecules, out of which a random set of 1,000 molecules was selected for validation. Both the ๐–ฒ๐—๐–บ๐—‰๐–พ๐–ฌ๐—ˆ๐—…โ€‹-โ€‹๐–พ๐—‡๐–ผ\mathop{\mathsf{ShapeMol}\text{-}\mathsf{enc}}\limits and ๐–ฃ๐–จ๐–ฅ๐–ฅ\mathop{\mathsf{DIFF}}\limits models are trained using this training set. 1,000 test molecules (i.e., conditions) as used in ๐–ฒ๐–ฐ๐–ด๐–จ๐–ฃ\mathop{\mathsf{SQUID}}\limits are used to test ๐–ฒ๐—๐–บ๐—‰๐–พ๐–ฌ๐—ˆ๐—…\mathop{\mathsf{ShapeMol}}\limits.

Baselines

We compared ๐–ฒ๐—๐–บ๐—‰๐–พ๐–ฌ๐—ˆ๐—…\mathop{\mathsf{ShapeMol}}\limits and ๐–ฒ๐—๐–บ๐—‰๐–พ๐–ฌ๐—ˆ๐—…โ€‹+โ€‹๐—€\mathop{\mathsf{ShapeMol}\text{+}\mathsf{g}}\limits with the state-of-the-art baseline ๐–ฒ๐–ฐ๐–ด๐–จ๐–ฃ\mathop{\mathsf{SQUID}}\limits and a virtual screening method over the training dataset, denoted as ๐–ต๐–ฒ\mathop{\mathsf{VS}}\limits. As far as we know, ๐–ฒ๐–ฐ๐–ด๐–จ๐–ฃ\mathop{\mathsf{SQUID}}\limits is the only generative baseline that generates 3D molecules conditioned on molecule shapes. ๐–ฒ๐–ฐ๐–ด๐–จ๐–ฃ\mathop{\mathsf{SQUID}}\limits consists of a fragment-based generative model based on variational autoencoder that sequentially decodes fragments from molecule latent embeddings and shape embeddings, and a rotatable bond scoring framework that adjusts the angles of rotatable bonds between fragments to maximize the 3D shape similarity with the condition molecule. ๐–ต๐–ฒ\mathop{\mathsf{VS}}\limits aims to sift through the training set to identify molecules with high shape similarities with the condition molecule. For ๐–ฒ๐–ฐ๐–ด๐–จ๐–ฃ\mathop{\mathsf{SQUID}}\limits, we assessed two interpolation levels, ฮป=0.3\lambda=0.3 and 1.01.0 (prior), following the original ๐–ฒ๐–ฐ๐–ด๐–จ๐–ฃ\mathop{\mathsf{SQUID}}\limits paperย (Adams and Coley 2023). For ๐–ฒ๐–ฐ๐–ด๐–จ๐–ฃ\mathop{\mathsf{SQUID}}\limits, ๐–ฒ๐—๐–บ๐—‰๐–พ๐–ฌ๐—ˆ๐—…\mathop{\mathsf{ShapeMol}}\limits and ๐–ฒ๐—๐–บ๐—‰๐–พ๐–ฌ๐—ˆ๐—…โ€‹+โ€‹๐—€\mathop{\mathsf{ShapeMol}\text{+}\mathsf{g}}\limits, we generated 50 molecules for each testing molecule (i.e., condition) as the candidates for evaluation. For ๐–ต๐–ฒ\mathop{\mathsf{VS}}\limits, we randomly sampled 500 training molecules for each testing molecule, and considered the top-50 molecules with the highest shape similarities as candidates for evaluation.

Evaluation Metrics

We use shape similarity ๐–ฒ๐—‚๐—†๐šœ(๐šœx,๐šœy)\mbox{$\mathop{\mathsf{Sim}_{\mathtt{s}}}\limits$}(\mbox{$\mathop{\mathtt{s}}\limits$}_{x},\mbox{$\mathop{\mathtt{s}}\limits$}_{y}) and molecular graph similarity ๐–ฒ๐—‚๐—†๐š(๐™ผx,๐™ผy)\mbox{$\mathop{\mathsf{Sim}_{\mathtt{g}}}\limits$}(\mbox{$\mathop{\mathtt{M}_{x}}\limits$},\mbox{$\mathop{\mathtt{M}_{y}}\limits$}) to measure the generated new molecules ๐™ผy\mathop{\mathtt{M}_{y}}\limits with respective to the condition ๐™ผx\mathop{\mathtt{M}_{x}}\limits. Higher ๐–ฒ๐—‚๐—†๐šœ\mathop{\mathsf{Sim}_{\mathtt{s}}}\limits and meanwhile lower ๐–ฒ๐—‚๐—†๐š\mathop{\mathsf{Sim}_{\mathtt{g}}}\limits indicate better model performance. We also measure the diversity (๐–ฝ๐—‚๐—\mathop{\mathsf{div}}\limits) of the generated molecules, calculated as 1 minus average pairwise ๐–ฒ๐—‚๐—†๐š\mathop{\mathsf{Sim}_{\mathtt{g}}}\limits among all generated molecules. Higher ๐–ฝ๐—‚๐—\mathop{\mathsf{div}}\limitsย indicates better performance. Details about the evaluation metrics are available in Supplementary Sectionย LABEL:supp:experiments:metrics.

Performance Comparison

Table 1: Overall Comparison on Shape-Conditioned Molecule Generation
method #c% #u% ๐–ฐ๐–ค๐–ฃ\mathop{\mathsf{QED}}\limits ๐–บ๐—๐—€๐–ฒ๐—‚๐—†๐šœ\mathop{\mathsf{avgSim}_{\mathtt{s}}}\limits(std) ๐–บ๐—๐—€๐–ฒ๐—‚๐—†๐š\mathop{\mathsf{avgSim}_{\mathtt{g}}}\limits(std) ๐—†๐–บ๐—‘๐–ฒ๐—‚๐—†๐šœ\mathop{\mathsf{maxSim}_{\mathtt{s}}}\limits(std) ๐—†๐–บ๐—‘๐–ฒ๐—‚๐—†๐š\mathop{\mathsf{maxSim}_{\mathtt{g}}}\limits(std) ๐–ฝ๐—‚๐—\mathop{\mathsf{div}}\limits(std)
๐–ต๐–ฒ\mathop{\mathsf{VS}}\limits 100.0 100.0 0.795 0.729 (0.039) 0.226 (0.038) 0.807 (0.042) 0.241 (0.087) 0.759 (0.015)
๐–ฒ๐–ฐ๐–ด๐–จ๐–ฃ\mathop{\mathsf{SQUID}}\limits (ฮป\lambda=0.3) 100.0 94.2 0.766 0.717 (0.083) 0.349 (0.088) 0.904 (0.070) 0.549 (0.243) 0.677 (0.065)
๐–ฒ๐–ฐ๐–ด๐–จ๐–ฃ\mathop{\mathsf{SQUID}}\limits (ฮป\lambda=1.0) 100.0 95.0 0.760 0.670 (0.069) 0.235 (0.045) 0.842 (0.061) 0.271 (0.096) 0.744 (0.046)
๐–ฒ๐—๐–บ๐—‰๐–พ๐–ฌ๐—ˆ๐—…\mathop{\mathsf{ShapeMol}}\limits 98.8 100.0 0.748 0.689 (0.044) 0.239 (0.049) 0.803 (0.042) 0.243 (0.068) 0.712 (0.055)
๐–ฒ๐—๐–บ๐—‰๐–พ๐–ฌ๐—ˆ๐—…โ€‹+โ€‹๐—€\mathop{\mathsf{ShapeMol}\text{+}\mathsf{g}}\limits 98.7 100.0 0.749 0.746 (0.036) 0.241 (0.050) 0.852 (0.034) 0.247 (0.068) 0.703 (0.053)
  • โ€ข

    โ€‹โ€‹Columns represent: โ€œ#c%": the percentage of connected molecules; โ€œ#u%โ€: the percentage of unique molecules; โ€œ๐–ฐ๐–ค๐–ฃ\mathop{\mathsf{QED}}\limitsโ€: the average drug-likeness of generated molecules; โ€œ๐–บ๐—๐—€๐–ฒ๐—‚๐—†๐šœ\mathop{\mathsf{avgSim}_{\mathtt{s}}}\limits/๐–บ๐—๐—€๐–ฒ๐—‚๐—†๐š\mathop{\mathsf{avgSim}_{\mathtt{g}}}\limitsโ€: the average of shape or graph similarities between the condition molecules and generated molecules; โ€œstd": the standard deviation; โ€œ๐—†๐–บ๐—‘๐–ฒ๐—‚๐—†๐šœ\mathop{\mathsf{maxSim}_{\mathtt{s}}}\limitsโ€: the maximum of shape similarities between the condition molecules and generated molecules; โ€œ๐—†๐–บ๐—‘๐–ฒ๐—‚๐—†๐š\mathop{\mathsf{maxSim}_{\mathtt{g}}}\limitsโ€: the graph similarities between the condition molecules and the molecules with the maximum shape similarities; โ€œ๐–ฝ๐—‚๐—\mathop{\mathsf{div}}\limitsโ€: the diversity among the generated molecules.

Overall Comparison

Tableย 1 presents the overall comparison of shape-conditioned molecule generation among ๐–ต๐–ฒ\mathop{\mathsf{VS}}\limits, ๐–ฒ๐–ฐ๐–ด๐–จ๐–ฃ\mathop{\mathsf{SQUID}}\limits, ๐–ฒ๐—๐–บ๐—‰๐–พ๐–ฌ๐—ˆ๐—…\mathop{\mathsf{ShapeMol}}\limits and ๐–ฒ๐—๐–บ๐—‰๐–พ๐–ฌ๐—ˆ๐—…โ€‹+โ€‹๐—€\mathop{\mathsf{ShapeMol}\text{+}\mathsf{g}}\limits. As shown in Tableย 1, ๐–ฒ๐—๐–บ๐—‰๐–พ๐–ฌ๐—ˆ๐—…โ€‹+โ€‹๐—€\mathop{\mathsf{ShapeMol}\text{+}\mathsf{g}}\limits achieves the highest average shape similarity 0.746ยฑ\pm0.036, with 2.3% improvement from the best baseline ๐–ต๐–ฒ\mathop{\mathsf{VS}}\limits (0.729ยฑ\pm0.039), although at the cost of a slightly higher graph similarity (0.241ยฑ\pm0.050 in ๐–ฒ๐—๐–บ๐—‰๐–พ๐–ฌ๐—ˆ๐—…โ€‹+โ€‹๐—€\mathop{\mathsf{ShapeMol}\text{+}\mathsf{g}}\limits vs 0.226ยฑ\pm0.038 in ๐–ต๐–ฒ\mathop{\mathsf{VS}}\limits). This indicates that ๐–ฒ๐—๐–บ๐—‰๐–พ๐–ฌ๐—ˆ๐—…โ€‹+โ€‹๐—€\mathop{\mathsf{ShapeMol}\text{+}\mathsf{g}}\limits could generate molecules that align more closely with the shape conditions than those in the dataset. Furthermore, ๐–ฒ๐—๐–บ๐—‰๐–พ๐–ฌ๐—ˆ๐—…โ€‹+โ€‹๐—€\mathop{\mathsf{ShapeMol}\text{+}\mathsf{g}}\limits achieves the second-best performance in maximum shape similarity ๐—†๐–บ๐—‘๐–ฒ๐—‚๐—†๐šœ\mathop{\mathsf{maxSim}_{\mathtt{s}}}\limitsย at 0.852ยฑ\pm0.034 among all the methods. While it underperforms the best baseline (0.904ยฑ\pm0.070 for ๐–ฒ๐–ฐ๐–ด๐–จ๐–ฃ\mathop{\mathsf{SQUID}}\limits with ฮป\lambda=0.3) on this metric, ๐–ฒ๐—๐–บ๐—‰๐–พ๐–ฌ๐—ˆ๐—…โ€‹+โ€‹๐—€\mathop{\mathsf{ShapeMol}\text{+}\mathsf{g}}\limits achieves substantially lower maximum graph similarity ๐—†๐–บ๐—‘๐–ฒ๐—‚๐—†๐š\mathop{\mathsf{maxSim}_{\mathtt{g}}}\limitsย of 0.247ยฑ\pm0.068 compared with the best baseline (0.549ยฑ\pm0.243). This highlights the ability of ๐–ฒ๐—๐–บ๐—‰๐–พ๐–ฌ๐—ˆ๐—…โ€‹+โ€‹๐—€\mathop{\mathsf{ShapeMol}\text{+}\mathsf{g}}\limits in generating novel molecules that resemble the shape conditions. ๐–ฒ๐—๐–บ๐—‰๐–พ๐–ฌ๐—ˆ๐—…โ€‹+โ€‹๐—€\mathop{\mathsf{ShapeMol}\text{+}\mathsf{g}}\limits also achieves the lowest standard deviation values on both the average and maximum shape similarities (0.036 and 0.034, respectively) among all the methods, further demonstrating its ability to consistently generate molecules with high shape similarities.

๐–ฒ๐—๐–บ๐—‰๐–พ๐–ฌ๐—ˆ๐—…โ€‹+โ€‹๐—€\mathop{\mathsf{ShapeMol}\text{+}\mathsf{g}}\limits performs substantially better than ๐–ฒ๐—๐–บ๐—‰๐–พ๐–ฌ๐—ˆ๐—…\mathop{\mathsf{ShapeMol}}\limits on 3D shape similarity metrics (e.g., 0.746ยฑ\pm0.036 vs 0.689ยฑ\pm0.044 on ๐–บ๐—๐—€๐–ฒ๐—‚๐—†๐šœ\mathop{\mathsf{avgSim}_{\mathtt{s}}}\limits). The superior performance of ๐–ฒ๐—๐–บ๐—‰๐–พ๐–ฌ๐—ˆ๐—…โ€‹+โ€‹๐—€\mathop{\mathsf{ShapeMol}\text{+}\mathsf{g}}\limits highlights the importance of shape guidance in the generative process. Although ๐–ฒ๐—๐–บ๐—‰๐–พ๐–ฌ๐—ˆ๐—…\mathop{\mathsf{ShapeMol}}\limits underperforms ๐–ฒ๐—๐–บ๐—‰๐–พ๐–ฌ๐—ˆ๐—…โ€‹+โ€‹๐—€\mathop{\mathsf{ShapeMol}\text{+}\mathsf{g}}\limits, it still outperforms ๐–ฒ๐–ฐ๐–ด๐–จ๐–ฃ\mathop{\mathsf{SQUID}}\limits with ฮป\lambda=1.0 in terms of the ๐–บ๐—๐—€๐–ฒ๐—‚๐—†๐šœ\mathop{\mathsf{avgSim}_{\mathtt{s}}}\limitsย (i.e., 0.689ยฑ\pm0.044 vs 0.670ยฑ\pm0.069).

In terms of the quality of generated molecules, 98.7% of molecules from ๐–ฒ๐—๐–บ๐—‰๐–พ๐–ฌ๐—ˆ๐—…โ€‹+โ€‹๐—€\mathop{\mathsf{ShapeMol}\text{+}\mathsf{g}}\limits and 98.8% from ๐–ฒ๐—๐–บ๐—‰๐–พ๐–ฌ๐—ˆ๐—…\mathop{\mathsf{ShapeMol}}\limits are connected, and every connected molecule is unique. ๐–ฒ๐–ฐ๐–ด๐–จ๐–ฃ\mathop{\mathsf{SQUID}}\limits with ฮป\lambda values of 0.3 or 1.0 ensures the 100% connectivity among generated molecules by sequentially attaching fragments. However, out of these connected molecules, 94.2% and 95.0% are unique for ๐–ฒ๐–ฐ๐–ด๐–จ๐–ฃ\mathop{\mathsf{SQUID}}\limits with ฮป\lambda value of 0.3 or 1.0, respectively. In terms of the drug-likeness (QED), both ๐–ฒ๐—๐–บ๐—‰๐–พ๐–ฌ๐—ˆ๐—…โ€‹+โ€‹๐—€\mathop{\mathsf{ShapeMol}\text{+}\mathsf{g}}\limits and ๐–ฒ๐—๐–บ๐—‰๐–พ๐–ฌ๐—ˆ๐—…\mathop{\mathsf{ShapeMol}}\limits achieve QED values (e.g., 0.749 for ๐–ฒ๐—๐–บ๐—‰๐–พ๐–ฌ๐—ˆ๐—…โ€‹+โ€‹๐—€\mathop{\mathsf{ShapeMol}\text{+}\mathsf{g}}\limits) close to those of ๐–ฒ๐–ฐ๐–ด๐–จ๐–ฃ\mathop{\mathsf{SQUID}}\limits with ฮป\lambda as 0.3 and 1.0 (e.g., 0.760 for ๐–ฒ๐–ฐ๐–ด๐–จ๐–ฃ\mathop{\mathsf{SQUID}}\limits with ฮป\lambda=0.3). All the generative methods produce slightly inferior QED values to real molecules (0.795 for ๐–ต๐–ฒ\mathop{\mathsf{VS}}\limits). In terms of diversity, ๐–ฒ๐—๐–บ๐—‰๐–พ๐–ฌ๐—ˆ๐—…โ€‹+โ€‹๐—€\mathop{\mathsf{ShapeMol}\text{+}\mathsf{g}}\limits and ๐–ฒ๐—๐–บ๐—‰๐–พ๐–ฌ๐—ˆ๐—…\mathop{\mathsf{ShapeMol}}\limits achieve higher diversity values (e.g., 0.703ยฑ\pm0.053 for ๐–ฒ๐—๐–บ๐—‰๐–พ๐–ฌ๐—ˆ๐—…โ€‹+โ€‹๐—€\mathop{\mathsf{ShapeMol}\text{+}\mathsf{g}}\limits) than ๐–ฒ๐–ฐ๐–ด๐–จ๐–ฃ\mathop{\mathsf{SQUID}}\limits with ฮป\lambda=1.0 (0.677ยฑ\pm0.065), though slightly lower than ๐–ฒ๐–ฐ๐–ด๐–จ๐–ฃ\mathop{\mathsf{SQUID}}\limits with ฮป\lambda=0.3 and ๐–ต๐–ฒ\mathop{\mathsf{VS}}\limits. Overall, ๐–ฒ๐—๐–บ๐—‰๐–พ๐–ฌ๐—ˆ๐—…\mathop{\mathsf{ShapeMol}}\limits and ๐–ฒ๐—๐–บ๐—‰๐–พ๐–ฌ๐—ˆ๐—…โ€‹+โ€‹๐—€\mathop{\mathsf{ShapeMol}\text{+}\mathsf{g}}\limits are able to generate connected, unique and diverse molecules with good drug-likeness scores.

Please note that unlike ๐–ฒ๐–ฐ๐–ด๐–จ๐–ฃ\mathop{\mathsf{SQUID}}\limits, which neglects distorted bonding geometries in real molecules and limits itself to generating molecules with fixed bond lengths and angles, both ๐–ฒ๐—๐–บ๐—‰๐–พ๐–ฌ๐—ˆ๐—…\mathop{\mathsf{ShapeMol}}\limits and ๐–ฒ๐—๐–บ๐—‰๐–พ๐–ฌ๐—ˆ๐—…โ€‹+โ€‹๐—€\mathop{\mathsf{ShapeMol}\text{+}\mathsf{g}}\limits are able to generate molecules without such limitations. Given the superior performance of ๐–ฒ๐—๐–บ๐—‰๐–พ๐–ฌ๐—ˆ๐—…โ€‹+โ€‹๐—€\mathop{\mathsf{ShapeMol}\text{+}\mathsf{g}}\limits in shape-conditioned molecule generation, it could serve as a promising tool for ligand-based drug design.

Comparison of Diffusion Weighting Schemes

Table 2: Comparison of Diffusion Weighting Schemes
method weights #c% #u% ๐–ฐ๐–ค๐–ฃ\mathop{\mathsf{QED}}\limits JS divergence
bond C-C
๐–ฒ๐—๐–บ๐—‰๐–พ๐–ฌ๐—ˆ๐—…\mathop{\mathsf{ShapeMol}}\limits wt๐šกw^{\mathtt{x}}_{t} 98.8 100.0 0.748 0.095 0.321
uniform 89.4 100.0 0.660 0.115 0.393
๐–ฒ๐—๐–บ๐—‰๐–พ๐–ฌ๐—ˆ๐—…โ€‹+โ€‹๐—€\mathop{\mathsf{ShapeMol}\text{+}\mathsf{g}}\limits wt๐šกw^{\mathtt{x}}_{t} 98.7 100.0 0.749 0.093 0.317
uniform 90.1 100.0 0.671 0.112 0.384
  • โ€ข

    โ€‹โ€‹Columns represent: โ€œweights": different weighting schemes; โ€œJS distance of bond/C-Cโ€: the Jensen-Shannon (JS) divergence of bond length among all the bond types (โ€œbond")/carbon-carbon single bonds (โ€œC-C") between real molecules and generated molecules; All the others are identical to those in Tableย 1.

While previous workย (Peng etย al. 2023; Guan etย al. 2023) applied uniform weights on different diffusion steps, ๐–ฒ๐—๐–บ๐—‰๐–พ๐–ฌ๐—ˆ๐—…\mathop{\mathsf{ShapeMol}}\limits uses different weights (i.e., wt๐šกw^{\mathtt{x}}_{t} in Eq.ย 20). We conducted an ablation study to demonstrate the effectiveness of this new weighting scheme. Particularly, we trained two ๐–ฃ๐–จ๐–ฅ๐–ฅ\mathop{\mathsf{DIFF}}\limits modules with the varying step weights wt๐šกw^{\mathtt{x}}_{t} (with ฮด=10\delta=10 in Eq.ย 20) and uniform weights, respectively, while fixing all the other hyper-parameters in ๐–ฒ๐—๐–บ๐—‰๐–พ๐–ฌ๐—ˆ๐—…\mathop{\mathsf{ShapeMol}}\limits and ๐–ฒ๐—๐–บ๐—‰๐–พ๐–ฌ๐—ˆ๐—…โ€‹+โ€‹๐—€\mathop{\mathsf{ShapeMol}\text{+}\mathsf{g}}\limits. Tableย 2 presents their performance comparison.

The results in Tableย 2 show that the different weights on different steps substantially improve the quality of the generated molecules. Specifically, ๐–ฒ๐—๐–บ๐—‰๐–พ๐–ฌ๐—ˆ๐—…\mathop{\mathsf{ShapeMol}}\limits with different weights ensures higher molecular connectivity and drug-likeness than that with uniform weights (98.8%98.8\% vs 89.4%89.4\% for connectivity; 0.7480.748 vs 0.6600.660 for QED). ๐–ฒ๐—๐–บ๐—‰๐–พ๐–ฌ๐—ˆ๐—…\mathop{\mathsf{ShapeMol}}\limits with different weights also produces molecules with bond length distributions closer to those of real molecules (i.e., lower Jensen-Shannon divergence), for example, the Jensen-Shannon (JS) divergence of bond lengths between real and generated molecules decreases from 0.115 to 0.095 when different weights are applied. The same trend can be observed for ๐–ฒ๐—๐–บ๐—‰๐–พ๐–ฌ๐—ˆ๐—…โ€‹+โ€‹๐—€\mathop{\mathsf{ShapeMol}\text{+}\mathsf{g}}\limits, for which the different weights also improve the generated molecule qualities. Since wt๐šกw^{\mathtt{x}}_{t} increases as the noise level in the data decreases (See discussions earlier in โ€œModel Training"), the results in Tableย 2 demonstrate the effectiveness of the new weighting scheme in promoting new molecules generated more similarly to real ones when the noise level in data is small.

Parameter Study

Table 3: Parameter Study in Shape Guidance
ฮณ\gamma SS ๐–ฐ๐–ค๐–ฃ\mathop{\mathsf{QED}}\limits JS. bond ๐–บ๐—๐—€๐–ฒ๐—‚๐—†๐šœ\mathop{\mathsf{avgSim}_{\mathtt{s}}}\limits ๐–บ๐—๐—€๐–ฒ๐—‚๐—†๐š\mathop{\mathsf{avgSim}_{\mathtt{g}}}\limits ๐—†๐–บ๐—‘๐–ฒ๐—‚๐—†๐šœ\mathop{\mathsf{maxSim}_{\mathtt{s}}}\limits ๐—†๐–บ๐—‘๐–ฒ๐—‚๐—†๐š\mathop{\mathsf{maxSim}_{\mathtt{g}}}\limits
- - 0.748 0.094 0.689 0.239 0.803 0.243
0.2 50 0.630 0.110 0.794 0.236 0.890 0.244
0.2 100 0.666 0.105 0.786 0.238 0.883 0.245
0.2 300 0.749 0.093 0.746 0.241 0.852 0.247
0.4 50 0.678 0.106 0.779 0.240 0.875 0.245
0.4 100 0.700 0.103 0.772 0.241 0.870 0.247
0.4 300 0.752 0.093 0.738 0.242 0.845 0.247
0.6 50 0.706 0.103 0.763 0.242 0.861 0.246
0.6 100 0.720 0.100 0.758 0.242 0.857 0.247
0.6 300 0.753 0.093 0.731 0.242 0.838 0.247
  • โ€ข

    โ€‹โ€‹Columns represent: โ€œฮณ\gammaโ€/โ€œSSโ€: distance threshold/step threshold in shape guidance; โ€œJS. bondโ€: the JS divergence of bond length distributions of all the bond types between real molecules and generated molecules; All the others are identical to those in Tableย 1.

We conducted a parameter study to evaluate the impact of the distance threshold ฮณ\gamma (Eq.ย 23) and the step threshold SS (Eq.ย 24) in the shape guidance. Particularly, using the same trained ๐–ฃ๐–จ๐–ฅ๐–ฅ\mathop{\mathsf{DIFF}}\limits module, we sampled molecules with different values of ฮณ\gamma and SS and present the results in Tableย 3. As shown in Tableย 3, the average shape similarities ๐–บ๐—๐—€๐–ฒ๐—‚๐—†๐šœ\mathop{\mathsf{avgSim}_{\mathtt{s}}}\limitsย and maximum shape similarities ๐—†๐–บ๐—‘๐–ฒ๐—‚๐—†๐šœ\mathop{\mathsf{maxSim}_{\mathtt{s}}}\limitsย consistently decrease as ฮณ\gamma and SS increase. For example, when S=50S=50, ๐–บ๐—๐—€๐–ฒ๐—‚๐—†๐šœ\mathop{\mathsf{avgSim}_{\mathtt{s}}}\limitsย and ๐—†๐–บ๐—‘๐–ฒ๐—‚๐—†๐šœ\mathop{\mathsf{maxSim}_{\mathtt{s}}}\limitsย decreases from 0.794 to 0.763 and 0.890 to 0.861, respectively, as ฮณ\gamma increases from 0.20.2 to 0.60.6. Similarly, when ฮณ=0.2\gamma=0.2, ๐–บ๐—๐—€๐–ฒ๐—‚๐—†๐šœ\mathop{\mathsf{avgSim}_{\mathtt{s}}}\limitsย and ๐—†๐–บ๐—‘๐–ฒ๐—‚๐—†๐šœ\mathop{\mathsf{maxSim}_{\mathtt{s}}}\limitsย decreases from 0.794 to 0.746 and 0.890 to 0.852, respectively, as SS increases from 5050 to 300300. As presented in โ€œ๐–ฒ๐—๐–บ๐—‰๐–พ๐–ฌ๐—ˆ๐—…\mathop{\mathsf{ShapeMol}}\limits with Shape Guidance", larger ฮณ\gamma and SS indicate stronger shape guidance in ๐–ฒ๐—๐–บ๐—‰๐–พ๐–ฌ๐—ˆ๐—…โ€‹+โ€‹๐—€\mathop{\mathsf{ShapeMol}\text{+}\mathsf{g}}\limits. These results demonstrate that stronger shape guidance in ๐–ฒ๐—๐–บ๐—‰๐–พ๐–ฌ๐—ˆ๐—…โ€‹+โ€‹๐—€\mathop{\mathsf{ShapeMol}\text{+}\mathsf{g}}\limits could effectively induce higher shape similarities between the given molecule and generated molecules.

It is also noticed that as shown in Tableย 3, incorporating shape guidance enables a trade-off between the quality of the generated molecules (๐–ฐ๐–ค๐–ฃ\mathop{\mathsf{QED}}\limits), and the shape similarities (๐–บ๐—๐—€๐–ฒ๐—‚๐—†๐šœ\mathop{\mathsf{avgSim}_{\mathtt{s}}}\limitsย and ๐—†๐–บ๐—‘๐–ฒ๐—‚๐—†๐šœ\mathop{\mathsf{maxSim}_{\mathtt{s}}}\limits) between the given molecule and the generated ones. For example, when ฮณ=0.2\gamma=0.2, ๐–ฐ๐–ค๐–ฃ\mathop{\mathsf{QED}}\limits increases from 0.630 to 0.749 and ๐–บ๐—๐—€๐–ฒ๐—‚๐—†๐šœ\mathop{\mathsf{avgSim}_{\mathtt{s}}}\limitsย decreases from 0.794 to 0.746 as SS increases from 5050 to 300300. These results indicate the effects of ฮณ\gamma and SS in guiding molecule generation conditioned on given shapes.

Case Study

Figureย 3 presents three generated molecules from three methods given the same condition molecule. As shown in Figureย 3, the molecule generated by ๐–ฒ๐—๐–บ๐—‰๐–พ๐–ฌ๐—ˆ๐—…\mathop{\mathsf{ShapeMol}}\limits has higher shape similarity (0.835) with the condition molecule than those from the baseline methods (0.759 for ๐–ต๐–ฒ\mathop{\mathsf{VS}}\limits and 0.749 for ๐–ฒ๐–ฐ๐–ด๐–จ๐–ฃ\mathop{\mathsf{SQUID}}\limits). Particularly, the molecule from ๐–ฒ๐—๐–บ๐—‰๐–พ๐–ฌ๐—ˆ๐—…\mathop{\mathsf{ShapeMol}}\limits has the surface shape (represented as blue shade in Figureย 3(d)) most similar to that of the condition molecule. All three molecules have low graph similarities with the condition molecule and higher ๐–ฐ๐–ค๐–ฃ\mathop{\mathsf{QED}}\limits scores than the condition molecule. This example shows the ability of ๐–ฒ๐—๐–บ๐—‰๐–พ๐–ฌ๐—ˆ๐—…\mathop{\mathsf{ShapeMol}}\limits to generate novel molecules that are more similar in 3D shape to condition molecules than those from baseline methods.

Refer to caption
Refer to caption
(a) condition molecule ๐™ผx\mathop{\mathtt{M}_{x}}\limits, ๐–ฐ๐–ค๐–ฃ\mathop{\mathsf{QED}}\limits = 0.462
Refer to caption
Refer to caption
Refer to caption
(b) ๐™ผy\mathop{\mathtt{M}_{y}}\limits from ๐–ต๐–ฒ\mathop{\mathsf{VS}}\limits: ๐–ฒ๐—‚๐—†๐šœ\mathop{\mathsf{Sim}_{\mathtt{s}}}\limits = 0.759, ๐–ฒ๐—‚๐—†๐š\mathop{\mathsf{Sim}_{\mathtt{g}}}\limits = 0.168, ๐–ฐ๐–ค๐–ฃ\mathop{\mathsf{QED}}\limits = 0.907
Refer to caption
Refer to caption
Refer to caption
(c) ๐™ผy\mathop{\mathtt{M}_{y}}\limits from ๐–ฒ๐–ฐ๐–ด๐–จ๐–ฃ\mathop{\mathsf{SQUID}}\limits: ๐–ฒ๐—‚๐—†๐šœ\mathop{\mathsf{Sim}_{\mathtt{s}}}\limits = 0.749, ๐–ฒ๐—‚๐—†๐š\mathop{\mathsf{Sim}_{\mathtt{g}}}\limits = 0.243, ๐–ฐ๐–ค๐–ฃ\mathop{\mathsf{QED}}\limits = 0.779
Refer to caption
Refer to caption
Refer to caption
(d) ๐™ผy\mathop{\mathtt{M}_{y}}\limits from ๐–ฒ๐—๐–บ๐—‰๐–พ๐–ฌ๐—ˆ๐—…\mathop{\mathsf{ShapeMol}}\limits: ๐–ฒ๐—‚๐—†๐šœ\mathop{\mathsf{Sim}_{\mathtt{s}}}\limits = 0.835, ๐–ฒ๐—‚๐—†๐š\mathop{\mathsf{Sim}_{\mathtt{g}}}\limits = 0.242, ๐–ฐ๐–ค๐–ฃ\mathop{\mathsf{QED}}\limits = 0.818
Figure 3: Generated 3D Molecules from Different Methods. Molecule 3D shapes are in shades; generated molecules are superpositioned with the condition molecule; and the molecular graphs of generated molecules are presented.

Discussions and Conclusions

In this paper, we develop a novel generative model ๐–ฒ๐—๐–บ๐—‰๐–พ๐–ฌ๐—ˆ๐—…\mathop{\mathsf{ShapeMol}}\limits, which generates 3D molecules conditioned on the 3D shape of given molecules. ๐–ฒ๐—๐–บ๐—‰๐–พ๐–ฌ๐—ˆ๐—…\mathop{\mathsf{ShapeMol}}\limits utilizes a pre-trained equivariant shape encoder to generate equivariant embeddings for 3D shapes of given molecules. Conditioned on the embeddings, ๐–ฒ๐—๐–บ๐—‰๐–พ๐–ฌ๐—ˆ๐—…\mathop{\mathsf{ShapeMol}}\limits learns an equivariant diffusion model to generate novel molecules. To improve the shape similarities between the given molecule and the generated ones, we develop ๐–ฒ๐—๐–บ๐—‰๐–พ๐–ฌ๐—ˆ๐—…โ€‹+โ€‹๐—€\mathop{\mathsf{ShapeMol}\text{+}\mathsf{g}}\limits, which incorporates shape guidance to push the generated atom positions to the shape of the given molecule. We compare ๐–ฒ๐—๐–บ๐—‰๐–พ๐–ฌ๐—ˆ๐—…\mathop{\mathsf{ShapeMol}}\limits and ๐–ฒ๐—๐–บ๐—‰๐–พ๐–ฌ๐—ˆ๐—…โ€‹+โ€‹๐—€\mathop{\mathsf{ShapeMol}\text{+}\mathsf{g}}\limits against state-of-the-art baseline methods. Our experimental results demonstrate that ๐–ฒ๐—๐–บ๐—‰๐–พ๐–ฌ๐—ˆ๐—…\mathop{\mathsf{ShapeMol}}\limits and ๐–ฒ๐—๐–บ๐—‰๐–พ๐–ฌ๐—ˆ๐—…โ€‹+โ€‹๐—€\mathop{\mathsf{ShapeMol}\text{+}\mathsf{g}}\limits could generate molecules with higher shape similarities, and competitive qualities compared to the baseline methods. In future work, we will explore generating 3D molecules jointly conditioned on the shape and the electrostatic, considering that the electrostatic of molecules could also determine the binding activities of molecules.

Acknowledgements

This project was made possible, in part, by support from the National Science Foundation grant nos. IIS-2133650 (X.N.), and The Ohio State University Presidentโ€™s Research Excellence program (X.N.). Any opinions, findings and conclusions or recommendations expressed in this paper are those of the authors and do not necessarily reflect the views of the funding agency.

References

  • Acharya etย al. (2011) Acharya, C.; Coop, A.; Polli, J.ย E.; and MacKerell, A.ย D. 2011. Recent Advances in Ligand-Based Drug Design: Relevance and Utility of the Conformationally Sampled Pharmacophore Approach. Current Computer Aided-Drug Design, 7(1): 10โ€“22.
  • Adams and Coley (2023) Adams, K.; and Coley, C.ย W. 2023. Equivariant Shape-Conditioned Generation of 3D Molecules for Ligand-Based Drug Design. In The Eleventh International Conference on Learning Representations.
  • Batool, Ahmad, and Choi (2019) Batool, M.; Ahmad, B.; and Choi, S. 2019. A Structure-Based Drug Discovery Paradigm. International Journal of Molecular Sciences, 20(11): 2783.
  • Chen etย al. (2022) Chen, Y.; Fernando, B.; Bilen, H.; NieรŸner, M.; and Gavves, E. 2022. 3D Equivariant Graph Implicit Functions. In Lecture Notes in Computer Science, 485โ€“502. Springer Nature Switzerland.
  • Chen etย al. (2021) Chen, Z.; Min, M.ย R.; Parthasarathy, S.; and Ning, X. 2021. A deep generative model for molecule optimization via one fragment modification. Nature Machine Intelligence, 3(12): 1040โ€“1049.
  • Deng etย al. (2021) Deng, C.; Litany, O.; Duan, Y.; Poulenard, A.; Tagliasacchi, A.; and Guibas, L.ย J. 2021. Vector Neurons: A General Framework for SO(3)-Equivariant Networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 12200โ€“12209.
  • Garciaย Satorras etย al. (2021) Garciaย Satorras, V.; Hoogeboom, E.; Fuchs, F.; Posner, I.; and Welling, M. 2021. E(n) Equivariant Normalizing Flows. In Ranzato, M.; Beygelzimer, A.; Dauphin, Y.; Liang, P.; and Vaughan, J.ย W., eds., Advances in Neural Information Processing Systems, volumeย 34, 4181โ€“4192. Curran Associates, Inc.
  • Gรณmez-Bombarelli etย al. (2018) Gรณmez-Bombarelli, R.; Wei, J.ย N.; Duvenaud, D.; Hernรกndez-Lobato, J.ย M.; Sรกnchez-Lengeling, B.; Sheberla, D.; Aguilera-Iparraguirre, J.; Hirzel, T.ย D.; Adams, R.ย P.; and Aspuru-Guzik, A. 2018. Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules. ACS Central Science, 4(2): 268โ€“276.
  • Guan etย al. (2023) Guan, J.; Qian, W.ย W.; Peng, X.; Su, Y.; Peng, J.; and Ma, J. 2023. 3D Equivariant Diffusion for Target-Aware Molecule Generation and Affinity Prediction. In The Eleventh International Conference on Learning Representations.
  • Hawkins, Skillman, and Nicholls (2006) Hawkins, P. C.ย D.; Skillman, A.ย G.; and Nicholls, A. 2006. Comparison of Shape-Matching and Docking as Virtual Screening Tools. Journal of Medicinal Chemistry, 50(1): 74โ€“82.
  • Ho, Jain, and Abbeel (2020) Ho, J.; Jain, A.; and Abbeel, P. 2020. Denoising Diffusion Probabilistic Models. In Larochelle, H.; Ranzato, M.; Hadsell, R.; Balcan, M.; and Lin, H., eds., Advances in Neural Information Processing Systems, volumeย 33, 6840โ€“6851. Curran Associates, Inc.
  • Hoogeboom etย al. (2021) Hoogeboom, E.; Nielsen, D.; Jaini, P.; Forrรฉ, P.; and Welling, M. 2021. Argmax Flows and Multinomial Diffusion: Learning Categorical Distributions. In Ranzato, M.; Beygelzimer, A.; Dauphin, Y.; Liang, P.; and Vaughan, J.ย W., eds., Advances in Neural Information Processing Systems, volumeย 34, 12454โ€“12465. Curran Associates, Inc.
  • Hoogeboom etย al. (2022) Hoogeboom, E.; Satorras, V.ย G.; Vignac, C.; and Welling, M. 2022. Equivariant Diffusion for Molecule Generation in 3D. In Chaudhuri, K.; Jegelka, S.; Song, L.; Szepesvari, C.; Niu, G.; and Sabato, S., eds., Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, 8867โ€“8887. PMLR.
  • Imrie etย al. (2021) Imrie, F.; Hadfield, T.ย E.; Bradley, A.ย R.; and Deane, C.ย M. 2021. Deep generative design with 3D pharmacophoric constraints. Chemical Science, 12(43): 14577โ€“14589.
  • Jin, Barzilay, and Jaakkola (2018) Jin, W.; Barzilay, R.; and Jaakkola, T. 2018. Junction Tree Variational Autoencoder for Molecular Graph Generation. In Dy, J.; and Krause, A., eds., Proceedings of the 35th International Conference on Machine Learning, volumeย 80 of Proceedings of Machine Learning Research, 2323โ€“2332. PMLR.
  • Kingma and Ba (2015) Kingma, D.ย P.; and Ba, J. 2015. Adam: A Method for Stochastic Optimization. In Bengio, Y.; and LeCun, Y., eds., 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, 2015.
  • Kong etย al. (2021) Kong, Z.; Ping, W.; Huang, J.; Zhao, K.; and Catanzaro, B. 2021. DiffWave: A Versatile Diffusion Model for Audio Synthesis. In International Conference on Learning Representations.
  • Kullback and Leibler (1951) Kullback, S.; and Leibler, R.ย A. 1951. On Information and Sufficiency. The Annals of Mathematical Statistics, 22(1): 79โ€“86.
  • Landrum etย al. (2023) Landrum, G.; Tosco, P.; Kelley, B.; Ric; Cosgrove, D.; Sriniker; Gedeck; Vianello, R.; NadineSchneider; Kawashima, E.; N, D.; Jones, G.; Dalke, A.; Cole, B.; Swain, M.; Turk, S.; AlexanderSavelyev; Vaucher, A.; Wรณjcikowski, M.; Ichiru Take; Probst, D.; Ujihara, K.; Scalfani, V.ย F.; Godin, G.; Lehtivarjo, J.; Pahl, A.; Walker, R.; Francois Berenger; Jasondbiggs; and Strets123. 2023. rdkit/rdkit: 2023_03_2 (Q1 2023) Release.
  • Luo etย al. (2021) Luo, S.; Guan, J.; Ma, J.; and Peng, J. 2021. A 3D Generative Model for Structure-Based Drug Design. In Beygelzimer, A.; Dauphin, Y.; Liang, P.; and Vaughan, J.ย W., eds., Advances in Neural Information Processing Systems.
  • Nichol and Dhariwal (2021) Nichol, A.ย Q.; and Dhariwal, P. 2021. Improved Denoising Diffusion Probabilistic Models. In Meila, M.; and Zhang, T., eds., Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, 8162โ€“8171. PMLR.
  • Papadopoulos etย al. (2021) Papadopoulos, K.; Giblin, K.ย A.; Janet, J.ย P.; Patronov, A.; and Engkvist, O. 2021. De novo design with deep generative models based on 3D similarity scoring. Bioorganic &amp: Medicinal Chemistry, 44: 116308.
  • Park etย al. (2019) Park, J.ย J.; Florence, P.; Straub, J.; Newcombe, R.; and Lovegrove, S. 2019. DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
  • Peng etย al. (2023) Peng, X.; Guan, J.; Liu, Q.; and Ma, J. 2023. MolDiff: Addressing the Atom-Bond Inconsistency Problem in 3D Molecule Diffusion Generation. In Krause, A.; Brunskill, E.; Cho, K.; Engelhardt, B.; Sabato, S.; and Scarlett, J., eds., Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, 27611โ€“27629. PMLR.
  • Peng etย al. (2022) Peng, X.; Luo, S.; Guan, J.; Xie, Q.; Peng, J.; and Ma, J. 2022. Pocket2Mol: Efficient Molecular Sampling Based on 3D Protein Pockets. In Chaudhuri, K.; Jegelka, S.; Song, L.; Szepesvari, C.; Niu, G.; and Sabato, S., eds., Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, 17644โ€“17655. PMLR.
  • Polykovskiy etย al. (2020) Polykovskiy, D.; Zhebrak, A.; Sanchez-Lengeling, B.; Golovanov, S.; Tatanov, O.; Belyaev, S.; Kurbanov, R.; Artamonov, A.; Aladinskiy, V.; Veselov, M.; Kadurin, A.; Johansson, S.; Chen, H.; Nikolenko, S.; Aspuru-Guzik, A.; and Zhavoronkov, A. 2020. Molecular Sets (MOSES): A Benchmarking Platform for Molecular Generation Models. Frontiers in Pharmacology, 11.
  • Ravi etย al. (2020) Ravi, N.; Reizenstein, J.; Novotny, D.; Gordon, T.; Lo, W.-Y.; Johnson, J.; and Gkioxari, G. 2020. Accelerating 3D Deep Learning with PyTorch3D. arXiv:2007.08501.
  • Ripphausen, Nisius, and Bajorath (2011) Ripphausen, P.; Nisius, B.; and Bajorath, J. 2011. State-of-the-art in ligand-based virtual screening. Drug Discovery Today, 16(9-10): 372โ€“376.
  • Vainio, Puranen, and Johnson (2009) Vainio, M.ย J.; Puranen, J.ย S.; and Johnson, M.ย S. 2009. ShaEP: Molecular Overlay Based on Shape and Electrostatic Potential. Journal of Chemical Information and Modeling, 49(2): 492โ€“502.
  • Wang etย al. (2019) Wang, Y.; Sun, Y.; Liu, Z.; Sarma, S.ย E.; Bronstein, M.ย M.; and Solomon, J.ย M. 2019. Dynamic Graph CNN for Learning on Point Clouds. ACM Trans. Graph., 38(5).
  • Wรณjcikowski, Zielenkiewicz, and Siedlecki (2015) Wรณjcikowski, M.; Zielenkiewicz, P.; and Siedlecki, P. 2015. Open Drug Discovery Toolkit (ODDT): a new open-source player in the drug discovery field. Journal of Cheminformatics, 7(1).
  • Zheng etย al. (2013) Zheng, H.; Hou, J.; Zimmerman, M.ย D.; Wlodawer, A.; and Minor, W. 2013. The future of crystallography in drug discovery. Expert Opinion on Drug Discovery, 9(2): 125โ€“137.

See pages - of supp.pdf