# publications

(*) denotes equal contribution

## 2024

- AISTATSMulti-Resolution Active Learning of Fourier Neural OperatorsShibo Li, Xin Yu, Wei Xing, Mike Kirby, Akil Narayan, and Shandian Zhe
*In The 27th International Conference on Artificial Intelligence and Statistics (AISTATS)*, 2024Fourier Neural Operator (FNO) is a popular operator learning framework. It not only achieves the state-of-the-art performance in many tasks, but also is efficient in training and prediction. However, collecting training data for the FNO can be a costly bottleneck in practice, because it often demands expensive physical simulations. To overcome this problem, we propose Multi-Resolution Active learning of FNO (MRA-FNO), which can dynamically select the input functions and resolutions to lower the data cost as much as possible while optimizing the learning efficiency. Specifically, we propose a probabilistic multi-resolution FNO and use ensemble Monte-Carlo to develop an effective posterior inference algorithm. To conduct active learning, we maximize a utility-cost ratio as the acquisition function to acquire new examples and resolutions at each step. We use moment matching and the matrix determinant lemma to enable tractable, efficient utility computation. Furthermore, we develop a cost annealing framework to avoid over-penalizing high-resolution queries at the early stage. The over-penalization is severe when the cost difference is significant between the resolutions, which renders active learning often stuck at low-resolution queries and inferior performance. Our method overcomes this problem and applies to general multi-fidelity active learning and optimization problems. We have shown the advantage of our method in several benchmark operator learning tasks.

- ICLRFunctional Bayesian Tucker Decomposition for Continuous-indexed Tensor DataShikai Fang, Xin Yu, Zheng Wang, Shibo Li, Mike Kirby, and Shandian Zhe
*In The Twelfth International Conference on Learning Representations*, 2024Tucker decomposition is a powerful tensor model to handle multi-aspect data. It demonstrates the low-rank property by decomposing the grid-structured data as interactions between a core tensor and a set of object representations (factors). A fundamental assumption of such decomposition is that there are finite objects in each aspect or mode, corresponding to discrete indexes of data entries. However, real-world data is often not naturally posed in this setting. For example, geographic data is represented as continuous indexes of latitude and longitude coordinates, and cannot fit tensor models directly. To generalize Tucker decomposition to such scenarios, we propose Functional Bayesian Tucker Decomposition (FunBaT). We treat the continuous-indexed data as the interaction between the Tucker core and a group of latent functions. We use Gaussian processes (GP) as functional priors to model the latent functions. Then, we convert each GP into a state-space prior by constructing an equivalent stochastic differential equation (SDE) to reduce computational cost. An efficient inference algorithm is developed for scalable posterior approximation based on advanced message-passing techniques. The advantage of our method is shown in both synthetic data and several real-world applications.

- ICLRSolving High Frequency and Multi-Scale PDEs with Gaussian Processes
*In The Twelfth International Conference on Learning Representations*, 2024Machine learning based solvers have garnered much attention in physical simulation and scientific computing, with a prominent example, physics-informed neural networks (PINNs). However, PINNs often struggle to solve high-frequency and multi-scale PDEs, which can be due to spectral bias during neural network training. To address this problem, we resort to the Gaussian process (GP) framework. To flexibly capture the dominant frequencies, we model the power spectrum of the PDE solution with a student t mixture or Gaussian mixture. We apply the inverse Fourier transform to obtain the covariance function (by Wiener-Khinchin theorem). The covariance derived from the Gaussian mixture spectrum corresponds to the known spectral mixture kernel. Next, we estimate the mixture weights in the log domain, which we show is equivalent to placing a Jeffreys prior. It automatically induces sparsity, prunes excessive frequencies, and adjusts the remaining toward the ground truth. Third, to enable efficient and scalable computation on massive collocation points, which are critical to capture high frequencies, we place the collocation points on a grid, and multiply our covariance function at each input dimension. We use the GP conditional mean to predict the solution and its derivatives so as to fit the boundary condition and the equation itself. As a result, we can derive a Kronecker product structure in the covariance matrix. We use Kronecker product properties and multilinear algebra to promote computational efficiency and scalability, without low-rank approximations. We show the advantage of our method in systematic experiments.

## 2023

- NeurIPSDynamic Tensor Decomposition via Neural Diffusion-Reaction ProcessesZheng Wang, Shikai Fang, Shibo Li, and Shandian Zhe
*In the Thirty-seventh Annual Conference on Neural Information Processing Systems*, 2023Tensor decomposition is an important tool for multiway data analysis. In practice, the data is often sparse yet associated with rich temporal information. Existing methods, however, often under-use the time information and ignore the structural knowledge within the sparsely observed tensor entries. To overcome these limitations and to better capture the underlying temporal structure, we propose Dynamic EMbedIngs fOr dynamic Tensor dEcomposition (DEMOTE). We develop a neural diffusion-reaction process to estimate dynamic embeddings for the entities in each tensor mode. Specifically, based on the observed tensor entries, we build a multi-partite graph to encode the correlation between the entities. We construct a graph diffusion process to co-evolve the embedding trajectories of the correlated entities and use a neural network to construct a reaction process for each individual entity. In this way, our model can capture both the commonalities and personalities during the evolution of the embeddings for different entities. We then use a neural network to model the entry value as a nonlinear function of the embedding trajectories. For model estimation, we combine ODE solvers to develop a stochastic mini-batch learning algorithm. We propose a stratified sampling method to balance the cost of processing each mini-batch so as to improve the overall efficiency. We show the advantage of our approach in both simulation study and real-world applications.

- NeurIPSStreaming Factor Trajectory Learning for Temporal Tensor DecompositionShikai Fang, Xin Yu, Shibo Li, Zheng Wang, Robert Kirby, and Shandian Zhe
*In the Thirty-seventh Annual Conference on Neural Information Processing Systems*, 2023Practical tensor data is often along with time information. Most existing temporal decomposition approaches estimate a set of fixed factors for the objects in each tensor mode, and hence cannot capture the temporal evolution of the objects’ representation. More important, we lack an effective approach to capture such evolution from streaming data, which is common in real-world applications. To address these issues, we propose Streaming Factor Trajectory Learning for temporal tensor decomposition. We use Gaussian processes (GPs) to model the trajectory of factors so as to flexibly estimate their temporal evolution. To address the computational challenges in handling streaming data, we convert the GPs into a state-space prior by constructing an equivalent stochastic differential equation (SDE). We develop an efficient online filtering algorithm to estimate a decoupled running posterior of the involved factor states upon receiving new data. The decoupled estimation enables us to conduct standard Rauch-Tung-Striebel smoothing to compute the full posterior of all the trajectories in parallel, without the need for revisiting any previous data. We have shown the advantage of SFTL in both synthetic tasks and real-world applications.

- ICMLMeta Learning of Interface Conditions for Multi-Domain Physics-Informed Neural NetworksShibo Li
^{*}, Michael Penwarden^{*}, Yiming Xu, Conor Tillinghast, Akil Narayan, Mike Kirby, and Shandian Zhe*In Proceedings of the 40th International Conference on Machine Learning*, 2023Physics-informed neural networks (PINNs) are emerging as popular mesh-free solvers for partial differential equations (PDEs). Recent extensions decompose the domain, apply different PINNs to solve the problem in each subdomain, and stitch the subdomains at the interface. Thereby, they can further alleviate the problem complexity, reduce the computational cost, and allow parallelization. However, the performance of multi-domain PINNs is sensitive to the choice of the interface conditions. While quite a few conditions have been proposed, there is no suggestion about how to select the conditions according to specific problems. To address this gap, we propose META Learning of Interface Conditions (METALIC), a simple, efficient yet powerful approach to dynamically determine appropriate interface conditions for solving a family of parametric PDEs. Specifically, we develop two contextual multi-arm bandit (MAB) models. The first one applies to the entire training course, and online updates a Gaussian process (GP) reward that given the PDE parameters and interface conditions predicts the performance. We prove a sub-linear regret bound for both UCB and Thompson sampling, which in theory guarantees the effectiveness of our MAB. The second one partitions the training into two stages, one is the stochastic phase and the other deterministic phase; we update a GP reward for each phase to enable different condition selections at the two stages to further bolster the flexibility and performance. We have shown the advantage of METALIC on four bench-mark PDE families.

- SynS & ML WorkshopInfinite-Fidelity Surrogate Learning via High-order Gaussian ProcessesShibo Li, Li Shi, and Shandian Zhe
*In The 1st Synergy of Scientific and Machine Learning Modelling Workshop @ ICML*, 2023Multi-fidelity learning is popular in computational physics. While the fidelity is often up to the choice of mesh spacing and hence is continuous in nature, most methods only model finite, discrete fidelities. The recent work (Li et al., 2022) proposes the first continuous-fidelity surrogate model, named infinite-fidelity coregionalization (IFC), which uses a neural Ordinary Differential Equation (ODE) to capture the rich information within the infinite, continuous fidelity space. While showing state-of-the-art predictive performance, IFC is computationally expensive in training and is difficult for uncertainty quantification. To overcome these limitations, we propose Infinite-Fidelity High-Order Gaussian Process (IF-HOGP), based on the recent GP high-dimensional output regression model HOGP. By tensorizing the output and using a product kernel at each mode, HOGP can highly efficiently estimate the mapping from the PDE parameters to the high-dimensional solution output, without the need for any low-rank approximation. We made a simple extension by injecting the continuous fidelity variable into the input, and applying a neural network transformation before feeding the input into the kernel. On three benchmark PDEs, IF-HOGP achieves prediction accuracy better than or close to IFC, yet gains 380x speed-up and 87.5% memory reduction. Meanwhile, uncertainty calibration for IF-HOGP is straightforward.

- AISTATSMeta-Learning with Adjoint MethodsShibo Li, Zheng Wang, Akil Narayan, Robert Kirby, and Shandian Zhe
*In Proceedings of The 26th International Conference on Artificial Intelligence and Statistics*, 2023Model Agnostic Meta-Learning (MAML) is widely used to find a good initialization for a family of tasks. Despite its success, a critical challenge in MAML is to calculate the gradient w.r.t. the initialization of a long training trajectory for the sampled tasks, because the computation graph can rapidly explode and the computational cost is very expensive. To address this problem, we propose Adjoint MAML (A-MAML). We view gradient descent in the inner optimization as the evolution of an Ordinary Differential Equation (ODE). To efficiently compute the gradient of the validation loss w.r.t. the initialization, we use the adjoint method to construct a companion, backward ODE. To obtain the gradient w.r.t. the initialization, we only need to run the standard ODE solver twice — one is forward in time that evolves a long trajectory of gradient flow for the sampled task; the other is backward and solves the adjoint ODE. We need not create or expand any intermediate computational graphs, adopt aggressive approximations, or impose proximal regularizers in the training loss. Our approach is cheap, accurate, and adaptable to different trajectory lengths. We demonstrate the advantage of our approach in both synthetic and real-world meta-learning tasks.

## 2022

- NeurIPSInfinite-Fidelity Coregionalization for Physical SimulationShibo Li, Zheng Wang, Robert Kirby, and Shandian Zhe
*In The Thirty-sixth Annual Conference on Neural Information Processing Systems*, 2022Multi-fidelity modeling and learning is important in physical simulation related applications. It can leverage both low-fidelity and high-fidelity examples for training so as to reduce the cost of data generation yet still achieving good performance. While existing approaches only model finite, discrete fidelities, in practice, the feasible fidelity choice is often infinite, which can correspond to a continuous mesh spacing or finite element length. In this paper, we propose Infinite Fidelity Coregionalization (IFC). Given the data, our method can extract and exploit rich information within infinite, continuous fidelities to bolster the prediction accuracy. Our model can interpolate and/or extrapolate the predictions to novel fidelities that are not covered by the training data. Specifically, we introduce a low-dimensional latent output as a continuous function of the fidelity and input, and multiple it with a basis matrix to predict high-dimensional solution outputs. We model the latent output as a neural Ordinary Differential Equation (ODE) to capture the complex relationships within and integrate information throughout the continuous fidelities. We then use Gaussian processes or another ODE to estimate the fidelity-varying bases. For efficient inference, we reorganize the bases as a tensor, and use a tensor-Gaussian variational posterior approximation to develop a scalable inference algorithm for massive outputs. We show the advantage of our method in several benchmark tasks in computational physics.

- NeurIPSBatch Multi-Fidelity Active Learning with Budget ConstraintsShibo Li
^{*}, Jeff Phillips^{*}, Xin Yu, Robert Kirby, and Shandian Zhe*In The Thirty-sixth Annual Conference on Neural Information Processing Systems*, 2022Learning functions with high-dimensional outputs is critical in many applications, such as physical simulation and engineering design. However, collecting training examples for these applications is often costly, e.g., by running numerical solvers. The recent work (Li et al., 2022) proposes the first multi-fidelity active learning approach for high-dimensional outputs, which can acquire examples at different fidelities to reduce the cost while improving the learning performance. However, this method only queries at one pair of fidelity and input at a time, and hence has a risk of bringing in strongly correlated examples to reduce the learning efficiency. In this paper, we propose Batch Multi-Fidelity Active Learning with Budget Constraints (BMFAL-BC), which can promote the diversity of training examples to improve the benefit-cost ratio, while respecting a given budget constraint for batch queries. Hence, our method can be more practically useful. Specifically, we propose a novel batch acquisition function that measures the mutual information between a batch of multi-fidelity queries and the target function, so as to penalize highly correlated queries and encourages diversity. The optimization of the batch acquisition function is challenging in that it involves a combinatorial search over many fidelities while subject to the budget constraint. To address this challenge, we develop a weighted greedy algorithm that can sequentially identify each (fidelity, input) pair, while achieving a near -approximation of the optimum. We show the advantage of our method in several computational physics and engineering applications.

- ICMLDecomposing Temporal High-Order Interactions via Latent ODEsShibo Li, Robert Kirby, and Shandian Zhe
*In Proceedings of the 39th International Conference on Machine Learning*, 2022High-order interactions between multiple objects are common in real-world applications. Although tensor decomposition is a popular framework for high-order interaction analysis and prediction, most methods cannot well exploit the valuable timestamp information in data. The existent methods either discard the timestamps or convert them into discrete steps or use over-simplistic decomposition models. As a result, these methods might not be capable enough of capturing complex, fine-grained temporal dynamics or making accurate predictions for long-term interaction results. To overcome these limitations, we propose a novel Temporal High-order Interaction decompoSition model based on Ordinary Differential Equations (THIS-ODE). We model the time-varying interaction result with a latent ODE. To capture the complex temporal dynamics, we use a neural network (NN) to learn the time derivative of the ODE state. We use the representation of the interaction objects to model the initial value of the ODE and to constitute a part of the NN input to compute the state. In this way, the temporal relationships of the participant objects can be estimated and encoded into their representations. For tractable and scalable inference, we use forward sensitivity analysis to efficiently compute the gradient of ODE state, based on which we use integral transform to develop a stochastic mini-batch learning algorithm. We demonstrate the advantage of our approach in simulation and four real-world applications.

- ICMLNonparametric Embeddings of Sparse High-Order Interaction EventsZheng Wang, Yiming Xu, Conor Tillinghast, Shibo Li, Akil Narayan, and Shandian Zhe
*In Proceedings of the 39th International Conference on Machine Learning*, 2022High-order interaction events are common in real-world applications. Learning embeddings that encode the complex relationships of the participants from these events is of great importance in knowledge mining and predictive tasks. Despite the success of existing approaches, e.g. Poisson tensor factorization, they ignore the sparse structure underlying the data, namely the occurred interactions are far less than the possible interactions among all the participants. In this paper, we propose Nonparametric Embeddings of Sparse High-order interaction events (NESH). We hybridize a sparse hypergraph (tensor) process and a matrix Gaussian process to capture both the asymptotic structural sparsity within the interactions and nonlinear temporal relationships between the participants. We prove strong asymptotic bounds (including both a lower and an upper bound ) of the sparse ratio, which reveals the asymptotic properties of the sampled structure. We use batch-normalization, stick-breaking construction and sparse variational GP approximations to develop an efficient, scalable model inference algorithm. We demonstrate the advantage of our approach in several real-world applications.

- AISTATSDeep Multi-Fidelity Active Learning of High-Dimensional OutputsShibo Li, Zheng Wang, Robert Kirby, and Shandian Zhe
*In The 25th International Conference on Artificial Intelligence and Statistics*, 2022Many applications, such as in physical simulation and engineering design, demand we estimate functions with high-dimensional outputs. To reduce the expensive cost of generating training examples, we usually choose several fidelities to enable a cost/quality trade-off. In this paper, we consider the active learning task to automatically identify the fidelities and training inputs to query new examples so as to achieve the best learning benefit-cost ratio. To this end, we propose DMFAL, a Deep Multi-Fidelity Active Learning approach. We first develop a deep neural network-based multi-fidelity model for high-dimensional outputs, which can flexibly capture strong complex correlations across the outputs and fidelities to enhance the learning of the target function. We then propose a mutual information based acquisition function that extends the predictive entropy principle. To overcome the computational challenges caused by large output dimensions, we use the multi-variate delta method and moment-matching to estimate the output posterior, and Weinstein-Aronszajn identity to calculate and optimize the acquisition function. We show the advantage of our method in several applications of computational physics and engineering design.

## 2021

- NeurIPSBatch Multi-Fidelity Bayesian Optimization with Deep Auto-Regressive NetworksShibo Li, Robert Kirby, and Shandian Zhe
*In Thirty-fifth Annual Conference on Neural Information Processing Systems*, 2021Bayesian optimization (BO) is a powerful approach for optimizing black-box, expensive-to-evaluate functions. To enable a flexible trade-off between the cost and accuracy, many applications allow the function to be evaluated at different fidelities. In order to reduce the optimization cost while maximizing the benefit-cost ratio, in this paper we propose Batch Multi-fidelity Bayesian Optimization with Deep Auto-Regressive Networks (BMBO-DARN). We use a set of Bayesian neural networks to construct a fully auto-regressive model, which is expressive enough to capture strong yet complex relationships across all the fidelities, so as to improve the surrogate learning and optimization performance. Furthermore, to enhance the quality and diversity of queries, we develop a simple yet efficient batch querying method, without any combinatorial search over the fidelities. We propose a batch acquisition function based on Max-value Entropy Search (MES) principle, which penalizes highly correlated queries and encourages diversity. We use posterior samples and moment matching to fulfill efficient computation of the acquisition function, and conduct alternating optimization over every fidelity-input pair, which guarantees an improvement at each step. We demonstrate the advantage of our approach on four real-world hyperparameter optimization applications.

## 2020

- NeurIPSMulti-fidelity Bayesian optimization via deep neural networksShibo Li, Wei Xing, Robert Kirby, and Shandian Zhe
*In Thirty-fourth Annual Conference on Neural Information Processing Systems*, 2020Bayesian optimization (BO) is a popular framework to optimize black-box functions. In many applications, the objective function can be evaluated at multiple fidelities to enable a trade-off between the cost and accuracy. To reduce the optimization cost, many multi-fidelity BO methods have been proposed. Despite their success, these methods either ignore or over-simplify the strong, complex correlations across the fidelities, and hence can be inefficient in estimating the objective function. To address this issue, we propose Deep Neural Network Multi-Fidelity Bayesian Optimization (DNN-MFBO) that can flexibly capture all kinds of complicated relationships between the fidelities to improve the objective function estimation and hence the optimization performance. We use sequential, fidelity-wise Gauss-Hermite quadrature and moment-matching to fulfill a mutual information-based acquisition function, which is computationally tractable and efficient. We show the advantages of our method in both synthetic benchmark datasets and real-world applications in engineering design.

- IJCAIScalable Gaussian Process Regression NetworksShibo Li, Wei Xing, Robert M. Kirby, and Shandian Zhe
*In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence*, 2020Gaussian process regression networks (GPRN) are powerful Bayesian models for multi-output regression, but their inference is intractable. To address this issue, existing methods use a fully factorized structure (or a mixture of such structures) over all the outputs and latent functions for posterior approximation, which, however, can miss the strong posterior dependencies among the latent variables and hurt the inference quality. In addition, the updates of the variational parameters are inefficient and can be prohibitively expensive for a large number of outputs. To overcome these limitations, we propose a scalable variational inference algorithm for GPRN, which not only captures the abundant posterior dependencies but also is much more efficient for massive outputs. We tensorize the output space and introduce tensor/matrix-normal variational posteriors to capture the posterior correlations and to reduce the parameters. We jointly optimize all the parameters and exploit the inherent Kronecker product structure in the variational model evidence lower bound to accelerate the computation. We demonstrate the advantages of our method in several real-world applications.