publications | Shibo Li

2026

Preprint

Dynamic Bayesian Optimization Framework for Instruction Tuning in Partial Differential Equation Discovery

Junqi Qu, Yan Zhang, Shangqian Gao, and Shibo Li

In arXiv, 2026

@inproceedings{qu2025dynamicbayesianoptimizationframework,
  title = {Dynamic Bayesian Optimization Framework for Instruction Tuning in Partial Differential Equation Discovery},
  author = {Qu, Junqi and Zhang, Yan and Gao, Shangqian and Li, Shibo},
  booktitle = {arXiv},
  year = {2026},
  eprint = {2601.00088},
  archiveprefix = {arXiv},
  primaryclass = {cs.LG},
}

ICML

Regime-Adaptive Bayesian Optimization via Dirichlet Process Mixtures of Gaussian Processes

Yan Zhang, Xuefeng Liu, Sipeng Chen, Sascha Ranftl, Chong Liu, and Shibo Li

In Forty-Third International Conference on Machine Learning, 2026

Bib PDF

@inproceedings{zhang2026regimeadaptivebayesianoptimizationdirichlet,
  title = {Regime-Adaptive Bayesian Optimization via Dirichlet Process Mixtures of Gaussian Processes},
  author = {Zhang, Yan and Liu, Xuefeng and Chen, Sipeng and Ranftl, Sascha and Liu, Chong and Li, Shibo},
  booktitle = {Forty-Third International Conference on Machine Learning},
  year = {2026},
  eprint = {2601.20043},
  archiveprefix = {arXiv},
  primaryclass = {cs.LG},
}

IJCNN/WCCI
COMPOL: A Unified Neural Operator Framework for Scalable Multi-Physics Simulations

Junqi Qu, Tao Wang, Yushun Dong, Hewei Tang, and Shibo Li

In 2026 IEEE World Congress on Computational Intelligence (WCCI), 2026

Abs Bib PDF

Physical simulations are essential tools across critical fields such as mechanical and aerospace engineering, chemistry, meteorology, etc. While neural operators, particularly the Fourier Neural Operator (FNO), have shown promise in predicting simulation results with impressive performance and efficiency, they face limitations when handling real-world scenarios involving coupled multi-physics outputs. Current neural operator methods either overlook the correlations between multiple physical processes or employ simplistic architectures that inadequately capture these relationships. To overcome these challenges, we introduce a novel coupled multi-physics neural operator learning (COMPOL) framework that extends the capabilities of Fourier operator layers to model interactions among multiple physical processes. Our approach implements feature aggregation through recurrent and attention mechanisms, enabling comprehensive modeling of coupled interactions. Our method’s core is an innovative system for aggregating latent features from multi-physics processes. These aggregated features serve as enriched information sources for neural operator layers, allowing our framework to capture complex physical relationships accurately. We evaluated our coupled multi-physics neural operator across diverse physical simulation tasks, including biological systems, fluid mechanics, and multiphase flow in porous media. Our proposed model demonstrates a two to three-fold improvement in predictive performance compared to existing approaches.
@inproceedings{li2025multiphysicssimulationscoupledfourier, title = {COMPOL: A Unified Neural Operator Framework for Scalable Multi-Physics Simulations}, author = {Qu, Junqi and Wang, Tao and Dong, Yushun and Tang, Hewei and Li, Shibo}, booktitle = {2026 IEEE World Congress on Computational Intelligence (WCCI)}, eprint = {2501.17296}, archiveprefix = {arXiv}, primaryclass = {cs.LG}, year = {2026}, }

2025

Preprint
Beyond Heuristics: Globally Optimal Configuration of Implicit Neural Representations

Sipeng Chen, Yan Zhang, and Shibo Li

In arXiv, 2025

Abs Bib PDF

Implicit Neural Representations (INRs) have emerged as a transformative paradigm in signal processing and computer vision, excelling in tasks from image reconstruction to 3D shape modeling. Yet their effectiveness is fundamentally limited by the absence of principled strategies for optimal configuration - spanning activation selection, initialization scales, layer-wise adaptation, and their intricate interdependencies. These choices dictate performance, stability, and generalization, but current practice relies on ad-hoc heuristics, brute-force grid searches, or task-specific tuning, often leading to inconsistent results across modalities. This work introduces OptiINR, the first unified framework that formulates INR configuration as a rigorous optimization problem. Leveraging Bayesian optimization, OptiINR efficiently explores the joint space of discrete activation families - such as sinusoidal (SIREN), wavelet-based (WIRE), and variable-periodic (FINER) - and their associated continuous initialization parameters. This systematic approach replaces fragmented manual tuning with a coherent, data-driven optimization process. By delivering globally optimal configurations, OptiINR establishes a principled foundation for INR design, consistently maximizing performance across diverse signal processing applications.
@inproceedings{chen2025heuristicsgloballyoptimalconfiguration, title = {Beyond Heuristics: Globally Optimal Configuration of Implicit Neural Representations}, author = {Chen, Sipeng and Zhang, Yan and Li, Shibo}, booktitle = {arXiv}, eprint = {2509.23139}, archiveprefix = {arXiv}, primaryclass = {cs.LG}, year = {2025}, }
SIGKDD
ATOM: A Framework of Detecting Query-Based Model Extraction Attacks for Graph Neural Networks

Zhan Cheng, Bolin Shen, Tianming Sha, Yuan Gao, Shibo Li, and Yushun Dong

In Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2025

Abs Bib PDF Code

Graph Neural Networks (GNNs) have gained traction in Graph-based Machine Learning as a Service (GMLaaS) platforms, yet they remain vulnerable to graph-based model extraction attacks (MEAs), where adversaries reconstruct surrogate models by querying the victim model. Existing defense mechanisms, such as watermarking and fingerprinting, suffer from poor real-time performance, susceptibility to evasion, or reliance on post-attack verification, making them inadequate for handling the dynamic characteristics of graph-based MEA variants. To address these limitations, we propose ATOM, a novel real-time MEA detection framework tailored for GNNs. ATOM integrates sequential modeling and reinforcement learning to dynamically detect evolving attack patterns, while leveraging k-core embedding to capture the structural properties, enhancing detection precision. Furthermore, we provide theoretical analysis to characterize query behaviors and optimize detection strategies. Extensive experiments on multiple real-world datasets demonstrate that ATOM outperforms existing approaches in detection performance, maintaining stable across different time steps, thereby offering a more effective defense mechanism for GMLaaS environments. Our source code is available at https://github.com/LabRAI/ATOM.
@inproceedings{10.1145/3711896.3736848, title = {ATOM: A Framework of Detecting Query-Based Model Extraction Attacks for Graph Neural Networks}, author = {Cheng, Zhan and Shen, Bolin and Sha, Tianming and Gao, Yuan and Li, Shibo and Dong, Yushun}, booktitle = {Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining}, year = {2025}, }

2024

AISTATS
Multi-Resolution Active Learning of Fourier Neural Operators

Shibo Li, Xin Yu, Wei Xing, Mike Kirby, Akil Narayan, and Shandian Zhe

In The 27th International Conference on Artificial Intelligence and Statistics (AISTATS), 2024

Abs Bib PDF Code Poster

Fourier Neural Operator (FNO) is a popular operator learning framework. It not only achieves the state-of-the-art performance in many tasks, but also is efficient in training and prediction. However, collecting training data for the FNO can be a costly bottleneck in practice, because it often demands expensive physical simulations. To overcome this problem, we propose Multi-Resolution Active learning of FNO (MRA-FNO), which can dynamically select the input functions and resolutions to lower the data cost as much as possible while optimizing the learning efficiency. Specifically, we propose a probabilistic multi-resolution FNO and use ensemble Monte-Carlo to develop an effective posterior inference algorithm. To conduct active learning, we maximize a utility-cost ratio as the acquisition function to acquire new examples and resolutions at each step. We use moment matching and the matrix determinant lemma to enable tractable, efficient utility computation. Furthermore, we develop a cost annealing framework to avoid over-penalizing high-resolution queries at the early stage. The over-penalization is severe when the cost difference is significant between the resolutions, which renders active learning often stuck at low-resolution queries and inferior performance. Our method overcomes this problem and applies to general multi-fidelity active learning and optimization problems. We have shown the advantage of our method in several benchmark operator learning tasks.
@inproceedings{li2023multiresolution, title = {Multi-Resolution Active Learning of Fourier Neural Operators}, author = {Li, Shibo and Yu, Xin and Xing, Wei and Kirby, Mike and Narayan, Akil and Zhe, Shandian}, booktitle = {The 27th International Conference on Artificial Intelligence and Statistics (AISTATS)}, year = {2024}, }
ICLR
Functional Bayesian Tucker Decomposition for Continuous-indexed Tensor Data

Shikai Fang, Xin Yu, Zheng Wang, Shibo Li, Mike Kirby, and Shandian Zhe

In The Twelfth International Conference on Learning Representations, 2024

Abs Bib PDF Code Poster

Tucker decomposition is a powerful tensor model to handle multi-aspect data. It demonstrates the low-rank property by decomposing the grid-structured data as interactions between a core tensor and a set of object representations (factors). A fundamental assumption of such decomposition is that there are finite objects in each aspect or mode, corresponding to discrete indexes of data entries. However, real-world data is often not naturally posed in this setting. For example, geographic data is represented as continuous indexes of latitude and longitude coordinates, and cannot fit tensor models directly. To generalize Tucker decomposition to such scenarios, we propose Functional Bayesian Tucker Decomposition (FunBaT). We treat the continuous-indexed data as the interaction between the Tucker core and a group of latent functions. We use Gaussian processes (GP) as functional priors to model the latent functions. Then, we convert each GP into a state-space prior by constructing an equivalent stochastic differential equation (SDE) to reduce computational cost. An efficient inference algorithm is developed for scalable posterior approximation based on advanced message-passing techniques. The advantage of our method is shown in both synthetic data and several real-world applications.
@inproceedings{fang2023functional, title = {Functional Bayesian Tucker Decomposition for Continuous-indexed Tensor Data}, author = {Fang, Shikai and Yu, Xin and Wang, Zheng and Li, Shibo and Kirby, Mike and Zhe, Shandian}, booktitle = {The Twelfth International Conference on Learning Representations}, year = {2024}, }
ICLR
Solving High Frequency and Multi-Scale PDEs with Gaussian Processes

In The Twelfth International Conference on Learning Representations, 2024

Abs Bib PDF Code Poster

Machine learning based solvers have garnered much attention in physical simulation and scientific computing, with a prominent example, physics-informed neural networks (PINNs). However, PINNs often struggle to solve high-frequency and multi-scale PDEs, which can be due to spectral bias during neural network training. To address this problem, we resort to the Gaussian process (GP) framework. To flexibly capture the dominant frequencies, we model the power spectrum of the PDE solution with a student t mixture or Gaussian mixture. We apply the inverse Fourier transform to obtain the covariance function (by Wiener-Khinchin theorem). The covariance derived from the Gaussian mixture spectrum corresponds to the known spectral mixture kernel. Next, we estimate the mixture weights in the log domain, which we show is equivalent to placing a Jeffreys prior. It automatically induces sparsity, prunes excessive frequencies, and adjusts the remaining toward the ground truth. Third, to enable efficient and scalable computation on massive collocation points, which are critical to capture high frequencies, we place the collocation points on a grid, and multiply our covariance function at each input dimension. We use the GP conditional mean to predict the solution and its derivatives so as to fit the boundary condition and the equation itself. As a result, we can derive a Kronecker product structure in the covariance matrix. We use Kronecker product properties and multilinear algebra to promote computational efficiency and scalability, without low-rank approximations. We show the advantage of our method in systematic experiments.
@inproceedings{fang2023functionam, title = {Solving High Frequency and Multi-Scale PDEs with Gaussian Processes}, author = {}, booktitle = {The Twelfth International Conference on Learning Representations}, year = {2024}, }

2023

NeurIPS
Dynamic Tensor Decomposition via Neural Diffusion-Reaction Processes

Zheng Wang, Shikai Fang, Shibo Li, and Shandian Zhe

In the Thirty-seventh Annual Conference on Neural Information Processing Systems, 2023

Abs Bib PDF Code

Tensor decomposition is an important tool for multiway data analysis. In practice, the data is often sparse yet associated with rich temporal information. Existing methods, however, often under-use the time information and ignore the structural knowledge within the sparsely observed tensor entries. To overcome these limitations and to better capture the underlying temporal structure, we propose Dynamic EMbedIngs fOr dynamic Tensor dEcomposition (DEMOTE). We develop a neural diffusion-reaction process to estimate dynamic embeddings for the entities in each tensor mode. Specifically, based on the observed tensor entries, we build a multi-partite graph to encode the correlation between the entities. We construct a graph diffusion process to co-evolve the embedding trajectories of the correlated entities and use a neural network to construct a reaction process for each individual entity. In this way, our model can capture both the commonalities and personalities during the evolution of the embeddings for different entities. We then use a neural network to model the entry value as a nonlinear function of the embedding trajectories. For model estimation, we combine ODE solvers to develop a stochastic mini-batch learning algorithm. We propose a stratified sampling method to balance the cost of processing each mini-batch so as to improve the overall efficiency. We show the advantage of our approach in both simulation study and real-world applications.
@inproceedings{wang2024dynamic, title = {Dynamic Tensor Decomposition via Neural Diffusion-Reaction Processes}, author = {Wang, Zheng and Fang, Shikai and Li, Shibo and Zhe, Shandian}, booktitle = {the Thirty-seventh Annual Conference on Neural Information Processing Systems}, year = {2023}, }
NeurIPS
Streaming Factor Trajectory Learning for Temporal Tensor Decomposition

Shikai Fang, Xin Yu, Shibo Li, Zheng Wang, Robert Kirby, and Shandian Zhe

In the Thirty-seventh Annual Conference on Neural Information Processing Systems, 2023

Abs Bib PDF Code Poster

Practical tensor data is often along with time information. Most existing temporal decomposition approaches estimate a set of fixed factors for the objects in each tensor mode, and hence cannot capture the temporal evolution of the objects’ representation. More important, we lack an effective approach to capture such evolution from streaming data, which is common in real-world applications. To address these issues, we propose Streaming Factor Trajectory Learning for temporal tensor decomposition. We use Gaussian processes (GPs) to model the trajectory of factors so as to flexibly estimate their temporal evolution. To address the computational challenges in handling streaming data, we convert the GPs into a state-space prior by constructing an equivalent stochastic differential equation (SDE). We develop an efficient online filtering algorithm to estimate a decoupled running posterior of the involved factor states upon receiving new data. The decoupled estimation enables us to conduct standard Rauch-Tung-Striebel smoothing to compute the full posterior of all the trajectories in parallel, without the need for revisiting any previous data. We have shown the advantage of SFTL in both synthetic tasks and real-world applications.
@inproceedings{fang2024streaming, title = {Streaming Factor Trajectory Learning for Temporal Tensor Decomposition}, author = {Fang, Shikai and Yu, Xin and Li, Shibo and Wang, Zheng and Kirby, Robert and Zhe, Shandian}, booktitle = {the Thirty-seventh Annual Conference on Neural Information Processing Systems}, year = {2023}, }
ICML
Meta Learning of Interface Conditions for Multi-Domain Physics-Informed Neural Networks

Shibo Li^*, Michael Penwarden^*, Yiming Xu, Conor Tillinghast, Akil Narayan, Mike Kirby, and Shandian Zhe

In Proceedings of the 40th International Conference on Machine Learning, 2023

Abs Bib PDF Code Poster

Physics-informed neural networks (PINNs) are emerging as popular mesh-free solvers for partial differential equations (PDEs). Recent extensions decompose the domain, apply different PINNs to solve the problem in each subdomain, and stitch the subdomains at the interface. Thereby, they can further alleviate the problem complexity, reduce the computational cost, and allow parallelization. However, the performance of multi-domain PINNs is sensitive to the choice of the interface conditions. While quite a few conditions have been proposed, there is no suggestion about how to select the conditions according to specific problems. To address this gap, we propose META Learning of Interface Conditions (METALIC), a simple, efficient yet powerful approach to dynamically determine appropriate interface conditions for solving a family of parametric PDEs. Specifically, we develop two contextual multi-arm bandit (MAB) models. The first one applies to the entire training course, and online updates a Gaussian process (GP) reward that given the PDE parameters and interface conditions predicts the performance. We prove a sub-linear regret bound for both UCB and Thompson sampling, which in theory guarantees the effectiveness of our MAB. The second one partitions the training into two stages, one is the stochastic phase and the other deterministic phase; we update a GP reward for each phase to enable different condition selections at the two stages to further bolster the flexibility and performance. We have shown the advantage of METALIC on four bench-mark PDE families.
@inproceedings{pmlr-v202-li23w, title = {Meta Learning of Interface Conditions for Multi-Domain Physics-Informed Neural Networks}, author = {Li, Shibo and Penwarden, Michael and Xu, Yiming and Tillinghast, Conor and Narayan, Akil and Kirby, Mike and Zhe, Shandian}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, year = {2023}, }
SynS & ML Workshop
Infinite-Fidelity Surrogate Learning via High-order Gaussian Processes

Shibo Li, Li Shi, and Shandian Zhe

In The 1st Synergy of Scientific and Machine Learning Modelling Workshop @ ICML, 2023

Abs Bib PDF Poster

Multi-fidelity learning is popular in computational physics. While the fidelity is often up to the choice of mesh spacing and hence is continuous in nature, most methods only model finite, discrete fidelities. The recent work (Li et al., 2022) proposes the first continuous-fidelity surrogate model, named infinite-fidelity coregionalization (IFC), which uses a neural Ordinary Differential Equation (ODE) to capture the rich information within the infinite, continuous fidelity space. While showing state-of-the-art predictive performance, IFC is computationally expensive in training and is difficult for uncertainty quantification. To overcome these limitations, we propose Infinite-Fidelity High-Order Gaussian Process (IF-HOGP), based on the recent GP high-dimensional output regression model HOGP. By tensorizing the output and using a product kernel at each mode, HOGP can highly efficiently estimate the mapping from the PDE parameters to the high-dimensional solution output, without the need for any low-rank approximation. We made a simple extension by injecting the continuous fidelity variable into the input, and applying a neural network transformation before feeding the input into the kernel. On three benchmark PDEs, IF-HOGP achieves prediction accuracy better than or close to IFC, yet gains 380x speed-up and 87.5% memory reduction. Meanwhile, uncertainty calibration for IF-HOGP is straightforward.
@inproceedings{li2023infinitefidelity, title = {Infinite-Fidelity Surrogate Learning via High-order Gaussian Processes}, author = {Li, Shibo and Shi, Li and Zhe, Shandian}, booktitle = {The 1st Synergy of Scientific and Machine Learning Modelling Workshop @ ICML}, year = {2023}, }
AISTATS
Meta-Learning with Adjoint Methods

Shibo Li, Zheng Wang, Akil Narayan, Robert Kirby, and Shandian Zhe

In Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, 2023

Abs Bib PDF Code Poster

Model Agnostic Meta-Learning (MAML) is widely used to find a good initialization for a family of tasks. Despite its success, a critical challenge in MAML is to calculate the gradient w.r.t. the initialization of a long training trajectory for the sampled tasks, because the computation graph can rapidly explode and the computational cost is very expensive. To address this problem, we propose Adjoint MAML (A-MAML). We view gradient descent in the inner optimization as the evolution of an Ordinary Differential Equation (ODE). To efficiently compute the gradient of the validation loss w.r.t. the initialization, we use the adjoint method to construct a companion, backward ODE. To obtain the gradient w.r.t. the initialization, we only need to run the standard ODE solver twice — one is forward in time that evolves a long trajectory of gradient flow for the sampled task; the other is backward and solves the adjoint ODE. We need not create or expand any intermediate computational graphs, adopt aggressive approximations, or impose proximal regularizers in the training loss. Our approach is cheap, accurate, and adaptable to different trajectory lengths. We demonstrate the advantage of our approach in both synthetic and real-world meta-learning tasks.
@inproceedings{pmlr-v206-li23c, title = {Meta-Learning with Adjoint Methods}, author = {Li, Shibo and Wang, Zheng and Narayan, Akil and Kirby, Robert and Zhe, Shandian}, booktitle = {Proceedings of The 26th International Conference on Artificial Intelligence and Statistics}, year = {2023}, }

2022

NeurIPS
Infinite-Fidelity Coregionalization for Physical Simulation

Shibo Li, Zheng Wang, Robert Kirby, and Shandian Zhe

In The Thirty-sixth Annual Conference on Neural Information Processing Systems, 2022

Abs Bib PDF Code Poster

Multi-fidelity modeling and learning is important in physical simulation related applications. It can leverage both low-fidelity and high-fidelity examples for training so as to reduce the cost of data generation yet still achieving good performance. While existing approaches only model finite, discrete fidelities, in practice, the feasible fidelity choice is often infinite, which can correspond to a continuous mesh spacing or finite element length. In this paper, we propose Infinite Fidelity Coregionalization (IFC). Given the data, our method can extract and exploit rich information within infinite, continuous fidelities to bolster the prediction accuracy. Our model can interpolate and/or extrapolate the predictions to novel fidelities that are not covered by the training data. Specifically, we introduce a low-dimensional latent output as a continuous function of the fidelity and input, and multiple it with a basis matrix to predict high-dimensional solution outputs. We model the latent output as a neural Ordinary Differential Equation (ODE) to capture the complex relationships within and integrate information throughout the continuous fidelities. We then use Gaussian processes or another ODE to estimate the fidelity-varying bases. For efficient inference, we reorganize the bases as a tensor, and use a tensor-Gaussian variational posterior approximation to develop a scalable inference algorithm for massive outputs. We show the advantage of our method in several benchmark tasks in computational physics.
@inproceedings{li2022infinitefidelity, title = {Infinite-Fidelity Coregionalization for Physical Simulation}, author = {Li, Shibo and Wang, Zheng and Kirby, Robert and Zhe, Shandian}, booktitle = {The Thirty-sixth Annual Conference on Neural Information Processing Systems}, year = {2022}, }
NeurIPS
Batch Multi-Fidelity Active Learning with Budget Constraints

Shibo Li^*, Jeff Phillips^*, Xin Yu, Robert Kirby, and Shandian Zhe

In The Thirty-sixth Annual Conference on Neural Information Processing Systems, 2022

Abs Bib PDF Code Poster

Learning functions with high-dimensional outputs is critical in many applications, such as physical simulation and engineering design. However, collecting training examples for these applications is often costly, e.g., by running numerical solvers. The recent work (Li et al., 2022) proposes the first multi-fidelity active learning approach for high-dimensional outputs, which can acquire examples at different fidelities to reduce the cost while improving the learning performance. However, this method only queries at one pair of fidelity and input at a time, and hence has a risk of bringing in strongly correlated examples to reduce the learning efficiency. In this paper, we propose Batch Multi-Fidelity Active Learning with Budget Constraints (BMFAL-BC), which can promote the diversity of training examples to improve the benefit-cost ratio, while respecting a given budget constraint for batch queries. Hence, our method can be more practically useful. Specifically, we propose a novel batch acquisition function that measures the mutual information between a batch of multi-fidelity queries and the target function, so as to penalize highly correlated queries and encourages diversity. The optimization of the batch acquisition function is challenging in that it involves a combinatorial search over many fidelities while subject to the budget constraint. To address this challenge, we develop a weighted greedy algorithm that can sequentially identify each (fidelity, input) pair, while achieving a near -approximation of the optimum. We show the advantage of our method in several computational physics and engineering applications.
@inproceedings{li2022batch, title = {Batch Multi-Fidelity Active Learning with Budget Constraints}, author = {Li, Shibo and Phillips, Jeff and Yu, Xin and Kirby, Robert and Zhe, Shandian}, booktitle = {The Thirty-sixth Annual Conference on Neural Information Processing Systems}, year = {2022}, }
ICML
Decomposing Temporal High-Order Interactions via Latent ODEs

Shibo Li, Robert Kirby, and Shandian Zhe

In Proceedings of the 39th International Conference on Machine Learning, 2022

Abs Bib PDF Code Poster

High-order interactions between multiple objects are common in real-world applications. Although tensor decomposition is a popular framework for high-order interaction analysis and prediction, most methods cannot well exploit the valuable timestamp information in data. The existent methods either discard the timestamps or convert them into discrete steps or use over-simplistic decomposition models. As a result, these methods might not be capable enough of capturing complex, fine-grained temporal dynamics or making accurate predictions for long-term interaction results. To overcome these limitations, we propose a novel Temporal High-order Interaction decompoSition model based on Ordinary Differential Equations (THIS-ODE). We model the time-varying interaction result with a latent ODE. To capture the complex temporal dynamics, we use a neural network (NN) to learn the time derivative of the ODE state. We use the representation of the interaction objects to model the initial value of the ODE and to constitute a part of the NN input to compute the state. In this way, the temporal relationships of the participant objects can be estimated and encoded into their representations. For tractable and scalable inference, we use forward sensitivity analysis to efficiently compute the gradient of ODE state, based on which we use integral transform to develop a stochastic mini-batch learning algorithm. We demonstrate the advantage of our approach in simulation and four real-world applications.
@inproceedings{pmlr-v162-li22i, title = {Decomposing Temporal High-Order Interactions via Latent {ODE}s}, author = {Li, Shibo and Kirby, Robert and Zhe, Shandian}, booktitle = {Proceedings of the 39th International Conference on Machine Learning}, year = {2022}, }
ICML
Nonparametric Embeddings of Sparse High-Order Interaction Events

Zheng Wang, Yiming Xu, Conor Tillinghast, Shibo Li, Akil Narayan, and Shandian Zhe

In Proceedings of the 39th International Conference on Machine Learning, 2022

Abs Bib PDF Code

High-order interaction events are common in real-world applications. Learning embeddings that encode the complex relationships of the participants from these events is of great importance in knowledge mining and predictive tasks. Despite the success of existing approaches, e.g. Poisson tensor factorization, they ignore the sparse structure underlying the data, namely the occurred interactions are far less than the possible interactions among all the participants. In this paper, we propose Nonparametric Embeddings of Sparse High-order interaction events (NESH). We hybridize a sparse hypergraph (tensor) process and a matrix Gaussian process to capture both the asymptotic structural sparsity within the interactions and nonlinear temporal relationships between the participants. We prove strong asymptotic bounds (including both a lower and an upper bound ) of the sparse ratio, which reveals the asymptotic properties of the sampled structure. We use batch-normalization, stick-breaking construction and sparse variational GP approximations to develop an efficient, scalable model inference algorithm. We demonstrate the advantage of our approach in several real-world applications.
@inproceedings{pmlr-v162-wang22ah, title = {Nonparametric Embeddings of Sparse High-Order Interaction Events}, author = {Wang, Zheng and Xu, Yiming and Tillinghast, Conor and Li, Shibo and Narayan, Akil and Zhe, Shandian}, booktitle = {Proceedings of the 39th International Conference on Machine Learning}, year = {2022}, }
AISTATS
Deep Multi-Fidelity Active Learning of High-Dimensional Outputs

Shibo Li, Zheng Wang, Robert Kirby, and Shandian Zhe

In The 25th International Conference on Artificial Intelligence and Statistics, 2022

Abs Bib PDF Code

Many applications, such as in physical simulation and engineering design, demand we estimate functions with high-dimensional outputs. To reduce the expensive cost of generating training examples, we usually choose several fidelities to enable a cost/quality trade-off. In this paper, we consider the active learning task to automatically identify the fidelities and training inputs to query new examples so as to achieve the best learning benefit-cost ratio. To this end, we propose DMFAL, a Deep Multi-Fidelity Active Learning approach. We first develop a deep neural network-based multi-fidelity model for high-dimensional outputs, which can flexibly capture strong complex correlations across the outputs and fidelities to enhance the learning of the target function. We then propose a mutual information based acquisition function that extends the predictive entropy principle. To overcome the computational challenges caused by large output dimensions, we use the multi-variate delta method and moment-matching to estimate the output posterior, and Weinstein-Aronszajn identity to calculate and optimize the acquisition function. We show the advantage of our method in several applications of computational physics and engineering design.
@inproceedings{pmlr-v151-li22b, title = {Deep Multi-Fidelity Active Learning of High-Dimensional Outputs}, author = {Li, Shibo and Wang, Zheng and Kirby, Robert and Zhe, Shandian}, booktitle = {The 25th International Conference on Artificial Intelligence and Statistics}, year = {2022}, }

2021

NeurIPS
Batch Multi-Fidelity Bayesian Optimization with Deep Auto-Regressive Networks

Shibo Li, Robert Kirby, and Shandian Zhe

In Thirty-fifth Annual Conference on Neural Information Processing Systems, 2021

Abs Bib PDF Code Poster

Bayesian optimization (BO) is a powerful approach for optimizing black-box, expensive-to-evaluate functions. To enable a flexible trade-off between the cost and accuracy, many applications allow the function to be evaluated at different fidelities. In order to reduce the optimization cost while maximizing the benefit-cost ratio, in this paper we propose Batch Multi-fidelity Bayesian Optimization with Deep Auto-Regressive Networks (BMBO-DARN). We use a set of Bayesian neural networks to construct a fully auto-regressive model, which is expressive enough to capture strong yet complex relationships across all the fidelities, so as to improve the surrogate learning and optimization performance. Furthermore, to enhance the quality and diversity of queries, we develop a simple yet efficient batch querying method, without any combinatorial search over the fidelities. We propose a batch acquisition function based on Max-value Entropy Search (MES) principle, which penalizes highly correlated queries and encourages diversity. We use posterior samples and moment matching to fulfill efficient computation of the acquisition function, and conduct alternating optimization over every fidelity-input pair, which guarantees an improvement at each step. We demonstrate the advantage of our approach on four real-world hyperparameter optimization applications.
@inproceedings{li2021batch, title = {Batch Multi-Fidelity Bayesian Optimization with Deep Auto-Regressive Networks}, author = {Li, Shibo and Kirby, Robert and Zhe, Shandian}, booktitle = {Thirty-fifth Annual Conference on Neural Information Processing Systems}, year = {2021}, }

2020

NeurIPS
Multi-fidelity Bayesian optimization via deep neural networks

Shibo Li, Wei Xing, Robert Kirby, and Shandian Zhe

In Thirty-fourth Annual Conference on Neural Information Processing Systems, 2020

Abs Bib PDF Code Poster

Bayesian optimization (BO) is a popular framework to optimize black-box functions. In many applications, the objective function can be evaluated at multiple fidelities to enable a trade-off between the cost and accuracy. To reduce the optimization cost, many multi-fidelity BO methods have been proposed. Despite their success, these methods either ignore or over-simplify the strong, complex correlations across the fidelities, and hence can be inefficient in estimating the objective function. To address this issue, we propose Deep Neural Network Multi-Fidelity Bayesian Optimization (DNN-MFBO) that can flexibly capture all kinds of complicated relationships between the fidelities to improve the objective function estimation and hence the optimization performance. We use sequential, fidelity-wise Gauss-Hermite quadrature and moment-matching to fulfill a mutual information-based acquisition function, which is computationally tractable and efficient. We show the advantages of our method in both synthetic benchmark datasets and real-world applications in engineering design.
@inproceedings{li2020multi, title = {Multi-fidelity Bayesian optimization via deep neural networks}, author = {Li, Shibo and Xing, Wei and Kirby, Robert and Zhe, Shandian}, booktitle = {Thirty-fourth Annual Conference on Neural Information Processing Systems}, year = {2020}, }
IJCAI
Scalable Gaussian Process Regression Networks

Shibo Li, Wei Xing, Robert M. Kirby, and Shandian Zhe

In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, 2020

Abs Bib PDF Code Poster

Gaussian process regression networks (GPRN) are powerful Bayesian models for multi-output regression, but their inference is intractable. To address this issue, existing methods use a fully factorized structure (or a mixture of such structures) over all the outputs and latent functions for posterior approximation, which, however, can miss the strong posterior dependencies among the latent variables and hurt the inference quality. In addition, the updates of the variational parameters are inefficient and can be prohibitively expensive for a large number of outputs. To overcome these limitations, we propose a scalable variational inference algorithm for GPRN, which not only captures the abundant posterior dependencies but also is much more efficient for massive outputs. We tensorize the output space and introduce tensor/matrix-normal variational posteriors to capture the posterior correlations and to reduce the parameters. We jointly optimize all the parameters and exploit the inherent Kronecker product structure in the variational model evidence lower bound to accelerate the computation. We demonstrate the advantages of our method in several real-world applications.
@inproceedings{ijcai2020-340, title = {Scalable Gaussian Process Regression Networks}, author = {Li, Shibo and Xing, Wei and Kirby, Robert M. and Zhe, Shandian}, booktitle = {Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence}, year = {2020}, }

CIKM

Analysis of multivariate scoring functions for automatic unbiased learning to rank

Tao Yang, Shikai Fang, Shibo Li, Yulan Wang, and Qingyao Ai

In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, 2020

Bib PDF

@inproceedings{yang2020analysis,
  title = {Analysis of multivariate scoring functions for automatic unbiased learning to rank},
  author = {Yang, Tao and Fang, Shikai and Li, Shibo and Wang, Yulan and Ai, Qingyao},
  booktitle = {Proceedings of the 29th ACM International Conference on Information \& Knowledge Management},
  year = {2020},
}