SAIBench: Benchmarking AI for Science


Motivation and Goal

The “AI wave” is propagating into scientific research communities, as researchers are gaining interest in leveraging state-of-the-art AI solutions to tackle difficult tasks.

The benchmarking of AI for Science methods concerns how to evaluate an AI method applied to a scientific task, based on well-defined metrics.

The main goal of SAIBench is to build an inclusive and interconnecting environment for all the relevant research efforts, including problem definition, AI method, training algorithm, software and hardware environment, metric definition and ranking definition, and deliver benchmarking result efficiently with given computation resources.


Training/testing AI models is tied to datasets. Yet the definition of a scientific task may not directly expose such a dataset, and SAIBench is designed to allow data extraction from a wide range of scientific tasks.

Also related to data is how researchers from different communities see “performance” differently. Fields that generate zettabytes of data (High Energy Physics, Climate Research, …) prefer Throughput; Fields that already developed very precise (and expensive) non-AI methods (Molecular Dynamics, …) prefer Cost/Error Ratio; Fields where data is scarce (Medical Imaging, Genomics, Astronomy, …) prefer Sample Efficiency.


We aim to unify the efforts and enabling low-friction on-boarding of new disciplines, by following the FAIR principles: Findability, Accessibility, Interoperability and Reusability. Different from the traditional view of AI4Science as individual, vertical scientific fields, we provide reusable building blocks to describe scientific research problems, define scientific tasks, and construct generic AI components.

Each module in the system is self-descriptive, so that the benchmarking planner can perform inner-join-like operations to mix-and-match the modules and automatically construct benchmarking scenarios.


Yatao Li, Jianfeng Zhan
SAIBench: Benchmarking AI for Science
BenchCouncil Transactions on Benchmarks, Standards and Evaluations
Volume 2, Issue 2
ISSN 2772-4859