ALCF Launches First Large-Scale AI Inference Service for Open Science

In 2025, the ALCF deployed a first-of-its-kind service to provide secure, scalable AI inference capabilities for open scientific research. The ALCF Inference Service delivers cloud-like access to large language models (LLMs), foundation models, and other AI-driven inference workloads directly on its high-performance computing (HPC) and AI systems.

Inference is the process of using trained AI models to analyze data, identify patterns, and make predictions. Chatbots like ChatGPT use inference to answer questions and generate responses in real time. In research, the same capability can help scientists guide experiments, make sense of complex data, and perform other analytical tasks more efficiently.

The ALCF Inference Service was born out of a 2025 paper that presented a framework for secure, distributed AI inference across HPC systems, giving researchers the ability to run parallel inference workloads on diverse models without relying on commercial cloud infrastructure. By hosting AI models directly on ALCF systems, it provides researchers with greater control over how models are used and how results are produced. Unlike vendor-managed frontier AI models, which can change without notice, the ALCF service offers a more stable and transparent environment for scientific research.

Designed specifically for scientific applications running on HPC systems—and agnostic to the underlying clusters—the ALCF Inference Service bridges traditional HPC environments with growing demand for production-scale inference. The platform supports both interactive, low-latency model serving and high-throughput batch processing to meet diverse research requirements. The service is OpenAI API-compatible, enabling straightforward integration with existing scientific software and AI-enabled applications.

The service runs on dedicated ALCF systems, including Metis, a SambaNova platform designed for high-throughput, low-latency AI inference workloads. Image: Argonne National Laboratory

The service is being used by a growing and diverse set of researchers. In addition to a substantial base of Argonne and ALCF users, it is actively supporting users across the DOE national laboratory ecosystem, enabling seamless access for researchers from several labs using their home institution credentials. These include Brookhaven National Laboratory, Fermi National Accelerator Laboratory, Los Alamos National Laboratory, Lawrence Berkeley National Laboratory, Lawrence Livermore National Laboratory, Oak Ridge National Laboratory, Sandia National Laboratories, and Thomas Jefferson National Accelerator Facility. This expanding cross-lab adoption underscores the service’s role in enabling integrated, multi-institutional research workflows.

Scientific Impact

Among its users, the ALCF Inference Service is supporting teams working on DOE’s Genesis Mission, a national AI initiative to build the world’s most powerful scientific platform to accelerate discovery science, strengthen national security, and drive energy innovation. It will also be a key tool for the American Science Cloud (AmSC), the Genesis Mission’s integrated platform connecting DOE supercomputers, experimental facilities, and data resources.

Beyond the Genesis Mission, the ALCF’s inference capabilities enables scientists to tackle complex challenges across many fields. In fusion energy research, for example, AI models can analyze streams of experimental data in real time and predict plasma disruptions before they occur. This capability enables safer and more efficient control of fusion reactions. In high energy physics and astronomy, inference helps scientists sift through massive volumes of collider and telescope data to identify rare events and new phenomena more quickly.

In chemistry and materials science, inference supports molecular design, automated simulation workflows, and rapid screening of candidate materials. For example, ChemGraph, an AI-driven framework for automating molecular simulation workflows, leverages the ALCF Inference Service for LLM-based reasoning and tool-calling tasks. This enables interactive, multi-step computational workflows that integrate simulation codes with AI models, allowing researchers to explore candidate molecules more rapidly and coordinate large-scale calculations as unified processes.

Argonne's Thang Pham (front) and Murat Keçeli are part of the team behind the ChemGraph framework. ChemGraph was developed with support from the ALCF Inference Service, which provides the LLM inference capabilities it uses as part of its workflow. Image: Argonne National Laboratory

During its initial deployment, dozens of users processed millions of inference tasks, generating billions of tokens across diverse scientific applications. Use cases included development of an HPC support chatbot and large-scale benchmarking of domain-specific generative pre-trained transformer (GPT) models against general-purpose LLMs.

In one evaluation spanning more than 50,000 inference requests across 15 models, researchers reduced evaluation time by approximately 40 percent compared to manual deployment approaches while maintaining consistent performance, demonstrating the service’s ability to scale effectively across architectures and workloads.

Architecture for Secure, Scalable Inference

The service runs on dedicated ALCF systems, including Sophia, an NVIDIA DGX A100 cluster, and Metis, a SambaNova Systems platform designed for high-throughput, low-latency AI inference workloads. In the near future, the service will also run on the facility’s new NVIDIA-based systems, Tara and Minerva. These machines work alongside the facility’s other powerful computing resources, such as the Aurora exascale supercomputer and the ALCF AI Testbed, giving researchers a broad set of capabilities for simulation, data-intensive, and AI-driven science.

The ALCF Inference Service employs a layered architecture that enables dynamic scaling and secure access to compute resources. An Inference Gateway API manages and routes user requests, preventing backend overload while supporting high volumes of concurrent inference workloads.

As part of the ALCF Developer Sessions webinar series, ALCF's Benoit Côté demonstrates how to integrate the Inference Service within scientific applications.

The system leverages Globus Compute as an intermediate communication layer between the Gateway API and HPC resources, enabling distributed inference tasks across clusters. Integration with Globus Auth provides secure, federated identity management aligned with institutional policies, while seamless interfacing with job schedulers allows efficient execution on compute nodes.

This architecture separates user interaction from backend execution, enabling flexible deployment across HPC environments while maintaining performance and policy compliance.

The ALCF Inference Service is a central component of the facility’s broader Service-Enabled Science program, which brings together HPC and AI resources, integrated workflow tools, AI model training capabilities, and large-scale data sharing and analysis to provide scientists with a complete suite of tools and services that supports the full research lifecycle.

ALCF Launches First Large-Scale AI Inference Service for Open Science

Scientific Impact

Architecture for Secure, Scalable Inference

AskALCF: AI-Powered HPC User Support