In 2025 the ALCF added four new AI machines to its array of computing resources, with the deployment of Janus, Minerva, the Cerebras CS-3, and Metis expanding the capabilities of the systems and services available to users.
Janus
Janus, built in collaboration with Hewlett Packard Enterprise (HPE), is equipped with 64 NVIDIA H100 GPUs. Janus is designed to support the development of the next-generation AI workforce, providing users with the capabilities needed to excel in AI-driven research and applications.
Minerva
Developed as part of a collaboration with World Wide Technology and NVIDIA, Minerva is architected to accelerate AI inference, the process of using trained AI models to make predictions, identify patterns, and generate insights from new data. This state-of-the-art system harnesses 64 NVIDIA B200 GPUs to bring the open-science community a new level of inference that will drive discovery with advanced AI capabilities.
Minerva helps support the ALCF Inference Service, offering researchers cloud-style access to a vast range of LLMs, foundation models, and other AI-driven inference workloads that can be integrated directly into scientific applications. The service automates optimized selection of AI models to provide each project with the model best-suited for its needs.
By providing straightforward, secure access to these models, the service aims to accelerate research in areas such as drug discovery, fusion energy science, material design, and nuclear reactor development; the ALCF’s inference capabilities are already helping scientists tackle complex challenges across many fields.
New Testbed Systems: Cerebras CS-3 and Metis
The ALCF also added two new state-of-the-art systems to its AI Testbed, expanding the array of powerful AI accelerators available to global scientific research community.
Launched in 2022, the ALCF AI Testbed is a growing collection of some of the world’s most advanced AI accelerators designed to enable researchers to explore deep learning and machine learning workloads to advance AI for science. The systems have also helped the facility gain a better understanding of how novel AI technologies can be integrated with traditional supercomputing systems powered by CPUs and GPUs.
The new Cerebras CS-3 replaced the previous-generation Cerebras CS-2, while a SambaNova SN40L, known as Metis, accompanies the existing SambaNova DataScale SN30. These systems join AI Testbed hardware from Graphcore and Groq.
The Cerebras CS-3 system is designed to train and finetune large-scale deep learning models. The ALCF has deployed four CS-3 wafer-scale engines, configured to support models of up to approximately 200 billion parameters.
The CS-3 software platform’s unique design, featuring an integrated machine learning framework, efficiently accelerates AI workloads to meet the demands of tasks that require low latency. It supports most well-known LLMs, vision models, and diffusion models. Beyond its support for AI workloads, the Cerebras system provides users with an open-source software development kit to develop low-level kernels ideal for traditional HPC simulation.
Powered by Reconfigurable Dataflow Unit (RDU) processors, the SambaNova Metis hardware and software are codesigned to optimize AI inference workloads. The ALCF deployment includes two SambaRacks, each containing 16 SN40L RDU systems to deliver high throughput and low latency.
The deployment of Metis further enhances ALCF’s AI inference capabilities. Models running on the Metis cluster are accessible through OpenAI-compatible API endpoints, each capable of hosting multiple independently accessible models.