Nvidia and IBM bring their Big Accelerator Memory, a direct response to Microsoft’s DirectStorage API

March 17, 2022

- Advertisement -

Although Microsoft’s DirectStorage application programming interface (API) promises to improve the efficiency of GPU-to-SSD data transfers for games in a Windows environment, Nvidia and its partners have discovered a way to enable GPUs to operate with SSDs without the need for a proprietary API.

Big Accelerator Memory (BaM) is a technology that promises to be useful for a variety of computing activities, but it will be especially effective for new workloads that require enormous datasets. In essence, when GPUs become more programmable than CPUs, they will require direct access to huge storage devices.

Interoperability between GPUs and SSDs must be improved for several reasons. To begin with, NVMe calls and data transfers place a significant amount of pressure on the CPU, which is wasteful in terms of overall performance and efficiency. Second, the overhead of CPU-GPU synchronisation and/or I/O traffic amplification severely restricts the effective storage bandwidth needed for applications with large datasets.

- Advertisement -

“The goal of Big Accelerator Memory is to extend GPU memory capacity and enhance the effective storage access bandwidth while providing high-level abstractions for the GPU threads to easily make on-demand, fine-grain access to massive data structures in the extended memory hierarchy,” a description of the concept by Nvidia, IBM, and Cornell University cited by The Register reads.

BaM essentially allows the Nvidia GPU to retrieve data directly from system memory and storage without the need for the CPU, making GPUs more self-sufficient than they are now. Compute GPUs will continue to use software-managed cache in local memory, but data will be moved via a PCIe interface, RDMA, and a proprietary Linux kernel driver that allows SSDs to read and write GPU memory directly when needed.

If the relevant data is not available locally, the GPU threads queue commands for the SSDs. Meanwhile, because BaM does not perform virtual memory address translation, serialisation events such as TLB misses do not occur. Nvidia and its partners intend to make the driver open-source so that others can benefit from their BaM approach.

- Advertisement -

“BaM mitigates the I/O traffic amplification by enabling the GPU threads to read or write small amounts of data on-demand, as determined by the computing,” Nvidia’s document reads. “We show that the BaM infrastructure software running on GPUs can identify and communicate the fine-grain accesses at a sufficiently high rate to fully utilize the underlying storage devices, even with consumer-grade SSDs, a BaM system can support application performance that is competitive against a much more expensive DRAM-only solution, and the reduction in I/O amplification can yield significant performance benefit.”

Nvidia’s BaM is, to a significant extent, a mechanism for GPUs to get a big pool of storage and utilise it independently of the CPU, making compute accelerators far more independent than they are now.

AMD attempted to marry GPUs and solid-state storage with their Radeon Pro SSG graphics card several years ago, as observant readers will recall. While adding more storage to a graphics card allows the system to better access massive datasets, the Radeon Pro SSG board was created solely for visuals and not for complicated computational applications. With BaM, Nvidia, IBM, and others are going a step farther.

- Advertisement -

Nvidia and IBM bring their Big Accelerator Memory, a direct response to Microsoft’s DirectStorage API

Get in Touch

LEAVE A REPLY Cancel reply

Related Articles

Latest Posts

Contact us