San Francisco, CA — MLCommons has introduced outcomes for its MLPerf Storage v2.0 benchmark suite, designed to measure the efficiency of storage techniques for machine studying workloads in an architecture-neutral, consultant, and reproducible method. In response to MLCommons, the outcomes present that storage techniques efficiency continues to enhance quickly, with examined techniques serving roughly twice the variety of accelerators than within the v1.0 benchmark spherical.
To view the outcomes visit the Storage benchmark results.
The v2.0 benchmark provides new checks that replicate checkpointing for AI coaching techniques. The benchmark outcomes present info for stakeholders who must configure the frequency of checkpoints to optimize for prime efficiency – notably at scale.
As AI coaching techniques have continued to scale as much as billions and even trillions of parameters, and the most important clusters of processors have reached 100 thousand accelerators or extra, system failures have turn out to be a distinguished technical problem. As a result of knowledge facilities are likely to run accelerators at near-maximum utilization for his or her whole lifecycle, each the accelerators themselves and the supporting {hardware} (energy provides, reminiscence, cooling techniques, and so forth.) are closely burdened, minimizing their anticipated lifetime. It is a continual subject, particularly in giant clusters: if the imply time to failure for an accelerator is 50,000 hours, then a 100,000-accelerator cluster operating for prolonged durations at full utilization will probably expertise a failure each half-hour. A cluster with a million accelerators would count on to see a failure each three minutes. Worse, as a result of AI coaching often includes massively parallel computation the place all of the accelerators are transferring in lockstep on the identical iteration of coaching, a failure of 1 processor can grind a complete cluster to a halt.
It’s now broadly accepted that saving checkpoints of intermediate coaching outcomes at common intervals is important to maintain AI coaching techniques operating at excessive efficiency. The AI coaching neighborhood has developed mathematical fashions that may optimize cluster efficiency and utilization by buying and selling off the overhead of standard checkpoints in opposition to the anticipated frequency and value of failure restoration (rolling again the computation, restoring the newest checkpoint, restarting the coaching from that time, and duplicating the misplaced work). These fashions, nonetheless, require correct knowledge on the size and efficiency of the storage techniques which are used to implement the checkpointing system.
The MLPerf Storage v2.0 checkpoint benchmark checks present exactly that knowledge, and the outcomes from this spherical recommend that stakeholders procuring AI coaching techniques must fastidiously think about the efficiency of the storage techniques they purchase, to make sure that they’ll retailer and retrieve a cluster’s checkpoints with out slowing the system all the way down to an unacceptable stage. For a deeper understanding of the problems round storage techniques and checkpointing, in addition to of the design of the checkpointing benchmarks, we encourage you to learn this put up from Wes Vaske, a member of the MLPerf Storage working group.
“On the scale of computation being applied for coaching giant AI fashions, common element failures are merely a truth of life,” mentioned Curtis Anderson, MLPerf Storage working group co-chair. “Checkpointing is now a typical follow in these techniques to mitigate failures, and we’re proud to be offering essential benchmark knowledge on storage techniques to permit stakeholders to optimize their coaching efficiency. This preliminary spherical of checkpoint benchmark outcomes exhibits us that present storage techniques provide a variety of efficiency specs, and never all techniques are well-matched to each checkpointing state of affairs. It additionally highlights the essential position of software program frameworks reminiscent of PyTorch and TensorFlow in coordinating coaching, checkpointing, and failure restoration, in addition to some alternatives for enhancing these frameworks to additional enhance total system efficiency.”
Persevering with from the v1.0 benchmark suite, the v2.0 suite measures storage efficiency in a various set of ML coaching eventualities. It emulates the storage calls for throughout a number of eventualities and system configurations protecting a spread of accelerators, fashions, and workloads. By simulating the accelerators’ “assume time” the benchmark can generate correct storage patterns with out the necessity to run the precise coaching, making it extra accessible to all. The benchmark focuses the take a look at on a given storage system’s capability to maintain tempo, because it requires the simulated accelerators to take care of a required stage of utilization.
The v2.0 outcomes present that submitted storage techniques have considerably elevated the variety of accelerators they’ll concurrently help, roughly twice the quantity in comparison with the techniques within the v1.0 benchmark.
“All the things is scaling up: fashions, parameters, coaching datasets, clusters, and accelerators. It’s no shock to see that storage system suppliers are innovating to help ever bigger scale techniques,” mentioned Oana Balmau, MLPerf Storage working group co-chair.
The v2.0 submissions additionally included a way more various set of technical approaches to delivering high-performance storage for AI coaching, together with:
6 native storage options;
2 options utilizing in-storage accelerators;
13 software-defined options;
12 block techniques;
16 on-prem shared storage options;
2 object shops.
“Necessity continues to be the mom of invention: confronted with the necessity to ship storage options which are each high-performance and at unprecedented scale, the technical neighborhood has stepped up as soon as once more and is innovating at a livid tempo,” mentioned Balmau.
The MLPerf Storage benchmark was created by way of a collaborative engineering course of by 35 main storage answer suppliers and educational analysis teams throughout 3 years. The open-source and peer-reviewed benchmark suite affords a stage enjoying area for competitors that drives innovation, efficiency, and vitality effectivity for the complete business. It additionally offers essential technical info for patrons who’re procuring and tuning AI coaching techniques.
The v2.0 benchmark outcomes, from a broad set of expertise suppliers, replicate the business’s recognition of the significance of high-performance storage options. MLPerf Storage v2.0 contains >200 efficiency outcomes from 26 submitting organizations: Alluxio, Argonne Nationwide Lab, DDN, ExponTech, FarmGPU, H3C, Hammerspace, HPE, JNIST/Huawei, Juicedata, Kingston, KIOXIA, Lightbits Labs, MangoBoost, Micron, Nutanix, Oracle, Quanta Laptop, Samsung, Sandisk, Simplyblock, TTA, UBIX, IBM, WDC, and YanRong. The submitters characterize seven totally different international locations, demonstrating the worth of the MLPerf Storage benchmark to the worldwide neighborhood of stakeholders.
“The MLPerf Storage benchmark has set new data for an MLPerf benchmark, each for the variety of organizations taking part and the overall variety of submissions,” mentioned David Kanter, Head of MLPerf at MLCommons. The AI neighborhood clearly sees the significance of our work in publishing correct, dependable, unbiased efficiency knowledge on storage techniques, and it has stepped up globally to be part of it. I’d particularly prefer to welcome first-time submitters Alluxio, ExponTech, FarmGPU, H3C, Kingston, KIOXIA, Oracle, Quanta Laptop, Samsung, Sandisk, TTA, UBIX, IBM, and WDC.”
“This stage of participation is a game-changer for benchmarking: it allows us to overtly publish extra correct and extra consultant knowledge on real-world techniques,” Kanter continued. That, in flip, provides the stakeholders on the entrance strains the data and instruments they should succeed at their jobs. The checkpoint benchmark outcomes are a wonderful living proof: now that we are able to measure checkpoint efficiency, we are able to take into consideration optimizing it.”