Cerebras access twenty billion parameters in workloads on a single chip

The artificial intelligence model trained by Cerebras climbed to a unique and remarkable twenty billion parameters. Cerebras completed this action without having to scale the workload across numerous accelerators. Cerebras’ triumph is critical for machine learning in that the infrastructure and complexity of the software requirements are reduced compared to previous models. The Wafer Scale Engine-2 is engraved in an individual 7 nm wafer, equalling hundreds of premium chips on the market, and features 2.6 trillion 7 nm transistors. Along with the wafer and transistors, the Wafer Scale Engine-2 incorporates 850,000 cores and 40 GB of integrated cache with a 15kW power consumption. Tom’s Hardware notes that “a single CS-2 system is akin to a supercomputer all on its own.” The benefit for Cerebras utilizing a 20 billion-parameter NLP model in an individual chip allows for the company to reduce its overhead in the cost of training thousands of GPUs, hardware, and scaling requirements. In turn, the company can eliminate any technical difficulties of partitioning various models across the chip. The company states this is “one of the most painful aspects of NLP workloads, […] taking months to complete.” It’s a tailored issue that’s unusual not only to each processed neural network, GPU specifications, and the overall network combining all the components, which researchers must take care of before the first section of training. The training is also solitary and cannot be used on multiple systems. Currently, we have seen systems that perform exceptionally well with having to use fewer parameters. One such system is Chinchilla, which continually exceeds GPT-3 and Gopher’s 70 billion parameters. However, Cerebras’ accomplishment is exceptionally significant in that researchers will find that they will be able to calculate and create gradually elaborate models on the new Wafer Scale Engine-2 where others cannot. — Andrew Feldman, CEO and Co-Founder, Cerebras Systems The technology behind the vast amount of workable parameters uses the company’s Weight Streaming technology, allowing researchers to “decouple compute and memory footprints, allowing for memory to be scaled towards whatever the amount is needed to store the rapidly-increasing number of parameters in AI workloads.” In turn, the time taken for setting up the learning will be reduced from months to minutes with only a few standard commands, allowing to switch flawlessly between GPT-J and GPT-Neo. News Source: Tom’s Hardware — Dan Olds, Chief Research Officer, Intersect360 Research

Cerebras CS 2 Wafer Scale Chip Outperforms Every Single GPU By Leaps   Bounds  Breaks Record of Largest AI Model Trained on A Single Device - 6Cerebras CS 2 Wafer Scale Chip Outperforms Every Single GPU By Leaps   Bounds  Breaks Record of Largest AI Model Trained on A Single Device - 41Cerebras CS 2 Wafer Scale Chip Outperforms Every Single GPU By Leaps   Bounds  Breaks Record of Largest AI Model Trained on A Single Device - 50Cerebras CS 2 Wafer Scale Chip Outperforms Every Single GPU By Leaps   Bounds  Breaks Record of Largest AI Model Trained on A Single Device - 83Cerebras CS 2 Wafer Scale Chip Outperforms Every Single GPU By Leaps   Bounds  Breaks Record of Largest AI Model Trained on A Single Device - 3Cerebras CS 2 Wafer Scale Chip Outperforms Every Single GPU By Leaps   Bounds  Breaks Record of Largest AI Model Trained on A Single Device - 77Cerebras CS 2 Wafer Scale Chip Outperforms Every Single GPU By Leaps   Bounds  Breaks Record of Largest AI Model Trained on A Single Device - 89Cerebras CS 2 Wafer Scale Chip Outperforms Every Single GPU By Leaps   Bounds  Breaks Record of Largest AI Model Trained on A Single Device - 16Cerebras CS 2 Wafer Scale Chip Outperforms Every Single GPU By Leaps   Bounds  Breaks Record of Largest AI Model Trained on A Single Device - 14Cerebras CS 2 Wafer Scale Chip Outperforms Every Single GPU By Leaps   Bounds  Breaks Record of Largest AI Model Trained on A Single Device - 9Cerebras CS 2 Wafer Scale Chip Outperforms Every Single GPU By Leaps   Bounds  Breaks Record of Largest AI Model Trained on A Single Device - 50Cerebras CS 2 Wafer Scale Chip Outperforms Every Single GPU By Leaps   Bounds  Breaks Record of Largest AI Model Trained on A Single Device - 34Cerebras CS 2 Wafer Scale Chip Outperforms Every Single GPU By Leaps   Bounds  Breaks Record of Largest AI Model Trained on A Single Device - 55Cerebras CS 2 Wafer Scale Chip Outperforms Every Single GPU By Leaps   Bounds  Breaks Record of Largest AI Model Trained on A Single Device - 40Cerebras CS 2 Wafer Scale Chip Outperforms Every Single GPU By Leaps   Bounds  Breaks Record of Largest AI Model Trained on A Single Device - 64Cerebras CS 2 Wafer Scale Chip Outperforms Every Single GPU By Leaps   Bounds  Breaks Record of Largest AI Model Trained on A Single Device - 62Cerebras CS 2 Wafer Scale Chip Outperforms Every Single GPU By Leaps   Bounds  Breaks Record of Largest AI Model Trained on A Single Device - 15Cerebras CS 2 Wafer Scale Chip Outperforms Every Single GPU By Leaps   Bounds  Breaks Record of Largest AI Model Trained on A Single Device - 71Cerebras CS 2 Wafer Scale Chip Outperforms Every Single GPU By Leaps   Bounds  Breaks Record of Largest AI Model Trained on A Single Device - 92Cerebras CS 2 Wafer Scale Chip Outperforms Every Single GPU By Leaps   Bounds  Breaks Record of Largest AI Model Trained on A Single Device - 46Cerebras CS 2 Wafer Scale Chip Outperforms Every Single GPU By Leaps   Bounds  Breaks Record of Largest AI Model Trained on A Single Device - 59Cerebras CS 2 Wafer Scale Chip Outperforms Every Single GPU By Leaps   Bounds  Breaks Record of Largest AI Model Trained on A Single Device - 66Cerebras CS 2 Wafer Scale Chip Outperforms Every Single GPU By Leaps   Bounds  Breaks Record of Largest AI Model Trained on A Single Device - 42Cerebras CS 2 Wafer Scale Chip Outperforms Every Single GPU By Leaps   Bounds  Breaks Record of Largest AI Model Trained on A Single Device - 54Cerebras CS 2 Wafer Scale Chip Outperforms Every Single GPU By Leaps   Bounds  Breaks Record of Largest AI Model Trained on A Single Device - 32Cerebras CS 2 Wafer Scale Chip Outperforms Every Single GPU By Leaps   Bounds  Breaks Record of Largest AI Model Trained on A Single Device - 86Cerebras CS 2 Wafer Scale Chip Outperforms Every Single GPU By Leaps   Bounds  Breaks Record of Largest AI Model Trained on A Single Device - 11