Scale By the Bay Scale By the Bay

Friday, October 29, 2021

Introducing Spark Cyclone: Accelerating Spark with the hidden supercomputer device plugin in Hadoop
Eduardo Gonzalez
Eduardo Gonzalez
Xpress AI, Founder and CEO

The Spark Cyclone project is an open-source plug-in for Apache Spark that accelerates Spark SQL with the NEC SX-Aurora TSUBASA accelerator which is supported in Hadoop 3.3. Dubbed the “Vector Engine” the card has lots of onboard RAM (48GB), lots of memory bandwidth (1.5 TB/s), and lots of computing power (6 TFLOPs).  This talk covers why the Vector Engine is uniquely suited for analytics (compared to the alternatives), how we execute Catalyst queries on the Vector Engine, and how the performance compares to Spark Rapids running on a V100.