DeveloperWeek 2021 DeveloperWeek 2021
Register to build your agenda.

PRO WORKSHOP: Large Graph Neural Network Learning with Kubernetes Spark

- PST
DeveloperWeek PRO Stage A
Join on Hopin

Jintao Zhang
Square, Software Engineer Machine Learning

Dr. Jintao Zhang achieved his PhD in machine learning from University of Kansas in August 2012. Since then he has been working in various companies as data science, machine learning, and engineering roles, with extensive experience on developing Spark based platforms and providing end-to-end machine learning solutions.


Graph neural network (GNN) learning on very large graphs have gained great popularity recently, as critical business insights are hidden in huge knowledge graphs with billions of edges, such as social networks, sale transactions, and etc. Graph node embedding (e.g. Node2Vec) and inductive graph representation learning (e.g. GraphSAGE) has been widely used for fraud detection, cross-sell recommendation, and etc.

The technical challenges mainly come from scalability and cost effectiveness. We have developed a highly scalable and reliable Python library based on Spark and PyTorch for graph neural networks under the Fugue project (https://github.com/fugue-project). Benchmark tests have proved that it can handle graphs with billions of edges and hundreds of millions of nodes within a few hours. The library can easily support Kubernetes Spark with the help of Fugue, and hence deliver a highly cost effective solution in a flexible and uniform framework.