Wednesday
Room 4
13:40 - 14:40
(UTC+02)
Talk (60 min)
Graph the planet: Wrangling GPU graph dataframes with GFQL
We have all been there: A new data dump that we need to understand - maybe graphs can help? And do we really need a database project to find out?
The explosion of LLMs and event data are making graphs more attractive as we need to answer basic questions like what are the unique entities, what are they doing, and how do they relate. The basic ergonomics and architecture of graph computing is shifting, so the solution is less clear.
This talk goes into the design and early usage patterns of GFQL, the first open source dataframe-native graph query language, built for massive scale and seamless Python integration and representative of these changing ideas. We’ll show how GFQL harnesses GPU acceleration to achieve up to 42X speedups on real-world graphs, and how its underlying components have won third place in the Graph 500 on its first submission. We’ll dive into how we have been using it on projects spanning billion-dollar lawsuits, cybersecurity incident response, clickstream analytics, and mining the Bluesky firehose in real-time. We’ll also see how GFQL avoids the overhead of external graph databases—no new infrastructure to manage—so we can work directly from our Jupyter notebooks, Python dashboards, and web apps from a simple pip install. Finally, learn how to combine GFQL with PyData, Arrow, and GPU libraries for end-to-end graph analytics, from data loading to interactive visualization and AI.