Heterogeneous pipeline processing with Kubernetes and Google Cloud Pub/Sub

At Spacemaker, we are building a product for real estate developers and architects leveraging AI methodologies and massive computing power to maximize the potential of any building site. At the core of our platform's AI engine, we have CPU-intensive optimization and search algorithms, memory-demanding simulations, and GPU-optimized machine learning and computer graphics techniques None of these components, however, are equal in terms of resource requirements. Depending on the input size, some components complete within a couple of milliseconds using one CPU, while other components have run time of hours even with several hundred CPUs. Finally, one of our core values at Spacemaker's is the complete autonomy of every product team. This autonomy enables the teams to make their own choices regarding programming languages and libraries but also on their general approach to the problems on hand.

The challenges outlined above requires a really flexible pipeline. We have, therefore, replaced our old batch-oriented pipeline with an asynchronous, message-based pipelined built on Kubernetes and Google Pub/Sub. At the core of this pipeline, we find a central message broker that dispatches units of work onto a set of queues, where the broker takes care of task dependencies and ensures the pipeline executions run to their completion. The dispatched tasks are processed by a set of workers deployed to a shared, auto-scaled Kubernetes cluster. This offers elasticity and scalability for the workers and their respective resource requirements. By structuring the pipeline in this way the teams are free to develop individual components using whichever programming languages and tools they see fit, allowing for each component to have different resource requirements.

In this presentation I will start off with presenting our old, batch oriented pipeline, discussing why it needed replacement and the research leading up to the design of our new pipeline. I will then elaborate further on the details of our auto-scaling including how we handle the burst workloads by adding new resources to our Kubernetes cluster based on the queue lengths for the workers. Then, I will dig into the details of our message broker, the API and the code used in our pipeline workers. Finally I will wrap-up my presentation highlighting som key performance results and my thoughts on the potential surrounding modularizing a pipeline to enable more general use cases.