2023 Sessions On-Demand

#

All Sessions

Minions to The Rescue—Tackling Complex Operations in Apache Pinot

Haitao Zhang

Session Speaker
StarTree
Software Engineer, StarTree
Haitao Zhang is a software engineer at StarTree, working on improving Apache Pinot. Previously, he was on Apple's AIML data infrastructure team and Uber’s Streaming Data Team, working on developing services, libraries and tools for Apache Kafka.

Xiaobing Li

Session Speaker
StarTree
Software Engineer, StarTree
Xiaobing Li is software engineer from StarTree, exploring ways to brew the best Pinot on cloud. Previously from AWS and Uber, with hands over Kafka, ClickHouse, ES and Spark.

Apache Pinot is a real-time distributed OLAP datastore that powers a variety of analytics use cases, which usually require executing high-throughput queries with low latency. To ensure data completeness, result correctness, and system performance, Pinot needs to execute background operational tasks – e.g. data compaction, GDPR data purging and reindexing after schema evolution etc. However, these operations can be computationally intensive and can easily impact query performance if executed on the same component as query execution.

Pinot leverages Minion, an Pinot native component built upon Apache Helix’s task framework, to execute those computationally intensive operational tasks, thus offloading workloads from the query execution component and avoiding sacrificing the query performance. The Minion component is designed to be easily extensible and pluggable – in addition to addressing the above issue, Minion is also used to build common data ingestion and backfilling pipelines, saving operators time from building customized and ad-hoc ones.

In this talk, we will deep dive into the Minion component and demonstrate how we leverage it in some typical operations tasks. We will also discuss the challenges faced while operating Minion at scale and how we greatly reduced the operational overheads by improving observability and introducing auto-scaling mechanisms.

To summarize, on one hand, Minion takes most of the operational burden in Pinot, helping real-time analytics run smoothly; on the other hand, Minion gives operators flexibility to perform complex operations that were hard (or even impossible) to perform, providing more delightful analytics product experiences.