2023 Sessions On-Demand

#

All Sessions

Gently Down the Stream with Apache Pinot

Navina Ramesh

Session Speaker
StarTree
Software Engineer, StarTree
Navina Ramesh has spent the last 7+ years working on streaming infrastructures like Apache Samza and Apache Kafka at LinkedIn. She is a committer and PMC member in the Apache Samza project. She is currently a software engineer at StarTree and brings her experience of powering large-scale data systems to the Apache Pinot project.

Streaming systems like Apache Kafka, Amazon Kinesis, and Google Pubsub have become the defacto standard to capture real-time events and CDC events, which are then ingested into an OLAP system like Apache Pinot to derive real-time insights on data.

However, most of the time, the data in the stream needs to undergo transformations prior to entering an OLAP system in order to be useful for a user-facing application. Such pre-processing is typically achieved through stream processing pipelines running in systems like Apache Flink, Apache Samza, KStreams etc. While this approach works, it brings in operational overhead that is expensive and tedious to maintain.

In this talk, we will explore some powerful real-time ingestion features in Apache Pinot that almost eliminates the need for stream processing pipelines. Starting from ingestion operations like filtering and column transformations to handling CDC data from Debezium supported-sources, Apache Pinot reduces the effort needed to build a user-facing analytical application.