2023 Sessions On-Demand

#

All Sessions

Backfill Upsert Table Via Flink/Pinot Connector

Yupeng Fu

Session Speaker
Uber
Principal Engineer, Uber
Yupeng is a Principal Engineer at Uber and he leads Uber's Real-time Platform and Infrastructure, including multiple mission-critical services powered by several open-source technologies like Kafka/Flink/Pinot. Before Uber, he was a founding member of Alluxio Inc, and a PMC member of the Alluxio open source project. Prior to Alluxio, Yupeng worked at Palantir, building data analytics platforms.

It has been a challenge to bootstrap or backfill upsert table (e.g. for correction) with long retention in Pinot, given upsert table must be a real-time table. However, in most organizations, streams (e.g. Kafka) have a limited retention period.

To address this challenge, we developed a Flink/Pinot connector to generate Upsert segments directly from batch data sources (e.g. Hive), and thus solved the backfilling problem with the historical data without dependency on Kafka.