2023 Sessions On-Demand

#

All Sessions

Get Your Data into Apache Pinot Faster with StarTree’s Data Manager

Tim Santos

Session Speaker
StarTree
Software Engineer, StarTree
Tim is currently a software engineer at StarTree working on the data ingestion team. Tim has also worked at LinkedIn where he built several company analytics products on top of Pinot such as Talent Insights.

Seunghyun Lee

Session Speaker
StarTree
Software Engineer, StarTree
Seunghyun is a software engineer at StarTree. Prior to this, he has worked at LinkedIn's Pinot team for more than 5 years, as well as a PMC for the Apache Pinot project. He has contributed several critical features to the project (including replica-group support, segment merge/rollup, etc). Outside of work, he enjoys playing golf and sipping Pinot.

Apache Pinot supports data ingestion from a wide variety of sources such as streaming (eg: Kafka, Pulsar, Kinesis), batch (eg: S3, GCS, HDFS) as well as data warehouses (eg: Snowflake, BigQuery). Configuring ingestion properties for all these data sources within the Pinot table config can be tedious. In addition, users also have to specify additional settings such as data partitioning, column indexes, retention, quotas and so on which makes it cumbersome for beginners to start onboarding Pinot tables.

StarTree’s Data Manager is a no-code, self-service tool that helps users of all calibers quickly get started with Pinot. It recently has undergone a revamp that provides a pleasant step by step experience to connect to your data source and start ingesting data. In addition, Data Manager uses data sampling and data preview techniques to ensure that your data model and Pinot indexes are configured for ingestion. With Data Manager, StarTree users are able to start querying data in Pinot faster than ever.