2023 Sessions On-Demand


All Sessions

Streaming Aggregation of Cloud Scale Telemetry

Shay Lin

Session Speaker
Software Engineer, Confluent
Shay Lin is experienced in event-driven microservices and stream processing frameworks. Shay works at Confluent’s data team, focusing on real-time analytics and data infrastructure. Before Confluent, she built systems at scale for payment risk and fraud detection.

How does one serve telemetry metrics at cloud scale? At Confluent, raw telemetry is flowing in at 5 million metrics per second. Not only is the storage expensive–often with extended retention to meet compliance requirements–but the computational cost of aggregation can also skyrocket in a pull model. In the pull model, metrics consumers like data science and billing used to query metrics from the OLAP data stores on demand, which created inconsistencies over time. This session will showcase how we switched to a push model for telemetry analytics, and tackled these challenges with Kafka Streams and Apache Druid.

You will walk away with an understanding of:
– Architecture choices for real-time aggregation
– Time semantics, and handling out-of-order events
– Partition and autoscaling story of the streaming platform