Dec 6–8, 2022San Francisco

Sessions

Watermarks: Universal Tool for Any Streaming and Event-Processing Applications

Track: Intermediate/Advanced Spring

In distributed environments, competing data sources produce events at varying rates, flowing through different process steps, delays, and arriving at different times to different consumers.

How would you efficiently reason about the completeness of such input data over time? When building temporal aggregations over streaming data, how can you reliably determine if the aggregation is complete and ready to be sent downstream? How can you enforce and retain a temporal order throughout the pipeline?

Watermarks is a practical approach to reason about the temporal completeness of infinite data streams. Used internally by Apache Flink, Beam, and Google Dataflow, the approach is applicable to any stream processing!

You’ll learn how to implement Watermarks for your Spring Cloud Stream and Function applications, how to generate and propagate Watermarks across the pipelines, the significance of Event and Process time domains, and the tradeoffs and ways to handle Late data and Idle data sources.