Compacted topics grow over time and are often utilizing high performance, low latency and relatively expensive storage solutions. Reducing duplicated data plays a critical role in the size of compacted topics. with less data on the topics, the Kafka cluster consumes less disk space which in turn it leads to lower operation cost.
in this use case-driven talk, we are going to demonstrate how our team at UnitedHealth Group leveraged existing transformers to extract data from the message metadata in the topic as well as how we developed our customized transformers to minimize the amount of duplicated data in each message in the topic.
3. Our Mission
To help people live healthier lives and to help
make the health system work better for everyone.
4. Optum by numbers
o Optum serves 127 million individual consumers
o Optum serves 9 out of 10 U.S. hospitals
o Optum serves ~9 out of 10 Fortune 100 employers
o Optum serves more than 90,000 physicians, practices and other health care facilities as
well as non-profit associations and organizations
o Optum serves a network of over 67,000 pharmacies
4
6. The Journey Begins
Data ingestion from multiple resources (Kafka Connect)
Data enrichment (KTable Joins & Streaming API)
Aggregation and metrics calculation (Kafka Streaming API)
Sinking data to database (Kafka Connect)
Near real-time APIs to serve the data
6
7. Broker Instance Fee Broker Storage Fee Data Transfer Fee
Cost of Operating a Kafka Cluster
7
8. Broker Instance Fee Broker Storage Fee Data Transfer Fee
Cost of Operating a Kafka Cluster
8
Reducing Broker Storage Cost by Minimizing Data
Redundancy in Topics