Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

Real-time Analytics in Financial

1 016 vues

Publié le

Use case, architecture and challenges of real-time analytics in financial industry.

Publié dans : Technologie
  • Soyez le premier à commenter

Real-time Analytics in Financial

  1. 1. 1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Real-time Analytics in Financial Use Case, Architecture and Challenges 蒋 逸峰(しょう いつほう/Yifeng Jiang) Solutions Engineer, Hortonworks @uprush October 26, 2016
  2. 2. 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved The YAP Map by Google M
  3. 3. 3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved http://www.wondermondo.com
  4. 4. 4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Today’s Money & Financial Service moneymoney Financial Service 0110010100 0110010100
  5. 5. 5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Every Financial Service is a Big Data Service à Financial services are BIG – Too big to fail à Every financial service is eventually a big data service – Number of transactions – Number of jobs – Third party data
  6. 6. 6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved How Big is Big Data in Financial? à Millions to billions transactions per day – Hundreds to tens of thousands transactions per second à Big Data in banking, payment, security, etc.
  7. 7. 7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Big Data Use Case in Financial http://www.forbes.com/sites/bernardmarr/2016/09/09/big-data-in-banking-how-citibank-delivers-real-business-benefits-with-their-data-first-approach/#7759859f75ed
  8. 8. 8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Why Real-time Analytics in Financial? Can you detect fraud from millions to billions transactions per day in real-time ? “The costs resulting from these anomalies is far easier to correct if spotted quickly – or even before it happens – through predictive modeling. ”
  9. 9. 9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved 最近気になったニュース http://gendai.ismedia.jp/articles/-/48832 http://mainichi.jp/articles/20161012/k00/00e/040/243000c
  10. 10. 10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved https://roboteer-tokyo.com/archives/4415
  11. 11. 11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Real-time Analytics in Financial Use Case, Architecture & Challenging
  12. 12. 12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved A Simple Use Case – Real-time Surveillance Detect abnormal transactions in Stock Exchange à Trigger alert if – A customer buy / sell amount exceeds 500M JPY in 3 minutes à 300K transactions per second à Abnormal must be detected within 10s Alert
  13. 13. 13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Real-time Surveillance Architecture v1 Trading Data (real-time) Message Bus (Kafka) Enricher (Storm) Aggregator (Storm) Master data, raw & aggregated trade (HBase+Phoenix) Surveillance Rule Engine Surveillance Alerts master data look up Insert trade (raw & aggregated) Architecture v1 how?
  14. 14. 14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Real-time Data Ingestion à From transaction database – Change Data Capture (CDC) – Not practical for most financial system à From gateway system – Receive data from gateway system – Send data to Kafka (as Kafka producer)
  15. 15. 15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Real-time Surveillance Architecture v1 Trading Data (real-time) Message Bus (Kafka) Enricher (Storm) Aggregator (Storm) Master data, raw & aggregated trade (HBase+Phoenix) Surveillance Rule Engine Surveillance Alerts master data look up Insert trade (raw & aggregated) via CDC or gateway Architecture v1 overhead?
  16. 16. 16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Real-time Data Lookup à Low latency data store – Master data – NoSQL database: HBase (+Phoenix), Redis à Use local Cache – LRU cache in Storm bolts
  17. 17. 17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Real-time Surveillance Architecture v2 Trading Data (real-time) Message Bus (Kafka) Enricher (Storm) Aggregator (Storm) Master data, raw & aggregated trade (HBase+Phoenix) Surveillance Rule Engine Surveillance Alerts local master data cache master data look up Insert trade (raw & aggregated) via CDC or gateway Architecture v2 – With Cache exactly-once? exactly-once?
  18. 18. 18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Exactly-Once? Message delivery semantics à At-most-once: may lose data but no duplication à At-least-once: no data loss, but may duplicate à Exactly-once: no data loss, no duplication
  19. 19. 19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Exactly-Once! NO real exactly-once message delivery in distributed system à There is no such thing as exactly-once delivery à Exactly-once is an end-to-end requirement But… people like exactly-once, especially in financial service system
  20. 20. 20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Exactly-once semantics (better phrase “effectively-once”) with at-least-once + idempotent operations à Kafka & Storm guarantee at-least-once à De-duplicate by ensuring idempotent in your application Effectively-once
  21. 21. 21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved De-duplication in window computation Most window computations can achieve idempotent à Examples: aggregation, counting, etc. à De-duplicate messages in the window – Using local in-memory state store, e.g. a Set class Trading Events in Kafka IDRegistry (local in-memory) 2. lookup trade_id 3. count de-duplicated events 5. output aggregated data Aggregated Trade Data Aggregator (Storm) 4. Insert trade_id 1. Pull data in 3m window
  22. 22. 22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved De-duplication in non-idempotent computation à Exactly-once in non-idempotent computation – Example: join continuous data streams – Global state store required: HBase, Redis – Batching can help reduce number of ID lookup. à Exactly-once is expensive, avoid it at the best Click Logs in Kafka IDRegistry (external NoSQL) 2. Lookup click_id 5. Output joined click Joined Click Logs Joiner (Storm) 4. Insert click_id 1. Pull data continuously Query Logs 3. Lookup query
  23. 23. 23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Real-time Surveillance Architecture v3 Trading Data (real-time) Message Bus (Kafka) Enricher (Storm) Aggregator (Storm) Master data, raw & aggregated trade (HBase+Phoenix) Surveillance Rule Engine Surveillance Alerts master data look up IDRegistry look up / insert, de-duplicate in window local master data cache IDRegistry (local in-memory) Insert trade (raw & aggregated) via CDC or gateway Architecture v3 – effectively-once order?
  24. 24. 24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Late Messages http://www.slideshare.net/HadoopSummit/apache-beam-a-unified-model-for-batch-and-stream-processing-data
  25. 25. 25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Handling Late Messages à Expect late messages – Streaming application needs to handle out of order events, e.g., emits late messages to a special Kafka topic à Use source generated timestamp à Storm’s late message support in window computation (BaseWindowedBolt) – withTimestampField(String fieldName) – withLag(Duration duration)
  26. 26. 26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Can I trust the data? Duplications! Out of order late messages! Data loss?
  27. 27. 27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Monitor Data Processing Pipeline Quality Approaches to monitor data pipeline quality à Audit completeness à Output duplicated and late messages to logs for auditing. à Define service level objective (SLO) of data quality and monitor the SLO.
  28. 28. 28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Define Data Processing Pipeline SLO Design practical SLO for the pipeline à Process 99.9999% events within a few seconds à and 100% events within a few hours à At-most-once semantics at any point of time à Near exactly-once semantics in near real-time à And exactly-once semantics eventually
  29. 29. 29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved The Rule Engine & The Architecture Hundreds of rules à A stock trading price jump up / down > k% and total amount > m% in K minutes à Single ATM cash withdrawal > k% and number of ATM > m in K minutes Many of these rules fit into this simple architecture! Rule Engine ✓✗? ? ? ✓ Rule base only?
  30. 30. 30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Architecture -- with Predictive Analytics Real-time Surveillance Architecture — with Predicate Engine Trading Data (real-time) Message Bus (Kafka) Enricher (Storm) Aggregator (Storm) Master data, raw & aggregated trade (HBase+Phoenix) Surveillance Rule Engine Surveillance Alerts master data look up IDRegistry look up / insert, de-duplicate in window local master data cache IDRegistry (local in-memory) Insert trade (raw & aggregated) Financial Data Lake Train Machine Learning Model (Spark) load ML model Surveillance Predicate Engine (Storm) via CDC or gateway
  31. 31. 31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Lifecycle of Big Data Adoption in Financial Service Industry 2. Business Intelligence Data mining and visualization software that reveals trends and useful information 1. Data Pooling and Processing Connect data and create structure by merging, conditioning streams and archived data 3. Predictive Analytics Automated analytics integrated into workflow that unlock data value and improve profitability Hadoop enabled Big Data Platform Customers typically “Start Small, Think Big”
  32. 32. 32 © Hortonworks Inc. 2011 – 2016. All Rights Reserved THANK YOU

×