Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.
Developing High Frequency Indicators
Using Real-Time Tick Data
on Apache Superset and Druid
CBRT Big Data Team
Emre Tokel,...
Agenda
WHO WE ARE
CBRT & Our Team
PROJECT DETAILS
Before, Test Cluster,
Phase 1-2-3, Prod
Migration
HIGH FREQUENCY
INDICAT...
Who We Are
1
Our Solutions
Data Management
• Data Governance Solutions
• Next Generation Analytics
• 360 Engagement
• Data Security
Ana...
• +20 IT , +18 DB&DWH
• +7 BIG DATA
• Lead Archtitect &Big Data /Analytics
@KOMTAS
• Instructor&Consultant
• ITU,MEF,Şehir...
Our Organization
 The Central Bank of the Republic of Turkey is primarily responsible for steering the
monetary and excha...
• Big Data Engineer• Big Data Engineer
M. Yağmur Şahin Emre Tokel Kerem Başol
• Big Data Team Leader
High Frequency
Indicators
2
1
Importance and Goals
 To observe foreign exchange markets in real-time
o Are there any patterns regarding to specific tim...
Project Details 3
2
1
Development of High Frequency Indicators Using Real-Time Tick
Data on Apache Superset and Druid
Phase 1
Prod
migration
Nex...
Test Cluster
 Our first studies on big data have started on very humble servers
o 5 servers with 32 GB RAM for each
o 3 T...
Development of High Frequency Indicators Using Real-Time Tick
Data on Apache Superset and Druid
Phase 1
Prod
migration
Nex...
TREP API
Apache
Kafka
Apache NiFi MongoDB
Apache
Zeppelin &
Power BI
Thomson Reuters Enterprise Platform (TREP)
 Thomson Reuters provides its subscribers with an enterprise platform that the...
TREP API
Apache
Kafka
Apache NiFi MongoDB
Apache
Zeppelin &
Power BI
Apache Kafka
 The data flow is very fast and quite dense
o We published the messages containing tick data collected by ou...
TREP API
Apache
Kafka
Apache NiFi MongoDB
Apache
Zeppelin &
Power BI
Apache NiFi
 In order to manage the flow, we decided to use Apache NiFi
 We used KafkaConsumer processor to consume mess...
Our NiFi Flow
TREP API
Apache
Kafka
Apache NiFi MongoDB
Apache
Zeppelin &
Power BI
MongoDB
 We had prepared data in JSON format with our Java application
 Since we have MongoDB installed on our enterpris...
TREP API
Apache
Kafka
Apache NiFi MongoDB
Apache
Zeppelin &
Power BI
Apache Zeppelin
 We provided our researchers with access to Apache Zeppelin and connection to
MongoDB via Python
 By doi...
Business Intelligence on Client Side
 Our users had to download daily tick-data manually from their Thomson Reuters
Termi...
We needed more!
 We had to visualize the data in real-time
o Analysis on persisted data using MongoDB, PowerBI and Apache...
TREP API
Apache
Kafka
Apache NiFi MongoDB
Apache
Zeppelin &
Power BI
Development of High Frequency Indicators Using Real-Time Tick
Data on Apache Superset and Druid
Phase 1
Prod
migration
Nex...
TREP
API
Apache
Kafka
Apache
Druid
Apache
Superset
Apache Druid
 We needed a database which was able to:
o Answer ad-hoc queries (slice/dice) for a limited window efficient...
TREP
API
Apache
Kafka
Apache
Druid
Apache
Superset
Apache Superset
 Apache Superset was the obvious alternative for real-time visualization since tick-data
was stored on Ap...
We needed more, again!
 Reliability issues with Druid
 Performance issues
 Enterprise integration requirements
Development of High Frequency Indicators Using Real-Time Tick
Data on Apache Superset and Druid
Phase 1
Prod
migration
Nex...
Architecture
Internet Data
Enterprise Content
Social Media/Media
Micro Level Data
Commercial Data Vendors
Ingestion
Big Da...
Development of High Frequency Indicators Using Real-Time Tick
Data on Apache Superset and Druid
Phase 1
Prod
migration
Nex...
TREP API Apache Kafka
Apache
Hive + Druid
Integration
Apache Spark
Apache
Superset
Apache Hive + Druid Integration
 After setting up our production environment (using HDP 3.0.1.0) and started to
feed data...
TREP API Apache Kafka
Apache
Hive + Druid
Integration
Apache Spark
Apache
Superset
Apache Spark
 Due to additional calculation requirements of our users, we decided to utilize Apache
Spark
 With Apache S...
Current Architecture
4
3
2
1
Current Architecture & Progress So Far
Java Application
Kafka Topic (real-time)
Kafka Topic (windowed)
TREP Event Queue
Co...
TREP Data Flow
Windowed Spark Streaming
Tick-Data Dashboard
Outlier Dashboard
Work in Progress
5
4
3
2
1
Implementing…
 Moving average calculation (20-day window)
 Volatility Indicator
 Average True Range Indicator (moving a...
Future Plans
6
5
4
3
2
1
To-Do List
 Matching data subscription
 Bringing historical tick data into real-time analysis
 Possible use of machine ...
Thank you!
Q & A
Observing Intraday Indicators Using Real-Time Tick Data on Apache Superset and Druid
Prochain SlideShare
Chargement dans…5
×

Observing Intraday Indicators Using Real-Time Tick Data on Apache Superset and Druid

402 vues

Publié le

The Central Bank of the Republic of Turkey is primarily responsible for steering the monetary and exchange rate policies in Turkey.

One of the major core functions of the Bank is market operations. In this context, analyzing and interpreting real-time tick data related to money market instruments has become not only a requirement but also a challenge.

For this use case, an API provided by one of the financial data vendors has been used to gather real-time tick data and data routing has been orchestrated by Apache NiFi.

Gathered data is being transferred to Kafka topics and then handed off to Druid for real-time indexing tasks.

Indicators such as effective cost, bid-ask spread, price impact measures, return reversal are calculated using Apache Storm and finally visualized by means of Apache Superset in order to provide decision-makers with a new set of tools.

Publié dans : Technologie
  • Soyez le premier à commenter

Observing Intraday Indicators Using Real-Time Tick Data on Apache Superset and Druid

  1. 1. Developing High Frequency Indicators Using Real-Time Tick Data on Apache Superset and Druid CBRT Big Data Team Emre Tokel, Kerem Başol, M. Yağmur Şahin Zekeriya Besiroglu / Komtas Bilgi Yonetimi 21 March 2019 Barcelona
  2. 2. Agenda WHO WE ARE CBRT & Our Team PROJECT DETAILS Before, Test Cluster, Phase 1-2-3, Prod Migration HIGH FREQUENCY INDICATORS Importance & Goals CURRENT ARCHITECTURE Apache Kafka, Spark, Druid & Superset WORK IN PROGRESS Further analyses FUTURE PLANS 6 5 4 3 2 1
  3. 3. Who We Are 1
  4. 4. Our Solutions Data Management • Data Governance Solutions • Next Generation Analytics • 360 Engagement • Data Security Analytics • Data Warehouse Solutions • Customer Journey Analytics • Advanced Marketing Analytics Solutions • Industry-specific analytic use cases • Online Customer Data Platform • IoT Analytics • Analytic Lab Solution Big Data & AI • Big Data & AI Advisory Services • Big Data & AI Accelerators • Data Lake Foundation • EDW Optimization / Offloading • Big Data Ingestion and Governance • AI Implementation – Chatbot • AI Implementation – Image Recognition Security Analytics • Security Analytic Advisory Services • Integrated Law Enforcement Solutions • Cyber Security Solutions • Fraud Analytics Solutions • Governance, Risk & Compliance Solutions
  5. 5. • +20 IT , +18 DB&DWH • +7 BIG DATA • Lead Archtitect &Big Data /Analytics @KOMTAS • Instructor&Consultant • ITU,MEF,Şehir Uni. BigData Instr. • Certified R programmer • Certified Hadoop Administrator
  6. 6. Our Organization  The Central Bank of the Republic of Turkey is primarily responsible for steering the monetary and exchange rate policies in Turkey. o Price stability o Financial stability o Exchange rate regime o The privilege of printing and issuing banknotes o Payment systems
  7. 7. • Big Data Engineer• Big Data Engineer M. Yağmur Şahin Emre Tokel Kerem Başol • Big Data Team Leader
  8. 8. High Frequency Indicators 2 1
  9. 9. Importance and Goals  To observe foreign exchange markets in real-time o Are there any patterns regarding to specific time intervals during the day? o Is there anything to observe before/after local working hours throughout the whole day? o What does the difference between bid/ask prices tell us?  To be able to detect risks and take necessary policy measures in a timely manner o Developing liquidity and risk indicators based real-time tick data o Visualizing observations for decision makers in real-time o Finally, discovering possible intraday seasonality  Wouldn’t it be great to be able to correlate with news flow as well?
  10. 10. Project Details 3 2 1
  11. 11. Development of High Frequency Indicators Using Real-Time Tick Data on Apache Superset and Druid Phase 1 Prod migration Next phases Test Cluster Phase 2 Phase 3
  12. 12. Test Cluster  Our first studies on big data have started on very humble servers o 5 servers with 32 GB RAM for each o 3 TB storage  HDP 2.6.0.3 installed o Not the latest version back then  Technical difficulties o Performance problems o Apache Druid indexing o Apache Superset maturity
  13. 13. Development of High Frequency Indicators Using Real-Time Tick Data on Apache Superset and Druid Phase 1 Prod migration Next phases Test Cluster Phase 2 Phase 3
  14. 14. TREP API Apache Kafka Apache NiFi MongoDB Apache Zeppelin & Power BI
  15. 15. Thomson Reuters Enterprise Platform (TREP)  Thomson Reuters provides its subscribers with an enterprise platform that they can collect the market data as it is generated  Each financial instrument on TREP has a unique code called RIC  The event queue implemented by the platform can be consumed with the provided Java SDK  We developed a Java application for consuming this event queue to collect tick-data according to required RICs
  16. 16. TREP API Apache Kafka Apache NiFi MongoDB Apache Zeppelin & Power BI
  17. 17. Apache Kafka  The data flow is very fast and quite dense o We published the messages containing tick data collected by our Java application to a message queue o Twofold analysis: Batch and real-time  We decided to use Apache Kafka residing on our test big data cluster  We created a topic for each RIC on Apache Kafka and published data to related topics
  18. 18. TREP API Apache Kafka Apache NiFi MongoDB Apache Zeppelin & Power BI
  19. 19. Apache NiFi  In order to manage the flow, we decided to use Apache NiFi  We used KafkaConsumer processor to consume messages from Kafka queues  The NiFi flow was designed to be persisted on MongoDB
  20. 20. Our NiFi Flow
  21. 21. TREP API Apache Kafka Apache NiFi MongoDB Apache Zeppelin & Power BI
  22. 22. MongoDB  We had prepared data in JSON format with our Java application  Since we have MongoDB installed on our enterprise systems, we decided to persist this data to MongoDB  Although MongoDB is not a part of HDP, it seemed as a good choice for our researchers to use this data in their analyses
  23. 23. TREP API Apache Kafka Apache NiFi MongoDB Apache Zeppelin & Power BI
  24. 24. Apache Zeppelin  We provided our researchers with access to Apache Zeppelin and connection to MongoDB via Python  By doing so, we offered an alternative to the tools on local computers and provided a unified interface for financial analysis
  25. 25. Business Intelligence on Client Side  Our users had to download daily tick-data manually from their Thomson Reuters Terminals and work on Excel  Users were then able to access tick-data using Power BI o We also provided our users with a news timeline along with the tick-data
  26. 26. We needed more!  We had to visualize the data in real-time o Analysis on persisted data using MongoDB, PowerBI and Apache Zeppelin was not enough
  27. 27. TREP API Apache Kafka Apache NiFi MongoDB Apache Zeppelin & Power BI
  28. 28. Development of High Frequency Indicators Using Real-Time Tick Data on Apache Superset and Druid Phase 1 Prod migration Next phases Test Cluster Phase 2 Phase 3
  29. 29. TREP API Apache Kafka Apache Druid Apache Superset
  30. 30. Apache Druid  We needed a database which was able to: o Answer ad-hoc queries (slice/dice) for a limited window efficiently o Store historic data and seamlessly integrate current and historic data o Provide native integration with possible real-time visualization frameworks (preferably from Apache stack) o Provide native integration with Apache Kafka  Apache Druid addressed all the aforementioned requirements  Indexing task was achieved using Tranquility
  31. 31. TREP API Apache Kafka Apache Druid Apache Superset
  32. 32. Apache Superset  Apache Superset was the obvious alternative for real-time visualization since tick-data was stored on Apache Druid o Native integration with Apache Druid o Freely available on Hortonworks service stack  We prepared real-time dashboards including: o Transaction Count o Bid / Ask Prices o Contributor Distribution o Bid - Ask Spread
  33. 33. We needed more, again!  Reliability issues with Druid  Performance issues  Enterprise integration requirements
  34. 34. Development of High Frequency Indicators Using Real-Time Tick Data on Apache Superset and Druid Phase 1 Prod migration Next phases Test Cluster Phase 2 Phase 3
  35. 35. Architecture Internet Data Enterprise Content Social Media/Media Micro Level Data Commercial Data Vendors Ingestion Big Data Platform Data Science GovernanceData Sources
  36. 36. Development of High Frequency Indicators Using Real-Time Tick Data on Apache Superset and Druid Phase 1 Prod migration Next phases Test Cluster Phase 2 Phase 3
  37. 37. TREP API Apache Kafka Apache Hive + Druid Integration Apache Spark Apache Superset
  38. 38. Apache Hive + Druid Integration  After setting up our production environment (using HDP 3.0.1.0) and started to feed data, we realized that data were scattered and we were missing the option to co-utilize these different data sources  We then realized that Apache Hive was already providing Kafka & Druid indexing service in the form of a simple table creation and querying facility for Druid from Hive
  39. 39. TREP API Apache Kafka Apache Hive + Druid Integration Apache Spark Apache Superset
  40. 40. Apache Spark  Due to additional calculation requirements of our users, we decided to utilize Apache Spark  With Apache Spark 2.4, we used Spark Streaming and Spark SQL contexts together in the same application  In our Spark application o For every 5 seconds, a 30-second window is created o On each window, outlier boundaries are calculated o Outlier data points are detected
  41. 41. Current Architecture 4 3 2 1
  42. 42. Current Architecture & Progress So Far Java Application Kafka Topic (real-time) Kafka Topic (windowed) TREP Event Queue Consume Publish Spark Application Consume Publish Druid Datasource (real-time) Druid Datasource (windowed) Superset Dashboard (tick data) Superset Dashboard (outlier)
  43. 43. TREP Data Flow
  44. 44. Windowed Spark Streaming
  45. 45. Tick-Data Dashboard
  46. 46. Outlier Dashboard
  47. 47. Work in Progress 5 4 3 2 1
  48. 48. Implementing…  Moving average calculation (20-day window)  Volatility Indicator  Average True Range Indicator (moving average) o [ max(t) - min(t) ] o [ max(t) - close(t-1) ] o [ max(t) - close(t-1) ]
  49. 49. Future Plans 6 5 4 3 2 1
  50. 50. To-Do List  Matching data subscription  Bringing historical tick data into real-time analysis  Possible use of machine learning for intraday indicators
  51. 51. Thank you! Q & A

×