Publicité

Complex Analytics with NoSQL Data Store in Real Time

CTO & Founder at Cloudify à Cloudify.co
21 Aug 2014
Publicité

Contenu connexe

Présentations pour vous(20)

Similaire à Complex Analytics with NoSQL Data Store in Real Time(20)

Publicité
Publicité

Complex Analytics with NoSQL Data Store in Real Time

  1. Complex Analytics with NoSQL Data Store in Real Time Nested Queries, Projection, Transactions and more Nati Shalom @natishalom slideshare.net/giganati
  2. What were here to discuss? Making Sense of the Exploding Data World How that World Could Look Like if Disk is no Longer the Bottleneck Live Demo
  3. Making Sense of The Exploding Data World
  4. Capacity and Performance Drives New Data Management Technologies PB TB GB Data Volume Data Mining Machine Learning Data Business Intelligence Warehouse High Throughput OLTP Yr Mo Day Hr Min Sec MS μS Data Velocity Operational Intelligence Exploratory Analytics OLTP Streaming
  5. Let’s Look at Tradeoffs of Some Selected Solutions
  6. SQL Queries • Query: SQL • Semantics: • CRUD • Aggregation • Projection • Partial update • Performance: 100’s/Sec • Consistency: Transactional • Scaling: Mostly Scale-UP • Availability: Disk Based
  7. NoSQL • Query: Proprietary but rich • Semantics: • CRUD • Limited Aggregation (Map/Reduce) • No Projection • No Partial update • Performance: 1000s/Sec • Consistency: Eventual • Scaling: Mostly Scale-Out • Availability: Based on replication
  8. IMDG • Query: Propriety but rich • Semantics: • CRUD • Aggregation API + Map/Reduce • Projection (GigaSpaces) • Partial Update (GigaSpaces) • Performance: 100k/sec • Consistency: Transactional • Scaling: Mostly Scale-Out • Availability: Replication
  9. Key/Value • Query: Key, Value • Semantics: • Mostly Read • No Aggregation • No Projection • No Partial update • Performance: 1M’s/sec • Consistency: Atomic • Scaling: Mostly Scale-Out • Availability: Limited (varies quite substantially between implementations)
  10. Stream Processing (Storm) • Semantics – Event driven data processing • Used for continues updates Spouts – No need for a costly “SELECT FOR UPDATE” • Performance: 10’sM/sec updates Bolt
  11. Common Assumption Disk is the bottleneck 100X 10,000X HDD Latency (Seek & Rotate) = Little Improvement 2010 Performance^10 2000 2020 Source: GigaOM Research
  12. Capacity and Performance Drives New Data Management Technologies (Source: IDC, 2013) Big Data (Hadoop) NoSQL In Memory, Stream Processing RDBMS
  13. There’s No One Size Fits All
  14. A Typical App Looks Like This.. Front End Analytics RT STORM Batch The Data Flow Complexity
  15. What if Disk Was no Longer the Bottleneck? FLASH Closes the CPU to Storage Gap
  16. Our Application Cloud Look Like This.. Front End High Speed Data Store (Using Flash/NVM) Key/Value SQL Document Graph Map/Reduce Transactional Disk Becomes the new Tape StreamBase Common Data Store serving Multiple Semantics/API
  17. We're not there yet .. But..
  18. We can use High Speed Data Bus for Integrating All of our Data Sources Front End Analytics RT STORM Batch High Speed Data Bus (Built-In Caching) RT Transactional Data Access Direct Access RT Streaming Hadoop Synch MySQL Synch Mongo Synch
  19. High Speed Data Bus (Zoom In)
  20. Designed for Transactional and Analytics Scenarios.. Homeland Security Real Time Search Social eCommerce User Tracking & Engagement Financial Services
  21. Many API’s – Same Data Key/Value SQL Document Graph Map/Reduce Transactional
  22. Let’s take a closer look..
  23. Nested Queries & Projections
  24. Aggregations.
  25. Fast Update … Remains with strong consistency!
  26. Transactions support
  27. The Performance of RAM at a Cost/Capacity Closer to Disk Provides 2x – 3.6x Better TPS/$ 1:50 More Capacity ZetaScale-GigaSpaces on SSDs Stock GigaSpaces in DRAM 62 - 1KB object size and uniform distribution - 2 sockets 2.8GHz CPU with total 24 cores, CentOS 5.8, 2 FusionIO SLC PCIe cards RAID - YCSB measurements performed by SanDisk 121 17 56 160 140 120 100 80 60 40 20 0 No Read / 100% Write 100 % Read / No Write FDF-GigaSpaces on SSDs Stock GigaSpaces in DRAM Assumptions: 1TB Flash = $2K; 1TB RAM = $20K ZetaScale-GigaSpaces 1200 1000 800 600 400 200 ZetaScale™ – XAP MemoryXtend 1:50 20 1000 0 Capacity XAP XAP Extend 242k Read/Sec
  28. Data is Moving to Cloud Source: Managing Storage: Trends, Challenges, and Options (2013-2014). (EMC, 2013)
  29. Orchestration needs to be integrated into DataBase solution to make it Cloud Ready
  30. Click on the relevant box to get the demo Many API’s Same Data Demo References Data Bus (Integration with Storm) Built In Orchestration
  31. Summary
  32. Nati Shalom Check out the slide on http://www.slideshare.net/giganati

Notes de l'éditeur

  1. Some of the emerging NewSQL and NoSQL disk-based databases might have had the ability to deal with the more demanding data volume and variety but… But disk-based databases have always been I/O bound – in other words, keeping up with the new velocity demands of data is much harder. Disks have always gotten in the way of database velocity or throughput. The closer to real-time that transaction throughput or analytics must be, the harder it is for disk-based approaches to keep up.
  2. It constructs a processing graph that feeds data from an input source through processing nodes. The processing graph is called a "topology". The input data sources are called "spouts", and the processing nodes are called "bolts". The data model consists of tuples. Tuples flow from Spouts to the bolts, which execute user code.
  3. http://www.zdnet.com/storage-in-2014-an-overview-7000024712/
  4. http://blogs.technet.com/b/dataplatforminsider/archive/2013/05/01/leveraging-flash-across-the-microsoft-sql-server-stack.aspx
  5. http://www.zdnet.com/storage-in-2014-an-overview-7000024712/
Publicité