SlideShare une entreprise Scribd logo
1  sur  25
Télécharger pour lire hors ligne
Securely explore your data 
PERFORMANCE MODELS 
FOR APACHE ACCUMULO: 
THE HEAVY TAIL OF A SHARED-NOTHING 
ARCHITECTURE 
Chris McCubbin 
Director of Data Science 
Sqrrl Data, Inc.
I’M NOT ADAM FUCHS 
• But perhaps I’m still an interesting guy 
• MS in CS from UMBC in Network Security and 
Quantum Computing 
• 8 years at JHU/APL working on UxV Swarms 
• 4 years at JHU/APL and TexelTek creating Big 
Data Applications for the NSA 
• Co-founder and Director of Data Science at Sqrrl 
©2014 Sqrrl Data, Inc 2
SO, YOUR DISTRIBUTED 
APPLICATION IS SLOW 
• Today’s distributed applications run on tens or 
hundreds of library components 
• Many versions so internet advice could be ineffective, or 
worse, flat out wrong 
• Hundreds of settings 
• Some, shall we say, could be better documented 
• Shared-nothing architectures are usually “shared-little” 
architectures with tricky interactions 
• Profiling is hard and time-consuming 
• What do we do? 
©2014 Sqrrl Data, Inc 3
TODAY’S TALK 
1. Quick intro to performance optimization 
2. Tricks and techniques for targeted distributed 
application modeling performance improvement 
3. A deep dive into improving bulk load application 
performance 
©2014 Sqrrl Data, Inc 4
The Apache Accumulo™ sorted, distributed key/value store is a secure, robust, 
scalable, high performance data storage and retrieval system. 
• Many applications in real-time storage and analysis of “big data”: 
• Spatio-temporal indexing in non-relational distributed databases - Fox et al 
2013 IEEE International Congress on Big Data 
• Big Data Dimensional Analysis - Gadepally et al IEEE HPEC 2014 
• Leading its peers in performance and scalability: 
• Achieving 100,000,000 database inserts per second using Accumulo and 
D4M - Kepner et al IEEE HPEC 2014 
• An NSA Big Graph experiment (Technical Report NSA-RD-2013-056002v1) 
• Benchmarking Apache Accumulo BigData Distributed Table Store Using Its 
Continuous Test Suite - Sen et al 2013 IEEE International Congress on Big 
Data 
For more papers and presentations, see http://accumulo.apache.org/papers.html 
©2014 Sqrrl Data, Inc 5
SCALING UP: DIVIDE & CONQUER 
• Collections of KV pairs form Tables 
• Tables are partitioned into Tablets 
• Metadata tablets hold info about 
other tablets, forming a 3-level 
hierarchy 
• A Tablet is a unit of work for a 
Tablet Server 
Table: 
Adam’s 
Table 
Table: 
Encyclopedia 
Table: 
Foo 
Data 
Tablet 
-­‐∞ 
: 
thing 
Data 
Tablet 
thing 
: 
∞ 
Data 
Tablet 
-­‐∞ 
: 
Ocelot 
Data 
Tablet 
Ocelot 
: 
Yak 
Data 
Tablet 
Yak 
: 
∞ 
Data 
Tablet 
-­‐∞ 
to 
∞ 
Well-­‐Known 
Loca9on 
(zookeeper) 
Root 
Tablet 
-­‐∞ 
to 
∞ 
Metadata 
Tablet 
2 
“Encyclopedia:Ocelot” 
to 
∞ 
Metadata 
Tablet 
1 
-­‐∞ 
to 
“Encyclopedia:Ocelot” 
©2014 Sqrrl Data, Inc 6
PERFORMANCE ANALYSIS CYCLE 
Simulate & 
Experiment 
Modify 
Code 
Analyze 
Start: 
Create 
Model 
Refine 
Model 
Outputs: 
Better Code 
+ Models 
©2014 Sqrrl Data, Inc 7
MAKING A MODEL 
• Determine points of low-impact metrics 
• Add some if needed 
• Create parallel state machine models with 
components driven by these metrics 
• Estimate running times and bottlenecks from 
a-priori information and/or apply measured 
statistics 
• Focus testing on validation of the initial 
model and the (estimated) pain points 
• Apply Amdahl’s Law 
• Rinse, repeat 
©2014 Sqrrl Data, Inc 8
BULK INGEST OVERVIEW 
• Accumulo supports two mechanisms to bring 
data in: streaming ingest and bulk ingest. 
• Bulk Ingest 
• Goal: maximize throughput without constraining 
latency. 
• create a set of Accumulo Rfiles, then register those 
files with Accumulo. 
• RFiles are groups of sorted key-value pairs with 
some indexing information 
• MapReduce has a built-in key sorting phase: a good 
fit to produce RFiles 
©2014 Sqrrl Data, Inc 9
BULK INGEST MODEL 
10 
Map Reduce Register 
Time 
©2014 Sqrrl Data, Inc
BULK INGEST MODEL 
11 
Hypothetical Resource Usage 
Time 
• 100% CPU 
• 20% Disk 
• 0% Network 
• 46 seconds 
• 40% CPU 
• 100% Disk 
• 20% Network 
• 168 seconds 
• 10% CPU 
• 20% Disk 
• 40% Network 
• 17 seconds 
©2014 Sqrrl Data, Inc 
Map Reduce Register
INSIGHT 
• Spare disk here, spare CPU there – can we even out resource consumption? 
• Why did reduce take 168 seconds? It should be more like 40 seconds. 
• No clear bottleneck during registration – is there a synchronization or 
serialization problem? 
12 
Time 
• 100% CPU 
• 20% Disk 
• 0% Network 
• 46 seconds 
• 40% CPU 
• 100% Disk 
• 20% Network 
• 168 seconds 
• 10% CPU 
• 20% Disk 
• 40% Network 
• 17 seconds 
©2014 Sqrrl Data, Inc 
Map Reduce Register
LOOKING DEEPER: 
REFINED BULK INGEST MODEL 
Reduce Thread 
Map Thread 
13 
Map 
Setup Map Sort 
Sort Reduce Output 
Spill Merge 
Serve 
Shuffle 
Time 
©2014 Sqrrl Data, Inc 
Parallel Latch
BULK INGEST MODEL PREDICTIONS 
• We can constrain parts of the model by physical 
throughput limitations 
• Disk -> memory (100Mbps avg 7200rpm seq. read rate) 
• Input reader 
• Memory -> Disk (100Mbps) 
• Spill, OutputWriter 
• Disk -> Disk (50Mbps) 
• Merge 
• Network (Gigabit = 125Mbps) 
• Shuffle 
• And/or algorithmic limitations 
• Sort, (Our) Map, (Our) Reduce, SerDe 
©2014 Sqrrl Data, Inc 14
PERFORMANCE GOAL MODEL 
Performance goals obtained through: 
• Simulation of individual components 
• Prediction of available resources at runtime 
©2014 Sqrrl Data, Inc 15
INSTRUMENTATION 
application version 1.3.3 SYSTEM DATA 
application sha 8d17baf8 node num 1 input type arcsight 
yarn.nodemanager.resource.memory-mb 43008 map num containers 20 input block size 32 
yarn.scheduler.minimum-allocation-mb 2048 red num containers 20 input block count 20 
yarn.scheduler.maximum-allocation-mb 43008 cores physical 12 input total 672054649 
yarn.app.mapreduce.am.resource.mb 2048 cores logical 24 output map 9313303723 
yarn.app.mapreduce.am.command-opts -Xmx1536m disk num 8 output map:combine input records 243419324 
mapreduce.map.memory.mb 2048 disk bandwidth 100 output map:combine records out 209318830 
mapreduce.map.java.opts -Xmx1638m replication 1 output map:spill 7325671992 
mapreduce.reduce.memory.mb 2048 monitoring TRUE output final 573802787 
mapreduce.reduce.java.opts -Xmx1638m output map:combine 7301374577 
mapreduce.task.io.sort.mb 100 TIME 
mapreduce.map.sort.spill.percent 0.8 map:setup avg 8 RATIOS 
mapreduce.task.io.sort.factor 10 map:map avg 12 input explosion factor 13.877904 
mapreduce.reduce.shuffle.parallelcopies 5 map:sort avg 12 compression intermediate 1.003327786 
mapreduce.job.reduce.slowstart.completedmaps 1 map:spill avg 12 load combiner output 0.783972562 
mapreduce.map.output.compress FALSE map:spill count 7 total ratio 0.786581455 
mapred.map.output.compression.codec n/a map:merge avg 46 
description baseline map total 290 CONSTANTS 
red:shuffle avg 6 avg schema entry size (bytes) 59 
red:merge avg 38 
red:reduce avg 68 effective MB/sec 1.618488025 
red:total avg 112 
red:reducer count 20 
job:total 396 
©2014 Sqrrl Data, Inc 16
PERFORMANCE MEASUREMENT 
Baseline (naive implementation) 
Reduce Thread 
Map Thread 
Map 
Setup Map Sort 
Sort Reduce Output 
Spill Merge 
Serve 
Shuffle 
©2014 Sqrrl Data, Inc 17
PATH TO IMPROVEMENT 
1. Profiling revealed much time spent serializing/ 
deserializing Key 
2. With proper configuration, MapReduce supports 
comparison of keys in serialized form 
3. Rewriting Key’s serialization lead to an order-preserving 
encoding, easy to compare in serialized form 
4. Configure MapReduce to use native code to compare 
Keys 
5. Tweak map input size and spill memory for as few spills 
as possible 
©2014 Sqrrl Data, Inc 18
PERFORMANCE MEASUREMENT 
Optimized sorting 
• Improvements: 
• Time for map-side merge went down 
• Sort performance drastically improved in both 
map and reduce phases 
• 300% faster 
©2014 Sqrrl Data, Inc 19
PERFORMANCE MEASUREMENT 
Optimized sorting 
Reduce Thread 
Map Thread 
Map 
Setup Map Sort 
Sort Reduce Output 
Spill Merge 
Serve 
Shuffle 
Insights: 
• Map is slower than expected 
• Output is disk bound maybe we can move more processing to Reduce 
• “Reverse Amdahl’s law” 
• Intermediate data inflation ratio (output/input for map) is very high 
©2014 Sqrrl Data, Inc 20
PATH TO IMPROVEMENT 
1. Profiling revealed much time spent copying data 
2. Evaluation of data passed from map to reduce 
revealed inefficiencies: 
• Constant timestamp cost 8 bytes per key 
• Repeated column names could be encoded/ 
compressed 
• Some Key/Value pairs didn’t need to be created 
until reduce 
©2014 Sqrrl Data, Inc 21
PERFORMANCE MEASUREMENT 
Optimized map code 
• Improvement: 
• Big speedup in map function 
• Twice as fast 
• Reduced intermediate inflation sped up all 
steps between map and reduce 
©2014 Sqrrl Data, Inc 22
DO TRY THIS AT HOME 
Hints for Accumulo Application Optimization 
With these steps, we achieved 6X speedup: 
• Perform comparisons on serialized objects 
• With Map/Reduce, calculate how many merge 
steps are needed 
• Avoid premature data inflation 
• Leverage compression to shift bottlenecks 
• Always consider how fast your code should run 
©2014 Sqrrl Data, Inc 23
SOME CURRENT ACCUMULO 
PERFORMANCE PROJECTS 
• Optimize metadata operations 
• Batch to improve throughput (ACCUMULO-2175, 
ACCUMULO-2889) 
• Remove from critical path where possible 
• Optimize write-ahead log performance 
• Maximize throughput 
• Reduce flushes 
• Parallelize WALs (ACCUMULO-1083) 
• Avoid downtime by pre-allocating 
©2014 Sqrrl Data, Inc 24
Securely explore your data 
SQRRL IS HIRING! 
QUESTIONS? 
Chris McCubbin 
Director of Data Science 
Sqrrl Data, Inc.

Contenu connexe

Tendances

Splunking configfiles 20211208_daniel_wilson
Splunking configfiles 20211208_daniel_wilsonSplunking configfiles 20211208_daniel_wilson
Splunking configfiles 20211208_daniel_wilsonBecky Burwell
 
Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...
Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...
Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...Big Data Spain
 
IoT Architectural Overview - 3 use case studies from InfluxData
IoT Architectural Overview - 3 use case studies from InfluxData IoT Architectural Overview - 3 use case studies from InfluxData
IoT Architectural Overview - 3 use case studies from InfluxData InfluxData
 
Scaling Through Simplicity—How a 300 million User Chat App Reduced Data Engin...
Scaling Through Simplicity—How a 300 million User Chat App Reduced Data Engin...Scaling Through Simplicity—How a 300 million User Chat App Reduced Data Engin...
Scaling Through Simplicity—How a 300 million User Chat App Reduced Data Engin...Spark Summit
 
巨量資料入門 The evolution of data architecture
巨量資料入門 The evolution of data architecture巨量資料入門 The evolution of data architecture
巨量資料入門 The evolution of data architectureWei-Chiu Chuang
 
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...Jürgen Ambrosi
 
Big data today and tomorrow
Big data today and tomorrowBig data today and tomorrow
Big data today and tomorrowmagda3695
 
JupyterCon 2020 - Supercharging SQL Users with Jupyter Notebooks
JupyterCon 2020 - Supercharging SQL Users with Jupyter NotebooksJupyterCon 2020 - Supercharging SQL Users with Jupyter Notebooks
JupyterCon 2020 - Supercharging SQL Users with Jupyter NotebooksMichelle Ufford
 
Analyzing 1.2 Million Network Packets per Second in Real-time
Analyzing 1.2 Million Network Packets per Second in Real-timeAnalyzing 1.2 Million Network Packets per Second in Real-time
Analyzing 1.2 Million Network Packets per Second in Real-timeDataWorks Summit
 
Yahoo Enabling Exploratory Analytics of Data in Shared-service Hadoop Clusters
Yahoo Enabling Exploratory Analytics of Data in Shared-service Hadoop ClustersYahoo Enabling Exploratory Analytics of Data in Shared-service Hadoop Clusters
Yahoo Enabling Exploratory Analytics of Data in Shared-service Hadoop ClustersBrett Sheppard
 
AWS Earth and Space 2018 - Element 84 Processing and Streaming GOES-16 Data...
AWS Earth and Space 2018 -   Element 84 Processing and Streaming GOES-16 Data...AWS Earth and Space 2018 -   Element 84 Processing and Streaming GOES-16 Data...
AWS Earth and Space 2018 - Element 84 Processing and Streaming GOES-16 Data...Dan Pilone
 
Data Tools and the Data Scientist Shortage
Data Tools and the Data Scientist ShortageData Tools and the Data Scientist Shortage
Data Tools and the Data Scientist ShortageWes McKinney
 
MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (...
MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (...MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (...
MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (...DataWorks Summit
 
Detecting Hacks: Anomaly Detection on Networking Data
Detecting Hacks: Anomaly Detection on Networking DataDetecting Hacks: Anomaly Detection on Networking Data
Detecting Hacks: Anomaly Detection on Networking DataDataWorks Summit
 
Risk Management Framework Using Intel FPGA, Apache Spark, and Persistent RDDs...
Risk Management Framework Using Intel FPGA, Apache Spark, and Persistent RDDs...Risk Management Framework Using Intel FPGA, Apache Spark, and Persistent RDDs...
Risk Management Framework Using Intel FPGA, Apache Spark, and Persistent RDDs...Databricks
 
Design Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data AnalyticsDesign Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data AnalyticsDataWorks Summit
 
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo LeeData Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo LeeSpark Summit
 
Elephants in the cloud or how to become cloud ready
Elephants in the cloud or how to become cloud readyElephants in the cloud or how to become cloud ready
Elephants in the cloud or how to become cloud readyKrzysztof Adamski
 

Tendances (19)

Splunking configfiles 20211208_daniel_wilson
Splunking configfiles 20211208_daniel_wilsonSplunking configfiles 20211208_daniel_wilson
Splunking configfiles 20211208_daniel_wilson
 
Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...
Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...
Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...
 
IoT Architectural Overview - 3 use case studies from InfluxData
IoT Architectural Overview - 3 use case studies from InfluxData IoT Architectural Overview - 3 use case studies from InfluxData
IoT Architectural Overview - 3 use case studies from InfluxData
 
Scaling Through Simplicity—How a 300 million User Chat App Reduced Data Engin...
Scaling Through Simplicity—How a 300 million User Chat App Reduced Data Engin...Scaling Through Simplicity—How a 300 million User Chat App Reduced Data Engin...
Scaling Through Simplicity—How a 300 million User Chat App Reduced Data Engin...
 
巨量資料入門 The evolution of data architecture
巨量資料入門 The evolution of data architecture巨量資料入門 The evolution of data architecture
巨量資料入門 The evolution of data architecture
 
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
 
Big data today and tomorrow
Big data today and tomorrowBig data today and tomorrow
Big data today and tomorrow
 
JupyterCon 2020 - Supercharging SQL Users with Jupyter Notebooks
JupyterCon 2020 - Supercharging SQL Users with Jupyter NotebooksJupyterCon 2020 - Supercharging SQL Users with Jupyter Notebooks
JupyterCon 2020 - Supercharging SQL Users with Jupyter Notebooks
 
Analyzing 1.2 Million Network Packets per Second in Real-time
Analyzing 1.2 Million Network Packets per Second in Real-timeAnalyzing 1.2 Million Network Packets per Second in Real-time
Analyzing 1.2 Million Network Packets per Second in Real-time
 
Yahoo Enabling Exploratory Analytics of Data in Shared-service Hadoop Clusters
Yahoo Enabling Exploratory Analytics of Data in Shared-service Hadoop ClustersYahoo Enabling Exploratory Analytics of Data in Shared-service Hadoop Clusters
Yahoo Enabling Exploratory Analytics of Data in Shared-service Hadoop Clusters
 
AWS Earth and Space 2018 - Element 84 Processing and Streaming GOES-16 Data...
AWS Earth and Space 2018 -   Element 84 Processing and Streaming GOES-16 Data...AWS Earth and Space 2018 -   Element 84 Processing and Streaming GOES-16 Data...
AWS Earth and Space 2018 - Element 84 Processing and Streaming GOES-16 Data...
 
Data Tools and the Data Scientist Shortage
Data Tools and the Data Scientist ShortageData Tools and the Data Scientist Shortage
Data Tools and the Data Scientist Shortage
 
MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (...
MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (...MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (...
MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (...
 
Detecting Hacks: Anomaly Detection on Networking Data
Detecting Hacks: Anomaly Detection on Networking DataDetecting Hacks: Anomaly Detection on Networking Data
Detecting Hacks: Anomaly Detection on Networking Data
 
Risk Management Framework Using Intel FPGA, Apache Spark, and Persistent RDDs...
Risk Management Framework Using Intel FPGA, Apache Spark, and Persistent RDDs...Risk Management Framework Using Intel FPGA, Apache Spark, and Persistent RDDs...
Risk Management Framework Using Intel FPGA, Apache Spark, and Persistent RDDs...
 
Design Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data AnalyticsDesign Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data Analytics
 
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo LeeData Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
 
Elephants in the cloud or how to become cloud ready
Elephants in the cloud or how to become cloud readyElephants in the cloud or how to become cloud ready
Elephants in the cloud or how to become cloud ready
 
Druid @ branch
Druid @ branch Druid @ branch
Druid @ branch
 

En vedette

Cloudera Federal Forum 2014: EzBake, the DoDIIS App Engine
Cloudera Federal Forum 2014: EzBake, the DoDIIS App EngineCloudera Federal Forum 2014: EzBake, the DoDIIS App Engine
Cloudera Federal Forum 2014: EzBake, the DoDIIS App EngineCloudera, Inc.
 
Data Engineering: Elastic, Low-Cost Data Processing in the Cloud
Data Engineering: Elastic, Low-Cost Data Processing in the CloudData Engineering: Elastic, Low-Cost Data Processing in the Cloud
Data Engineering: Elastic, Low-Cost Data Processing in the CloudCloudera, Inc.
 
Building Web Scale Apps with Docker and Mesos by Alex Rukletsov (Mesosphere)
Building Web Scale Apps with Docker and Mesos by Alex Rukletsov (Mesosphere)Building Web Scale Apps with Docker and Mesos by Alex Rukletsov (Mesosphere)
Building Web Scale Apps with Docker and Mesos by Alex Rukletsov (Mesosphere)Docker, Inc.
 
Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
 Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ... Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...Cloudera, Inc.
 
Kudu Forrester Webinar
Kudu Forrester WebinarKudu Forrester Webinar
Kudu Forrester WebinarCloudera, Inc.
 

En vedette (6)

Cloudera Federal Forum 2014: EzBake, the DoDIIS App Engine
Cloudera Federal Forum 2014: EzBake, the DoDIIS App EngineCloudera Federal Forum 2014: EzBake, the DoDIIS App Engine
Cloudera Federal Forum 2014: EzBake, the DoDIIS App Engine
 
Iframe src
Iframe srcIframe src
Iframe src
 
Data Engineering: Elastic, Low-Cost Data Processing in the Cloud
Data Engineering: Elastic, Low-Cost Data Processing in the CloudData Engineering: Elastic, Low-Cost Data Processing in the Cloud
Data Engineering: Elastic, Low-Cost Data Processing in the Cloud
 
Building Web Scale Apps with Docker and Mesos by Alex Rukletsov (Mesosphere)
Building Web Scale Apps with Docker and Mesos by Alex Rukletsov (Mesosphere)Building Web Scale Apps with Docker and Mesos by Alex Rukletsov (Mesosphere)
Building Web Scale Apps with Docker and Mesos by Alex Rukletsov (Mesosphere)
 
Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
 Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ... Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
 
Kudu Forrester Webinar
Kudu Forrester WebinarKudu Forrester Webinar
Kudu Forrester Webinar
 

Similaire à Performance Models for Apache Accumulo

Accumulo Summit 2015: Performance Models for Apache Accumulo: The Heavy Tail ...
Accumulo Summit 2015: Performance Models for Apache Accumulo: The Heavy Tail ...Accumulo Summit 2015: Performance Models for Apache Accumulo: The Heavy Tail ...
Accumulo Summit 2015: Performance Models for Apache Accumulo: The Heavy Tail ...Accumulo Summit
 
HP - Jerome Rolia - Hadoop World 2010
HP - Jerome Rolia - Hadoop World 2010HP - Jerome Rolia - Hadoop World 2010
HP - Jerome Rolia - Hadoop World 2010Cloudera, Inc.
 
The Future of Hadoop: A deeper look at Apache Spark
The Future of Hadoop: A deeper look at Apache SparkThe Future of Hadoop: A deeper look at Apache Spark
The Future of Hadoop: A deeper look at Apache SparkCloudera, Inc.
 
DOWNSAMPLING DATA
DOWNSAMPLING DATADOWNSAMPLING DATA
DOWNSAMPLING DATAInfluxData
 
Geospatial Sensor Networks and Partitioning Data
Geospatial Sensor Networks and Partitioning DataGeospatial Sensor Networks and Partitioning Data
Geospatial Sensor Networks and Partitioning DataAlexMiowski
 
Parallel Algorithms Advantages and Disadvantages
Parallel Algorithms Advantages and DisadvantagesParallel Algorithms Advantages and Disadvantages
Parallel Algorithms Advantages and DisadvantagesMurtadha Alsabbagh
 
Data Warehouse Offload
Data Warehouse OffloadData Warehouse Offload
Data Warehouse OffloadJohn Berns
 
The Data Center and Hadoop
The Data Center and HadoopThe Data Center and Hadoop
The Data Center and HadoopMichael Zhang
 
Presentation mongo db munich
Presentation mongo db munichPresentation mongo db munich
Presentation mongo db munichMongoDB
 
Real World Performance - Data Warehouses
Real World Performance - Data WarehousesReal World Performance - Data Warehouses
Real World Performance - Data WarehousesConnor McDonald
 
Cloud Computing ...changes everything
Cloud Computing ...changes everythingCloud Computing ...changes everything
Cloud Computing ...changes everythingLew Tucker
 
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...Databricks
 
The Impact of Columnar File Formats on SQL-on-Hadoop Engine Performance: A St...
The Impact of Columnar File Formats on SQL-on-Hadoop Engine Performance: A St...The Impact of Columnar File Formats on SQL-on-Hadoop Engine Performance: A St...
The Impact of Columnar File Formats on SQL-on-Hadoop Engine Performance: A St...t_ivanov
 
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...Data Con LA
 
Track A-2 基於 Spark 的數據分析
Track A-2 基於 Spark 的數據分析Track A-2 基於 Spark 的數據分析
Track A-2 基於 Spark 的數據分析Etu Solution
 
Demonstrating 100 Gbps in and out of the Clouds
Demonstrating 100 Gbps in and out of the CloudsDemonstrating 100 Gbps in and out of the Clouds
Demonstrating 100 Gbps in and out of the CloudsIgor Sfiligoi
 
Stinger Initiative: Leveraging Hive & Yarn for High-Performance/Interactive Q...
Stinger Initiative: Leveraging Hive & Yarn for High-Performance/Interactive Q...Stinger Initiative: Leveraging Hive & Yarn for High-Performance/Interactive Q...
Stinger Initiative: Leveraging Hive & Yarn for High-Performance/Interactive Q...Caserta
 

Similaire à Performance Models for Apache Accumulo (20)

Accumulo Summit 2015: Performance Models for Apache Accumulo: The Heavy Tail ...
Accumulo Summit 2015: Performance Models for Apache Accumulo: The Heavy Tail ...Accumulo Summit 2015: Performance Models for Apache Accumulo: The Heavy Tail ...
Accumulo Summit 2015: Performance Models for Apache Accumulo: The Heavy Tail ...
 
HP - Jerome Rolia - Hadoop World 2010
HP - Jerome Rolia - Hadoop World 2010HP - Jerome Rolia - Hadoop World 2010
HP - Jerome Rolia - Hadoop World 2010
 
The Future of Hadoop: A deeper look at Apache Spark
The Future of Hadoop: A deeper look at Apache SparkThe Future of Hadoop: A deeper look at Apache Spark
The Future of Hadoop: A deeper look at Apache Spark
 
DOWNSAMPLING DATA
DOWNSAMPLING DATADOWNSAMPLING DATA
DOWNSAMPLING DATA
 
Geospatial Sensor Networks and Partitioning Data
Geospatial Sensor Networks and Partitioning DataGeospatial Sensor Networks and Partitioning Data
Geospatial Sensor Networks and Partitioning Data
 
OOW13 Exadata and ODI with Parallel
OOW13 Exadata and ODI with ParallelOOW13 Exadata and ODI with Parallel
OOW13 Exadata and ODI with Parallel
 
Parallel Algorithms Advantages and Disadvantages
Parallel Algorithms Advantages and DisadvantagesParallel Algorithms Advantages and Disadvantages
Parallel Algorithms Advantages and Disadvantages
 
Data Warehouse Offload
Data Warehouse OffloadData Warehouse Offload
Data Warehouse Offload
 
The Data Center and Hadoop
The Data Center and HadoopThe Data Center and Hadoop
The Data Center and Hadoop
 
Presentation mongo db munich
Presentation mongo db munichPresentation mongo db munich
Presentation mongo db munich
 
Real World Performance - Data Warehouses
Real World Performance - Data WarehousesReal World Performance - Data Warehouses
Real World Performance - Data Warehouses
 
Cloud Computing ...changes everything
Cloud Computing ...changes everythingCloud Computing ...changes everything
Cloud Computing ...changes everything
 
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
 
The Impact of Columnar File Formats on SQL-on-Hadoop Engine Performance: A St...
The Impact of Columnar File Formats on SQL-on-Hadoop Engine Performance: A St...The Impact of Columnar File Formats on SQL-on-Hadoop Engine Performance: A St...
The Impact of Columnar File Formats on SQL-on-Hadoop Engine Performance: A St...
 
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
 
Track A-2 基於 Spark 的數據分析
Track A-2 基於 Spark 的數據分析Track A-2 基於 Spark 的數據分析
Track A-2 基於 Spark 的數據分析
 
Spark etl
Spark etlSpark etl
Spark etl
 
The state of SQL-on-Hadoop in the Cloud
The state of SQL-on-Hadoop in the CloudThe state of SQL-on-Hadoop in the Cloud
The state of SQL-on-Hadoop in the Cloud
 
Demonstrating 100 Gbps in and out of the Clouds
Demonstrating 100 Gbps in and out of the CloudsDemonstrating 100 Gbps in and out of the Clouds
Demonstrating 100 Gbps in and out of the Clouds
 
Stinger Initiative: Leveraging Hive & Yarn for High-Performance/Interactive Q...
Stinger Initiative: Leveraging Hive & Yarn for High-Performance/Interactive Q...Stinger Initiative: Leveraging Hive & Yarn for High-Performance/Interactive Q...
Stinger Initiative: Leveraging Hive & Yarn for High-Performance/Interactive Q...
 

Plus de Sqrrl

Transitioning Government Technology
Transitioning Government TechnologyTransitioning Government Technology
Transitioning Government TechnologySqrrl
 
Leveraging Threat Intelligence to Guide Your Hunts
Leveraging Threat Intelligence to Guide Your HuntsLeveraging Threat Intelligence to Guide Your Hunts
Leveraging Threat Intelligence to Guide Your HuntsSqrrl
 
How to Hunt for Lateral Movement on Your Network
How to Hunt for Lateral Movement on Your NetworkHow to Hunt for Lateral Movement on Your Network
How to Hunt for Lateral Movement on Your NetworkSqrrl
 
Machine Learning for Incident Detection: Getting Started
Machine Learning for Incident Detection: Getting StartedMachine Learning for Incident Detection: Getting Started
Machine Learning for Incident Detection: Getting StartedSqrrl
 
Building a Next-Generation Security Operations Center (SOC)
Building a Next-Generation Security Operations Center (SOC)Building a Next-Generation Security Operations Center (SOC)
Building a Next-Generation Security Operations Center (SOC)Sqrrl
 
User and Entity Behavior Analytics using the Sqrrl Behavior Graph
User and Entity Behavior Analytics using the Sqrrl Behavior GraphUser and Entity Behavior Analytics using the Sqrrl Behavior Graph
User and Entity Behavior Analytics using the Sqrrl Behavior GraphSqrrl
 
Threat Hunting Platforms (Collaboration with SANS Institute)
Threat Hunting Platforms (Collaboration with SANS Institute)Threat Hunting Platforms (Collaboration with SANS Institute)
Threat Hunting Platforms (Collaboration with SANS Institute)Sqrrl
 
Sqrrl and IBM: Threat Hunting for QRadar Users
Sqrrl and IBM: Threat Hunting for QRadar UsersSqrrl and IBM: Threat Hunting for QRadar Users
Sqrrl and IBM: Threat Hunting for QRadar UsersSqrrl
 
Threat Hunting for Command and Control Activity
Threat Hunting for Command and Control ActivityThreat Hunting for Command and Control Activity
Threat Hunting for Command and Control ActivitySqrrl
 
Modernizing Your SOC: A CISO-led Training
Modernizing Your SOC: A CISO-led TrainingModernizing Your SOC: A CISO-led Training
Modernizing Your SOC: A CISO-led TrainingSqrrl
 
Threat Hunting vs. UEBA: Similarities, Differences, and How They Work Together
Threat Hunting vs. UEBA: Similarities, Differences, and How They Work Together Threat Hunting vs. UEBA: Similarities, Differences, and How They Work Together
Threat Hunting vs. UEBA: Similarities, Differences, and How They Work Together Sqrrl
 
The Art and Science of Alert Triage
The Art and Science of Alert TriageThe Art and Science of Alert Triage
The Art and Science of Alert TriageSqrrl
 
Reducing Mean Time to Know
Reducing Mean Time to KnowReducing Mean Time to Know
Reducing Mean Time to KnowSqrrl
 
Sqrrl Enterprise: Big Data Security Analytics Use Case
Sqrrl Enterprise: Big Data Security Analytics Use CaseSqrrl Enterprise: Big Data Security Analytics Use Case
Sqrrl Enterprise: Big Data Security Analytics Use CaseSqrrl
 
The Linked Data Advantage
The Linked Data AdvantageThe Linked Data Advantage
The Linked Data AdvantageSqrrl
 
Sqrrl Enterprise: Integrate, Explore, Analyze
Sqrrl Enterprise: Integrate, Explore, AnalyzeSqrrl Enterprise: Integrate, Explore, Analyze
Sqrrl Enterprise: Integrate, Explore, AnalyzeSqrrl
 
Sqrrl Datasheet: Cyber Hunting
Sqrrl Datasheet: Cyber HuntingSqrrl Datasheet: Cyber Hunting
Sqrrl Datasheet: Cyber HuntingSqrrl
 
Benchmarking The Apache Accumulo Distributed Key–Value Store
Benchmarking The Apache Accumulo Distributed Key–Value StoreBenchmarking The Apache Accumulo Distributed Key–Value Store
Benchmarking The Apache Accumulo Distributed Key–Value StoreSqrrl
 
Scalable Graph Clustering with Pregel
Scalable Graph Clustering with PregelScalable Graph Clustering with Pregel
Scalable Graph Clustering with PregelSqrrl
 
April 2015 Webinar: Cyber Hunting with Sqrrl
April 2015 Webinar: Cyber Hunting with SqrrlApril 2015 Webinar: Cyber Hunting with Sqrrl
April 2015 Webinar: Cyber Hunting with SqrrlSqrrl
 

Plus de Sqrrl (20)

Transitioning Government Technology
Transitioning Government TechnologyTransitioning Government Technology
Transitioning Government Technology
 
Leveraging Threat Intelligence to Guide Your Hunts
Leveraging Threat Intelligence to Guide Your HuntsLeveraging Threat Intelligence to Guide Your Hunts
Leveraging Threat Intelligence to Guide Your Hunts
 
How to Hunt for Lateral Movement on Your Network
How to Hunt for Lateral Movement on Your NetworkHow to Hunt for Lateral Movement on Your Network
How to Hunt for Lateral Movement on Your Network
 
Machine Learning for Incident Detection: Getting Started
Machine Learning for Incident Detection: Getting StartedMachine Learning for Incident Detection: Getting Started
Machine Learning for Incident Detection: Getting Started
 
Building a Next-Generation Security Operations Center (SOC)
Building a Next-Generation Security Operations Center (SOC)Building a Next-Generation Security Operations Center (SOC)
Building a Next-Generation Security Operations Center (SOC)
 
User and Entity Behavior Analytics using the Sqrrl Behavior Graph
User and Entity Behavior Analytics using the Sqrrl Behavior GraphUser and Entity Behavior Analytics using the Sqrrl Behavior Graph
User and Entity Behavior Analytics using the Sqrrl Behavior Graph
 
Threat Hunting Platforms (Collaboration with SANS Institute)
Threat Hunting Platforms (Collaboration with SANS Institute)Threat Hunting Platforms (Collaboration with SANS Institute)
Threat Hunting Platforms (Collaboration with SANS Institute)
 
Sqrrl and IBM: Threat Hunting for QRadar Users
Sqrrl and IBM: Threat Hunting for QRadar UsersSqrrl and IBM: Threat Hunting for QRadar Users
Sqrrl and IBM: Threat Hunting for QRadar Users
 
Threat Hunting for Command and Control Activity
Threat Hunting for Command and Control ActivityThreat Hunting for Command and Control Activity
Threat Hunting for Command and Control Activity
 
Modernizing Your SOC: A CISO-led Training
Modernizing Your SOC: A CISO-led TrainingModernizing Your SOC: A CISO-led Training
Modernizing Your SOC: A CISO-led Training
 
Threat Hunting vs. UEBA: Similarities, Differences, and How They Work Together
Threat Hunting vs. UEBA: Similarities, Differences, and How They Work Together Threat Hunting vs. UEBA: Similarities, Differences, and How They Work Together
Threat Hunting vs. UEBA: Similarities, Differences, and How They Work Together
 
The Art and Science of Alert Triage
The Art and Science of Alert TriageThe Art and Science of Alert Triage
The Art and Science of Alert Triage
 
Reducing Mean Time to Know
Reducing Mean Time to KnowReducing Mean Time to Know
Reducing Mean Time to Know
 
Sqrrl Enterprise: Big Data Security Analytics Use Case
Sqrrl Enterprise: Big Data Security Analytics Use CaseSqrrl Enterprise: Big Data Security Analytics Use Case
Sqrrl Enterprise: Big Data Security Analytics Use Case
 
The Linked Data Advantage
The Linked Data AdvantageThe Linked Data Advantage
The Linked Data Advantage
 
Sqrrl Enterprise: Integrate, Explore, Analyze
Sqrrl Enterprise: Integrate, Explore, AnalyzeSqrrl Enterprise: Integrate, Explore, Analyze
Sqrrl Enterprise: Integrate, Explore, Analyze
 
Sqrrl Datasheet: Cyber Hunting
Sqrrl Datasheet: Cyber HuntingSqrrl Datasheet: Cyber Hunting
Sqrrl Datasheet: Cyber Hunting
 
Benchmarking The Apache Accumulo Distributed Key–Value Store
Benchmarking The Apache Accumulo Distributed Key–Value StoreBenchmarking The Apache Accumulo Distributed Key–Value Store
Benchmarking The Apache Accumulo Distributed Key–Value Store
 
Scalable Graph Clustering with Pregel
Scalable Graph Clustering with PregelScalable Graph Clustering with Pregel
Scalable Graph Clustering with Pregel
 
April 2015 Webinar: Cyber Hunting with Sqrrl
April 2015 Webinar: Cyber Hunting with SqrrlApril 2015 Webinar: Cyber Hunting with Sqrrl
April 2015 Webinar: Cyber Hunting with Sqrrl
 

Dernier

WhatsApp Chat: 📞 8617697112 Independent Call Girls in Darjeeling
WhatsApp Chat: 📞 8617697112 Independent Call Girls in DarjeelingWhatsApp Chat: 📞 8617697112 Independent Call Girls in Darjeeling
WhatsApp Chat: 📞 8617697112 Independent Call Girls in DarjeelingNitya salvi
 
sample sample sample sample sample sample
sample sample sample sample sample samplesample sample sample sample sample sample
sample sample sample sample sample sampleCasey Keith
 
Night 7k to 12k Daman Call Girls 👉👉 8617697112⭐⭐ 100% Genuine Escort Service ...
Night 7k to 12k Daman Call Girls 👉👉 8617697112⭐⭐ 100% Genuine Escort Service ...Night 7k to 12k Daman Call Girls 👉👉 8617697112⭐⭐ 100% Genuine Escort Service ...
Night 7k to 12k Daman Call Girls 👉👉 8617697112⭐⭐ 100% Genuine Escort Service ...Nitya salvi
 
Ooty call girls 📞 8617697112 At Low Cost Cash Payment Booking
Ooty call girls 📞 8617697112 At Low Cost Cash Payment BookingOoty call girls 📞 8617697112 At Low Cost Cash Payment Booking
Ooty call girls 📞 8617697112 At Low Cost Cash Payment BookingNitya salvi
 
Darjeeling Call Girls 8250077686 Service Offer VIP Hot Model
Darjeeling Call Girls 8250077686 Service Offer VIP Hot ModelDarjeeling Call Girls 8250077686 Service Offer VIP Hot Model
Darjeeling Call Girls 8250077686 Service Offer VIP Hot ModelDeiva Sain Call Girl
 
❤Personal Contact Number Mcleodganj Call Girls 8617697112💦✅.
❤Personal Contact Number Mcleodganj Call Girls 8617697112💦✅.❤Personal Contact Number Mcleodganj Call Girls 8617697112💦✅.
❤Personal Contact Number Mcleodganj Call Girls 8617697112💦✅.Nitya salvi
 
Genuine 8250077686 Hot and Beautiful 💕 Amaravati Escorts call Girls
Genuine 8250077686 Hot and Beautiful 💕 Amaravati Escorts call GirlsGenuine 8250077686 Hot and Beautiful 💕 Amaravati Escorts call Girls
Genuine 8250077686 Hot and Beautiful 💕 Amaravati Escorts call GirlsDeiva Sain Call Girl
 
Ooty Call Girls 8250077686 Service Offer VIP Hot Model
Ooty Call Girls 8250077686 Service Offer VIP Hot ModelOoty Call Girls 8250077686 Service Offer VIP Hot Model
Ooty Call Girls 8250077686 Service Offer VIP Hot ModelDeiva Sain Call Girl
 
Hire 💕 8617697112 Reckong Peo Call Girls Service Call Girls Agency
Hire 💕 8617697112 Reckong Peo Call Girls Service Call Girls AgencyHire 💕 8617697112 Reckong Peo Call Girls Service Call Girls Agency
Hire 💕 8617697112 Reckong Peo Call Girls Service Call Girls AgencyNitya salvi
 
Tamluk ❤CALL GIRL 8617697112 ❤CALL GIRLS IN Tamluk ESCORT SERVICE❤CALL GIRL
Tamluk ❤CALL GIRL 8617697112 ❤CALL GIRLS IN Tamluk ESCORT SERVICE❤CALL GIRLTamluk ❤CALL GIRL 8617697112 ❤CALL GIRLS IN Tamluk ESCORT SERVICE❤CALL GIRL
Tamluk ❤CALL GIRL 8617697112 ❤CALL GIRLS IN Tamluk ESCORT SERVICE❤CALL GIRLNitya salvi
 
Kolkata Call Girls - 📞 8617697112 🔝 Top Class Call Girls Service Available
Kolkata Call Girls - 📞 8617697112 🔝 Top Class Call Girls Service AvailableKolkata Call Girls - 📞 8617697112 🔝 Top Class Call Girls Service Available
Kolkata Call Girls - 📞 8617697112 🔝 Top Class Call Girls Service AvailableNitya salvi
 
Top places to visit, top tourist destinations
Top places to visit, top tourist destinationsTop places to visit, top tourist destinations
Top places to visit, top tourist destinationsswarajdm34
 
Bhubaneswar Call Girls 8250077686 Service Offer VIP Hot Model
Bhubaneswar Call Girls 8250077686 Service Offer VIP Hot ModelBhubaneswar Call Girls 8250077686 Service Offer VIP Hot Model
Bhubaneswar Call Girls 8250077686 Service Offer VIP Hot ModelDeiva Sain Call Girl
 
WhatsApp Chat: 📞 8617697112 Suri Call Girls available for hotel room package
WhatsApp Chat: 📞 8617697112 Suri Call Girls available for hotel room packageWhatsApp Chat: 📞 8617697112 Suri Call Girls available for hotel room package
WhatsApp Chat: 📞 8617697112 Suri Call Girls available for hotel room packageNitya salvi
 
Genuine 9332606886 Hot and Beautiful 💕 Pune Escorts call Girls
Genuine 9332606886 Hot and Beautiful 💕 Pune Escorts call GirlsGenuine 9332606886 Hot and Beautiful 💕 Pune Escorts call Girls
Genuine 9332606886 Hot and Beautiful 💕 Pune Escorts call GirlsDeiva Sain Call Girl
 
Sample sample sample sample sample sample
Sample sample sample sample sample sampleSample sample sample sample sample sample
Sample sample sample sample sample sampleCasey Keith
 
Hire 8617697112 Call Girls Udhampur For an Amazing Night
Hire 8617697112 Call Girls Udhampur For an Amazing NightHire 8617697112 Call Girls Udhampur For an Amazing Night
Hire 8617697112 Call Girls Udhampur For an Amazing NightNitya salvi
 
IATA GEOGRAPHY AREAS in the world, HM111
IATA GEOGRAPHY AREAS in the world, HM111IATA GEOGRAPHY AREAS in the world, HM111
IATA GEOGRAPHY AREAS in the world, HM1112022472524
 
Papi kondalu Call Girls 8250077686 Service Offer VIP Hot Model
Papi kondalu Call Girls 8250077686 Service Offer VIP Hot ModelPapi kondalu Call Girls 8250077686 Service Offer VIP Hot Model
Papi kondalu Call Girls 8250077686 Service Offer VIP Hot ModelDeiva Sain Call Girl
 
Andheri East Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Andheri East Call Girls 🥰 8617370543 Service Offer VIP Hot ModelAndheri East Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Andheri East Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 

Dernier (20)

WhatsApp Chat: 📞 8617697112 Independent Call Girls in Darjeeling
WhatsApp Chat: 📞 8617697112 Independent Call Girls in DarjeelingWhatsApp Chat: 📞 8617697112 Independent Call Girls in Darjeeling
WhatsApp Chat: 📞 8617697112 Independent Call Girls in Darjeeling
 
sample sample sample sample sample sample
sample sample sample sample sample samplesample sample sample sample sample sample
sample sample sample sample sample sample
 
Night 7k to 12k Daman Call Girls 👉👉 8617697112⭐⭐ 100% Genuine Escort Service ...
Night 7k to 12k Daman Call Girls 👉👉 8617697112⭐⭐ 100% Genuine Escort Service ...Night 7k to 12k Daman Call Girls 👉👉 8617697112⭐⭐ 100% Genuine Escort Service ...
Night 7k to 12k Daman Call Girls 👉👉 8617697112⭐⭐ 100% Genuine Escort Service ...
 
Ooty call girls 📞 8617697112 At Low Cost Cash Payment Booking
Ooty call girls 📞 8617697112 At Low Cost Cash Payment BookingOoty call girls 📞 8617697112 At Low Cost Cash Payment Booking
Ooty call girls 📞 8617697112 At Low Cost Cash Payment Booking
 
Darjeeling Call Girls 8250077686 Service Offer VIP Hot Model
Darjeeling Call Girls 8250077686 Service Offer VIP Hot ModelDarjeeling Call Girls 8250077686 Service Offer VIP Hot Model
Darjeeling Call Girls 8250077686 Service Offer VIP Hot Model
 
❤Personal Contact Number Mcleodganj Call Girls 8617697112💦✅.
❤Personal Contact Number Mcleodganj Call Girls 8617697112💦✅.❤Personal Contact Number Mcleodganj Call Girls 8617697112💦✅.
❤Personal Contact Number Mcleodganj Call Girls 8617697112💦✅.
 
Genuine 8250077686 Hot and Beautiful 💕 Amaravati Escorts call Girls
Genuine 8250077686 Hot and Beautiful 💕 Amaravati Escorts call GirlsGenuine 8250077686 Hot and Beautiful 💕 Amaravati Escorts call Girls
Genuine 8250077686 Hot and Beautiful 💕 Amaravati Escorts call Girls
 
Ooty Call Girls 8250077686 Service Offer VIP Hot Model
Ooty Call Girls 8250077686 Service Offer VIP Hot ModelOoty Call Girls 8250077686 Service Offer VIP Hot Model
Ooty Call Girls 8250077686 Service Offer VIP Hot Model
 
Hire 💕 8617697112 Reckong Peo Call Girls Service Call Girls Agency
Hire 💕 8617697112 Reckong Peo Call Girls Service Call Girls AgencyHire 💕 8617697112 Reckong Peo Call Girls Service Call Girls Agency
Hire 💕 8617697112 Reckong Peo Call Girls Service Call Girls Agency
 
Tamluk ❤CALL GIRL 8617697112 ❤CALL GIRLS IN Tamluk ESCORT SERVICE❤CALL GIRL
Tamluk ❤CALL GIRL 8617697112 ❤CALL GIRLS IN Tamluk ESCORT SERVICE❤CALL GIRLTamluk ❤CALL GIRL 8617697112 ❤CALL GIRLS IN Tamluk ESCORT SERVICE❤CALL GIRL
Tamluk ❤CALL GIRL 8617697112 ❤CALL GIRLS IN Tamluk ESCORT SERVICE❤CALL GIRL
 
Kolkata Call Girls - 📞 8617697112 🔝 Top Class Call Girls Service Available
Kolkata Call Girls - 📞 8617697112 🔝 Top Class Call Girls Service AvailableKolkata Call Girls - 📞 8617697112 🔝 Top Class Call Girls Service Available
Kolkata Call Girls - 📞 8617697112 🔝 Top Class Call Girls Service Available
 
Top places to visit, top tourist destinations
Top places to visit, top tourist destinationsTop places to visit, top tourist destinations
Top places to visit, top tourist destinations
 
Bhubaneswar Call Girls 8250077686 Service Offer VIP Hot Model
Bhubaneswar Call Girls 8250077686 Service Offer VIP Hot ModelBhubaneswar Call Girls 8250077686 Service Offer VIP Hot Model
Bhubaneswar Call Girls 8250077686 Service Offer VIP Hot Model
 
WhatsApp Chat: 📞 8617697112 Suri Call Girls available for hotel room package
WhatsApp Chat: 📞 8617697112 Suri Call Girls available for hotel room packageWhatsApp Chat: 📞 8617697112 Suri Call Girls available for hotel room package
WhatsApp Chat: 📞 8617697112 Suri Call Girls available for hotel room package
 
Genuine 9332606886 Hot and Beautiful 💕 Pune Escorts call Girls
Genuine 9332606886 Hot and Beautiful 💕 Pune Escorts call GirlsGenuine 9332606886 Hot and Beautiful 💕 Pune Escorts call Girls
Genuine 9332606886 Hot and Beautiful 💕 Pune Escorts call Girls
 
Sample sample sample sample sample sample
Sample sample sample sample sample sampleSample sample sample sample sample sample
Sample sample sample sample sample sample
 
Hire 8617697112 Call Girls Udhampur For an Amazing Night
Hire 8617697112 Call Girls Udhampur For an Amazing NightHire 8617697112 Call Girls Udhampur For an Amazing Night
Hire 8617697112 Call Girls Udhampur For an Amazing Night
 
IATA GEOGRAPHY AREAS in the world, HM111
IATA GEOGRAPHY AREAS in the world, HM111IATA GEOGRAPHY AREAS in the world, HM111
IATA GEOGRAPHY AREAS in the world, HM111
 
Papi kondalu Call Girls 8250077686 Service Offer VIP Hot Model
Papi kondalu Call Girls 8250077686 Service Offer VIP Hot ModelPapi kondalu Call Girls 8250077686 Service Offer VIP Hot Model
Papi kondalu Call Girls 8250077686 Service Offer VIP Hot Model
 
Andheri East Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Andheri East Call Girls 🥰 8617370543 Service Offer VIP Hot ModelAndheri East Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Andheri East Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 

Performance Models for Apache Accumulo

  • 1. Securely explore your data PERFORMANCE MODELS FOR APACHE ACCUMULO: THE HEAVY TAIL OF A SHARED-NOTHING ARCHITECTURE Chris McCubbin Director of Data Science Sqrrl Data, Inc.
  • 2. I’M NOT ADAM FUCHS • But perhaps I’m still an interesting guy • MS in CS from UMBC in Network Security and Quantum Computing • 8 years at JHU/APL working on UxV Swarms • 4 years at JHU/APL and TexelTek creating Big Data Applications for the NSA • Co-founder and Director of Data Science at Sqrrl ©2014 Sqrrl Data, Inc 2
  • 3. SO, YOUR DISTRIBUTED APPLICATION IS SLOW • Today’s distributed applications run on tens or hundreds of library components • Many versions so internet advice could be ineffective, or worse, flat out wrong • Hundreds of settings • Some, shall we say, could be better documented • Shared-nothing architectures are usually “shared-little” architectures with tricky interactions • Profiling is hard and time-consuming • What do we do? ©2014 Sqrrl Data, Inc 3
  • 4. TODAY’S TALK 1. Quick intro to performance optimization 2. Tricks and techniques for targeted distributed application modeling performance improvement 3. A deep dive into improving bulk load application performance ©2014 Sqrrl Data, Inc 4
  • 5. The Apache Accumulo™ sorted, distributed key/value store is a secure, robust, scalable, high performance data storage and retrieval system. • Many applications in real-time storage and analysis of “big data”: • Spatio-temporal indexing in non-relational distributed databases - Fox et al 2013 IEEE International Congress on Big Data • Big Data Dimensional Analysis - Gadepally et al IEEE HPEC 2014 • Leading its peers in performance and scalability: • Achieving 100,000,000 database inserts per second using Accumulo and D4M - Kepner et al IEEE HPEC 2014 • An NSA Big Graph experiment (Technical Report NSA-RD-2013-056002v1) • Benchmarking Apache Accumulo BigData Distributed Table Store Using Its Continuous Test Suite - Sen et al 2013 IEEE International Congress on Big Data For more papers and presentations, see http://accumulo.apache.org/papers.html ©2014 Sqrrl Data, Inc 5
  • 6. SCALING UP: DIVIDE & CONQUER • Collections of KV pairs form Tables • Tables are partitioned into Tablets • Metadata tablets hold info about other tablets, forming a 3-level hierarchy • A Tablet is a unit of work for a Tablet Server Table: Adam’s Table Table: Encyclopedia Table: Foo Data Tablet -­‐∞ : thing Data Tablet thing : ∞ Data Tablet -­‐∞ : Ocelot Data Tablet Ocelot : Yak Data Tablet Yak : ∞ Data Tablet -­‐∞ to ∞ Well-­‐Known Loca9on (zookeeper) Root Tablet -­‐∞ to ∞ Metadata Tablet 2 “Encyclopedia:Ocelot” to ∞ Metadata Tablet 1 -­‐∞ to “Encyclopedia:Ocelot” ©2014 Sqrrl Data, Inc 6
  • 7. PERFORMANCE ANALYSIS CYCLE Simulate & Experiment Modify Code Analyze Start: Create Model Refine Model Outputs: Better Code + Models ©2014 Sqrrl Data, Inc 7
  • 8. MAKING A MODEL • Determine points of low-impact metrics • Add some if needed • Create parallel state machine models with components driven by these metrics • Estimate running times and bottlenecks from a-priori information and/or apply measured statistics • Focus testing on validation of the initial model and the (estimated) pain points • Apply Amdahl’s Law • Rinse, repeat ©2014 Sqrrl Data, Inc 8
  • 9. BULK INGEST OVERVIEW • Accumulo supports two mechanisms to bring data in: streaming ingest and bulk ingest. • Bulk Ingest • Goal: maximize throughput without constraining latency. • create a set of Accumulo Rfiles, then register those files with Accumulo. • RFiles are groups of sorted key-value pairs with some indexing information • MapReduce has a built-in key sorting phase: a good fit to produce RFiles ©2014 Sqrrl Data, Inc 9
  • 10. BULK INGEST MODEL 10 Map Reduce Register Time ©2014 Sqrrl Data, Inc
  • 11. BULK INGEST MODEL 11 Hypothetical Resource Usage Time • 100% CPU • 20% Disk • 0% Network • 46 seconds • 40% CPU • 100% Disk • 20% Network • 168 seconds • 10% CPU • 20% Disk • 40% Network • 17 seconds ©2014 Sqrrl Data, Inc Map Reduce Register
  • 12. INSIGHT • Spare disk here, spare CPU there – can we even out resource consumption? • Why did reduce take 168 seconds? It should be more like 40 seconds. • No clear bottleneck during registration – is there a synchronization or serialization problem? 12 Time • 100% CPU • 20% Disk • 0% Network • 46 seconds • 40% CPU • 100% Disk • 20% Network • 168 seconds • 10% CPU • 20% Disk • 40% Network • 17 seconds ©2014 Sqrrl Data, Inc Map Reduce Register
  • 13. LOOKING DEEPER: REFINED BULK INGEST MODEL Reduce Thread Map Thread 13 Map Setup Map Sort Sort Reduce Output Spill Merge Serve Shuffle Time ©2014 Sqrrl Data, Inc Parallel Latch
  • 14. BULK INGEST MODEL PREDICTIONS • We can constrain parts of the model by physical throughput limitations • Disk -> memory (100Mbps avg 7200rpm seq. read rate) • Input reader • Memory -> Disk (100Mbps) • Spill, OutputWriter • Disk -> Disk (50Mbps) • Merge • Network (Gigabit = 125Mbps) • Shuffle • And/or algorithmic limitations • Sort, (Our) Map, (Our) Reduce, SerDe ©2014 Sqrrl Data, Inc 14
  • 15. PERFORMANCE GOAL MODEL Performance goals obtained through: • Simulation of individual components • Prediction of available resources at runtime ©2014 Sqrrl Data, Inc 15
  • 16. INSTRUMENTATION application version 1.3.3 SYSTEM DATA application sha 8d17baf8 node num 1 input type arcsight yarn.nodemanager.resource.memory-mb 43008 map num containers 20 input block size 32 yarn.scheduler.minimum-allocation-mb 2048 red num containers 20 input block count 20 yarn.scheduler.maximum-allocation-mb 43008 cores physical 12 input total 672054649 yarn.app.mapreduce.am.resource.mb 2048 cores logical 24 output map 9313303723 yarn.app.mapreduce.am.command-opts -Xmx1536m disk num 8 output map:combine input records 243419324 mapreduce.map.memory.mb 2048 disk bandwidth 100 output map:combine records out 209318830 mapreduce.map.java.opts -Xmx1638m replication 1 output map:spill 7325671992 mapreduce.reduce.memory.mb 2048 monitoring TRUE output final 573802787 mapreduce.reduce.java.opts -Xmx1638m output map:combine 7301374577 mapreduce.task.io.sort.mb 100 TIME mapreduce.map.sort.spill.percent 0.8 map:setup avg 8 RATIOS mapreduce.task.io.sort.factor 10 map:map avg 12 input explosion factor 13.877904 mapreduce.reduce.shuffle.parallelcopies 5 map:sort avg 12 compression intermediate 1.003327786 mapreduce.job.reduce.slowstart.completedmaps 1 map:spill avg 12 load combiner output 0.783972562 mapreduce.map.output.compress FALSE map:spill count 7 total ratio 0.786581455 mapred.map.output.compression.codec n/a map:merge avg 46 description baseline map total 290 CONSTANTS red:shuffle avg 6 avg schema entry size (bytes) 59 red:merge avg 38 red:reduce avg 68 effective MB/sec 1.618488025 red:total avg 112 red:reducer count 20 job:total 396 ©2014 Sqrrl Data, Inc 16
  • 17. PERFORMANCE MEASUREMENT Baseline (naive implementation) Reduce Thread Map Thread Map Setup Map Sort Sort Reduce Output Spill Merge Serve Shuffle ©2014 Sqrrl Data, Inc 17
  • 18. PATH TO IMPROVEMENT 1. Profiling revealed much time spent serializing/ deserializing Key 2. With proper configuration, MapReduce supports comparison of keys in serialized form 3. Rewriting Key’s serialization lead to an order-preserving encoding, easy to compare in serialized form 4. Configure MapReduce to use native code to compare Keys 5. Tweak map input size and spill memory for as few spills as possible ©2014 Sqrrl Data, Inc 18
  • 19. PERFORMANCE MEASUREMENT Optimized sorting • Improvements: • Time for map-side merge went down • Sort performance drastically improved in both map and reduce phases • 300% faster ©2014 Sqrrl Data, Inc 19
  • 20. PERFORMANCE MEASUREMENT Optimized sorting Reduce Thread Map Thread Map Setup Map Sort Sort Reduce Output Spill Merge Serve Shuffle Insights: • Map is slower than expected • Output is disk bound maybe we can move more processing to Reduce • “Reverse Amdahl’s law” • Intermediate data inflation ratio (output/input for map) is very high ©2014 Sqrrl Data, Inc 20
  • 21. PATH TO IMPROVEMENT 1. Profiling revealed much time spent copying data 2. Evaluation of data passed from map to reduce revealed inefficiencies: • Constant timestamp cost 8 bytes per key • Repeated column names could be encoded/ compressed • Some Key/Value pairs didn’t need to be created until reduce ©2014 Sqrrl Data, Inc 21
  • 22. PERFORMANCE MEASUREMENT Optimized map code • Improvement: • Big speedup in map function • Twice as fast • Reduced intermediate inflation sped up all steps between map and reduce ©2014 Sqrrl Data, Inc 22
  • 23. DO TRY THIS AT HOME Hints for Accumulo Application Optimization With these steps, we achieved 6X speedup: • Perform comparisons on serialized objects • With Map/Reduce, calculate how many merge steps are needed • Avoid premature data inflation • Leverage compression to shift bottlenecks • Always consider how fast your code should run ©2014 Sqrrl Data, Inc 23
  • 24. SOME CURRENT ACCUMULO PERFORMANCE PROJECTS • Optimize metadata operations • Batch to improve throughput (ACCUMULO-2175, ACCUMULO-2889) • Remove from critical path where possible • Optimize write-ahead log performance • Maximize throughput • Reduce flushes • Parallelize WALs (ACCUMULO-1083) • Avoid downtime by pre-allocating ©2014 Sqrrl Data, Inc 24
  • 25. Securely explore your data SQRRL IS HIRING! QUESTIONS? Chris McCubbin Director of Data Science Sqrrl Data, Inc.