SlideShare une entreprise Scribd logo
1  sur  23
Télécharger pour lire hors ligne
Introducing Accumulo Collections:
A Practical Accumulo Interface
By Jonathan Wolff
jwolff@isentropy.com
Founder, Isentropy LLC
https://isentropy.com
Code and Documentation on Github
https://github.com/isentropy/accumulo-collections/wiki
Accumulo Needs A Practical API
● Accumulo is great under the hood, but needs a practical
interface for real-world NoSQL applications.
● Could companies use Accumulo in place of MySQL??
● Accumulo needs a layer to:
1) Handle java Object serialization locally and on tablet servers
2) Handle foreign keys/joins.
3) Abstract iterators, so that it's easy to do server-side
computations.
4) Provide a useful library of filters, transformations, aggregates.
What is Accumulo Collections?
● Accumulo Collections is a new, alternative NoSQL framework that
uses Accumulo as a backend. It abstracts powerful Accumulo
functionality in a concise java API.
● Since Accumulo is already a sorted map, java SortedMap is a
natural choice for an interface. It's already familiar to java
developers. Devs who know nothing about Accumulo can use it to
build giant, responsive NoSQL applications.
● But Accumulo Collections is more than a SortedMap
implementation...
● Many features are implemented on the tablet servers by iterators,
and wrapped in java methods. You don't need to understand
Accumulo iterators to use them.
AccumuloSortedMap wraps an
Accumulo table
● AccumuloSortedMap is a java SortedMap implementation that is backed by
an Accumulo table. It handles object serialization and foreign keys, and
abstracts powerful iterator functionality.
● Method calls derive new maps that contain transformations and aggregates.
Derived maps modify the underlying Scanner. This abstracts the concept of
iterators. Derived map methods run on-the-fly and can be chained:
// similar to SQL: WHERE timestamp BETWEEN t0 AND t1 AND rand() > .5
AccumuloSortedMap derivedMap = map.timeFilter(t0,t1).sample(0.5);
// statistical aggregate (mean, sd, n, etc) of values from key range [100,200)
StatisticalSummary stats = map.submap(100, 200).valueStats();
Each of the above methods stacks an iterator on the underlying map. The
iterators make use of SerDes to operate directly on java Objects.
Just like a standard java
SortedMap, but…
● AccumuloSortedMap returns a copy of the map value.
You must put() to save modifications.
● To use sorted map features, the SerDe used must
serialize bytes in same sort order as java Objects.
The default FixedPointSerde is suitable for most
common keys types (strings, primitives, byte[], etc).
More about SerDes later…
● Supports sizes greater than MAX_INT. See
sizeAsLong().
● Can be set to read-only. Derived map methods, which
stack scan iterators, always return read-only maps.
Use Accumulo as a SortedMap
AccumuloSortedMapFactory factory = new AccumuloSortedMapFactory(conn,"factory_name");
AccumuloSortedMap<Long,String> map = factory.makeMap("mapname");
for(long i=0; i<1000; i++){
map.put(i, "value"+i);
};
map.get(123); // equals “value123”
map.keySet().iterator().next(); // equals 0
AccumuloSortedMap submap = map.subMap(100, 150);
submap.size(); // equals 50
submap.firstKey(); // equals 100
submap.keyStats().getSum(); // equals 6225.0
for(Entry<Long,String> e : submap.entrySet()){ // iterate };
// these commands throws Exceptions. Both Maps are read-only.
map.setReadOnly(true).put(1000,”nogood”);
submap.put(1000,”nogood”);
Timestamp Features
AccumuloSortedMap makes use of Accumulo's timestamp features
and AgeOffFilter. Each map entry has an insert timestamp:
long insertTimestamp = map.getTimestamp(key);
Can filter map by timestamp. Implemented on tablet servers.
AccumuloSortedMap timeFiltered = map.timeFilter(fromTs, toTs);
Can set an entry TTL in ms. Implemented on tablet servers. Timed
out entries are wiped during compaction:
map.setTimeOutMs(5000);
Filter Entries by Regex
A bundled iterator filters entries on tablet servers by
comparing key.toString() and value.toString() to regexs. To
filter all keys that match “a(b|c)”:
map.put(“ac”,”1”);
map.put(“ax”,”2”);
map.put(“ab”,”3”);
// has only 1st and 3rd entries:
AccumuloSortedMap filtered = map.regexKeyFilter(“a(b|c)”);
Sampling and Partitioning Features
● AccumuloSortedMap supports sampling and partitioning on the tablet
servers using the supplied SamplingFilter (Accumulo iterator).
● You can derive a map that is a random sample:
AccumuloSortedMap sampleSubmap = map.sample(0.5);
● Or you can define a Sampler which will “freeze” a fixed subsample:
Sampler s = new Sampler(“my_sample_seed”,0.0,0.1,fromTs, toTs);
AccumuloSortedMap frozenSample = map.sample(s);
● When you supply a sample_seed, you define an ordering of the
keys by hash(sample_seed + key bytes). The same hash range
within that ordering will produce the same sample. The fractions
indicate the hash range.
Map Aggregates Computed on
Tablet Servers
● Aggregate functions are implemented using iterators
that calculate aggregate quantities over the entire
tablet server. The results are then combined locally.
● Similar to MapReduce with # mappers = # tservers
and # reducers = 1.
● Examples of built-in aggregate methods : size(),
checksum(), keyStats(), valueStats()
Efficient One-to-Many Mapping
● AccumuloSortedMap can be configured to allow multiple
values per key.
● Works by changing the VersioningIterator settings.
● SortedMap functions still work and see only the latest value.
● Extra methods give iterators over multiple values:
– Iterator<V> getAll(Object key)
– Iterator<Entry<K,V>> multiEntryIterator()
● All values for a given key will be stored on the same tablet
server. This enables server-side per-row aggregates. Like
SQL GROUP BY.
One-to-Many Example
map.setMaxValuesPerKey(-1); // unlimited
map.put(1, 2);
map.put(1, 3);
map.put(1, 4);
map.put(2, 22);
AccumuloSortedMap<Number, StatisticalSummary> row_stats = map.rowStats();
StatisticalSummary row1= map.row_stats.get(1);
row1.getMean(); // =3.0;
row1.getMax(); // = 4.0
// count multiple values
sizeAsLong(true); // = 4
//sum all values, looking at 1 value per key. 4 +22
map.valueStats().getSum(); // = 26.0
//sum all values, looking at multiple values per key. 2+3+4+22
map.valueStats(true).getSum(); // = 31
Writing Custom Transformations and
Aggregates
● Accumulo Collections provides useful abstract iterators
that operate on deserialized java Objects.
– Iterators are passed the SerDe classnames so that they
can read the deserialized Objects.
● You can extends these iterators to implement your own
transformations and aggregates. The API is very simple:
abstract Object transformValue(Object k, Object v);
abstract boolean allow(Object k, Object v);
Example: Custom Javascript
Tranformation
As an example of custom transformations, consider
ScriptTransformingIterator in the “experimental” package. You can pass
javaScript code, which is interpreted on the tablet servers. The key and
value bind to javaScript variables “k” and “v”. For example:
Allow only entries with even keys:
AccumuloSortedMap evens = map.jsFilter("k % 2 == 0");
Map of key → 3*value:
AccumuloSortedMap tripled = map.jsTransform(" 3*v ");
These examples work on keys and values that are java Numbers. Other
javascript functions also work on Strings, java Maps, etc.
Foreign Keys
Accumulo Collections provides a serializable ForeignKey Object which is
like a symbolic link that points to a map plus a key. There is no integrity
checking of the link:
map1.put("key1", "value1");
ForeignKey fk_to_key1 = map1.makeForeignKey("key1");
map2.put("key2", fk_to_key1);
// both equals "value1"
fk_to_key1.resolve(conn);
map2.get("key2").resolve(conn);
Using AccumuloSortedMapFactory
● The map factory is the preferred way to construct
AccumuloSortedMaps. The factory is itself a map
of (map name→ map metadata) with default
settings. The factory:
– acts as a namespace, mapping map names to real
Accumulo table names.
– Configures SerDes.
– Configures other metadata like
max_values_per_key.
Factory Example
AccumuloSortedMapFactory factory;
AccumuloSortedMap map;
factory = new AccumuloSortedMapFactory(conn,“factory_table”);
// 10 values per key default for all maps
factory.addDefaultProperty(MAP_PROPERTY_VALUES_PER_KEY , ”10” );
// 5000ms timeout in map “mymap”
factory.addMapSpecificProperty(“mymap”, MAP_PROPERTY_TTL, ”5000”);
map = factory.makeMap(“mymap”);
More about SerDes
● Accumulo uses BytesWritable.compareTo() to
compare keys on the tablet servers.
– No way to set alternate comparator (?)
● Keys must be serialized in such a way that byte
sort order is same as java sort order.
● FixedPointSerde, the default SerDe, writes
Numbers in fixed point unsigned format so that
numerical comparison works. Other Objects are
java serialized.
Bulk Import, Saving Dervied Maps
● The putAll and importAll methods in AccumuloSortedMap batch
writes to Accumulo, unlike put(). You can save a derived map using
putAll:
map.putAll(someOtherMap);
● importAll() is like putAll, but take an Iterator as an argument. This
can be used to import entries from other sources, like input streams
and files.
map.importAll(new TsvInputStreamIterator(“importfile.tsv”));
● Aside from batching, putAll() and importAll() do not do anything
special on the tablet servers. The import data all passes through the
local machine to Accumulo. The optional KeyValueTransformer runs
locally.
Benchmarks
● I benchmarked Accumulo Collections against raw
Accumulo read/writes on a toy Accumulo cluster
running in Docker. All the moving parts of a real
cluster, but running on one machine.
● All tests so far indicate that Accumulo Collections
adds very little overhead (~10%) to normal
Accumulo operation.
● I would appreciate it if someone sends me
benchmarks from a proper cluster!
Benchmark Data
read
write batched
write unbatched
0 2 4 6 8 10 12 14 16 18
Raw Accumulo vs Accumulo Collections
median time in ms, 10000 operations
raw
Acc Collections
median time (ms)
Performance Tips
● Batched writes are much faster. Use putAll() and
importAll() in place of put() when possible.
– Write your changes locally to a memory-based
Map, then store in bulk with putAll().
● Iterating over a range is much faster than lots of
individual get() calls.
– If you need to do lots of get() calls over a small
submap, you can cache a map locally in memory
with the localCopy() method.
Contact Info
● I'm available for hire. You can email me at
jwolff@isentropy.com. My consulting company,
Isentropy, is online at https://isentropy.com .
● Accumulo Collections is available on Github at
https://github.com/isentropy/accumulo-collections
● Constructive questions and comments welcome.

Contenu connexe

Tendances

Hadoop introduction 2
Hadoop introduction 2Hadoop introduction 2
Hadoop introduction 2
Tianwei Liu
 
EDF2012 Kostas Tzouma - Linking and analyzing bigdata - Stratosphere
EDF2012   Kostas Tzouma - Linking and analyzing bigdata - StratosphereEDF2012   Kostas Tzouma - Linking and analyzing bigdata - Stratosphere
EDF2012 Kostas Tzouma - Linking and analyzing bigdata - Stratosphere
European Data Forum
 
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Deanna Kosaraju
 
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop ClustersHDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
Xiao Qin
 
Large Scale Data Analysis with Map/Reduce, part I
Large Scale Data Analysis with Map/Reduce, part ILarge Scale Data Analysis with Map/Reduce, part I
Large Scale Data Analysis with Map/Reduce, part I
Marin Dimitrov
 

Tendances (20)

HP - Jerome Rolia - Hadoop World 2010
HP - Jerome Rolia - Hadoop World 2010HP - Jerome Rolia - Hadoop World 2010
HP - Jerome Rolia - Hadoop World 2010
 
Mapreduce by examples
Mapreduce by examplesMapreduce by examples
Mapreduce by examples
 
Hadoop introduction 2
Hadoop introduction 2Hadoop introduction 2
Hadoop introduction 2
 
EDF2012 Kostas Tzouma - Linking and analyzing bigdata - Stratosphere
EDF2012   Kostas Tzouma - Linking and analyzing bigdata - StratosphereEDF2012   Kostas Tzouma - Linking and analyzing bigdata - Stratosphere
EDF2012 Kostas Tzouma - Linking and analyzing bigdata - Stratosphere
 
Map Reduce Online
Map Reduce OnlineMap Reduce Online
Map Reduce Online
 
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
 
MapReduce
MapReduceMapReduce
MapReduce
 
Hopsworks - ExtremeEarth Open Workshop
Hopsworks - ExtremeEarth Open WorkshopHopsworks - ExtremeEarth Open Workshop
Hopsworks - ExtremeEarth Open Workshop
 
Finalprojectpresentation
FinalprojectpresentationFinalprojectpresentation
Finalprojectpresentation
 
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop ClustersHDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
 
Large Scale Data Analysis with Map/Reduce, part I
Large Scale Data Analysis with Map/Reduce, part ILarge Scale Data Analysis with Map/Reduce, part I
Large Scale Data Analysis with Map/Reduce, part I
 
A time energy performance analysis of map reduce on heterogeneous systems wit...
A time energy performance analysis of map reduce on heterogeneous systems wit...A time energy performance analysis of map reduce on heterogeneous systems wit...
A time energy performance analysis of map reduce on heterogeneous systems wit...
 
MapReduce and Hadoop
MapReduce and HadoopMapReduce and Hadoop
MapReduce and Hadoop
 
Hadoop 2
Hadoop 2Hadoop 2
Hadoop 2
 
Hadoop 3
Hadoop 3Hadoop 3
Hadoop 3
 
A Scalable Hierarchical Clustering Algorithm Using Spark: Spark Summit East t...
A Scalable Hierarchical Clustering Algorithm Using Spark: Spark Summit East t...A Scalable Hierarchical Clustering Algorithm Using Spark: Spark Summit East t...
A Scalable Hierarchical Clustering Algorithm Using Spark: Spark Summit East t...
 
The Pregel Programming Model with Spark GraphX
The Pregel Programming Model with Spark GraphXThe Pregel Programming Model with Spark GraphX
The Pregel Programming Model with Spark GraphX
 
Introduction to Map Reduce
Introduction to Map ReduceIntroduction to Map Reduce
Introduction to Map Reduce
 
HDFS-HC2: Analysis of Data Placement Strategy based on Computing Power of Nod...
HDFS-HC2: Analysis of Data Placement Strategy based on Computing Power of Nod...HDFS-HC2: Analysis of Data Placement Strategy based on Computing Power of Nod...
HDFS-HC2: Analysis of Data Placement Strategy based on Computing Power of Nod...
 
Hadoop Map Reduce
Hadoop Map ReduceHadoop Map Reduce
Hadoop Map Reduce
 

Similaire à Accumulo Summit 2016: Introducing Accumulo Collections: A Practical Accumulo Interface

Distributed Radar Tracking Simulation Project
Distributed Radar Tracking Simulation ProjectDistributed Radar Tracking Simulation Project
Distributed Radar Tracking Simulation Project
Assignmentpedia
 
Distributed Radar Tracking Simulation Project
Distributed Radar Tracking Simulation ProjectDistributed Radar Tracking Simulation Project
Distributed Radar Tracking Simulation Project
Assignmentpedia
 

Similaire à Accumulo Summit 2016: Introducing Accumulo Collections: A Practical Accumulo Interface (20)

Amazon elastic map reduce
Amazon elastic map reduceAmazon elastic map reduce
Amazon elastic map reduce
 
Best practices in Java
Best practices in JavaBest practices in Java
Best practices in Java
 
ECMAScript 6 Review
ECMAScript 6 ReviewECMAScript 6 Review
ECMAScript 6 Review
 
Fosdem2017 Scientific computing on Jruby
Fosdem2017  Scientific computing on JrubyFosdem2017  Scientific computing on Jruby
Fosdem2017 Scientific computing on Jruby
 
Anais Dotis-Georgiou & Faith Chikwekwe [InfluxData] | Top 10 Hurdles for Flux...
Anais Dotis-Georgiou & Faith Chikwekwe [InfluxData] | Top 10 Hurdles for Flux...Anais Dotis-Georgiou & Faith Chikwekwe [InfluxData] | Top 10 Hurdles for Flux...
Anais Dotis-Georgiou & Faith Chikwekwe [InfluxData] | Top 10 Hurdles for Flux...
 
Cascading talk in Etsy (http://www.meetup.com/cascading/events/169390262/)
Cascading talk in Etsy (http://www.meetup.com/cascading/events/169390262/)Cascading talk in Etsy (http://www.meetup.com/cascading/events/169390262/)
Cascading talk in Etsy (http://www.meetup.com/cascading/events/169390262/)
 
Distributed Radar Tracking Simulation Project
Distributed Radar Tracking Simulation ProjectDistributed Radar Tracking Simulation Project
Distributed Radar Tracking Simulation Project
 
Distributed Radar Tracking Simulation Project
Distributed Radar Tracking Simulation ProjectDistributed Radar Tracking Simulation Project
Distributed Radar Tracking Simulation Project
 
Lambdas puzzler - Peter Lawrey
Lambdas puzzler - Peter LawreyLambdas puzzler - Peter Lawrey
Lambdas puzzler - Peter Lawrey
 
Hadoop map reduce in operation
Hadoop map reduce in operationHadoop map reduce in operation
Hadoop map reduce in operation
 
Sqlapi0.1
Sqlapi0.1Sqlapi0.1
Sqlapi0.1
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
Getting started with ES6 : Future of javascript
Getting started with ES6 : Future of javascriptGetting started with ES6 : Future of javascript
Getting started with ES6 : Future of javascript
 
Stata Programming Cheat Sheet
Stata Programming Cheat SheetStata Programming Cheat Sheet
Stata Programming Cheat Sheet
 
Unit 3
Unit 3 Unit 3
Unit 3
 
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
 
Java 8
Java 8Java 8
Java 8
 
Beyond Map/Reduce: Getting Creative With Parallel Processing
Beyond Map/Reduce: Getting Creative With Parallel ProcessingBeyond Map/Reduce: Getting Creative With Parallel Processing
Beyond Map/Reduce: Getting Creative With Parallel Processing
 
Gephi Toolkit Tutorial
Gephi Toolkit TutorialGephi Toolkit Tutorial
Gephi Toolkit Tutorial
 

Dernier

➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
amitlee9823
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
karishmasinghjnh
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
gajnagarg
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
amitlee9823
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
amitlee9823
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
amitlee9823
 
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
gajnagarg
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Dernier (20)

➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 

Accumulo Summit 2016: Introducing Accumulo Collections: A Practical Accumulo Interface

  • 1. Introducing Accumulo Collections: A Practical Accumulo Interface By Jonathan Wolff jwolff@isentropy.com Founder, Isentropy LLC https://isentropy.com Code and Documentation on Github https://github.com/isentropy/accumulo-collections/wiki
  • 2. Accumulo Needs A Practical API ● Accumulo is great under the hood, but needs a practical interface for real-world NoSQL applications. ● Could companies use Accumulo in place of MySQL?? ● Accumulo needs a layer to: 1) Handle java Object serialization locally and on tablet servers 2) Handle foreign keys/joins. 3) Abstract iterators, so that it's easy to do server-side computations. 4) Provide a useful library of filters, transformations, aggregates.
  • 3. What is Accumulo Collections? ● Accumulo Collections is a new, alternative NoSQL framework that uses Accumulo as a backend. It abstracts powerful Accumulo functionality in a concise java API. ● Since Accumulo is already a sorted map, java SortedMap is a natural choice for an interface. It's already familiar to java developers. Devs who know nothing about Accumulo can use it to build giant, responsive NoSQL applications. ● But Accumulo Collections is more than a SortedMap implementation... ● Many features are implemented on the tablet servers by iterators, and wrapped in java methods. You don't need to understand Accumulo iterators to use them.
  • 4. AccumuloSortedMap wraps an Accumulo table ● AccumuloSortedMap is a java SortedMap implementation that is backed by an Accumulo table. It handles object serialization and foreign keys, and abstracts powerful iterator functionality. ● Method calls derive new maps that contain transformations and aggregates. Derived maps modify the underlying Scanner. This abstracts the concept of iterators. Derived map methods run on-the-fly and can be chained: // similar to SQL: WHERE timestamp BETWEEN t0 AND t1 AND rand() > .5 AccumuloSortedMap derivedMap = map.timeFilter(t0,t1).sample(0.5); // statistical aggregate (mean, sd, n, etc) of values from key range [100,200) StatisticalSummary stats = map.submap(100, 200).valueStats(); Each of the above methods stacks an iterator on the underlying map. The iterators make use of SerDes to operate directly on java Objects.
  • 5. Just like a standard java SortedMap, but… ● AccumuloSortedMap returns a copy of the map value. You must put() to save modifications. ● To use sorted map features, the SerDe used must serialize bytes in same sort order as java Objects. The default FixedPointSerde is suitable for most common keys types (strings, primitives, byte[], etc). More about SerDes later… ● Supports sizes greater than MAX_INT. See sizeAsLong(). ● Can be set to read-only. Derived map methods, which stack scan iterators, always return read-only maps.
  • 6. Use Accumulo as a SortedMap AccumuloSortedMapFactory factory = new AccumuloSortedMapFactory(conn,"factory_name"); AccumuloSortedMap<Long,String> map = factory.makeMap("mapname"); for(long i=0; i<1000; i++){ map.put(i, "value"+i); }; map.get(123); // equals “value123” map.keySet().iterator().next(); // equals 0 AccumuloSortedMap submap = map.subMap(100, 150); submap.size(); // equals 50 submap.firstKey(); // equals 100 submap.keyStats().getSum(); // equals 6225.0 for(Entry<Long,String> e : submap.entrySet()){ // iterate }; // these commands throws Exceptions. Both Maps are read-only. map.setReadOnly(true).put(1000,”nogood”); submap.put(1000,”nogood”);
  • 7. Timestamp Features AccumuloSortedMap makes use of Accumulo's timestamp features and AgeOffFilter. Each map entry has an insert timestamp: long insertTimestamp = map.getTimestamp(key); Can filter map by timestamp. Implemented on tablet servers. AccumuloSortedMap timeFiltered = map.timeFilter(fromTs, toTs); Can set an entry TTL in ms. Implemented on tablet servers. Timed out entries are wiped during compaction: map.setTimeOutMs(5000);
  • 8. Filter Entries by Regex A bundled iterator filters entries on tablet servers by comparing key.toString() and value.toString() to regexs. To filter all keys that match “a(b|c)”: map.put(“ac”,”1”); map.put(“ax”,”2”); map.put(“ab”,”3”); // has only 1st and 3rd entries: AccumuloSortedMap filtered = map.regexKeyFilter(“a(b|c)”);
  • 9. Sampling and Partitioning Features ● AccumuloSortedMap supports sampling and partitioning on the tablet servers using the supplied SamplingFilter (Accumulo iterator). ● You can derive a map that is a random sample: AccumuloSortedMap sampleSubmap = map.sample(0.5); ● Or you can define a Sampler which will “freeze” a fixed subsample: Sampler s = new Sampler(“my_sample_seed”,0.0,0.1,fromTs, toTs); AccumuloSortedMap frozenSample = map.sample(s); ● When you supply a sample_seed, you define an ordering of the keys by hash(sample_seed + key bytes). The same hash range within that ordering will produce the same sample. The fractions indicate the hash range.
  • 10. Map Aggregates Computed on Tablet Servers ● Aggregate functions are implemented using iterators that calculate aggregate quantities over the entire tablet server. The results are then combined locally. ● Similar to MapReduce with # mappers = # tservers and # reducers = 1. ● Examples of built-in aggregate methods : size(), checksum(), keyStats(), valueStats()
  • 11. Efficient One-to-Many Mapping ● AccumuloSortedMap can be configured to allow multiple values per key. ● Works by changing the VersioningIterator settings. ● SortedMap functions still work and see only the latest value. ● Extra methods give iterators over multiple values: – Iterator<V> getAll(Object key) – Iterator<Entry<K,V>> multiEntryIterator() ● All values for a given key will be stored on the same tablet server. This enables server-side per-row aggregates. Like SQL GROUP BY.
  • 12. One-to-Many Example map.setMaxValuesPerKey(-1); // unlimited map.put(1, 2); map.put(1, 3); map.put(1, 4); map.put(2, 22); AccumuloSortedMap<Number, StatisticalSummary> row_stats = map.rowStats(); StatisticalSummary row1= map.row_stats.get(1); row1.getMean(); // =3.0; row1.getMax(); // = 4.0 // count multiple values sizeAsLong(true); // = 4 //sum all values, looking at 1 value per key. 4 +22 map.valueStats().getSum(); // = 26.0 //sum all values, looking at multiple values per key. 2+3+4+22 map.valueStats(true).getSum(); // = 31
  • 13. Writing Custom Transformations and Aggregates ● Accumulo Collections provides useful abstract iterators that operate on deserialized java Objects. – Iterators are passed the SerDe classnames so that they can read the deserialized Objects. ● You can extends these iterators to implement your own transformations and aggregates. The API is very simple: abstract Object transformValue(Object k, Object v); abstract boolean allow(Object k, Object v);
  • 14. Example: Custom Javascript Tranformation As an example of custom transformations, consider ScriptTransformingIterator in the “experimental” package. You can pass javaScript code, which is interpreted on the tablet servers. The key and value bind to javaScript variables “k” and “v”. For example: Allow only entries with even keys: AccumuloSortedMap evens = map.jsFilter("k % 2 == 0"); Map of key → 3*value: AccumuloSortedMap tripled = map.jsTransform(" 3*v "); These examples work on keys and values that are java Numbers. Other javascript functions also work on Strings, java Maps, etc.
  • 15. Foreign Keys Accumulo Collections provides a serializable ForeignKey Object which is like a symbolic link that points to a map plus a key. There is no integrity checking of the link: map1.put("key1", "value1"); ForeignKey fk_to_key1 = map1.makeForeignKey("key1"); map2.put("key2", fk_to_key1); // both equals "value1" fk_to_key1.resolve(conn); map2.get("key2").resolve(conn);
  • 16. Using AccumuloSortedMapFactory ● The map factory is the preferred way to construct AccumuloSortedMaps. The factory is itself a map of (map name→ map metadata) with default settings. The factory: – acts as a namespace, mapping map names to real Accumulo table names. – Configures SerDes. – Configures other metadata like max_values_per_key.
  • 17. Factory Example AccumuloSortedMapFactory factory; AccumuloSortedMap map; factory = new AccumuloSortedMapFactory(conn,“factory_table”); // 10 values per key default for all maps factory.addDefaultProperty(MAP_PROPERTY_VALUES_PER_KEY , ”10” ); // 5000ms timeout in map “mymap” factory.addMapSpecificProperty(“mymap”, MAP_PROPERTY_TTL, ”5000”); map = factory.makeMap(“mymap”);
  • 18. More about SerDes ● Accumulo uses BytesWritable.compareTo() to compare keys on the tablet servers. – No way to set alternate comparator (?) ● Keys must be serialized in such a way that byte sort order is same as java sort order. ● FixedPointSerde, the default SerDe, writes Numbers in fixed point unsigned format so that numerical comparison works. Other Objects are java serialized.
  • 19. Bulk Import, Saving Dervied Maps ● The putAll and importAll methods in AccumuloSortedMap batch writes to Accumulo, unlike put(). You can save a derived map using putAll: map.putAll(someOtherMap); ● importAll() is like putAll, but take an Iterator as an argument. This can be used to import entries from other sources, like input streams and files. map.importAll(new TsvInputStreamIterator(“importfile.tsv”)); ● Aside from batching, putAll() and importAll() do not do anything special on the tablet servers. The import data all passes through the local machine to Accumulo. The optional KeyValueTransformer runs locally.
  • 20. Benchmarks ● I benchmarked Accumulo Collections against raw Accumulo read/writes on a toy Accumulo cluster running in Docker. All the moving parts of a real cluster, but running on one machine. ● All tests so far indicate that Accumulo Collections adds very little overhead (~10%) to normal Accumulo operation. ● I would appreciate it if someone sends me benchmarks from a proper cluster!
  • 21. Benchmark Data read write batched write unbatched 0 2 4 6 8 10 12 14 16 18 Raw Accumulo vs Accumulo Collections median time in ms, 10000 operations raw Acc Collections median time (ms)
  • 22. Performance Tips ● Batched writes are much faster. Use putAll() and importAll() in place of put() when possible. – Write your changes locally to a memory-based Map, then store in bulk with putAll(). ● Iterating over a range is much faster than lots of individual get() calls. – If you need to do lots of get() calls over a small submap, you can cache a map locally in memory with the localCopy() method.
  • 23. Contact Info ● I'm available for hire. You can email me at jwolff@isentropy.com. My consulting company, Isentropy, is online at https://isentropy.com . ● Accumulo Collections is available on Github at https://github.com/isentropy/accumulo-collections ● Constructive questions and comments welcome.