SlideShare une entreprise Scribd logo
1  sur  13
Apache Accumulo 1.8.0
Overview
Josh Elser
Apache Accumulo Meetup Group
2016/06/27
2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Accumulo 1.8.0
 First release candidate in the works
 A “minor” release, but significantly more work required than a “patch” release
– ContinuousIngest and verification
– RandomWalk
 Long time coming..
3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Semantic Versioning
 Defines a set of rules for software projects to adhere to across different versions.
 Clear understanding on compatibility
 Rules are defined in terms of a “public API”
– Defined by the project adopting SemVer
 Major
– Incompatible changes, deprecations removed
 Minor
– Backwards-compatible features added
 Patch
– Backwards-compatible bug-fixes only (no features)
http://semver.org - major.minor.patch
4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Accumulo and Semantic Versioning
 Apache Accumulo defines a public API
– Made up of Java classes, defined by packages
– The goal is to describe how user code should function across releases
– Recursively, all public types in (excluding impl, thrift, or crypto)
• org.apache.accumulo.core.{client,data,security}
• org.apache.accumulo.minicluster
 Other concerns for compatibility too
– RPC classes
– Persistent data (RFiles and ZooKeeper)
 Not comprehensive!
– Not all user facing code is yet included in the public API
• Monitoring UIs and data
• Start/stop scripts
• The Accumulo Shell
5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Accumulo and Semantic Versioning
 Is it guaranteed that your application from 1.7.1 work against 1.8.0?
 What about a 1.6.5 application?
 Are you guaranteed to be able to roll back an upgrade from 1.8.0 to 1.7.1?
 Is it guaranteed that your 1.8.0 application work against 1.7.0?
POP QUIZ!
6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Notable changes currently
staged for Apache Accumulo 1.8.0
7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
System Administrator Changes
 [ACCUMULO-925] - Launch scripts should use a PIDfile
– New script: start-daemon.sh
– Encapsulates only the things that need to happen on the machine starting a process
• No SSH’ing
– Support for PID files to track processes
– Rotating .out and .err files on start
• Critical for delayed JVM layer issues
8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Performance!
 [ACCUMULO-3423] - Speed up write-ahead log (WAL) roll-overs
– Changes how references to WALs are stored by Accumulo
– Reduces the number of writes when switching to a new WAL
– Uses ZooKeeper to track the state, copies into tablet row before recovery starts
– 10-30% faster over previous implementation (while exacerbating the problem)
 [ACCUMULO-1124] - Optimize index size in RFile
– RFiles have “data” and “index” blocks; index from RowID to data block containing that RowID
– Large RowIDs bloat the index (e.g. inverted URL)
– Fewer index blocks can be cached
– Related work: [ACCUMULO-4164] and [ACCUMULO-4314]
9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
New Features
 [ACCUMULO-3913] - Add per table sampling
– Helpful in running analytics over some percentage of the total data
– Can automatically create samples during compaction or on the fly using Iterators
– Configurable hashing to ensure consistency across “index” and “data” tables
• No dangling references index records or unreachable data records
– Consider snapshot’ing a sample of a table. After compaction, just a “normal” table
 [ACCUMULO-4187] - Rate limiting of major compactions
– Compactions can strain system resources: hardware, JVM and HDFS
– Normally, desirable to process compactions as fast as possible
– Can negatively affect low-latency workloads
– Configure a limit in bytes per seconds that a TabletServer should process during compaction
10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
New Features (pt.2)
 [ACCUMULO-3948] - Enable A/B testing of scan iterators on a table
– Classpath context is a definition of JARs which the TabletServer should dynamically load
– Configuration allows a context to be specified when using a [Batch]Scanner
– Multiple implementations of the same SKVIterator classes can co-exist
– Useful in testing new implementations of iterators on real data before switching production
 [ACCUMULO-626] - Create an iterator fuzz tester
– Writing SKVIterators is notoriously difficult
– Many common pitfalls and gotcha’s, often not appearing until “real” use
– A testing framework codifies these edge cases and can automatically test iterators
• Similar to ”security fuzzing”
– Users must provide data sets and the expected outcome from using their SKVIterator
– A supplement to unit testing and MiniAccumuloCluster, not a replacement
– Test cases implicitly encourage good design of SKVIterators
11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
New APIs
 [ACCUMULO-2883] - Add API to fetch current locations of Tablets
– Long-standing feature request (order of years)
– Extremely useful for distributed execution engines for locality aware computation
• Apache Hive, Presto, Apache Drill, Apache Spark, etc
– Smart placement can reduce client <--> Accumulo network traffic
• Locality with Accumulo Tablets also implies locality with HDFS data (over time)
 [ACCUMULO-4165] - Create a user level API for RFile
– Example of a “glaring” hole in the public API
– Only stable way to create an RFile is via MapReduce
– Provides a supported API for reading and writing RFiles
– Simplifies implementation and use of RFile access internally too
12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Changes to be wary of
 [ACCUMULO-3409] - Move default ports out of ephemeral range
– Traditional ephemeral range on Linux: [32768, 61000]
– Transient connections can prevent processes from starting
– Monitor HTTP port moves from 50095 to 9995
 [ACCUMULO-4077] - Upgrade to Apache Thrift 0.9.3
– Thrift is used by Accumulo for RPCs
– Serialized messages are compatible (with caveats) across releases, but Java classes are not
– A massive pain for downstream integrations
– If you require a different version of Thrift and want to use Accumulo 1.8.0
• Shade+Relocate your version of Thrift in your application
• Upgrade to Apache Thrift 0.9.3
13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Thank You
Email: elserj@apache.org
Twitter: @josh_elser
Mailing list: dev@accumulo.apache.org

Contenu connexe

Tendances

Real-Time Ingesting and Transforming Sensor Data and Social Data with NiFi an...
Real-Time Ingesting and Transforming Sensor Data and Social Data with NiFi an...Real-Time Ingesting and Transforming Sensor Data and Social Data with NiFi an...
Real-Time Ingesting and Transforming Sensor Data and Social Data with NiFi an...
DataWorks Summit
 

Tendances (20)

Streamline Hadoop DevOps with Apache Ambari
Streamline Hadoop DevOps with Apache AmbariStreamline Hadoop DevOps with Apache Ambari
Streamline Hadoop DevOps with Apache Ambari
 
Hive ACID Apache BigData 2016
Hive ACID Apache BigData 2016Hive ACID Apache BigData 2016
Hive ACID Apache BigData 2016
 
An Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseAn Apache Hive Based Data Warehouse
An Apache Hive Based Data Warehouse
 
An Overview on Optimization in Apache Hive: Past, Present, Future
An Overview on Optimization in Apache Hive: Past, Present, FutureAn Overview on Optimization in Apache Hive: Past, Present, Future
An Overview on Optimization in Apache Hive: Past, Present, Future
 
Apache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, ScaleApache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, Scale
 
Hive present-and-feature-shanghai
Hive present-and-feature-shanghaiHive present-and-feature-shanghai
Hive present-and-feature-shanghai
 
Apache Ambari - What's New in 2.4
Apache Ambari - What's New in 2.4 Apache Ambari - What's New in 2.4
Apache Ambari - What's New in 2.4
 
Practical Kerberos with Apache HBase
Practical Kerberos with Apache HBasePractical Kerberos with Apache HBase
Practical Kerberos with Apache HBase
 
Apache Ambari: Past, Present, Future
Apache Ambari: Past, Present, FutureApache Ambari: Past, Present, Future
Apache Ambari: Past, Present, Future
 
Apache Ambari - HDP Cluster Upgrades Operational Deep Dive and Troubleshooting
Apache Ambari - HDP Cluster Upgrades Operational Deep Dive and TroubleshootingApache Ambari - HDP Cluster Upgrades Operational Deep Dive and Troubleshooting
Apache Ambari - HDP Cluster Upgrades Operational Deep Dive and Troubleshooting
 
Meet HBase 2.0 and Phoenix 5.0
Meet HBase 2.0 and Phoenix 5.0Meet HBase 2.0 and Phoenix 5.0
Meet HBase 2.0 and Phoenix 5.0
 
Apache Hive ACID Project
Apache Hive ACID ProjectApache Hive ACID Project
Apache Hive ACID Project
 
Apache Hive on ACID
Apache Hive on ACIDApache Hive on ACID
Apache Hive on ACID
 
Apache HBase Internals you hoped you Never Needed to Understand
Apache HBase Internals you hoped you Never Needed to UnderstandApache HBase Internals you hoped you Never Needed to Understand
Apache HBase Internals you hoped you Never Needed to Understand
 
Hive 3 a new horizon
Hive 3  a new horizonHive 3  a new horizon
Hive 3 a new horizon
 
Real-Time Ingesting and Transforming Sensor Data and Social Data with NiFi an...
Real-Time Ingesting and Transforming Sensor Data and Social Data with NiFi an...Real-Time Ingesting and Transforming Sensor Data and Social Data with NiFi an...
Real-Time Ingesting and Transforming Sensor Data and Social Data with NiFi an...
 
Apache phoenix
Apache phoenixApache phoenix
Apache phoenix
 
The state of SQL-on-Hadoop in the Cloud
The state of SQL-on-Hadoop in the CloudThe state of SQL-on-Hadoop in the Cloud
The state of SQL-on-Hadoop in the Cloud
 
Performance Update: When Apache ORC Met Apache Spark
Performance Update: When Apache ORC Met Apache SparkPerformance Update: When Apache ORC Met Apache Spark
Performance Update: When Apache ORC Met Apache Spark
 
Streaming SQL with Apache Calcite
Streaming SQL with Apache CalciteStreaming SQL with Apache Calcite
Streaming SQL with Apache Calcite
 

En vedette

Imperative Induced Innovation - Patrick W. Dowd, Ph. D
Imperative Induced Innovation - Patrick W. Dowd, Ph. DImperative Induced Innovation - Patrick W. Dowd, Ph. D
Imperative Induced Innovation - Patrick W. Dowd, Ph. D
scoopnewsgroup
 

En vedette (8)

Imperative Induced Innovation - Patrick W. Dowd, Ph. D
Imperative Induced Innovation - Patrick W. Dowd, Ph. DImperative Induced Innovation - Patrick W. Dowd, Ph. D
Imperative Induced Innovation - Patrick W. Dowd, Ph. D
 
SQRRL threat hunting platform
SQRRL threat hunting platformSQRRL threat hunting platform
SQRRL threat hunting platform
 
Near Real-Time Outlier Detection and Interpretation
Near Real-Time Outlier Detection and InterpretationNear Real-Time Outlier Detection and Interpretation
Near Real-Time Outlier Detection and Interpretation
 
Calcite meetup-2016-04-20
Calcite meetup-2016-04-20Calcite meetup-2016-04-20
Calcite meetup-2016-04-20
 
Introduction to Accumulo
Introduction to AccumuloIntroduction to Accumulo
Introduction to Accumulo
 
Big data advance topics - part 2.pptx
Big data   advance topics - part 2.pptxBig data   advance topics - part 2.pptx
Big data advance topics - part 2.pptx
 
2017 02-07 - elastic & spark. building a search geo locator
2017 02-07 - elastic & spark. building a search geo locator2017 02-07 - elastic & spark. building a search geo locator
2017 02-07 - elastic & spark. building a search geo locator
 
The Impala Cookbook
The Impala CookbookThe Impala Cookbook
The Impala Cookbook
 

Similaire à Apache Accumulo 1.8.0 Overview

PaaS on Openstack
PaaS on OpenstackPaaS on Openstack
PaaS on Openstack
Open Stack
 
Feb. 9, 2010 ICACT 2010@Phoenix Park, Korea
Feb. 9, 2010 ICACT 2010@Phoenix Park, Korea Feb. 9, 2010 ICACT 2010@Phoenix Park, Korea
Feb. 9, 2010 ICACT 2010@Phoenix Park, Korea
webhostingguy
 
Building RESTful services using SCA and JAX-RS
Building RESTful services using SCA and JAX-RSBuilding RESTful services using SCA and JAX-RS
Building RESTful services using SCA and JAX-RS
Luciano Resende
 
APACHE
APACHEAPACHE
APACHE
ARJUN
 

Similaire à Apache Accumulo 1.8.0 Overview (20)

Next Generation Execution Engine for Apache Storm
Next Generation Execution Engine for Apache StormNext Generation Execution Engine for Apache Storm
Next Generation Execution Engine for Apache Storm
 
WebCamp 2016: DevOps. Николай Дойков: Опыт создания клауда для потокового вид...
WebCamp 2016: DevOps. Николай Дойков: Опыт создания клауда для потокового вид...WebCamp 2016: DevOps. Николай Дойков: Опыт создания клауда для потокового вид...
WebCamp 2016: DevOps. Николай Дойков: Опыт создания клауда для потокового вид...
 
SUSE Expert Days 2017 FUJITSU
SUSE Expert Days 2017 FUJITSUSUSE Expert Days 2017 FUJITSU
SUSE Expert Days 2017 FUJITSU
 
Sanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticiansSanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticians
 
Flexible compute
Flexible computeFlexible compute
Flexible compute
 
2013-05-22 RedHatGov Partner Event
2013-05-22 RedHatGov Partner Event2013-05-22 RedHatGov Partner Event
2013-05-22 RedHatGov Partner Event
 
Micro services vs hadoop
Micro services vs hadoopMicro services vs hadoop
Micro services vs hadoop
 
PaaS on Openstack
PaaS on OpenstackPaaS on Openstack
PaaS on Openstack
 
Feb. 9, 2010 ICACT 2010@Phoenix Park, Korea
Feb. 9, 2010 ICACT 2010@Phoenix Park, Korea Feb. 9, 2010 ICACT 2010@Phoenix Park, Korea
Feb. 9, 2010 ICACT 2010@Phoenix Park, Korea
 
Building RESTful services using SCA and JAX-RS
Building RESTful services using SCA and JAX-RSBuilding RESTful services using SCA and JAX-RS
Building RESTful services using SCA and JAX-RS
 
Hadoop Operations - Past, Present, and Future
Hadoop Operations - Past, Present, and FutureHadoop Operations - Past, Present, and Future
Hadoop Operations - Past, Present, and Future
 
ACID Transactions in Hive
ACID Transactions in HiveACID Transactions in Hive
ACID Transactions in Hive
 
Hadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in ProductionHadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in Production
 
Hadoop & cloud storage object store integration in production (final)
Hadoop & cloud storage  object store integration in production (final)Hadoop & cloud storage  object store integration in production (final)
Hadoop & cloud storage object store integration in production (final)
 
Open shift and docker - october,2014
Open shift and docker - october,2014Open shift and docker - october,2014
Open shift and docker - october,2014
 
Runos OpenFlow Controller (eng)
Runos OpenFlow Controller (eng)Runos OpenFlow Controller (eng)
Runos OpenFlow Controller (eng)
 
Running Services on YARN
Running Services on YARNRunning Services on YARN
Running Services on YARN
 
Managing Enterprise Hadoop Clusters with Apache Ambari
Managing Enterprise Hadoop Clusters with Apache AmbariManaging Enterprise Hadoop Clusters with Apache Ambari
Managing Enterprise Hadoop Clusters with Apache Ambari
 
Managing Enterprise Hadoop Clusters with Apache Ambari
Managing Enterprise Hadoop Clusters with Apache AmbariManaging Enterprise Hadoop Clusters with Apache Ambari
Managing Enterprise Hadoop Clusters with Apache Ambari
 
APACHE
APACHEAPACHE
APACHE
 

Dernier

Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Medical / Health Care (+971588192166) Mifepristone and Misoprostol tablets 200mg
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
masabamasaba
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
masabamasaba
 

Dernier (20)

Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go Platformless
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
 

Apache Accumulo 1.8.0 Overview

  • 1. Apache Accumulo 1.8.0 Overview Josh Elser Apache Accumulo Meetup Group 2016/06/27
  • 2. 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache Accumulo 1.8.0  First release candidate in the works  A “minor” release, but significantly more work required than a “patch” release – ContinuousIngest and verification – RandomWalk  Long time coming..
  • 3. 3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Semantic Versioning  Defines a set of rules for software projects to adhere to across different versions.  Clear understanding on compatibility  Rules are defined in terms of a “public API” – Defined by the project adopting SemVer  Major – Incompatible changes, deprecations removed  Minor – Backwards-compatible features added  Patch – Backwards-compatible bug-fixes only (no features) http://semver.org - major.minor.patch
  • 4. 4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache Accumulo and Semantic Versioning  Apache Accumulo defines a public API – Made up of Java classes, defined by packages – The goal is to describe how user code should function across releases – Recursively, all public types in (excluding impl, thrift, or crypto) • org.apache.accumulo.core.{client,data,security} • org.apache.accumulo.minicluster  Other concerns for compatibility too – RPC classes – Persistent data (RFiles and ZooKeeper)  Not comprehensive! – Not all user facing code is yet included in the public API • Monitoring UIs and data • Start/stop scripts • The Accumulo Shell
  • 5. 5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache Accumulo and Semantic Versioning  Is it guaranteed that your application from 1.7.1 work against 1.8.0?  What about a 1.6.5 application?  Are you guaranteed to be able to roll back an upgrade from 1.8.0 to 1.7.1?  Is it guaranteed that your 1.8.0 application work against 1.7.0? POP QUIZ!
  • 6. 6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Notable changes currently staged for Apache Accumulo 1.8.0
  • 7. 7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved System Administrator Changes  [ACCUMULO-925] - Launch scripts should use a PIDfile – New script: start-daemon.sh – Encapsulates only the things that need to happen on the machine starting a process • No SSH’ing – Support for PID files to track processes – Rotating .out and .err files on start • Critical for delayed JVM layer issues
  • 8. 8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Performance!  [ACCUMULO-3423] - Speed up write-ahead log (WAL) roll-overs – Changes how references to WALs are stored by Accumulo – Reduces the number of writes when switching to a new WAL – Uses ZooKeeper to track the state, copies into tablet row before recovery starts – 10-30% faster over previous implementation (while exacerbating the problem)  [ACCUMULO-1124] - Optimize index size in RFile – RFiles have “data” and “index” blocks; index from RowID to data block containing that RowID – Large RowIDs bloat the index (e.g. inverted URL) – Fewer index blocks can be cached – Related work: [ACCUMULO-4164] and [ACCUMULO-4314]
  • 9. 9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved New Features  [ACCUMULO-3913] - Add per table sampling – Helpful in running analytics over some percentage of the total data – Can automatically create samples during compaction or on the fly using Iterators – Configurable hashing to ensure consistency across “index” and “data” tables • No dangling references index records or unreachable data records – Consider snapshot’ing a sample of a table. After compaction, just a “normal” table  [ACCUMULO-4187] - Rate limiting of major compactions – Compactions can strain system resources: hardware, JVM and HDFS – Normally, desirable to process compactions as fast as possible – Can negatively affect low-latency workloads – Configure a limit in bytes per seconds that a TabletServer should process during compaction
  • 10. 10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved New Features (pt.2)  [ACCUMULO-3948] - Enable A/B testing of scan iterators on a table – Classpath context is a definition of JARs which the TabletServer should dynamically load – Configuration allows a context to be specified when using a [Batch]Scanner – Multiple implementations of the same SKVIterator classes can co-exist – Useful in testing new implementations of iterators on real data before switching production  [ACCUMULO-626] - Create an iterator fuzz tester – Writing SKVIterators is notoriously difficult – Many common pitfalls and gotcha’s, often not appearing until “real” use – A testing framework codifies these edge cases and can automatically test iterators • Similar to ”security fuzzing” – Users must provide data sets and the expected outcome from using their SKVIterator – A supplement to unit testing and MiniAccumuloCluster, not a replacement – Test cases implicitly encourage good design of SKVIterators
  • 11. 11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved New APIs  [ACCUMULO-2883] - Add API to fetch current locations of Tablets – Long-standing feature request (order of years) – Extremely useful for distributed execution engines for locality aware computation • Apache Hive, Presto, Apache Drill, Apache Spark, etc – Smart placement can reduce client <--> Accumulo network traffic • Locality with Accumulo Tablets also implies locality with HDFS data (over time)  [ACCUMULO-4165] - Create a user level API for RFile – Example of a “glaring” hole in the public API – Only stable way to create an RFile is via MapReduce – Provides a supported API for reading and writing RFiles – Simplifies implementation and use of RFile access internally too
  • 12. 12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Changes to be wary of  [ACCUMULO-3409] - Move default ports out of ephemeral range – Traditional ephemeral range on Linux: [32768, 61000] – Transient connections can prevent processes from starting – Monitor HTTP port moves from 50095 to 9995  [ACCUMULO-4077] - Upgrade to Apache Thrift 0.9.3 – Thrift is used by Accumulo for RPCs – Serialized messages are compatible (with caveats) across releases, but Java classes are not – A massive pain for downstream integrations – If you require a different version of Thrift and want to use Accumulo 1.8.0 • Shade+Relocate your version of Thrift in your application • Upgrade to Apache Thrift 0.9.3
  • 13. 13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Thank You Email: elserj@apache.org Twitter: @josh_elser Mailing list: dev@accumulo.apache.org