Contenu connexe Similaire à Apache Accumulo 1.8.0 Overview (20) Apache Accumulo 1.8.0 Overview2. 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Accumulo 1.8.0
First release candidate in the works
A “minor” release, but significantly more work required than a “patch” release
– ContinuousIngest and verification
– RandomWalk
Long time coming..
3. 3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Semantic Versioning
Defines a set of rules for software projects to adhere to across different versions.
Clear understanding on compatibility
Rules are defined in terms of a “public API”
– Defined by the project adopting SemVer
Major
– Incompatible changes, deprecations removed
Minor
– Backwards-compatible features added
Patch
– Backwards-compatible bug-fixes only (no features)
http://semver.org - major.minor.patch
4. 4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Accumulo and Semantic Versioning
Apache Accumulo defines a public API
– Made up of Java classes, defined by packages
– The goal is to describe how user code should function across releases
– Recursively, all public types in (excluding impl, thrift, or crypto)
• org.apache.accumulo.core.{client,data,security}
• org.apache.accumulo.minicluster
Other concerns for compatibility too
– RPC classes
– Persistent data (RFiles and ZooKeeper)
Not comprehensive!
– Not all user facing code is yet included in the public API
• Monitoring UIs and data
• Start/stop scripts
• The Accumulo Shell
5. 5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Accumulo and Semantic Versioning
Is it guaranteed that your application from 1.7.1 work against 1.8.0?
What about a 1.6.5 application?
Are you guaranteed to be able to roll back an upgrade from 1.8.0 to 1.7.1?
Is it guaranteed that your 1.8.0 application work against 1.7.0?
POP QUIZ!
6. 6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Notable changes currently
staged for Apache Accumulo 1.8.0
7. 7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
System Administrator Changes
[ACCUMULO-925] - Launch scripts should use a PIDfile
– New script: start-daemon.sh
– Encapsulates only the things that need to happen on the machine starting a process
• No SSH’ing
– Support for PID files to track processes
– Rotating .out and .err files on start
• Critical for delayed JVM layer issues
8. 8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Performance!
[ACCUMULO-3423] - Speed up write-ahead log (WAL) roll-overs
– Changes how references to WALs are stored by Accumulo
– Reduces the number of writes when switching to a new WAL
– Uses ZooKeeper to track the state, copies into tablet row before recovery starts
– 10-30% faster over previous implementation (while exacerbating the problem)
[ACCUMULO-1124] - Optimize index size in RFile
– RFiles have “data” and “index” blocks; index from RowID to data block containing that RowID
– Large RowIDs bloat the index (e.g. inverted URL)
– Fewer index blocks can be cached
– Related work: [ACCUMULO-4164] and [ACCUMULO-4314]
9. 9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
New Features
[ACCUMULO-3913] - Add per table sampling
– Helpful in running analytics over some percentage of the total data
– Can automatically create samples during compaction or on the fly using Iterators
– Configurable hashing to ensure consistency across “index” and “data” tables
• No dangling references index records or unreachable data records
– Consider snapshot’ing a sample of a table. After compaction, just a “normal” table
[ACCUMULO-4187] - Rate limiting of major compactions
– Compactions can strain system resources: hardware, JVM and HDFS
– Normally, desirable to process compactions as fast as possible
– Can negatively affect low-latency workloads
– Configure a limit in bytes per seconds that a TabletServer should process during compaction
10. 10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
New Features (pt.2)
[ACCUMULO-3948] - Enable A/B testing of scan iterators on a table
– Classpath context is a definition of JARs which the TabletServer should dynamically load
– Configuration allows a context to be specified when using a [Batch]Scanner
– Multiple implementations of the same SKVIterator classes can co-exist
– Useful in testing new implementations of iterators on real data before switching production
[ACCUMULO-626] - Create an iterator fuzz tester
– Writing SKVIterators is notoriously difficult
– Many common pitfalls and gotcha’s, often not appearing until “real” use
– A testing framework codifies these edge cases and can automatically test iterators
• Similar to ”security fuzzing”
– Users must provide data sets and the expected outcome from using their SKVIterator
– A supplement to unit testing and MiniAccumuloCluster, not a replacement
– Test cases implicitly encourage good design of SKVIterators
11. 11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
New APIs
[ACCUMULO-2883] - Add API to fetch current locations of Tablets
– Long-standing feature request (order of years)
– Extremely useful for distributed execution engines for locality aware computation
• Apache Hive, Presto, Apache Drill, Apache Spark, etc
– Smart placement can reduce client <--> Accumulo network traffic
• Locality with Accumulo Tablets also implies locality with HDFS data (over time)
[ACCUMULO-4165] - Create a user level API for RFile
– Example of a “glaring” hole in the public API
– Only stable way to create an RFile is via MapReduce
– Provides a supported API for reading and writing RFiles
– Simplifies implementation and use of RFile access internally too
12. 12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Changes to be wary of
[ACCUMULO-3409] - Move default ports out of ephemeral range
– Traditional ephemeral range on Linux: [32768, 61000]
– Transient connections can prevent processes from starting
– Monitor HTTP port moves from 50095 to 9995
[ACCUMULO-4077] - Upgrade to Apache Thrift 0.9.3
– Thrift is used by Accumulo for RPCs
– Serialized messages are compatible (with caveats) across releases, but Java classes are not
– A massive pain for downstream integrations
– If you require a different version of Thrift and want to use Accumulo 1.8.0
• Shade+Relocate your version of Thrift in your application
• Upgrade to Apache Thrift 0.9.3
13. 13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Thank You
Email: elserj@apache.org
Twitter: @josh_elser
Mailing list: dev@accumulo.apache.org