Talk Abstract
An overview of the new client API planned for the release of Accumulo 2.0. Problems with the old API are described, along with lessons learned. The benefits of the new API are explained, and code snippets are provided to demonstrate the overall design and contrast it with the old API, in order to assist users interested in transitioning.
Speaker
Christopher Tubbs
Computer Systems Researcher, National Security Agency
Christopher is a researcher with the National Security Agency. He holds a Bachelor's degree in both Physics and Computer Science from Eastern Michigan University. He is an open source enthusiast and advocate for open source development, as well as data privacy and security. He has been contributing to the Accumulo project since 2009, prior to its release to the Apache Software Foundation in 2011. He is currently a committer and PMC member on the project, and an ASF member, as well as the Accumulo package maintainer for the Fedora project.
3. Version Philosophy
Old
1.x.y
x: major*
y: minor/bugfixes*
* habit of removing
deprecated code arbitrarily
New (1.6.2+)
x.y.z
x: major
y: minor
z: patch (bugfix)
Semantic Versioning 2.0
(http://semver.org/)
4. Background: 1.x API
● Focus (or lack thereof)
○ Function > Usability
○ Limited forethought for integration
● Current API
○ a gradual evolution
○ biggest redesign in 2009
■ Instance / Connector
■ Permissions / Authenticator
○ lots of feature additions, deprecations, removals, but
few fundamental design changes since
5. Background: 1.x API (cont.)
Public API
● public and protected
○ in org.apache.accumulo.core.client
■ everything but impl packages
○ in org.apache.accumulo.core.data
■ Key, Mutation, Value, Range
■ Condition and ConditionalMutation (1.6+)
○ in org.apache.accumulo.minicluster
■ everything but impl packages
6. Background: 1.x API (cont.)
Public API (1.7)
● public and protected
○ org.apache.accumulo.core.client
○ org.apache.accumulo.core.data
○ org.apache.accumulo.core.security
○ org.apache.accumulo.minicluster
● all but
○ *impl*, *thrift*, *crypto*
7. Lessons Learned
1. Confusing entry point
Instance i = new ZooKeeperInstance(…);
Connector c;
// c = new Connector(i, user, pass);
c = i.getConnector(user, pass);
2. Too many overloaded methods
BatchWriterConfig bwConf;
bwConf = new BatchWriterConfig();
bw = c.createBatchWriter(table, bwConf);
12. What About Exceptions?
Current Problems:
... throws TableNotFoundException,
AccumuloSecurityException,
AccumuloException;
With Java 7, this gets a little better:
catch (AccumuloSecurityException |
AccumuloException e) { … }
13. Exception Hierarchy
Better:
public class TableNotFound
extends AccumuloException {}
--------------------------------------------
try (
AccumuloClient client =
Accumulo.client().build()) {
/* do work with the client */
} catch (AccumuloException e) {
…
} catch (YourCodeException e) {…}
14. Leaking
Current Problems:
● Leaking non-public (implementation) classes
○ apilyzer-maven-plugin
○ Problem: requires users to instantiate, assign, or
pass non-public classes in normal use
● Exposing too much implementation
○ MapReduce classes
○ Problem: makes it difficult to extend or evolve
internal changes without affecting users.
15. Dependency Exposure
Current Problems:
● Dependencies on unstable third-party
classes
○ Guava “@Beta”-annotated classes
○ Hadoop “@LimitedPrivate”-annotated classes
● Dependencies with lots of transitive deps
○ Hadoop “Text”,
○ “Writable” for serialization
● RPC serialization library in public API
○ Thrift
16. Parameter Problems
Current Problems:
● Exposing implementation-specific classes
○ log4j “Level”
○ prevents using log4j2, slf4j, and logback
● “stringly” typed objects parameters
○ table
○ tableName
○ tableId
17. Encoding Problems
Current Problems:
● Fail to specify internal encoding
● serialize/deserialize mismatch
● UTF-8 or user-specified?
● Overloaded methods again
● Unexpected characters (Authorizations)
● The Accumulo shell (jline)
19. API-only Artifact
● accumulo-api.jar (new!)
○ org.apache.accumulo.api
○ no dependencies on other accumulo jars
■ use Java’s ServiceLoader to bind to impl
○ minimal dependencies on stable libraries
■ commons
■ guava
● Not in accumulo-core.jar
20. 2.0.0 API Statement
Public API (new!)
● public and protected
○ org.apache.accumulo.api
Alternatively:
● public and protected
○ accumulo-api.jar
21. Goals: A Summary
● Improved API stability
● Compatibility (semver)
○ Easy to check
○ Easy to track changes
● Helps users manage dependencies
● Separate API from implementation
● Possible ability to swap out implementation
(mock replacement? in-process impl?)
● Intuitive “front-door”
● Fluent usage
● Resource management
22. Release plan
Steps
● Finish implementation
● Initial reviews
● 2.0.0-alpha-1
○ Developer preview released to get feedback
● 2.0.0-beta-1 ?
○ Possibly another developer preview after stabilizing
API changes
● 2.0.0 final release (Summer?)