3 apache-avro

Apache Avro

Zafar Gilani
Muhammad Adnan Khan
Hui Shang

Outline
• Overview
• Comparison
• Specification
• SASL profile and usage
• References

Overview
• A data serialization system.
• An RPC framework.
• For: storage & comm.
• Purpose:
– Provide rich data structures.
– A compact and fast binary data format.
– Simple integration with dynamic languages.

Overview
• Avro uses JSON for Interface Description
Language (IDL).
– To specify data types.
– To specify protocols.
• Review: JavaScript Object Notation is just a
light-weight text-based standard for data
interchange.

Why the need for Avro?
• Primary usage in Hadoop, provides standard:
1. Serialization format for persistent data.
2. Wire format for communication ..
• .. among Hadoop nodes.
• .. from client programs to Hadoop services.

Overview
• Avro relies on schemas.
– Schema stored with data.
– Each datum written with no per-value overheads.
• Thus serialization is fast and small.
• Avro in RPC:
– Schema exchange during client-server handshake.
– Correspondence in fields can be easily resolved.

APIs
• Supporting API for:
– Java
–C
– C++
– C#
– Python
– Ruby

Comparison with other systems
• Avro vs. Protobuf and Thrift.
• A quick note about Thrift:
– Initially developed at Facebook by a Google intern.
– Closer to Google’s protobuf.

Comparison with other systems
Avro Google protobuf Thrift

Implementation Hmm.. Cleaner  Hmm..

Error handling Complex Simple OK

Extensibility Hmm.. Richer OK

Compatibility Java, C, C++, C#, That and much About the same as
Python and Ruby more such as protobuf
Adobe Actionscript,
Microsoft
Silverlight, etc.

Specification
• Schema represented in one of:
– JSON string, naming a defined type.
– JSON object of the form:
• {"type": "typeName" ...attributes...}
– JSON array
• Primitive types: null, boolean, int, long, float,
double, bytes, string
– {"type": "string"}
• Complex types: records, enums, arrays, maps,
unions, fixed

Specification, example protocol
{
"namespace": "com.acme",
"protocol": "HelloWorld",
"doc": "Protocol Greetings",

"types": [
{"name": "Greeting", "type": "record", "fields": [
{"name": "message", "type": "string"}]},
{"name": "Curse", "type": "error", "fields": [
{"name": "message", "type": "string"}]}
],

"messages": {
"hello": {
"doc": "Say hello.",
"request": [{"name": "greeting", "type": "Greeting" }],
"response": "Greeting",
"errors": ["Curse"]
}
}
}

SASL profile
• Simple Authentication and Security Layer.
• Provides a framework for
– Authentication.
– Security of network protocols.

SASL usage
• Negotiation procedure to use connection-
oriented Avro RPC:
– 0: START Used in a client's initial message.
– 1: CONTINUE Used while negotiation is
ongoing.
– 2: FAIL Terminates negotiation unsuccessfully.
– 3: COMPLETE Terminates negotiation
sucessfully.

References
1. Apache Avro,
http://avro.apache.org/docs/current/
2. Google protocol buffers vs Apache Avro,
http://www.sammur.com/?p=36
3. Avro vs Thrift,
http://tech.puredanger.com/2011/05/27/serializ
ation-comparison/
4. SASL,
http://avro.apache.org/docs/current/sasl.html

3 apache-avro

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à 3 apache-avro

Similaire à 3 apache-avro (20)

Plus de zafargilani

Plus de zafargilani (7)

Dernier

Dernier (20)

3 apache-avro