This talk covers the challenges we tackled during building our new service oriented system. Summarizing what we realized would bad Ideas to do, what are the better approaches to data consistency, how we used Apache Avro technology and what other supporting infrastructure we created to help us achieving the goal of consistent yet flexible system.
Amihay Zer-Kavod is I'm a Senior Software Architect at LivePerson.
1. Apache Avro in LivePerson
Collecting and saving data is easy
keeping it consistent is tough
DevCon Tlv, June 2014
Amihay Zer-Kavod, Software Architect
2. Who am I?
Amihay Zer-Kavod
Software Architect
Been in software Since 1989
4. ● Consistent but decoupled communication
between services, such as:
o Monitoring, Interaction
o Predictive, Sentiment
o Reporting & Analysis
o History
Communication & Meaning
event
evento
事件
घटना
حدث
ארוע
событие
● Consistent meaning over time
o BigData Store (Hadoop)
o Reporting
5. What can’t we use?
Don’t use Direct APIs!
They are completely wrong for this issue, since:
• They produce too much coupling between services
• APIs are synchronous by nature
• Adds irrelevant complexity to the called service
6. So what is needed?
The Message is the API!
● A unified event model (schema) for all reported events
● Management tools for the unified schema
● Tools for sending events over the wire
● Tools for reading/writing event in big data
● Backward and forward compatibility
7. The Event model
From generic to specific structure with:
• Common header - all common data to all events
• Logical Entities - common header to all logical entities
(such as Visitor)
• Dynamic Specific headers
• Specific Event body
8. Apache Avro to the rescue
● Avro - a schema based serialization/deserialization
framework
● Avro idl - schema definition language
● Avro file - Hadoop integration
● Avro schema resolution
● Apache Avro created by Doug Cutting
10. Avro IDL - LivePerson Event
/** Base for all LivePerson Events
*/
@namespace("com.liveperson.global")
record LPEvent {
/** Common Header of the event */
CommonHeader header = null;
/** Logical entity details participating in this event - Visitor, Agent, etc... */
array<Participant> participants = null;
/** Holding specific platform info as node name (machine) cluster Id etc... */
PlatformHeader platformSpecificHeader = null;
/** Auditing Header, Optional - adds data for auditing of the events flow in the platform*/
union {null, AuditingHeader } auditingHeader = null;
/** The event body */
EventBody eventBody = null;
}
11. Backward & Forward Compatibility
Avro schema evolution
● Avro supports two schemes resolution
● Need to follow a set of rules:
● Every field must have a default value
● A field can be added (make sure to put a default value)
● Field types can not be changed (add a new field
instead)
● enum symbols can be added but never removed
13. How good does it work?
● Cyber Monday 2013 (one day)
o More than 320,000 events per second
o 7 Storm topologies consuming the events seconds from
real time
o 2TB of data saved to Hadoop
● 2014 preparation:
o x2 number of events per second to ~640,000
14. So how did we do it?
1. Use an event driven system, don’t use direct APIs
2. Create a unified schema for all events
3. Use Avro to implement the schema
4. Add some supporting infrastructure