Generic presentation about Big Data Architecture/Components. This presentation was delivered by David Pilato and Tugdual Grall during JUG Summer Camp 2015 in La Rochelle, France
26. Data Processing
• Transform the data
• Enrich the data
• Examples:
• Store data in multiple formats
• Aggregate data
• Build Recommendations
• ….
27. MapReduce Processing Model
• Define mappers
• Shuffling is automatic
• Define reducers
• For complex work, chain jobs together
– Use a higher level language or DSL that does this for you
28. Apache Spark: Fast Big Data
– Rich APIs in Java,
Scala, Python
– Interactive shell
• Fast to Run
– General execution
graphs
– In-memory storage
40. Conclusion
• If possible use Streams: Kafka, Logstash
• Advanced Data Processing and Machine Learning : Spark
• Expose your data using SQL for your “BI folks” : Drill
• Aggregation and Full Text Search : Elasticsearch
• Data Visualisation : Kibana