2. 2
• Big Data analytics / Machine Learning
• 4+ years exp with Hadoop ecosystem
• 2 years exp with Spark
About me
http://bigdataresearch.io/
• Co-founder of Big Data Research Group
• Provides open source solutions around Big Data
analytics
http://atigeo.com/
3. Agenda
• jaws-spark-sql-rest (Jaws) intro
• Main features
• Architecture
• Scaling
• Resource manager
• Working with Tachyon
• Working with Parquet files
• Configure Spark Sql context
• Demo 3
5. Jaws
• Highly scalable and resilient data warehouse explorer
• Restful alternative to Spark SQL JDBC and not only …
• Support for Spark 0.9.1/Shark thru Spark 1.5
• Support for hive/MR
https://github.com/atigeo/jaws-spark-sql-rest
5
6. Main features
• Submit queries concurrently and asynchronously
• Provides persisted logs, query history, results with paging
• Pluggable persistent layer (Cassandra/HDFS)
• Supports load balancing with query cancelation
• Provides a metadata browser
• In-memory Parquet warehouse with Tachyon
• Configuration file to fine tune Spark context
• Pluggable UI 6
11. Results persistence
• Queries with limited number of results:
‣ Cassandra
‣ HDFS
• Queries with unlimited number of results:
‣ HDFS
‣ Tachyon
11
12. Working with Tachyon
• Persists unlimited results in Tachyon
• Registers tables over Parquet files from Tachyon
12
Tachyon benefits:
★ in memory storage system
★ share data between applications at a memory
speed
13. Working with Parquet files
• Register tables on top of parquet files
13
Parquet
★ columnar format
★ nested data structures
★ supports schema evolution
★ efficient compression
• Files stored on HDFS or Tachyon
• MetaInfo about table stored in Cassandra (feature before Spark
1.3)