This document discusses SQL-on-Hadoop tools like Shark and SparkSQL. Shark sits on top of Apache Spark and is tightly coupled with Hive, using Hive statements and metadata. It provides faster performance than Hive due to in-memory processing. SparkSQL is a new tool that is not dependent on Hive and uses a new SchemaRDD. The document recommends using columnar file formats like Parquet for better performance and disk usage and provides a hands-on demonstration comparing file formats and query execution times in Hive, Impala and Shark.