Jim Scott, CHUG co-founder and Director, Enterprise Strategy and Architecture for MapR presents "Using Apache Drill". This presentation was given on August 13th, 2014 at the Nokia office in Chicago, IL.
Jim has held positions running Operations, Engineering, Architecture and QA teams. He has worked in the Consumer Packaged Goods, Digital Advertising, Digital Mapping, Chemical and Pharmaceutical industries. His work with high-throughput computing at Dow Chemical was a precursor to more standardized big data concepts like Hadoop.
Apache Drill brings the power of standard ANSI:SQL 2003 to your desktop and your clusters. It is like AWK for Hadoop. Drill supports querying schemaless systems like HBase, Cassandra and MongoDB. Use standard JDBC and ODBC APIs to use Drill from your custom applications. Leveraging an efficient columnar storage format, an optimistic execution engine and a cache-conscious memory layout, Apache Drill is blazing fast. Coordination, query planning, optimization, scheduling, and execution are all distributed throughout nodes in a system to maximize parallelization. This presentation contains live demonstrations.
The video can be found here: http://vimeo.com/chug/using-apache-drill
Modeled after Dremel based on the white paper from Google
With additional flexibility required to support a broader range of data formats and data sources
The design goal is to scale to 10,000+ servers and to be able to process petabyes of data and trillions of records in seconds
Hortonworks has used code from drill in Tez
These are not people who can only create an Abstract Syntax Tree – They have worked on Oracle, DB2, ParAccel, Teradata, SQLServer, Vertica
You don’t use a QWERTY-like keyboard
Do you really want to use another SQL-like syntax
Contributors
Facebook, Visa, Mesosphere, many universities, etc --- Even Oracle
So many tools and applications. Great performance
One technology – standard across multiple databases
As applications evolve, schemas change rapidly
Why do we tolerate applications that only support the parts and pieces they choose for SQL?
A DrillBit is simply a Drill process running in any particular node in the cluster
Have I mentioned JDBC and ODBC drivers? This means you can use standard database interfaces that support standards.