All SQL engines (traditional or SQL-on-Hadoop) view tables as spreadsheet-like data structures with rows and columns. All records have the same structure, and there is no support for nested data or repeating fields. Drill views tables conceptually as collections of JSON (with additional types) documents. Each record can have a different structure (hence, schema-less). This is revolutionary and has never been done before.
If you consider the four data models shown in the 2x2, all models can be represented by the complex, no schema model (JSON) because it is the most flexible. However, no other data model can be represented by the flat, fixed schema model. Therefore, when using any SQL engine except Drill, the data has to be transformed before it can be available to queries.
Drill (and Hadoop) do not replace the data warehouse. Data exploration is a separate use case and gap which is not well filled by existing data analytic technologies.
MapR is close partners with industry leaders such as Teradata where our systems tightly integrated and provide a better overall enterprise architecture for organizations looking for a best-of-breed approach to big data analytics.
Distributed quey engine
Any Drill bit can accept the request
Driver drillbit
Drill is fault tolerant
Only sql on hadoop engine with no central servers
Quesiton to think – for given work
Very short queries are going to be impaced by falures
50% is the time checkpointing, you are paying penalty for benefit
What portion of jobs complete in few hours, excluding time to queue
Intrepreted exepression tree
Csutom code for every single query for every operator
--
Traditional databases – knows data types ahead of time, so they generate execution binary to all the nodes -