Notes:
This being Apache there are 50 ways to assemble this, I’m going to cover one
There are a lot of parts in the picture, I won’t be able to cover them all
For several of these I want to look at what’s there now, and what communities are working on to improve this experience
What tools/processes have you tried to attach to Hadoop and have been unable to do so? Why?
Hive version = Hive 2.1
Presto version = Presto 0.163
Additional details:
Queries are 20 interactive queries taken from TPC-DS using 1 TB of data from TPC-DS schema.
Queries run at random using a jmeter test harness
Extend reach of Hadoop APIs
Gateway for Hadoop’s REST API
Different REST APIs have varying levels of AuthN, AuthZ, SSL, SSO Capability
Enterprise authentication
Apply Enterprise capabilities to All REST APIs – IdM Integration, SSO, OAuth, SAML
Avoid exposing Cluster port, hostnames to all users
Show – clearly identify customer metadata. Change
Add customer classification example – Aetna – make the use case story have continuity. Use DX procedures to diagnosis
** bring meta from external systems into hadoop – keep it together
Maps well to Yahoo case
Notes:
This being Apache there are 50 ways to assemble this, I’m going to cover one
There are a lot of parts in the picture, I won’t be able to cover them all
For several of these I want to look at what’s there now, and what communities are working on to improve this experience