Data scientists can do two things better than data analysts: ask great questions and answer them faster than other people would think possible.
Everyone gets a Ferrari!
Oh no! Everyone has a Ferrari! Induced demand: as you increase the supply of something, the demand for it increases as well.
We need the equivalent of public transit infrastructure for analytic queries: low marginal cost for asking one more question, goes the places most people need to go, removes load from the roadways.
The spell correction example as a model for what the public transit infrastructure should look like.
Can we create a data model that makes this kind of powerful analysis available to people who only know SQL?
Even better, can a common data model enable us to seamlessly move models from the offline, analytical world to the online, operational world? Because the supernova data model is essentially the HBase/Cassandra/Mongo/etc. data model.
http://github.com/jwills/exhibit
No tool can make you a data scientist, because it’s the ability to push beyond the limits of your tools that makes you a data scientist.