Warren Buffet would often think of companies as castles with a competitive moat protecting the business. Products or companies that figure out how to build and leverage differentiated data assets will be best positioned to win their respective markets. This talk describes the properties of a good data moat, why it matters, and how to go about building them within your organization.
9. Big Data: Reality
• Science, theory, and reason are not being replaced
• Big Data is different: for some problems, big data produces
better results than we find with smaller samples
• Data storage and logging are increasingly cheap, so err on the
side of collecting data to process later if you think it may be
valuable
• Large, differentiated data assets are the foundation for
defensible products and better decisions
25. What do data scientists actually do?
source: data from
http://www.linkedin.com/skills
26. Two species of data scientist*
Type I: Traditional BI
• Question-driven
• Interactive
• Ad-hoc, post-hoc
• Fixed data
• Focus on speed and
flexibility
• Output is embedded into a
report, dashboard, or in-database
scoring engine
Type II: Data Products
• Metric-driven
• Automated
• Systematic
• Fluid data
• Focus on transparency and
reliability
• Output is a production
system that makes
customer-facing decisions
*Slide adapted from Josh Wills “From the Lab to the Factory”
28. Data Product pre-history: Data Aggregators
• 1972: Vinod Gupta forms American Business
Information, Inc., a database initially built via
manual data entry of Yellow Pages
information
• 1973: LEXIS full text legal search launches
publicly
• 1986: Bloomberg reaches 5,000 terminal
subscribers
• 1994: Jerry Yang & David Filo compile and
maintain a hand curated set of categorized
links on the World Wide Web known as the
Yahoo! Directory
29. The Rise of Algorithmic Data Products
• Google: Web Search, PageRank, AdWords
• Netflix: Movie Recommendations
• Pandora: Music Recommendations
• eBay: Product Search, Fraud Detection, Advertising
• Amazon: Similar Items, Book Recommendations
• LinkedIn: People You May Know, Who Viewed My Profile
31. Data Product investment and ROI
• Skill Extraction and Standardization Pipeline
• Skill Pages
• Skills Section on member profiles
• Suggested Skills Algorithm and email > 20M members
• Skill Endorsements > 60M members, 3B+ Edges
• Big product wins: engagement, recall, relevance
• SkillRank & Reputation Algorithm R&D
• LinkedIn is now the definitive source for information
on skills & expertise
*Statistics as of 2013
32. How leaders can drive data growth
• Accountability: Who defines the data vision &
roadmap in your organization? Who is accountable for
building and expanding your moat?
• Invest in data infrastructure, training, logging, & tools
for rapid iteration. Build a data lake.
• Invest in exploration and innovation, including user
facing data product and algorithm development
• Define a framework for trading off data quality and
quantity metrics
• Ask “How does this increase our data moat?” when
evaluating any new project, incentivize it
Scientists make measurements: http://seanjtaylor.com/post/41463778912/real-scientists-make-their-own-data
Creating new information, observations, alpha
Some data scientists go to great lengths to avoid collecting data or touching the user interface, when a small change can eliminate tons of wasted time
Requires authority or support from leadership to make product changes
Works best if data scientists are involved in design decisions from the start
- CERN supercollider - collect something nobody else has collected
Vision/Roadmap: what data doesn’t exist that would make your product better, aligned with company mission.
Google Streetview Photos => Self Driving Car
Facebook / LinkedIn story – emergence of new role
--- "Built to Last" Be a clock builder - an architect - not a time teller
--- Another analogy: are you a sports reporter, repeating the details of the game in a dashboard, or are you crunching that data to select the best new talent