SlideShare une entreprise Scribd logo
1  sur  47
Data Science @ UW
2
“It’s a great time to be a data geek.”
-- Roger Barga, Microsoft Research
“The greatest minds of my generation are trying
to figure out how to make people click on ads”
-- Jeff Hammerbacher, co-founder, Cloudera
The Fourth Paradigm
1. Empirical + experimental
2. Theoretical
3. Computational
4. Data-Intensive
Jim Gray
7/21/2014 Bill Howe, UW 3
“All across our campus, the process of discovery will increasingly rely on
researchers’ ability to extract knowledge from vast amounts of data… In order
to remain at the forefront, UW must be a leader in advancing these
techniques and technologies, and in making [them] accessible to researchers
in the broadest imaginable range of fields.”
2005-2008
In other words:
• Data-driven discovery will be ubiquitous
• UW must be a leader in inventing the
capabilities
• UW must be a leader in translational
activities – in putting these capabilities to
work
• It’s about intellectual infrastructure (human capital) and software
infrastructure (shared tools and services – digital capital)
A 5-year, US$37.8 million cross-institutional
collaboration to create a data science environment
5
2014
$9.3 million from Washington Research Foundation to
Amplify the Moore/Sloan effort
• 6 X 5-year Faculty lines in Data Science
• 6 X startup packages
• 15 X 3 yr postdoctoral fellows
• Funds to remodel and furnish a WRF Data Science Studio
• Also $7.1 million to closely-related Institute for
Neuroengineering, $8.0 million to Institute for Protein
Design, $6.7 million to Clean Energy Institute
6
7/21/2014 Bill Howe, UW 7
Data Science Kickoff Session:
137 posters from 30+ departments and units
8
PIs on Moore/Sloan effort
+ eScience Institute
Steering Committee
+ UW participants in
February 7 Data Science
poster session
Broad collaborations
Establish a virtuous cycle
• 6 working groups, each with
• 3-6 faculty from each institution
Key Activity: Promote interdisciplinary careers
• Interdisciplinary graduate students
– New, interdisciplinary “Data Science” Ph.D. tracks and program
• Interdisciplinary postdocs (“Data Science Fellows”)
– Dual-mentored postdocs with interests in both methods and a domain
science
• Interdisciplinary research scientists (“Data Scientists”)
• Work across disciplines to solve people’s data science challenges
• Interdisciplinary faculty
– Supported with special hiring and funding initiatives
• “Senior Research Fellows”
– Short-term and long-term visitors
• A diverse faculty steering committee
UW Data Science Education Efforts
7/21/2014 Bill Howe, UW 11
Students Non-Students
CS/Informatics Non-Major
professionals researchers
undergrads grads undergrads grads
UWEO Data Science Certificate
MOOC Intro to Data Science
IGERT: Big Data PhD Track
New CS Courses
Bootcamps and workshops
Intro to Data Programming
Data Science Masters (planned)
Incubator: hands-on training
12
Educational
transformation
Big Data access
and management
Big Data
modeling
Big Data analytics
Collaborative
Big Data scienceData
Key Activity: Foster Interdisciplinary Education
• Ultimate goal: A new PhD program
– Initial goal: A new certificate based on Big Data tracks in all departments
– Education highlights: data science courses, co-advising, and internships
• End-to-End Research Agenda
– Big Data mgmt, analytics, modeling, & collaboration
• Cyberinfrastructure Development
– Big Data analysis service
• Additional data science educational activities
– Coursera MOOCs
• Introduction to Data Science (Bill Howe)
• Computational Methods of Data Analysis (Nathan Kutz)
• High Performance Scientific Computing (Randy LeVeque)
– Traditional courses
• Many! Example: Biochemistry for Computer Scientists (Joe Hellerstein)
• We try to list relevant courses on the eScience Institute website
– UW Educational Outreach
• 3-course Certificate in Data Science
• 3-course Certificate in Cloud Data Management & Analytics
• 3-course Certificate in Cloud Application Development on Amazon Web Services
• 3-course Certificate in Data Visualization
– Workshops and bootcamps
• Software Carpentry (Winter & Spring 2013; Winter, Spring, & Summer 2014)
• Cosmology and Machine Learning (Autumn 2014)
• An open shared R&D space where researchers from
across the campus will come to collaborate
• A resident data science team
– Permanent staff of ~5 Data Scientists – applied research and development
– ~15-20 Data Science Fellows (research scientists, visitors, postdocs, students)
– Entrepreneurial mentorship
• Modes of engagement
– Drop-in open workspace
– Studio “Office Hours”
– Incubation Program
– Plus seminars, sponsored
lunches, workshops,
bootcamps, joint proposals …
Key Activity: “Re-establish the watercooler”
Key Activity: Create scalable impact through a
Data Science Incubation Program
• Scale and concentrate our efforts
– Move from “accidental” encounters to engineered partnerships
– Identify emerging opportunities around campus
– Provide a shared environment where researchers can learn from an in-house
team, external mentors, and each other
• A startup environment!
– “Seed grant” program
• Lightweight – 1-page proposals
– Significant potential for technology spinout – new markets for existing
technology and new technology for existing
markets
Key Activity: Democratize Access to Big Data and Big
Data Infrastructure
• SQLShare: Database-as-a-Service for scientists and engineers
• Myria: Easy, Scalable Analytics-as-a-Service
Open Data sharing platforms
• Database-as-a-service for open data analytics
• Interoperable with external tools and languages
• Local or cloud deployments
• Interoperable with existing database platforms
• Built-in data integration, profiling, analytics
Google
Fusion
Tables
17
Entrepreneurship
1) “Data once guarded for assumed but untested reasons is now
open, and we're seeing benefits.”
-- Nigel Shadbolt, Open Data Institute
2) Need to help “non-specialists within an organization use data
that had been the realm of programmers and DB admins”
-- Benjamin Romano, Xconomy
“Businesses are now using data the way scientists always have”
-- Jeff Hammerbacher, Cloudera
Halperin, Howe, et al. SSDBM 2013
19
Scalable Analytics as a Service
20
Kenya Health Information System Data
Grégoire Lurton
June 12, 2014
Abie FlaxmanDan HalperinGregoire
Lurton
In the beginning
In the beginning
“Much of the material remains unprocessed,
or, if processed, unanalyzed, or, if analyzed,
not read, or, if read, not used or acted upon”
Objectives
 Design generalizable method to process HIS-
like data
 Make important dataset available for
analysis
 Explore actionable data analysis of HIS data
Why do we care?
Metadata Trace - saving
Reports of year n saved in
January of year n+1
Years were not recorded for
the first year of use…
REDPy
Repeating Earthquake Detector (Python)
An eScience Incubator Project
Project Lead: Alicia Hotovec-Ellis
Data Scientist: Jake Vanderplas
John
Vidale
Alicia
Hotovec-Ellis
Jake
Vanderplas
What is a
“repeating” earthquake?
EVENT# 1
2
3
4
5
6
7
Why do we study
repeating earthquakes?
The problem(s)…
Time (minutes)
Time(HH:MM:SS)
Clustering for
Ordered in time
Event#
Event #
Ordered with OPTICS
Event#
Event #
I talked with Alicia a bit yesterday, and she showed me that her earthquake-repeater-
searching implementation is more general, and more powerful than I had thought, and
closer to trial by others (and I have a particular use in mind in the ongoing iMUSH
experiment on Mount St Helens)<snip>
So I'm encouraging her to continue to work on it a day per week or so for the
forseeable future, assuming you have the facilities to continue the incubation.
The project outlives the incubator……
Publications in the works on both the software and
the science – from three months of half-time work
Using Twitter data to identify geographic
clustering of anti-vaccination sentiments
Ben Brooks
June 12, 2014
Benjamin
Brooks
Andrew
Whitaker
Abie Flaxman
Initial approach
• Sentiment regarding vaccination can be discerned
from Twitter.
• Can we find city- or county-level pockets of anti-
vaccination sentiment?
• Do these locales correlate with outbreak and
vaccination rate data (beyond H1N1)?
Training data issues
• Training data from PSU study labeled
tweets as positive, negative, neutral, or
irrelevant.
• Many tweet categorizations seemed
suspect.
• Produced new training dataset; switched
approach to negative tweets vs. all others.
• Of tweets we labeled as negative, PSU
training data agreed with 36%.
• Sample non-negative tweets in
training dataset from PSU study:
• “RT @Lyn_Sue Lyn_Sue18 Reasons Why u
Should NOT Vaccinate Your Children
Against The Flu This Season”
• “1882 -3 O RT @alexHroz Citizens From
All Walks Intend To Refuse Swine Flu
"Vaccine,”
• “Eighteen Reasons Why You Should NOT
Vaccinate Your Children Against The Flu
This Season by Bill Sard”
• “Swine Flu Vaccine not necessary and not
healthy:”
Background: Previous work
• “For our sentiment classification, we used an ensemble method combining the Naive Bayes and the
Maximum Entropy classifiers…The accuracy of this ensemble classifier was 84.29%.”
Other sentiment approaches
• Precision Of all tweets labeled negative by the algorithm, what percentage are “true
negatives”?
• Recall Of all “true negative” tweets, what percentage are labeled negative by the algorithm?
Precision Recall
Vaccine-specific keywords 19% 59%
Modified general sentiment 25% 41%
Naïve Bayes 79% 19%
Logistic regression 70% 28%
Labeled data from PSU study 41% 36%
Other sentiment approaches
• Data labeled by human beings does not perform dramatically better than other classifiers!
Precision Recall
Vaccine-specific keywords 19% 59%
Modified general sentiment 25% 41%
Naïve Bayes 79% 19%
Logistic regression 70% 28%
Labeled data from PSU study 41% 36%
Scalable Analytics over Call Record Data in Developing Nations
Project Lead
Ian Kelley
Information School
University of Washington
E-mail: ikelley@uw.edu
eScience Data Incubator - 12 June 2014
Andrew WhitakerIan Kelley Josh Blumenstock
Map migration patterns of workers during labor
market shortages (Rwanda)
Measure and categorize mobility patterns
Determine peoples’ geographic center of gravity
Discover the effects of violent events on internal
population mobility (Afghanistan)
Track activity patterns over time; identify changes
Map connected areas of country
eScience Data Incubator - 12 June 2014
eScience Data Incubator - 12 June 2014
Average position during a time period (e.g., day, week)
eScience Data Incubator - 12 June 2014
Towards An Urban Science Incubation Cohort
44
OneBusAway:
Transit Traveler Information Systems
Foreclosure Rates and
changes in poverty
concentration
PNW Seismic Network
Early Warning System
Ocean Observatories Initiative
Education CRPE
Seattle the tech and innovation hub
• “most innovative state” (Bloomberg 12/13)
• “smartest city” (Fast Company, 11/13)
• only US city on “ten best Internet cities” (UBM’s Future
Cities blog, 8/13)
• ranked 2nd for women entrepreneurs (geekwire, 2/13)
• ranked 4th as global startup hub, > NYC (geekwire, 11/12)
• “the top tech city” (geekwire, 6/12)
• …and so on
45
eScience Institute + Urban Science
• Better public engagement than in physical and earth sciences
• Leverages our core interest in open data and open science
• Acute need relative to traditionally data-intensive fields
– relative newcomers in DS techniques and technologies
– We prefer collaborations with smaller labs and individuals as opposed to
“Big Science” projects
• Seattle offers a unique testbed as an urbanizing region
– Brookings “metro”: Interconnected urban, suburban, rural, environment
– Engaged, active communities
– Strong local interest in open data, open government
– Global hub for technology and innovation (next slide)
• Connections with King County Executive’s office, State CIO’s
office, Seattle CTO’s office, local gov data companies (Socrata) 46
Data Science @ UW
We are at the dawn of
a revolutionary new era of discovery and learning

Contenu connexe

Tendances

Knowledge-empowered Probabilistic Graphical Models for Physical-Cyber-Social ...
Knowledge-empowered Probabilistic Graphical Models for Physical-Cyber-Social ...Knowledge-empowered Probabilistic Graphical Models for Physical-Cyber-Social ...
Knowledge-empowered Probabilistic Graphical Models for Physical-Cyber-Social ...
Artificial Intelligence Institute at UofSC
 
RDAP14: Developing a cross-institutional data management plan for a major par...
RDAP14: Developing a cross-institutional data management plan for a major par...RDAP14: Developing a cross-institutional data management plan for a major par...
RDAP14: Developing a cross-institutional data management plan for a major par...
ASIS&T
 

Tendances (20)

2015 Kno.e.sis Center Annual Review
2015 Kno.e.sis Center Annual Review2015 Kno.e.sis Center Annual Review
2015 Kno.e.sis Center Annual Review
 
Knowledge-empowered Probabilistic Graphical Models for Physical-Cyber-Social ...
Knowledge-empowered Probabilistic Graphical Models for Physical-Cyber-Social ...Knowledge-empowered Probabilistic Graphical Models for Physical-Cyber-Social ...
Knowledge-empowered Probabilistic Graphical Models for Physical-Cyber-Social ...
 
Kno.e.sis Review: late 2012 to mid 2013
Kno.e.sis Review: late 2012 to mid 2013Kno.e.sis Review: late 2012 to mid 2013
Kno.e.sis Review: late 2012 to mid 2013
 
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
 
Broad Data
Broad DataBroad Data
Broad Data
 
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
 
Social and Physical Sensing Enabled Decision Support for Disaster Management ...
Social and Physical Sensing Enabled Decision Support for Disaster Management ...Social and Physical Sensing Enabled Decision Support for Disaster Management ...
Social and Physical Sensing Enabled Decision Support for Disaster Management ...
 
Data stories
Data storiesData stories
Data stories
 
The Evidence Hub: Harnessing the Collective Intelligence of Communities to Bu...
The Evidence Hub: Harnessing the Collective Intelligence of Communities to Bu...The Evidence Hub: Harnessing the Collective Intelligence of Communities to Bu...
The Evidence Hub: Harnessing the Collective Intelligence of Communities to Bu...
 
DATA CENTRIC EDUCATION & LEARNING
 DATA CENTRIC EDUCATION & LEARNING DATA CENTRIC EDUCATION & LEARNING
DATA CENTRIC EDUCATION & LEARNING
 
RDAP14: Developing a cross-institutional data management plan for a major par...
RDAP14: Developing a cross-institutional data management plan for a major par...RDAP14: Developing a cross-institutional data management plan for a major par...
RDAP14: Developing a cross-institutional data management plan for a major par...
 
A Blind Date With (Big) Data: Student Data in (Higher) Education
A Blind Date With (Big) Data: Student Data in (Higher) EducationA Blind Date With (Big) Data: Student Data in (Higher) Education
A Blind Date With (Big) Data: Student Data in (Higher) Education
 
The Power of Open Data!
The Power of Open Data!The Power of Open Data!
The Power of Open Data!
 
TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, ...
TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, ...TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, ...
TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, ...
 
SWOT Analysis - What Does it Tell Us?
SWOT Analysis - What Does it Tell Us?SWOT Analysis - What Does it Tell Us?
SWOT Analysis - What Does it Tell Us?
 
Best Practices for Sharing Economics Data
Best Practices for Sharing Economics DataBest Practices for Sharing Economics Data
Best Practices for Sharing Economics Data
 
The Analytics and Data Science Landscape
The Analytics and Data Science LandscapeThe Analytics and Data Science Landscape
The Analytics and Data Science Landscape
 
Research Life Cycle for GeoData 2014
Research Life Cycle for GeoData 2014Research Life Cycle for GeoData 2014
Research Life Cycle for GeoData 2014
 
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...
 
Explainable Fact Checking with Humans in-the-loop
Explainable Fact Checking with Humans in-the-loopExplainable Fact Checking with Humans in-the-loop
Explainable Fact Checking with Humans in-the-loop
 

Similaire à Data Science and Urban Science @ UW

NCME Big Data in Education
NCME Big Data  in EducationNCME Big Data  in Education
NCME Big Data in Education
Philip Piety
 
A Revolution in Open Science: Open Data and the Role of Libraries (Professor ...
A Revolution in Open Science: Open Data and the Role of Libraries (Professor ...A Revolution in Open Science: Open Data and the Role of Libraries (Professor ...
A Revolution in Open Science: Open Data and the Role of Libraries (Professor ...
LIBER Europe
 
Why should I care about information literacy?
Why should I care about information literacy? Why should I care about information literacy?
Why should I care about information literacy?
nmjb
 

Similaire à Data Science and Urban Science @ UW (20)

Ps rwebinar january2019final
Ps rwebinar january2019finalPs rwebinar january2019final
Ps rwebinar january2019final
 
The Future of Open Science
The Future of Open ScienceThe Future of Open Science
The Future of Open Science
 
NCME Big Data in Education
NCME Big Data  in EducationNCME Big Data  in Education
NCME Big Data in Education
 
Elsevier CWTS Open Data Report Presentation at RDA meeting in Barcelona
Elsevier CWTS Open Data Report Presentation at RDA meeting in Barcelona Elsevier CWTS Open Data Report Presentation at RDA meeting in Barcelona
Elsevier CWTS Open Data Report Presentation at RDA meeting in Barcelona
 
Critical infrastructure to promote data synthesis
Critical infrastructure to promote data synthesis Critical infrastructure to promote data synthesis
Critical infrastructure to promote data synthesis
 
Realizing the Potential of Research Data by Carole L. Palmer
Realizing the Potential of Research Data by Carole L. Palmer Realizing the Potential of Research Data by Carole L. Palmer
Realizing the Potential of Research Data by Carole L. Palmer
 
Data_Science_Applications_&_Use_Cases.pptx
Data_Science_Applications_&_Use_Cases.pptxData_Science_Applications_&_Use_Cases.pptx
Data_Science_Applications_&_Use_Cases.pptx
 
Data_Science_Applications_&_Use_Cases.pptx
Data_Science_Applications_&_Use_Cases.pptxData_Science_Applications_&_Use_Cases.pptx
Data_Science_Applications_&_Use_Cases.pptx
 
CODATA International Training Workshop in Big Data for Science for Researcher...
CODATA International Training Workshop in Big Data for Science for Researcher...CODATA International Training Workshop in Big Data for Science for Researcher...
CODATA International Training Workshop in Big Data for Science for Researcher...
 
UTS CIC2 Briefing, 17 June 2016
UTS CIC2 Briefing, 17 June 2016UTS CIC2 Briefing, 17 June 2016
UTS CIC2 Briefing, 17 June 2016
 
Data_Science_Applications_&_Use_Cases.pdf
Data_Science_Applications_&_Use_Cases.pdfData_Science_Applications_&_Use_Cases.pdf
Data_Science_Applications_&_Use_Cases.pdf
 
One View of Data Science
One View of Data ScienceOne View of Data Science
One View of Data Science
 
Data Science for Every Student at RPI
Data Science for Every Student at RPIData Science for Every Student at RPI
Data Science for Every Student at RPI
 
Data Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has ChangedData Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has Changed
 
A Revolution in Open Science: Open Data and the Role of Libraries (Professor ...
A Revolution in Open Science: Open Data and the Role of Libraries (Professor ...A Revolution in Open Science: Open Data and the Role of Libraries (Professor ...
A Revolution in Open Science: Open Data and the Role of Libraries (Professor ...
 
What Data Science Will Mean to You - One Person's View
What Data Science Will Mean to You - One Person's ViewWhat Data Science Will Mean to You - One Person's View
What Data Science Will Mean to You - One Person's View
 
PhRMA Some Early Thoughts
PhRMA Some Early ThoughtsPhRMA Some Early Thoughts
PhRMA Some Early Thoughts
 
Why should I care about information literacy?
Why should I care about information literacy? Why should I care about information literacy?
Why should I care about information literacy?
 
Qs1 group a
Qs1 group a Qs1 group a
Qs1 group a
 
Holmes "Institutional Infrastructure for Data Sharing"
Holmes "Institutional Infrastructure for Data Sharing"Holmes "Institutional Infrastructure for Data Sharing"
Holmes "Institutional Infrastructure for Data Sharing"
 

Plus de University of Washington

Database Agnostic Workload Management (CIDR 2019)
Database Agnostic Workload Management (CIDR 2019)Database Agnostic Workload Management (CIDR 2019)
Database Agnostic Workload Management (CIDR 2019)
University of Washington
 
Data Science, Data Curation, and Human-Data Interaction
Data Science, Data Curation, and Human-Data InteractionData Science, Data Curation, and Human-Data Interaction
Data Science, Data Curation, and Human-Data Interaction
University of Washington
 
Enabling Collaborative Research Data Management with SQLShare
Enabling Collaborative Research Data Management with SQLShareEnabling Collaborative Research Data Management with SQLShare
Enabling Collaborative Research Data Management with SQLShare
University of Washington
 

Plus de University of Washington (20)

Database Agnostic Workload Management (CIDR 2019)
Database Agnostic Workload Management (CIDR 2019)Database Agnostic Workload Management (CIDR 2019)
Database Agnostic Workload Management (CIDR 2019)
 
Data Responsibly: The next decade of data science
Data Responsibly: The next decade of data scienceData Responsibly: The next decade of data science
Data Responsibly: The next decade of data science
 
Thoughts on Big Data and more for the WA State Legislature
Thoughts on Big Data and more for the WA State LegislatureThoughts on Big Data and more for the WA State Legislature
Thoughts on Big Data and more for the WA State Legislature
 
The Other HPC: High Productivity Computing in Polystore Environments
The Other HPC: High Productivity Computing in Polystore EnvironmentsThe Other HPC: High Productivity Computing in Polystore Environments
The Other HPC: High Productivity Computing in Polystore Environments
 
Big Data + Big Sim: Query Processing over Unstructured CFD Models
Big Data + Big Sim: Query Processing over Unstructured CFD ModelsBig Data + Big Sim: Query Processing over Unstructured CFD Models
Big Data + Big Sim: Query Processing over Unstructured CFD Models
 
Data, Responsibly: The Next Decade of Data Science
Data, Responsibly: The Next Decade of Data ScienceData, Responsibly: The Next Decade of Data Science
Data, Responsibly: The Next Decade of Data Science
 
Democratizing Data Science in the Cloud
Democratizing Data Science in the CloudDemocratizing Data Science in the Cloud
Democratizing Data Science in the Cloud
 
Science Data, Responsibly
Science Data, ResponsiblyScience Data, Responsibly
Science Data, Responsibly
 
Data Science, Data Curation, and Human-Data Interaction
Data Science, Data Curation, and Human-Data InteractionData Science, Data Curation, and Human-Data Interaction
Data Science, Data Curation, and Human-Data Interaction
 
Big Data Middleware: CIDR 2015 Gong Show Talk, David Maier, Bill Howe
Big Data Middleware: CIDR 2015 Gong Show Talk, David Maier, Bill Howe Big Data Middleware: CIDR 2015 Gong Show Talk, David Maier, Bill Howe
Big Data Middleware: CIDR 2015 Gong Show Talk, David Maier, Bill Howe
 
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
 
XLDB South America Keynote: eScience Institute and Myria
XLDB South America Keynote: eScience Institute and MyriaXLDB South America Keynote: eScience Institute and Myria
XLDB South America Keynote: eScience Institute and Myria
 
Myria: Analytics-as-a-Service for (Data) Scientists
Myria: Analytics-as-a-Service for (Data) ScientistsMyria: Analytics-as-a-Service for (Data) Scientists
Myria: Analytics-as-a-Service for (Data) Scientists
 
eResearch New Zealand Keynote
eResearch New Zealand KeynoteeResearch New Zealand Keynote
eResearch New Zealand Keynote
 
Data science curricula at UW
Data science curricula at UWData science curricula at UW
Data science curricula at UW
 
Enabling Collaborative Research Data Management with SQLShare
Enabling Collaborative Research Data Management with SQLShareEnabling Collaborative Research Data Management with SQLShare
Enabling Collaborative Research Data Management with SQLShare
 
Virtual Appliances, Cloud Computing, and Reproducible Research
Virtual Appliances, Cloud Computing, and Reproducible ResearchVirtual Appliances, Cloud Computing, and Reproducible Research
Virtual Appliances, Cloud Computing, and Reproducible Research
 
End-to-End eScience
End-to-End eScienceEnd-to-End eScience
End-to-End eScience
 
HaLoop: Efficient Iterative Processing on Large-Scale Clusters
HaLoop: Efficient Iterative Processing on Large-Scale ClustersHaLoop: Efficient Iterative Processing on Large-Scale Clusters
HaLoop: Efficient Iterative Processing on Large-Scale Clusters
 
Query-Driven Visualization in the Cloud with MapReduce
Query-Driven Visualization in the Cloud with MapReduce Query-Driven Visualization in the Cloud with MapReduce
Query-Driven Visualization in the Cloud with MapReduce
 

Dernier

Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
PirithiRaju
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
PirithiRaju
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Sérgio Sacani
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
RizalinePalanog2
 
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
Lokesh Kothari
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Lokesh Kothari
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
Areesha Ahmad
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
RohitNehra6
 

Dernier (20)

Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
 
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
 
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
 
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICESAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
Creating and Analyzing Definitive Screening Designs
Creating and Analyzing Definitive Screening DesignsCreating and Analyzing Definitive Screening Designs
Creating and Analyzing Definitive Screening Designs
 

Data Science and Urban Science @ UW

  • 2. 2 “It’s a great time to be a data geek.” -- Roger Barga, Microsoft Research “The greatest minds of my generation are trying to figure out how to make people click on ads” -- Jeff Hammerbacher, co-founder, Cloudera
  • 3. The Fourth Paradigm 1. Empirical + experimental 2. Theoretical 3. Computational 4. Data-Intensive Jim Gray 7/21/2014 Bill Howe, UW 3
  • 4. “All across our campus, the process of discovery will increasingly rely on researchers’ ability to extract knowledge from vast amounts of data… In order to remain at the forefront, UW must be a leader in advancing these techniques and technologies, and in making [them] accessible to researchers in the broadest imaginable range of fields.” 2005-2008 In other words: • Data-driven discovery will be ubiquitous • UW must be a leader in inventing the capabilities • UW must be a leader in translational activities – in putting these capabilities to work • It’s about intellectual infrastructure (human capital) and software infrastructure (shared tools and services – digital capital)
  • 5. A 5-year, US$37.8 million cross-institutional collaboration to create a data science environment 5 2014
  • 6. $9.3 million from Washington Research Foundation to Amplify the Moore/Sloan effort • 6 X 5-year Faculty lines in Data Science • 6 X startup packages • 15 X 3 yr postdoctoral fellows • Funds to remodel and furnish a WRF Data Science Studio • Also $7.1 million to closely-related Institute for Neuroengineering, $8.0 million to Institute for Protein Design, $6.7 million to Clean Energy Institute 6
  • 7. 7/21/2014 Bill Howe, UW 7 Data Science Kickoff Session: 137 posters from 30+ departments and units
  • 8. 8 PIs on Moore/Sloan effort + eScience Institute Steering Committee + UW participants in February 7 Data Science poster session Broad collaborations
  • 9. Establish a virtuous cycle • 6 working groups, each with • 3-6 faculty from each institution
  • 10. Key Activity: Promote interdisciplinary careers • Interdisciplinary graduate students – New, interdisciplinary “Data Science” Ph.D. tracks and program • Interdisciplinary postdocs (“Data Science Fellows”) – Dual-mentored postdocs with interests in both methods and a domain science • Interdisciplinary research scientists (“Data Scientists”) • Work across disciplines to solve people’s data science challenges • Interdisciplinary faculty – Supported with special hiring and funding initiatives • “Senior Research Fellows” – Short-term and long-term visitors • A diverse faculty steering committee
  • 11. UW Data Science Education Efforts 7/21/2014 Bill Howe, UW 11 Students Non-Students CS/Informatics Non-Major professionals researchers undergrads grads undergrads grads UWEO Data Science Certificate MOOC Intro to Data Science IGERT: Big Data PhD Track New CS Courses Bootcamps and workshops Intro to Data Programming Data Science Masters (planned) Incubator: hands-on training
  • 12. 12 Educational transformation Big Data access and management Big Data modeling Big Data analytics Collaborative Big Data scienceData Key Activity: Foster Interdisciplinary Education • Ultimate goal: A new PhD program – Initial goal: A new certificate based on Big Data tracks in all departments – Education highlights: data science courses, co-advising, and internships • End-to-End Research Agenda – Big Data mgmt, analytics, modeling, & collaboration • Cyberinfrastructure Development – Big Data analysis service
  • 13. • Additional data science educational activities – Coursera MOOCs • Introduction to Data Science (Bill Howe) • Computational Methods of Data Analysis (Nathan Kutz) • High Performance Scientific Computing (Randy LeVeque) – Traditional courses • Many! Example: Biochemistry for Computer Scientists (Joe Hellerstein) • We try to list relevant courses on the eScience Institute website – UW Educational Outreach • 3-course Certificate in Data Science • 3-course Certificate in Cloud Data Management & Analytics • 3-course Certificate in Cloud Application Development on Amazon Web Services • 3-course Certificate in Data Visualization – Workshops and bootcamps • Software Carpentry (Winter & Spring 2013; Winter, Spring, & Summer 2014) • Cosmology and Machine Learning (Autumn 2014)
  • 14. • An open shared R&D space where researchers from across the campus will come to collaborate • A resident data science team – Permanent staff of ~5 Data Scientists – applied research and development – ~15-20 Data Science Fellows (research scientists, visitors, postdocs, students) – Entrepreneurial mentorship • Modes of engagement – Drop-in open workspace – Studio “Office Hours” – Incubation Program – Plus seminars, sponsored lunches, workshops, bootcamps, joint proposals … Key Activity: “Re-establish the watercooler”
  • 15. Key Activity: Create scalable impact through a Data Science Incubation Program • Scale and concentrate our efforts – Move from “accidental” encounters to engineered partnerships – Identify emerging opportunities around campus – Provide a shared environment where researchers can learn from an in-house team, external mentors, and each other • A startup environment! – “Seed grant” program • Lightweight – 1-page proposals – Significant potential for technology spinout – new markets for existing technology and new technology for existing markets
  • 16. Key Activity: Democratize Access to Big Data and Big Data Infrastructure • SQLShare: Database-as-a-Service for scientists and engineers • Myria: Easy, Scalable Analytics-as-a-Service
  • 17. Open Data sharing platforms • Database-as-a-service for open data analytics • Interoperable with external tools and languages • Local or cloud deployments • Interoperable with existing database platforms • Built-in data integration, profiling, analytics Google Fusion Tables 17 Entrepreneurship 1) “Data once guarded for assumed but untested reasons is now open, and we're seeing benefits.” -- Nigel Shadbolt, Open Data Institute 2) Need to help “non-specialists within an organization use data that had been the realm of programmers and DB admins” -- Benjamin Romano, Xconomy “Businesses are now using data the way scientists always have” -- Jeff Hammerbacher, Cloudera
  • 18. Halperin, Howe, et al. SSDBM 2013
  • 20. 20
  • 21. Kenya Health Information System Data Grégoire Lurton June 12, 2014 Abie FlaxmanDan HalperinGregoire Lurton
  • 24. “Much of the material remains unprocessed, or, if processed, unanalyzed, or, if analyzed, not read, or, if read, not used or acted upon” Objectives  Design generalizable method to process HIS- like data  Make important dataset available for analysis  Explore actionable data analysis of HIS data Why do we care?
  • 25. Metadata Trace - saving Reports of year n saved in January of year n+1 Years were not recorded for the first year of use…
  • 26.
  • 27.
  • 28. REDPy Repeating Earthquake Detector (Python) An eScience Incubator Project Project Lead: Alicia Hotovec-Ellis Data Scientist: Jake Vanderplas John Vidale Alicia Hotovec-Ellis Jake Vanderplas
  • 29. What is a “repeating” earthquake? EVENT# 1 2 3 4 5 6 7
  • 30. Why do we study repeating earthquakes?
  • 32. Clustering for Ordered in time Event# Event # Ordered with OPTICS Event# Event #
  • 33. I talked with Alicia a bit yesterday, and she showed me that her earthquake-repeater- searching implementation is more general, and more powerful than I had thought, and closer to trial by others (and I have a particular use in mind in the ongoing iMUSH experiment on Mount St Helens)<snip> So I'm encouraging her to continue to work on it a day per week or so for the forseeable future, assuming you have the facilities to continue the incubation. The project outlives the incubator…… Publications in the works on both the software and the science – from three months of half-time work
  • 34. Using Twitter data to identify geographic clustering of anti-vaccination sentiments Ben Brooks June 12, 2014 Benjamin Brooks Andrew Whitaker Abie Flaxman
  • 35. Initial approach • Sentiment regarding vaccination can be discerned from Twitter. • Can we find city- or county-level pockets of anti- vaccination sentiment? • Do these locales correlate with outbreak and vaccination rate data (beyond H1N1)?
  • 36. Training data issues • Training data from PSU study labeled tweets as positive, negative, neutral, or irrelevant. • Many tweet categorizations seemed suspect. • Produced new training dataset; switched approach to negative tweets vs. all others. • Of tweets we labeled as negative, PSU training data agreed with 36%. • Sample non-negative tweets in training dataset from PSU study: • “RT @Lyn_Sue Lyn_Sue18 Reasons Why u Should NOT Vaccinate Your Children Against The Flu This Season” • “1882 -3 O RT @alexHroz Citizens From All Walks Intend To Refuse Swine Flu "Vaccine,” • “Eighteen Reasons Why You Should NOT Vaccinate Your Children Against The Flu This Season by Bill Sard” • “Swine Flu Vaccine not necessary and not healthy:”
  • 37. Background: Previous work • “For our sentiment classification, we used an ensemble method combining the Naive Bayes and the Maximum Entropy classifiers…The accuracy of this ensemble classifier was 84.29%.”
  • 38. Other sentiment approaches • Precision Of all tweets labeled negative by the algorithm, what percentage are “true negatives”? • Recall Of all “true negative” tweets, what percentage are labeled negative by the algorithm? Precision Recall Vaccine-specific keywords 19% 59% Modified general sentiment 25% 41% Naïve Bayes 79% 19% Logistic regression 70% 28% Labeled data from PSU study 41% 36%
  • 39. Other sentiment approaches • Data labeled by human beings does not perform dramatically better than other classifiers! Precision Recall Vaccine-specific keywords 19% 59% Modified general sentiment 25% 41% Naïve Bayes 79% 19% Logistic regression 70% 28% Labeled data from PSU study 41% 36%
  • 40. Scalable Analytics over Call Record Data in Developing Nations Project Lead Ian Kelley Information School University of Washington E-mail: ikelley@uw.edu eScience Data Incubator - 12 June 2014 Andrew WhitakerIan Kelley Josh Blumenstock
  • 41. Map migration patterns of workers during labor market shortages (Rwanda) Measure and categorize mobility patterns Determine peoples’ geographic center of gravity Discover the effects of violent events on internal population mobility (Afghanistan) Track activity patterns over time; identify changes Map connected areas of country eScience Data Incubator - 12 June 2014
  • 42. eScience Data Incubator - 12 June 2014 Average position during a time period (e.g., day, week)
  • 43. eScience Data Incubator - 12 June 2014
  • 44. Towards An Urban Science Incubation Cohort 44 OneBusAway: Transit Traveler Information Systems Foreclosure Rates and changes in poverty concentration PNW Seismic Network Early Warning System Ocean Observatories Initiative Education CRPE
  • 45. Seattle the tech and innovation hub • “most innovative state” (Bloomberg 12/13) • “smartest city” (Fast Company, 11/13) • only US city on “ten best Internet cities” (UBM’s Future Cities blog, 8/13) • ranked 2nd for women entrepreneurs (geekwire, 2/13) • ranked 4th as global startup hub, > NYC (geekwire, 11/12) • “the top tech city” (geekwire, 6/12) • …and so on 45
  • 46. eScience Institute + Urban Science • Better public engagement than in physical and earth sciences • Leverages our core interest in open data and open science • Acute need relative to traditionally data-intensive fields – relative newcomers in DS techniques and technologies – We prefer collaborations with smaller labs and individuals as opposed to “Big Science” projects • Seattle offers a unique testbed as an urbanizing region – Brookings “metro”: Interconnected urban, suburban, rural, environment – Engaged, active communities – Strong local interest in open data, open government – Global hub for technology and innovation (next slide) • Connections with King County Executive’s office, State CIO’s office, Seattle CTO’s office, local gov data companies (Socrata) 46
  • 47. Data Science @ UW We are at the dawn of a revolutionary new era of discovery and learning

Notes de l'éditeur

  1. 3
  2. Institutional change rather than specific research projects
  3. Institutional change rather than specific research projects
  4. What is the studio? it’s an open research space where anyone on campus can come to collaborate with a data science team that consists of a several permanent staff with expertise in databases, machine learning, visualization, software engineering, reproducibility, cluster and cloud computing – these are new “research and development” career paths in applied data science, attracting those with significant software backgrounds interested in applying their expertise to science problems The Studio will also house a number of data science fellows – partially funded research scientists, visiting scientists, postdocs, and students (including IGERT students as Magda discussed) The Studio will be a delivery vector for a number of activities – the seminar series, the lunches, workshops and bootcamps. But you can engage directly with the Studio in a number of ways: the space will be designed to support drop-in collaboration, we will hold scheduled office hours, and a flagship program that I’m really excited about is our data science incubator, which I’ll describe in a moment.
  5. These data science collaborations can spin out tools like SQLShare, but we need to make these technology-oriented collaborations more common. The next generation of this is an incubation program to scale and concentrate our collaborations We want to move from “accidental” encounters to engineered partnerships -- identify promising new opportunities and new partners around campus and invest our time with them. We need a shared environment where researchers can learn not only from our team, but also external mentors and most importantly **each other** – we routinely find shared solutions across very different fields. John Wilkerson in political science is using sequence alignment algorithms from biology for text analytics to trace the flow of ideas through legislation – he’ll have a student participating in our incubation program this spring. And we intend this to be a true startup environment, with siginifcant potential for technology spinout. We can help find new markets for existting technology as well as finding opportunities for new technologies.
  6. Let me give you a brief example of a project a little further upstream that the incubation program can provide access to. This work is in a space of open data sharing platforms, along with Socrata here in Seattle, products from Google and Microsoft, and a number of other companies. Two observations motivate the products in this space: First, there’s a movement toward open data that has researchers, government agencies, and even companies exposing their data assets online for use by others for reasons of transparency, efficiency, accountability. Even for commercial data, there are marketplaces emerging to facilitate the buying and selling of data. All of these use cases need new technology. So that’s one reason. Second, if you’re going to use someone else’s data, you need it to be as accessible as possible. In particular, you need to help data analysts use the data “had previously been the realm of programmers and DB adminsistrators” – here I’m quoting Benjamin Romano from in an Xconomy article about Socrata. SQLShare is an open data system, but emphasizes rich data manipulation rather than just fetch and retrieval, interoperability with external tools and existing databases, local or cloud deployments, and built-in services for data integration, profiling, and visualization. Ginger mentioned this system in her talk – we have maintained a production deployment here on campus for three years focusing on science users. Our observation is that science use cases are a predictor for commercial use cases – businesses are beginning to use data the same way scientists always have – they collect it aggressively, torture it with analytics, use it to make predictions about the world. So we think if we can handle these difficult science use cases that we will also be addressing a significant commercial problem.
  7. GENERALISATION POSSIBLE – NOT A ONE OFF ISSUE
  8. BIEN DECRIRE LE GRAPH ET LES AXES
  9. Cluster in space and often in time
  10. Everywhere! But small, often missed by routine network detection… Eruptions!! What can they tell us?… What’s the state of the science?
  11. Great – we have a classifiers that is accurate. Let’s extend this work to see to what extent opinions manifest themselves in actions of public health importance. A lot of discussion in media on return of vaccine-preventable disease outbreaks. Fear of autism, etc.
  12. H1N1 vaccination rates recorded in January 2010 (older than 6 months) vs. average sentiment score of users in regions (black) and states (gray) Impressed with accuracy of 84% in a 4-class problem. Not a surprising result that state-level information might not be that strongly correlated. Wanted to dig deeper into the geographic features of users.
  13. MENTION INABILITY TO REPRODUCE ACCURATE CLASSIFIER
  14. Dow Constantine’s office, King County Executive Fred Jarrett, Chief of Staff for King County Tom Stritikus, Dean of the UW College of Education Thaisa Way, Landscape Architecture Bill Glenn, Socrata local company behind data.seattle.gov