SlideShare une entreprise Scribd logo
1  sur  46
Télécharger pour lire hors ligne
Open Government
Data & MongoDB
        Luigi Montanez
 luigi@sunlightfoundation.com
Question? @LuigiMontanez
Open Data + Open Source
  = Open Government



                Question? @LuigiMontanez
MongoDB enables
   open data

          Question? @LuigiMontanez
Opening Up Data
✴   Gather data from disparate sources
     ✴   Data dumps (SQL, Fixed-width columns)
     ✴   Web scraping
     ✴   Text/PDF parsing
✴   Serving RESTful JSON APIs



                                  Question? @LuigiMontanez
JSON
✴   Tree structure, not tabular
✴   Still relational
✴   JSON for data, XML for documents
✴   Closely resembles native data structures
✴   No manual parsing needed



                                  Question? @LuigiMontanez
Three Projects
✴   Poligraft
✴   Real Time Congress API
✴   Open State Project




                             Question? @LuigiMontanez
Three Projects
✴   Poligraft
✴   Real Time Congress API
✴   Open State Project




                             Question? @LuigiMontanez
App design
    drives
schema design

          Question? @LuigiMontanez
{
  "title": "President Obama's climate 'Plan B' in hot water -
Darren Samuelsohn - POLITICO.com"
}




                            Text
{
  "title": "President Obama's climate 'Plan B' in hot water -
Darren Samuelsohn - POLITICO.com",
  "slug": "EOsc",
  "source_url": "http://www.politico.com/news/stories/
  0810/40534.html",
  "content": ".................",
}
                            Text
{
  "title": "President Obama's climate 'Plan B' in hot water -
Darren Samuelsohn - POLITICO.com",
  "slug": "EOsc",
  "source_url": "http://www.politico.com/news/stories/
  0810/40534.html",
  "content": ".................",
  "entities": [...]         Text
}
{
  "title": "President Obama's climate 'Plan B' in hot water -
Darren Samuelsohn - POLITICO.com",
  "slug": "EOsc",
  "source_url": "http://www.politico.com/news/stories/
  0810/40534.html",
  "content": ".................",
  "entities": [
      {
                            Text
      "name": "Barack Obama",
      "type": "politician",
      },
        ...
  ]
}
{
  "title": "President Obama's climate 'Plan B' in hot water -
Darren Samuelsohn - POLITICO.com",
  "slug": "EOsc",
  "source_url": "http://www.politico.com/news/stories/
  0810/40534.html",
  "content": ".................",
  "entities": [
      {

                            Text
      "name": "Barack Obama",
      "type": "politician",
      "breakdown": {"indiv": "33", "pac": "67"}
      "top_industries": ["Lawyers/Lobbyists","Finance/Insurance/
      Real Estate","Misc. Business"]
      },
        ...
  ]
}
Natural Schemas


           Question? @LuigiMontanez
Three Projects
✴   Poligraft
✴   Real Time Congress API
✴   Open State Project




                             Question? @LuigiMontanez
Real-Time Congress API




                 Credit: vgm8383 on Flickr
Android App: “Congress”
Politiwidgets
Requirements
✴   Aggregate lots of data
      Biographical, Bills, Votes, Earmarks,
      Video Clips, Floor Updates, Legislative
      Documents, Committee Schedules,
      Contributions, Interest Group Ratings
✴   Lightweight responses



                                  Question? @LuigiMontanez
{legislator: {
    in_office: true,
    title: "Rep",
    nickname: "",
    district: "9",
    bioguide_id: "L000551",
    govtrack_id: "400237",
    phone: "202-225-2661",
    website: "http://lee.house.gov/index.html",
    twitter_id: "",
    last_name: "Lee",
    name_suffix: "",
    last_updated: "2010/04/13 00:00:14 +0000",
    party: "D",
    chamber: "house",
    state: "CA",
    youtube_url: "http://www.youtube.com/RepLee",
    first_name: "Barbara",
    gender: "F",
    congress_office: "2444 Rayburn House Office Building",
    earmarks: {
          average_number: 20,
          total_amount: 10000000,
          average_amount: 22994535,
          total_number: 28,
          last_updated: "2010-03-18",
          fiscal_year: 2010,
    }
    ...
}
// limit selection to a subset of fields
db.people.find( { 'first_name' : 'john' },
                { 'last_name' : 1,
                  'address' : 1 } );

// use dot-notation to dig into an object
db.people.find( { 'state': 'CA' },
                { 'address.zip_code': 1 } );
?sections=last_name,first_name,state,earmarks

  {legislator: {
      last_name: "Lee",
      first_name: "Barbara",
      state: "CA",
      earmarks: {
            average_number: 20,
            total_amount: 10000000,
            average_amount: 22994535,
            total_number: 28,
            last_updated: "2010-03-18",
            fiscal_year: 2010,
      }
  }
?sections=last_name,first_name,state,earmarks.total_amount,earmarks.total_number




         {legislator: {
             last_name: "Lee",
             first_name: "Barbara",
             state: "CA",
             earmarks: {
                   total_amount: 10000000,
                   total_number: 28
             }
         }
Partial responses
 make payloads
     smaller

            Question? @LuigiMontanez
Three Projects
✴   Poligraft
✴   Real Time Congress API
✴   Open State Project




                             Question? @LuigiMontanez
50 States =
50 Formats

         Question? @LuigiMontanez
Schemalessness
allows for granular
      control

             Question? @LuigiMontanez
Custom Fields
✴   Traditional RDBMS
     ✴   Update the schema for new fields, run a
         migration, feel icky
     ✴   Create a custom_fields table
✴   MongoDB
     ✴   Just store it


                                   Question? @LuigiMontanez
Speaking JSON
   natively

         Question? @LuigiMontanez
Python
Source   Scraped JSON               PostgreSQL
                        Transform
Source   Scraped JSON   MongoDB
Three Projects
✴   Poligraft
✴   Real Time Congress API
✴   Open State Project




                             Question? @LuigiMontanez
Developer Happiness
Thanks!
sunlightlabs.com
@LuigiMontanez



                   Question? @LuigiMontanez

Contenu connexe

Tendances

Turning Data Into Narrative
Turning Data Into NarrativeTurning Data Into Narrative
Turning Data Into NarrativeDaniel X. O'Neil
 
Finding data: advanced search operators
Finding data: advanced search operatorsFinding data: advanced search operators
Finding data: advanced search operatorsPaul Bradshaw
 
#smxlondon Everything You Need to Know About How GraphSearch Works in 15-ish ...
#smxlondon Everything You Need to Know About How GraphSearch Works in 15-ish ...#smxlondon Everything You Need to Know About How GraphSearch Works in 15-ish ...
#smxlondon Everything You Need to Know About How GraphSearch Works in 15-ish ...Kelvin Newman
 
Journalists and the Social Web 1
Journalists and the Social Web 1Journalists and the Social Web 1
Journalists and the Social Web 1ardessie
 
Ric Rodriguez - Search In 2020 - it's No Longer About Ranking
Ric Rodriguez - Search In 2020 - it's No Longer About RankingRic Rodriguez - Search In 2020 - it's No Longer About Ranking
Ric Rodriguez - Search In 2020 - it's No Longer About RankingRic Rodriguez
 
Fluentd meetup intro
Fluentd meetup introFluentd meetup intro
Fluentd meetup introtd_kiyoto
 
The Google Hacking Database: A Key Resource to Exposing Vulnerabilities
The Google Hacking Database: A Key Resource to Exposing VulnerabilitiesThe Google Hacking Database: A Key Resource to Exposing Vulnerabilities
The Google Hacking Database: A Key Resource to Exposing VulnerabilitiesTechWell
 
Beyond Google: Advanced Internet Search Tips and Tricks
Beyond Google: Advanced Internet Search Tips and TricksBeyond Google: Advanced Internet Search Tips and Tricks
Beyond Google: Advanced Internet Search Tips and TricksGenealogyMedia.com
 
Plv Hal History Day
Plv Hal History DayPlv Hal History Day
Plv Hal History DayESU_THREE
 
The Analects of Confucius
The Analects of ConfuciusThe Analects of Confucius
The Analects of Confuciusjp1221
 
FSDN conversations
FSDN conversationsFSDN conversations
FSDN conversationsvhepworth
 
Google Cheat Sheet
Google Cheat SheetGoogle Cheat Sheet
Google Cheat Sheetguest47b8f5d
 
Mpl brownbag sept2011
Mpl brownbag sept2011Mpl brownbag sept2011
Mpl brownbag sept2011Jason Coleman
 
Beyond Google: Advanced Search
Beyond Google: Advanced SearchBeyond Google: Advanced Search
Beyond Google: Advanced SearchGenealogyMedia.com
 
(Re-) Discovering Lost Web Pages
(Re-) Discovering Lost Web Pages(Re-) Discovering Lost Web Pages
(Re-) Discovering Lost Web PagesMichael Nelson
 

Tendances (20)

Turning Data Into Narrative
Turning Data Into NarrativeTurning Data Into Narrative
Turning Data Into Narrative
 
Footnotes
FootnotesFootnotes
Footnotes
 
Finding data: advanced search operators
Finding data: advanced search operatorsFinding data: advanced search operators
Finding data: advanced search operators
 
#smxlondon Everything You Need to Know About How GraphSearch Works in 15-ish ...
#smxlondon Everything You Need to Know About How GraphSearch Works in 15-ish ...#smxlondon Everything You Need to Know About How GraphSearch Works in 15-ish ...
#smxlondon Everything You Need to Know About How GraphSearch Works in 15-ish ...
 
3 google hacking
3 google hacking3 google hacking
3 google hacking
 
Hacking
HackingHacking
Hacking
 
Journalists and the Social Web 1
Journalists and the Social Web 1Journalists and the Social Web 1
Journalists and the Social Web 1
 
Ric Rodriguez - Search In 2020 - it's No Longer About Ranking
Ric Rodriguez - Search In 2020 - it's No Longer About RankingRic Rodriguez - Search In 2020 - it's No Longer About Ranking
Ric Rodriguez - Search In 2020 - it's No Longer About Ranking
 
Rfl dfn search1
Rfl dfn search1Rfl dfn search1
Rfl dfn search1
 
Fluentd meetup intro
Fluentd meetup introFluentd meetup intro
Fluentd meetup intro
 
The Google Hacking Database: A Key Resource to Exposing Vulnerabilities
The Google Hacking Database: A Key Resource to Exposing VulnerabilitiesThe Google Hacking Database: A Key Resource to Exposing Vulnerabilities
The Google Hacking Database: A Key Resource to Exposing Vulnerabilities
 
Beyond Google: Advanced Internet Search Tips and Tricks
Beyond Google: Advanced Internet Search Tips and TricksBeyond Google: Advanced Internet Search Tips and Tricks
Beyond Google: Advanced Internet Search Tips and Tricks
 
Plv Hal History Day
Plv Hal History DayPlv Hal History Day
Plv Hal History Day
 
The Analects of Confucius
The Analects of ConfuciusThe Analects of Confucius
The Analects of Confucius
 
FSDN conversations
FSDN conversationsFSDN conversations
FSDN conversations
 
Google Cheat Sheet
Google Cheat SheetGoogle Cheat Sheet
Google Cheat Sheet
 
Mpl brownbag sept2011
Mpl brownbag sept2011Mpl brownbag sept2011
Mpl brownbag sept2011
 
Beyond Google: Advanced Search
Beyond Google: Advanced SearchBeyond Google: Advanced Search
Beyond Google: Advanced Search
 
(Re-) Discovering Lost Web Pages
(Re-) Discovering Lost Web Pages(Re-) Discovering Lost Web Pages
(Re-) Discovering Lost Web Pages
 
Google Searchology
Google SearchologyGoogle Searchology
Google Searchology
 

Similaire à Open Government Data & MongoDB Enables Open Data Projects

Sunlight Labs & MongoDB @ MongoDC
Sunlight Labs & MongoDB @ MongoDCSunlight Labs & MongoDB @ MongoDC
Sunlight Labs & MongoDB @ MongoDCLuigi Montanez
 
gRPC vs REST: let the battle begin!
gRPC vs REST: let the battle begin!gRPC vs REST: let the battle begin!
gRPC vs REST: let the battle begin!Alex Borysov
 
IRE "Better Watchdog" workshop presentation "Data: Now I've got it, what do I...
IRE "Better Watchdog" workshop presentation "Data: Now I've got it, what do I...IRE "Better Watchdog" workshop presentation "Data: Now I've got it, what do I...
IRE "Better Watchdog" workshop presentation "Data: Now I've got it, what do I...J T "Tom" Johnson
 
Civic Hacking @ Ruby Hoedown
Civic Hacking @ Ruby HoedownCivic Hacking @ Ruby Hoedown
Civic Hacking @ Ruby HoedownLuigi Montanez
 
Harvesting Social Media (in BESOCIAL)
Harvesting Social Media (in BESOCIAL)Harvesting Social Media (in BESOCIAL)
Harvesting Social Media (in BESOCIAL)Sven Lieber
 
gRPC vs REST: let the battle begin!
gRPC vs REST: let the battle begin!gRPC vs REST: let the battle begin!
gRPC vs REST: let the battle begin!Alex Borysov
 
BBC Linked Data Platform (SemTechBiz San Fran 2013)
BBC Linked Data Platform (SemTechBiz San Fran 2013)BBC Linked Data Platform (SemTechBiz San Fran 2013)
BBC Linked Data Platform (SemTechBiz San Fran 2013)Dave Rogers
 
MongoDB In Production At Sailthru
MongoDB In Production At SailthruMongoDB In Production At Sailthru
MongoDB In Production At Sailthruibwhite
 
Building Next-Generation Web APIs with JSON-LD and Hydra
Building Next-Generation Web APIs with JSON-LD and HydraBuilding Next-Generation Web APIs with JSON-LD and Hydra
Building Next-Generation Web APIs with JSON-LD and HydraMarkus Lanthaler
 
R, HTTP, and APIs, with a preview of TopicWatchr
R, HTTP, and APIs, with a preview of TopicWatchrR, HTTP, and APIs, with a preview of TopicWatchr
R, HTTP, and APIs, with a preview of TopicWatchrPortland R User Group
 
"R, HTTP, and APIs, with a preview of TopicWatchr" (15 November 2011)
"R, HTTP, and APIs, with a preview of TopicWatchr" (15 November 2011)"R, HTTP, and APIs, with a preview of TopicWatchr" (15 November 2011)
"R, HTTP, and APIs, with a preview of TopicWatchr" (15 November 2011)Portland R User Group
 
A Real-World Implementation of Linked Data
A Real-World Implementation of Linked DataA Real-World Implementation of Linked Data
A Real-World Implementation of Linked DataDimitri van Hees
 
xAPI Camp-Four Lines of Code
xAPI Camp-Four Lines of CodexAPI Camp-Four Lines of Code
xAPI Camp-Four Lines of CodeAnthony Altieri
 
Semantic Web Science
Semantic Web ScienceSemantic Web Science
Semantic Web ScienceJames Hendler
 
Mongo at Sailthru (MongoNYC 2011)
Mongo at Sailthru (MongoNYC 2011)Mongo at Sailthru (MongoNYC 2011)
Mongo at Sailthru (MongoNYC 2011)ibwhite
 
Seo; Cutting Through The Noise
Seo; Cutting Through The NoiseSeo; Cutting Through The Noise
Seo; Cutting Through The NoiseBill Slawski
 
"gRPC vs REST: let the battle begin!" GeeCON Krakow 2018 edition
"gRPC vs REST: let the battle begin!" GeeCON Krakow 2018 edition"gRPC vs REST: let the battle begin!" GeeCON Krakow 2018 edition
"gRPC vs REST: let the battle begin!" GeeCON Krakow 2018 editionAlex Borysov
 
AMS, API, RAILS and a developer, a Love Story
AMS, API, RAILS and a developer, a Love StoryAMS, API, RAILS and a developer, a Love Story
AMS, API, RAILS and a developer, a Love StoryJoão Moura
 

Similaire à Open Government Data & MongoDB Enables Open Data Projects (20)

Sunlight Labs & MongoDB @ MongoDC
Sunlight Labs & MongoDB @ MongoDCSunlight Labs & MongoDB @ MongoDC
Sunlight Labs & MongoDB @ MongoDC
 
gRPC vs REST: let the battle begin!
gRPC vs REST: let the battle begin!gRPC vs REST: let the battle begin!
gRPC vs REST: let the battle begin!
 
Google Machine Learning APIs - puppies or muffins?
Google Machine Learning APIs - puppies or muffins?Google Machine Learning APIs - puppies or muffins?
Google Machine Learning APIs - puppies or muffins?
 
IRE "Better Watchdog" workshop presentation "Data: Now I've got it, what do I...
IRE "Better Watchdog" workshop presentation "Data: Now I've got it, what do I...IRE "Better Watchdog" workshop presentation "Data: Now I've got it, what do I...
IRE "Better Watchdog" workshop presentation "Data: Now I've got it, what do I...
 
Civic Hacking @ Ruby Hoedown
Civic Hacking @ Ruby HoedownCivic Hacking @ Ruby Hoedown
Civic Hacking @ Ruby Hoedown
 
Harvesting Social Media (in BESOCIAL)
Harvesting Social Media (in BESOCIAL)Harvesting Social Media (in BESOCIAL)
Harvesting Social Media (in BESOCIAL)
 
gRPC vs REST: let the battle begin!
gRPC vs REST: let the battle begin!gRPC vs REST: let the battle begin!
gRPC vs REST: let the battle begin!
 
BBC Linked Data Platform (SemTechBiz San Fran 2013)
BBC Linked Data Platform (SemTechBiz San Fran 2013)BBC Linked Data Platform (SemTechBiz San Fran 2013)
BBC Linked Data Platform (SemTechBiz San Fran 2013)
 
MongoDB In Production At Sailthru
MongoDB In Production At SailthruMongoDB In Production At Sailthru
MongoDB In Production At Sailthru
 
Building Next-Generation Web APIs with JSON-LD and Hydra
Building Next-Generation Web APIs with JSON-LD and HydraBuilding Next-Generation Web APIs with JSON-LD and Hydra
Building Next-Generation Web APIs with JSON-LD and Hydra
 
R, HTTP, and APIs, with a preview of TopicWatchr
R, HTTP, and APIs, with a preview of TopicWatchrR, HTTP, and APIs, with a preview of TopicWatchr
R, HTTP, and APIs, with a preview of TopicWatchr
 
"R, HTTP, and APIs, with a preview of TopicWatchr" (15 November 2011)
"R, HTTP, and APIs, with a preview of TopicWatchr" (15 November 2011)"R, HTTP, and APIs, with a preview of TopicWatchr" (15 November 2011)
"R, HTTP, and APIs, with a preview of TopicWatchr" (15 November 2011)
 
Coalmine spie 2012 presentation - jsw -d3
Coalmine   spie 2012 presentation - jsw -d3Coalmine   spie 2012 presentation - jsw -d3
Coalmine spie 2012 presentation - jsw -d3
 
A Real-World Implementation of Linked Data
A Real-World Implementation of Linked DataA Real-World Implementation of Linked Data
A Real-World Implementation of Linked Data
 
xAPI Camp-Four Lines of Code
xAPI Camp-Four Lines of CodexAPI Camp-Four Lines of Code
xAPI Camp-Four Lines of Code
 
Semantic Web Science
Semantic Web ScienceSemantic Web Science
Semantic Web Science
 
Mongo at Sailthru (MongoNYC 2011)
Mongo at Sailthru (MongoNYC 2011)Mongo at Sailthru (MongoNYC 2011)
Mongo at Sailthru (MongoNYC 2011)
 
Seo; Cutting Through The Noise
Seo; Cutting Through The NoiseSeo; Cutting Through The Noise
Seo; Cutting Through The Noise
 
"gRPC vs REST: let the battle begin!" GeeCON Krakow 2018 edition
"gRPC vs REST: let the battle begin!" GeeCON Krakow 2018 edition"gRPC vs REST: let the battle begin!" GeeCON Krakow 2018 edition
"gRPC vs REST: let the battle begin!" GeeCON Krakow 2018 edition
 
AMS, API, RAILS and a developer, a Love Story
AMS, API, RAILS and a developer, a Love StoryAMS, API, RAILS and a developer, a Love Story
AMS, API, RAILS and a developer, a Love Story
 

Plus de Luigi Montanez

Search-Friendly Web Development at RubyNation
Search-Friendly Web Development at RubyNationSearch-Friendly Web Development at RubyNation
Search-Friendly Web Development at RubyNationLuigi Montanez
 
Civic Hacking @ Strange Loop 2010
Civic Hacking @ Strange Loop 2010Civic Hacking @ Strange Loop 2010
Civic Hacking @ Strange Loop 2010Luigi Montanez
 
Civic Hacking @ Strange Loop 2010
Civic Hacking @ Strange Loop 2010Civic Hacking @ Strange Loop 2010
Civic Hacking @ Strange Loop 2010Luigi Montanez
 
Civic Coding @ SunnyConf
Civic Coding @ SunnyConfCivic Coding @ SunnyConf
Civic Coding @ SunnyConfLuigi Montanez
 
Search-Friendly Web Development @ Ruby|Web Conference 2010
Search-Friendly Web Development @ Ruby|Web Conference 2010Search-Friendly Web Development @ Ruby|Web Conference 2010
Search-Friendly Web Development @ Ruby|Web Conference 2010Luigi Montanez
 
Search-Friendly Web Development @ Lone Star Ruby Conference 2010
Search-Friendly Web Development @ Lone Star Ruby Conference 2010Search-Friendly Web Development @ Lone Star Ruby Conference 2010
Search-Friendly Web Development @ Lone Star Ruby Conference 2010Luigi Montanez
 
Search-Friendly Web Development @ DC RUG - August 2010
Search-Friendly Web Development @ DC RUG - August 2010Search-Friendly Web Development @ DC RUG - August 2010
Search-Friendly Web Development @ DC RUG - August 2010Luigi Montanez
 
Civic Hacking @ Ruby Midwest 2010
Civic Hacking @ Ruby Midwest 2010Civic Hacking @ Ruby Midwest 2010
Civic Hacking @ Ruby Midwest 2010Luigi Montanez
 
Civic Hacking @ Ignite RailsConf
Civic Hacking @ Ignite RailsConfCivic Hacking @ Ignite RailsConf
Civic Hacking @ Ignite RailsConfLuigi Montanez
 
Civic Hacking @ LA RubyConf 2010
Civic Hacking @ LA RubyConf 2010Civic Hacking @ LA RubyConf 2010
Civic Hacking @ LA RubyConf 2010Luigi Montanez
 
Be A Civic Coder - DCRUG
Be A Civic Coder - DCRUGBe A Civic Coder - DCRUG
Be A Civic Coder - DCRUGLuigi Montanez
 

Plus de Luigi Montanez (13)

Search-Friendly Web Development at RubyNation
Search-Friendly Web Development at RubyNationSearch-Friendly Web Development at RubyNation
Search-Friendly Web Development at RubyNation
 
Civic Hacking @ Strange Loop 2010
Civic Hacking @ Strange Loop 2010Civic Hacking @ Strange Loop 2010
Civic Hacking @ Strange Loop 2010
 
Civic Hacking @ Strange Loop 2010
Civic Hacking @ Strange Loop 2010Civic Hacking @ Strange Loop 2010
Civic Hacking @ Strange Loop 2010
 
Civic Coding @ SunnyConf
Civic Coding @ SunnyConfCivic Coding @ SunnyConf
Civic Coding @ SunnyConf
 
Search-Friendly Web Development @ Ruby|Web Conference 2010
Search-Friendly Web Development @ Ruby|Web Conference 2010Search-Friendly Web Development @ Ruby|Web Conference 2010
Search-Friendly Web Development @ Ruby|Web Conference 2010
 
Search-Friendly Web Development @ Lone Star Ruby Conference 2010
Search-Friendly Web Development @ Lone Star Ruby Conference 2010Search-Friendly Web Development @ Lone Star Ruby Conference 2010
Search-Friendly Web Development @ Lone Star Ruby Conference 2010
 
Search-Friendly Web Development @ DC RUG - August 2010
Search-Friendly Web Development @ DC RUG - August 2010Search-Friendly Web Development @ DC RUG - August 2010
Search-Friendly Web Development @ DC RUG - August 2010
 
Civic Hacking @ Ruby Midwest 2010
Civic Hacking @ Ruby Midwest 2010Civic Hacking @ Ruby Midwest 2010
Civic Hacking @ Ruby Midwest 2010
 
Civic Hacking @ Ignite RailsConf
Civic Hacking @ Ignite RailsConfCivic Hacking @ Ignite RailsConf
Civic Hacking @ Ignite RailsConf
 
Civic Hacking @ LA RubyConf 2010
Civic Hacking @ LA RubyConf 2010Civic Hacking @ LA RubyConf 2010
Civic Hacking @ LA RubyConf 2010
 
Be A Civic Coder - DCRUG
Be A Civic Coder - DCRUGBe A Civic Coder - DCRUG
Be A Civic Coder - DCRUG
 
Be A Civic Coder
Be A Civic CoderBe A Civic Coder
Be A Civic Coder
 
Thin
ThinThin
Thin
 

Open Government Data & MongoDB Enables Open Data Projects

  • 1. Open Government Data & MongoDB Luigi Montanez luigi@sunlightfoundation.com
  • 2.
  • 4. Open Data + Open Source = Open Government Question? @LuigiMontanez
  • 5. MongoDB enables open data Question? @LuigiMontanez
  • 6. Opening Up Data ✴ Gather data from disparate sources ✴ Data dumps (SQL, Fixed-width columns) ✴ Web scraping ✴ Text/PDF parsing ✴ Serving RESTful JSON APIs Question? @LuigiMontanez
  • 7. JSON ✴ Tree structure, not tabular ✴ Still relational ✴ JSON for data, XML for documents ✴ Closely resembles native data structures ✴ No manual parsing needed Question? @LuigiMontanez
  • 8. Three Projects ✴ Poligraft ✴ Real Time Congress API ✴ Open State Project Question? @LuigiMontanez
  • 9. Three Projects ✴ Poligraft ✴ Real Time Congress API ✴ Open State Project Question? @LuigiMontanez
  • 10. App design drives schema design Question? @LuigiMontanez
  • 11.
  • 12.
  • 13.
  • 14.
  • 15. { "title": "President Obama's climate 'Plan B' in hot water - Darren Samuelsohn - POLITICO.com" } Text
  • 16.
  • 17. { "title": "President Obama's climate 'Plan B' in hot water - Darren Samuelsohn - POLITICO.com", "slug": "EOsc", "source_url": "http://www.politico.com/news/stories/ 0810/40534.html", "content": ".................", } Text
  • 18.
  • 19.
  • 20. { "title": "President Obama's climate 'Plan B' in hot water - Darren Samuelsohn - POLITICO.com", "slug": "EOsc", "source_url": "http://www.politico.com/news/stories/ 0810/40534.html", "content": ".................", "entities": [...] Text }
  • 21. { "title": "President Obama's climate 'Plan B' in hot water - Darren Samuelsohn - POLITICO.com", "slug": "EOsc", "source_url": "http://www.politico.com/news/stories/ 0810/40534.html", "content": ".................", "entities": [ { Text "name": "Barack Obama", "type": "politician", }, ... ] }
  • 22.
  • 23. { "title": "President Obama's climate 'Plan B' in hot water - Darren Samuelsohn - POLITICO.com", "slug": "EOsc", "source_url": "http://www.politico.com/news/stories/ 0810/40534.html", "content": ".................", "entities": [ { Text "name": "Barack Obama", "type": "politician", "breakdown": {"indiv": "33", "pac": "67"} "top_industries": ["Lawyers/Lobbyists","Finance/Insurance/ Real Estate","Misc. Business"] }, ... ] }
  • 24.
  • 25. Natural Schemas Question? @LuigiMontanez
  • 26. Three Projects ✴ Poligraft ✴ Real Time Congress API ✴ Open State Project Question? @LuigiMontanez
  • 27. Real-Time Congress API Credit: vgm8383 on Flickr
  • 30. Requirements ✴ Aggregate lots of data Biographical, Bills, Votes, Earmarks, Video Clips, Floor Updates, Legislative Documents, Committee Schedules, Contributions, Interest Group Ratings ✴ Lightweight responses Question? @LuigiMontanez
  • 31. {legislator: { in_office: true, title: "Rep", nickname: "", district: "9", bioguide_id: "L000551", govtrack_id: "400237", phone: "202-225-2661", website: "http://lee.house.gov/index.html", twitter_id: "", last_name: "Lee", name_suffix: "", last_updated: "2010/04/13 00:00:14 +0000", party: "D", chamber: "house", state: "CA", youtube_url: "http://www.youtube.com/RepLee", first_name: "Barbara", gender: "F", congress_office: "2444 Rayburn House Office Building", earmarks: { average_number: 20, total_amount: 10000000, average_amount: 22994535, total_number: 28, last_updated: "2010-03-18", fiscal_year: 2010, } ... }
  • 32. // limit selection to a subset of fields db.people.find( { 'first_name' : 'john' }, { 'last_name' : 1, 'address' : 1 } ); // use dot-notation to dig into an object db.people.find( { 'state': 'CA' }, { 'address.zip_code': 1 } );
  • 33. ?sections=last_name,first_name,state,earmarks {legislator: { last_name: "Lee", first_name: "Barbara", state: "CA", earmarks: { average_number: 20, total_amount: 10000000, average_amount: 22994535, total_number: 28, last_updated: "2010-03-18", fiscal_year: 2010, } }
  • 34. ?sections=last_name,first_name,state,earmarks.total_amount,earmarks.total_number {legislator: { last_name: "Lee", first_name: "Barbara", state: "CA", earmarks: { total_amount: 10000000, total_number: 28 } }
  • 35. Partial responses make payloads smaller Question? @LuigiMontanez
  • 36. Three Projects ✴ Poligraft ✴ Real Time Congress API ✴ Open State Project Question? @LuigiMontanez
  • 37.
  • 38. 50 States = 50 Formats Question? @LuigiMontanez
  • 39. Schemalessness allows for granular control Question? @LuigiMontanez
  • 40. Custom Fields ✴ Traditional RDBMS ✴ Update the schema for new fields, run a migration, feel icky ✴ Create a custom_fields table ✴ MongoDB ✴ Just store it Question? @LuigiMontanez
  • 41. Speaking JSON natively Question? @LuigiMontanez
  • 42. Python Source Scraped JSON PostgreSQL Transform
  • 43. Source Scraped JSON MongoDB
  • 44. Three Projects ✴ Poligraft ✴ Real Time Congress API ✴ Open State Project Question? @LuigiMontanez
  • 46. Thanks! sunlightlabs.com @LuigiMontanez Question? @LuigiMontanez