2. 2
Don’t Need All 3 Vs to be Big DataDon’t Need All 3 Vs to be Big Data
Patient Records (volume, variety) PGD/Crowd Sourcing (variety)
Cyber (volume, velocity, variety)Asset Management (variety, velocity)
4. 4
With the right tools the government can
•Build new applications not possible before
•Enhance service to citizens significantly
•Remove data redundancy
•Decrease time to mission
•Reduce cost
Big Data Is An OpportunityBig Data Is An Opportunity
7. 7
4,000,000+4,000,000+
MongoDB DownloadsMongoDB Downloads
100,000+100,000+
Online Education RegistrantsOnline Education Registrants
20,000+20,000+
MongoDB User Group MembersMongoDB User Group Members
20,000+20,000+
MongoDB Days AttendeesMongoDB Days Attendees
15,000+15,000+
MongoDB Management Service (MMS) UsersMongoDB Management Service (MMS) Users
Global Community
8. 8
Indeed.com Trends
Top Job Trends
1.HTML 5
2.MongoDB
3.iOS
4.Android
5.Mobile Apps
6.Puppet
7.Hadoop
8.jQuery
9.PaaS
10.Social Media
Leading NoSQL Database
LinkedIn Job Skills
MongoDB
Competitor 1
Competitor 2
Competitor 3
Competitor 4
Competitor 5
All Others
Google Search
MongoDB
Competitor 1
Competitor 2
Competitor 3
Competitor 4
Jaspersoft Big Data Index
Direct Real-Time Downloads
MongoDB
Competitor 1
Competitor 2
Competitor 3
9. 9
To provide the best database for how we build and
run apps today
MongoDB Vision
Build
–New, complex, varying data
–Flexibility
–New languages
–Faster development
Run
–Big Data scalability
–Real-time
–Commodity hardware
–Cloud
11. 11
RDBMS
Agility – Document Oriented Model
MongoDB
{
_id : ObjectId("4c4ba5e5e8aabf3"),
employee_name: "Dunham, Justin",
department : "Marketing",
title : "Product Manager, Web",
report_up: "Neray, Graham",
pay_band: “C",
benefits : [
{ type : "Health",
plan : "PPO Plus" },
{ type : "Dental",
plan : "Standard" }
]
}
12. 12
• MongoDB does not need any pre-defined data schema.
• Every document could have different data
Agility – Dynamic SchemaAgility – Dynamic Schema
{name: “jeff”,
eyes: “blue”,
height: 72,
boss: “ben”}
{name: “brendan”,
aliases: [“el diablo”]}
{name: “ben”,
hat: ”yes”}
{name: “matt”,
pizza: “DiGiorno”,
height: 74,
boss: 555.555.1212}
{name: “will”,
eyes: “blue”,
birthplace: “NY”,
aliases: [“bill”, “la
ciacco”],
gender: ”???”,
boss: ”ben”}
13. 13
Agility – Rich Query and Aggregation
{
object: ‘M1 Abrahms 3123’,
type: ‘Armored Vehicle’,
owner: ‘5th Armored’,
location: [45.123,47.232],
current_range: 245
armament: [
{ model: ‘105mm M68A1’,
type: ‘Rifled Cannon’,
range: 100000, … },
{ model: ‘120mm M256’,
type: ‘Smooth Bore Cannon’}],
crew: [ {name: ‘Paul’, …
weight: 126000,
equipment: […
desc: “This unit is highly …
}
Rich Queries
• Find all armored vehicle under 64 tons
with a smooth bore cannon and a crew
member with IED removal training
Geospatial
• Find all units within a 220 mile radius of
a position with transport capacity of 20
sorted by proximity
Text Search • Find all units having Arabic mentioned
Aggregation
• Calculate the average range of units
within the Afghanistan Theater of
Operation
Map Reduce
• Find correlations between co-located
units and mission casualties
14. 14
• Organization focused on mobile development
• Many apps geospatially enabled
• Agile development, SCRUM
• Originally Native/Java Services/PostGRES
• Moved to Native/Node.JS/MongoDB
Use Case: Mobile Development
• Constant model changes
• JSON throughout
• Development velocity was
more than doubled
15. 15
• Executive order for agencies to open data
• History of MongoDB as back end for data services
• Data services expose data in JSON/XML
• Existing tools lack flexibility and scalability
• We are seeing powerful and scalable platforms built on top of
MongoDB
• Platform doesn’t need to know or care what the data looks like
• http://cfpb.github.io/qu/
Use Case: Open Data Initiative
16. 16
Agility – Native Language Drivers
Shell
Command-line shell for
interacting directly with
database
Drivers
Drivers for most popular
programming languages and
frameworks
> db.collection.insert({company:“10gen”, product:“MongoDB”})
>
> db.collection.findOne()
{
“_id”: ObjectId(“5106c1c2fc629bfe52792e86”),
“company”: “10gen”
“product”: “MongoDB”
}
Java
Python
Perl
Ruby
Haskell
JavaScript
22. 22
• Project in the IC
• Needle in the haystack for
changing datasets
Use Case: The Graveyard
• 7 previous technologies
• Speed in ingestion, simple and agile usage
• 500k TPS over 50 nodes
• Breadth of querying and indexing important
• Millions saved in total cost
23. 23
Developer/Ops Savings
•Ease of Use
•Agile development
•Less maintenance
Hardware Savings
•Commodity servers
•Internal storage (no SAN)
•Scale out, not up
Software/Support Savings
•No upfront license
•Cost visibility for usage growth
Value - Lower TCO
DB Alternative
24. 24
• MongoDB a great fit for many problems in
Healthcare
• Lots of document oriented data already
• Complex, varying, unstructured data
• $10 Billion non-profit health conglomerate
• Insurance companies
• RelayHealth
• Usage within the VA
• popHealth
• Cypress
Use Case: Healthcare
25. 25
MongoDB Products and Services
Training
Online and In-Person for Developers and Administrators
MongoDB Management Service (MMS)
Cloud-Based Suite of Services for Managing MongoDB Deployments
Subscriptions
MongoDB Enterprise, MMS (On-Prem), Professional Support,
Commercial License
Consulting
Expert Resources for All Phases of MongoDB Implementations
Notes de l'éditeur
Introduction Run the Federal business which of course spans Civilian, IC, and DoD Technical 8 years in NoSQL in the government space SPSS Ad Serving Code slinging at DARPA
I accept the general definition of big data equating to VVV You don’t need all 3 Vs to be big data though What do they have in common? They are all difficult to handle with the traditional data stack Should have called it Awkward Data I’ll never get a job at Gartner
More demands Expectations shaped by rapid change and innovation in consumer markets Government must react faster Time to mission is shortening One example Enemy threats more varied and change rapidly Fiscal constraints (do more with less)
If you have the technology to handle big data then it’ s a massive opportunity not a problem to deal with. Exploit unstructured, semi-structured, machine and sensor data. New kinds of data and density means new applications The most innovative and agile will gain the most advantage from big data. It may seem obvious that big data increases cost but Big data.projects are solving old problems with new solutions and help them reduce costs which can often dwarf the IT spend associated with it. In most commercial and government organizations IT spend accounts for only 10% of cost
NoSQL shouldn’t be viewed as a pejorative term. It simply refers to databases that don’t use SQL as their primary access and aren’t relational MongoDB has a rich feature set and date model that allows it to work across many problem domains. It shares this in common with the RDBMS which has been a general purpose database for 30 years
How does MongoDB help Governments be faster better cheaper? It comes down to three key aspects that MongoDB enables: Agility, Scalability, and Value I promise you this is not my most technical slide
Using a document oriented model made having a dynamic schema natural What does having a dynamic schema mean?
When I first leaned about the use case it was about a 1.5 years ago Originally built on Native Mobile App code/Java services layer, and PostGRES Constant data model changes because the vast majority of stories required new data They were finding that it as taking to long to implement new features and that there was a lot of time lost in data schema management and data conversion Original velocity was around 90 points -> 180 Despite being new to the technologies JSON throughout almost entirely removed coding around persitence (storage, querying retrieval) Probably now velocity is better because they have more experience and JavaScript everywhere (titanium) velocity has improved even more Difficult to quantify because of the auto-scaling nature of SCRUM velocity Obviously I’m terrible at marketing, I should do a WAG somehow so I could sensationalize it more
Talk about workloads working off of the secondary
Once again I showed my inability to sensationalize. Webcast description mentioned 3 but it was actually 7 The 7 previous technologies were Oracle Presharding into Application Memory* Terracotta HBase ExtremeDB GemFire Voldemort Millions were saved on TCO on a single project. This was over many facets which is a good segue
10 Billion dollar use case Numerous data islands of patient/health records Consolidating this data into MongoDB Data records, documents, PDFs, text, X-rays, metadata One system to maintain, and search Huge improvement in data accuracy Multiple Insurance companies RelayHealth which is part of McKesson provides clinical connectivity to physicians, patients, hospitals. In the gov space they have done work in the Blue Button and PMP (prescription monitoring programs) VA - Projects sprouting throughout from mobile to patient data popHealth – the project was sponsored by HHS and built by MITRE. Its an open source reference implementation software service that automates the reporting of Meaningful Use quality measures. popHealth integrates with a healthcare provider’s electronic health record (EHR) system using continuity of care records. popHealth streamlines the automated generation of summary quality measure reports on the provider’s patient population. Cypress - Meaningful Use Stage 2 Clinical Quality Measure Testing And Certification Tool