Pentaho and MongoDB Partner to Solve Government Big Data Challenges1. Pentaho & MongoDB Partner to Solve
Government Big Data Challenges
December 2013
Bob Gourley
Publisher, CTOvision.com
Will LaForest
Director of Federal, MongoDB
Dave Henry
SVP Enterprise Solutions, Pentaho
1
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
2. Big Data Management
Best Practices for Federal Big
Data Projects
Bob Gourley
Publisher, CTOvision.com
2
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
3. Brief Purpose
Research & Reports
A focus on a
new discipline
of “Big Data
Management”
Contribute your
thoughts at
CTOvision.com
3
Intro to top 5
“Best
Practices”
of Federal
Data activities
Invitation to
collaborate
and refine
approaches
A perpetual
draft - your
input is
requested
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
4. Update Sources
Big Data Government Newsletter - reader survey
2,600 readers
2% response rate, across Federal agencies
Review of openly published research by Wikibon, TDWI, IDC, Gartner,
Forrester and of course our own CTOvision
Review of best practices and use cases from the best vendors in
Enterprise Big Data
Engagement of the community at events like Strata and Hadoop World
Planning Assumption
The ability to collect, parse, analyze machine data in real time,
whether on premise or in the cloud, will continue to grow
4
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
5. Big Data Management
Agencies are thinking through the right changes to concepts and technologies
Old approaches still important, but cannot solve emerging problems
Big Data Management is an evolved discipline which builds on existing data
management approaches to leverage new concepts, technologies and best
practices to optimize mission support
5
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
6. Solutions That Require Big Data Management
•
•
•
•
•
•
•
•
•
•
Open Source Information: analysis and integration
Situational Awareness across disparate data sets
Two use cases: “Connect the Dots” and “Needle in Haystack”
Cyber Security: rapid real time analysis of all relevant data
Asset catalog across extensive/dynamic enterprises
Rapid return of geospatial data
Location based push of data
Real time return of relevant search
Real time suggestion of topics
Bioinformatics:
• Human Genome
• Patient location, treatment, outcomes
• Law Enforcement: Predictive Policing
• Data Hub: Unified storage, governance, security, functionality
6
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
7. Best Practices in Big Data Management
VISION
STRATEGY
Start with a mission-focused vision. This will vary by organization. Support
to mission will drive everything else. Consider that analytics and Big Data
go together.
Should prioritize and tackle challenges like: Changes to governance
processes, right mix of skills for workforce, learning new technology,
prioritizing which workload types will be handled by which part of the
architecture.
KNOW
DESIGN
Document and continuously improve. Architect to manage data in its
original form. Include right mix of traditional and new in your design. Don’t
assume any one platform will be a solution. Architect to insulate
applications and users from a variety of disparate big data platforms.
EXECUTE
7
Know existing infrastructure and process with focus on: Understanding of
legal/policy dynamics relevant to your agency, understanding of new
capabilities available, current and required throughputs/capacities, types of
workloads supported by each components in the architecture, available
tech choices.
Avoid custom coding wherever possible. Don’t let new Big Data Platforms
become proprietary silos. ETL remains important. Ensure training for all
based on job function. Don’t neglect your own training. Serve the analyst.
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
8. Next Steps
Continue your market surveys, stay aware of what new
technologies can do for you.
Revisit your vision. As you do, ponder this: How can you leverage
data to support your mission?
Continue to study use-cases and exchange best practices. Dialog
with others in and out of your sector. Great lessons are coming
from other industries.
Continue to engage with the broader community. Sign-up for our
Government Big Data Weekly.
Share your lessons learned.
8
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
9. Provide Your Thoughts, Input, Questions
E-mail: bob@ctovision.com
Blog: http://ctovision.com
Twitter: http://www.twitter.com/bobgourley
Facebook, LinkedIn, etc: See the blog
9
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
10. The Modern Operational
Database for Government
Will LaForest
Director of Federal, MongoDB
10
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
11. The Evolution of Databases
1990
2000
2010
Operational &
Real-time
Online
NoSQL
RDBMS
RDBMS
RDBMS
Datawarehouse
OLAP/BI
OLAP/BI
Hadoop
Offline
11
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
12. Relational Database Challenges
Variety
Agile Development
• Unstructured data
• Iterative
• Semi-structured
data
• Short development
cycles
• Polymorphic data
• New workloads
Volume & Velocity
New Architectures
• Petabytes of data
• Horizontal scaling
• Trillions of records
• Commodity
servers
• Millions of queries per
second
12
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
• Cloud computing
13. MongoDB
The Modern Operational Database
General
Purpose
13
Document
Oriented
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
OpenSource
14. Fully Featured
Rich Queries
• Find Paul’s cars
• Find everybody in London with a car
built between 1970 and 1980
Geospatial
• Calculate the average value of Paul’s
car collection
• Secondary
Native Indexes • Compound
• Geospatial
14
first_name: ‘Paul’,
surname: ‘Miller’,
city: ‘London’,
location: [45.123,47.232],
cars: [
{ model: ‘Bentley’,
year: 1973,
value: 100000, … },
{ model: ‘Rolls Royce’,
year: 1965,
value: 330000, … }
}
• Find all the cars described as having
leather seats
Aggregation
{
• Find all of the car owners within 5km of
Trafalgar Sq.
Text Search
MongoDB
• Full Text
• Hash
• Covering
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
}
15. MongoDB and Enterprise IT Stack
CRM, ERP, Collaboration, Mobile, BI
Data Management
Online Data
Offline Data
RDBMS
RDBMS
Hadoop
EDW
Infrastructure
OS & Virtualization, Compute, Storage, Network
15
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
Security & Auditing
Management & Monitoring
Applications
17. Document Data Model
MongoDB
Relational
{
first_name: ‘Paul’,
surname: ‘Miller’
city: ‘London’,
location: [45.123,47.232],
cars: [
{ model: ‘Bentley’,
year: 1973,
value: 100000, … },
{ model: ‘Rolls Royce’,
year: 1965,
value: 330000, … }
]
}
17
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
18. Dynamic Schema
MongoDB does not need any defined data schema.
Every document could have different data
{name: “will”,
eyes: “blue”,
birthplace: “NY”,
aliases: [“bill”, “la
ciacco”],
gender: ”???”,
boss: ”ben”}
18
{name: “jeff”,
eyes: “blue”,
height: 72,
boss: “ben”}
{name: “ben”,
hat: ”yes”}
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
{name: “brendan”,
aliases: [“el diablo”]}
{name: “matt”,
pizza: “DiGiorno”,
height: 74,
boss: 555.555.1212}
20. Automatic Sharding
• Increase or decrease capacity as you go
• Automatic balancing
• Optimized for commodity servers and cloud
infrastructure
20
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
21. High Availability
• Automated replication and failover
• 0 down time with hardware failure and upgrades
• Multi-data center support
• Improved operational simplicity (e.g., HW swaps)
• Data durability and consistency
21
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
22. MongoDB Performance*
Top 5 Marketing
Firm
Government
Agency
Top 5 Investment
Bank
10+ fields, arrays,
nested documents
20+ fields, arrays,
nested documents
Queries Key-based
1 – 100 docs/query
80/20 read/write
Compound queries
Range queries
MapReduce
20/80 read/write
Compound queries
Range queries
50/50 read/write
Servers ~250
~50
~40
Ops/sec 1,200,000
500,000
30,000
Data Key/value
* These figures are provided as examples. Your application governs your performance.
22
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
24. Operational and Analytical Workloads
• Application interacts with primaries
• Analytical workloads on secondaries
• Workloads are isolated from one
another
• Working set appropriate for each
application
24
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
26. Read Global / Write Local
Primary:LON
Secondary:NYC
Primary:NYC
Secondary:SYD
Secondary:LON
Secondary:SYD
Primary:SYD
Secondary:LON
Secondary:NYC
26
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
27. Solving Big Data
Challenges in the
Federal Government
Dave Diegtel
Head of Federal Sales, Pentaho
27
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
28. Why Pentaho for Federal Government
•
•
Business Model and Subscription: Pentaho’s Subscription Model and
Server-based pricing allows for lower upfront investment and risk compared to
legacy BI vendors who traditionally cost an average of 4X for similar size
deployments.
•
Government Certifications: Pentaho has made significant investments in
Government Certifications and Compliance such as 508 and Security.
•
Open API’s and extensible architecture enable ease of integration and
reduce potential for vendor lock-in.
•
28
Company and Product Maturity: Pentaho has been around for over 9 years,
with 1,000’s of paid customers, and 5.0 Version release. Pentaho is proven
and less risky.
Existing Government Customers and Cleared Personnel
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
29. A Comprehensive Big
Data Platform
Dave Henry
Senior VP Enterprise Solutions, Pentaho
29
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
30. Pentaho 5.0 Architected for the Future
Simplified analytics experience for all users
Billing Customer
Social
Media
Analytics
Existing & New
Data Infrastructure
& Processes
Web
Location Network
ANY Data
•
•
•
•
30
Relational
Operational
Big Data
Data sources not yet
anticipated…
ANY Environment
•
•
•
•
•
Data warehouses
Data marts
Stack vendors
Cloud
Embedded
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
ANY Analytics
•
•
•
•
•
Reports
Dashboards
Visualizations
Discovery
Predictive
31. The New Reality
Simplified analysis for all users
Simplified
Analytics
Experience
Blended
Big Data
Enterprise
Big Data
Integration
31
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
32. Pentaho & MongoDB Enable Key Use Cases
Customer 360 and Device Data Analytics enable comprehensive
insight
…
• MongoDB delivers Scalable,
Low-Latency Enterprise Data
Store
Mission
Scope
• Visual ETL development with
Pentaho Data Integration
(PDI)
• Reporting, Dashboards,
Visualization and Discovery
with Pentaho Analytics
32
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
Pentaho Data
Integration
Pentaho Analytics
• Reporting
• Dashboards
• Visualization
• Discovery
Pentaho Data
Integration
33. Enterprise Customer Data Store
Powerful data integration for MongoDB
Customer
Master
PDI ETL
POS Data
Web Event
Data
$push to data arrays
33
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
mongoDB
cluster
38. James Dixon
Founder and CTO, Pentaho
38
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
As CTO at Pentaho, James Dixon is
responsible for Pentaho's
architecture and technology
roadmap. James has over 15 years
of professional experience in
software architecture, development
and systems consulting. Prior to
Pentaho, James held key technical
roles at AppSource Corporation
(acquired by Arbor Software which
later merged into Hyperion
Solutions) and Keyola (acquired by
Lawson Software). Earlier in his
career, James was a technology
consultant working with large and
small firms to deliver the benefits of
innovative technology in real-world
environments.
39. Why Pentaho?
• Pentaho is the best platform to connect, integrate, and analyze both
traditional sources and MongoDB
• Pentaho embraces and extends the MongoDB environment with rich
visualization and exploration of data
• Pentaho’s Subscription-based business model lowers upfront investments,
enabling faster ROI
• Pentaho has dozens of Federal Government Customers and made
significant investments in government certifications and cleared personnel
• Pentaho and MongoDB are established partners – Pentaho carefully
engineers its products to use the latest MongoDB APIs to provide the best
possible performance
39
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
40. Next Steps and Q&A
• Needs Assessment with Pentaho and MongoDB
• Dave Diegtel - ddiegtel@pentaho.com
• Will LaForrest - will@mongodb.com
• Try Pentaho (30 Free Trial) -- pentaho.com/download
• Learn More about Big Data and Government Solutions
• Pentaho
• Big Data Website: pentahobigdata.com/
• Government Solutions: pentaho.com/solutions/government
• MongoDB:
• Government Solutions: mongodb.com/industries/government
• Big Data: Examples and Guidelines for the Enterprise Decision Maker
mongodb.com/lp/whitepaper/big-data-nosql
• MongoDB Top 5 Considerations When Evaluating NoSQL Databases
mongodb.com/lp/whitepaper/nosql-considerations
• Sign-up for the Big Data Government Newsletter at CTOvision.com &
take reader survey
40
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555