This document discusses a platform called EzBake that was created to help a US government customer modernize their systems and better analyze large amounts of data. EzBake provides tools to easily develop and deploy applications, integrate and analyze data from various sources, and implement security controls. It improved the customer's ability to share data and applications across many teams and networks, decreased development times from 6-8 months to 3-4 weeks, and reduced costs while increasing capabilities.
Powerful Google developer tools for immediate impact! (2023-24 C)
(ENT211) Migrating the US Government to the Cloud | AWS re:Invent 2014
1. November 13, 2014 I Las Vegas
Matt Carroll, CTO, Defense & Intelligence
CSC
2. The problem
Over 400+ apps within its
enterprise
Over 1000+ active data
sources consuming data on
the order of TBs daily
Network supports over
230,000 daily users with
mission and business needs
Apps
Data
Users
Network
Security
Multiple networks deployed
worldwide on multiple continents
Every capability runs through a
lengthy certification and
accreditation process (4–6 mo)
Disparate activities across apps
and data have left little
quantitative data
We faced a highly complex environment for a US Government customer that
had a large dependency on legacy systems with a need to modernize quickly
Metrics
3. Customer challenges
Budget
•Not enough money to transition every app to take advantage of Big Data or a distributed system
•Outsourcing IaaSneeds to be monitored for accounting, security, scale, etcwithout complex software
•Application elasticity is critical to understanding the true costs of operations and maintenance
•Storage (data) is a much bigger cost than expected
•Need to consolidate systems engineering support
While we faced many challenges it became clear early on that budgetand ease of integration for appsmust be our two driving forces
App migration is not simple
•Most apps are CRUD based; write a report, find a report
•Security business logic is baked into each app
•Number one question: Why can’t I choose the technology that best fits my app?
•Cannot disrupt operations by any means!
•Applications must reside on multiple networks and work together
•Takes too long to get started, laying down databases, web tiers, etc
Security is the ultimate killer of time
The process around security became complicated, burdensome and still insufficient to counter threats at scale
4. Our mission
Our Missionis to facilitate Big Data analytics across the enterprise by providing the tools necessary to align the work of the application engineer, analytic developer, and data scientist —freeing them tofocus on end products, not infrastructure;we provide this through EzBake
Big Data should be easy
Big Data should drive insight
Big Data should be ubiquitous
Big Data should be secure
5. EzBake
It’s all about making application transition easier!!! Rather than assembling your own big data stack, EzBakeprovides an integrated way to compose the different elements of your application: collecting, processing, storing, and querying data
Ease of application development
•Time to market of apps and reuse
•Autodeploymentand high-availability scaling
•Integrated analytics and audit trails for logs, metrics, data access, and security events
Built-in security layer
•Role-based access and complex policies
•Down to the object / cell-level controls
•Encryption in transit
Data layer
•Ubiquitous data access (no stovepipes!)
•Simplified streaming / batch analytics
•Tailorableand technology agnostic
•Abstracted index patterns
Data layer
Custom applications
Physical databases
MongoDBAccumulo
PostgreSQL(RDS) RedishBase
ElasticsearchTitan +Custom
Execution layer
Stream Batch Query
Events +More
Security layer
6. Key features
Scaled and commonly used thrift services, typically used during streaming ingest
Interface for building data flow topologies which abstract physical stream processors
Both direct access to indices and aggregate query across the various data sets
Indexing patterns exposed as thrift services and abstracts the physical database
Amazon Elastic MapReduce (EMR) abstractions that enable complex, multidimensional discovery
Both at the data persistence and user access layers
Automated elasticity through a GUI-based deployment
Streaming ingest (Frack)
Common services
Data persistence
Distributed query
Security
Batch analytics
Deployment
7. Technology agnostic
•Instead of a jack-of-all-trades indexing for free text search, geospatial search, etc, use mission-specific indices for specific application logic needs
•Focus on storage patterns vice database specific operations, thereby enforcing data access standards across the enterprise
•Allow for new cartridges for web frameworks including Node.js, Python, Ruby, etc.
Each app has its own needs, and it is not on the platform builder to force the team into a particular technology, rather offer a solution to meet the use case
8. Easy to deploy and secure
The platform provisions and scales, like classic PaaS, and embeds data layer connections and security on Amazon EC2
•Developers pull-down sandbox from the collaboration environment to develop on their local box
•App / service is output as a WAR and YML file (buildpack)
•The app registration page allows engineers to deploy and register apps, data feeds, and services on the platform
•EzDeployersupports dynamic resource management to all capabilities hosted and provisions through Amazon Elastic Compute Cloud (EC2)
9. App registration
•Applications carry role-based access controls with human inserted deployment authorization
•Registration to include data feeds, services, batch jobs, and intents.
•Ability to assign other users as admin controllers through AWS Identity and Access Management (IAM) controls or other IdAM
•Cuts down time to deploy and removes the need for app developers to write Puppet scripts
•Build in account management policies for financial tracking of PaaSand IaaScosts
Deploy with buildpackssecurely through the application registration page and provide elasticity as a service by abstracting Amazon EC2 services
10. Lab76: Collaborative development
•Speed start of development from weeks to hours by enabling a truly agile development environment
•GitLabwas exposed for source control and promoting the sharing of code across the enterprise through governance and oversight
•Customized RedMinewas exposed for task management and to allow task oversight and alignment
•DevOpscould clone an Amazon Virtual Private Cloud and stand up new environments in a day vs. months of setting up for each app or system
The key to speeding transition was to remove redundancy; by providing a one-stop shop for devtools (Git, RedMine, Jenkins), a means to share code and common development environments, we gained months back from each development team
11. Leveraging a data layer on SQL and NoSQL the platform abstracts physical data stores and promotes storage patterns to enable ease of sharing, force object-level security, and provide the ability to plug and play databases
Breaking-down disparate data stores
•That’s not to say we implement Big Data SQL
•Instead, we have the model that binds app development, BigData, and security
•Focus developers towards database abstractions extensible toany database
So what?
•Move to production with Big Data without impacting existing SQL based production architecture (think PostgreSQL to RDS)
•Brings data together across the enterprise helping customers with disparate engineering teams build to a standard
12. Distributed query
We distribute object-specific queries across disparate data sets exposed through the data layer while controlling access through the service and at the data level
•Migrate off-legacy data stores without disrupting production instances
•Focus on object-based queries across many data sets as well as across Amazon VPC within an environment
•Work with Clouderato modify Impala to run against multiple data stores
•Common access controls across multiple data sets
So what?
•Common method to discover data across many apps, great for BI tools and third-party apps like Palantir, Tableau, etc.
•Decreases the duplication of storage across the enterprise through common indexing patterns
13. Security becomes an API
•All data is encrypted in transit
•All transactions are authorized by the security service
•All data is secured at the object level
•Robust security service —scales horizontally and generated authorization tokens base on external IdAMproperties
•Internal group management service scales to trillions of groups and beyond
•Compressed bitvectorrepresentation of data visibility and access authorizations speeds security computations
Following several zero-day attacks the enterprise is waking up to security but has no understanding of how to secure their Big Data platforms —a major reason many are not in production
Bob
Bob has authorizations:
X, Y, and Z
Data
Data is tagged as: X, Y, and R. Sorry Bob! Only X and Y for you!
Query
Object-level security across all data stores through a common API will provide dramatic efficiencies as it decreases time to model data across multiple data stores
14. Metering and monitoring
•JavascriptAPI for web apps, Thrift API for services, and REST for others
•Improve application usability and usefulness by examining analytics on usage patterns
•Diagnose issues with system, services, and apps
•Determine cost allocation based on what agencies and organizations are using the system
To bring back focus on understanding the environment we needed the platform to provide a comprehensive visualization to monitor users, data and services on AWS
15. Batch (Amino)
•Removes complexity of Amazon EMR for the average engineer
•Crowd source microanalyticsthrough analysts and engineers
•Data agnostic
•Not a black box
•Fully scalable
•Inherent cross-data source linked indexes
•Encourages sharing of knowledge, discovery
•Index built to support machine learning
•Security considered up front —index is in Accumulo
•Utilized AWS to enable rapid load-balancing to support demand based on data and usage
Developers can write Amazon Elastic MapReduce(EMR) code to analyze data, but don’t know what to look for; the analysts know what to look for, but don’t know how to write code. Technology is not the problem.It’s enabling the analyst to effectively leverage technology and reuse it.
16. The impact
So What? What were the overall accomplishments to date? Well…
Time: The platform and the development model decreased the development time from 6–8 months to production to 3–4 weeks.
Lean and Mean: Application teams went from being heavy on DevOps, security, testing to smaller, more agile teams focused on specific-mission use cases
Most importantly…
We revectoredteams back to their users, providing more capabilities in less time, thereby saving lives and protecting our country
Data Shared: Legacy REST/SOAP interfaces have begun to die and time spent on sharing data is down significantly without impacting operations and more apps have more access to data
Money: Removal of redundant code and system, faster app deployment, cuts in total storage costs, and decrease in team sizes led to a significant cost savings up front for the customer