Data-driven agencies face extreme data integration and analytics challenges. Decades of point solutions have solved specific mission problems while creating valuable data stores. However, these data stores are not integrated and are stored in information silos. AWS's powerful data ingestion and integration services now allow agencies to rapidly store more in data lakes for deeper analytics. Join this discussion on how FAA and other agencies have leveraged AWS data integration and analytic services to optimize and innovate with their previously untapped information silos. Learn More: https://aws.amazon.com/government-education/
2. Challenges
• Data-driven agencies face data
integration challenges caused
by stove-piped data
• Data is growing at exponential
rates – of all types
• Staff available to
analyze data is
not growing
Overview
Our AWS-Based Solution
• A secure cloud platform to
accelerate time to analysis and
innovation
• Data integration services to
rapidly ingest, store and
integrate data for deeper
analytics
• Big data analytics services to
optimize operations and create
valuable mission impacts
3. AWS-based Data Analytic Platform
Secure Platform Data Integration Data Analytics
Enables a smaller number of staff to meet mission demands
by smartly leveraging rapidly growing data
Elastic, Services-Based
Architecture
5. Start with a secure, extensible platform
AWS Security
Services integrated
with Security and
Management
Framework
FAA Cloud Services (FCS) Platform FeaturesApproach
Mission – provide a
Government Cloud for FAA
applications, data, and
analytics work streams
• FedRAMP-certified
Enterprise integration with
FAA services
• Extensible, agile
infrastructure and services
• Security-as-a-Service
model
BENEFITS Certified, agile, fully managed secure cloud platform
6. FSC Platform accelerates ATO for new workloads
• Baseline reference
architecture
• CONOPS for IaaS
• Inheritance of security
boundaries and controls
• SOC integration
• Procurement acquisition
guidelines
• Reduced review cycles for
compliance
7. FAA Cloud Services Impacts
Key Lessons Learned
• Adapt Contracts to enable cloud
acquisition
• Be Agile - use an iterative
approach
• Aligns Security Engineering with
Compliance
Key Benefits
• Established a Secure Platform for
community reuse
• New work streams inherit secure
foundation – can be accredited
much faster
• Security-as-a-Service lowers costs
and increases repeatability
9. Tear Down the Walls
AWS IaaS to host
Open-Source
Solutions for Data
Integration and
Analysis
Mission – provide a
data integration
platform for FAA data
analytics
• FCS Secure Platform
• AWS IaaS
• Open Source Tools
• Big Data Tools
• Streaming & Batch
• Land, Ingest, Enrich,
Store, and Serve
Data
BENEFITS Secure, high availability with little O&M cost, scalable to
Peta- and Exa-bytes, deeper analytics in weeks
FAA Enterprise
Information Management
Analytic FeaturesApproach
10. Challenge – Stove-Piped Data Slows Analysis
Loss of Separation Flight Tracks Weather Data
Determination of root cause for a ”Loss of Separation” event requires information from
numerous source systems. Collecting and integrating data for analysis is time-consuming.
11. Many Data Sources impact LOS Events
Cockpit Recording Cloud Tops Flight Plans
Air Traffic Controllers Aircraft Sensors Maintenance Logs
13. Big Data
Medium Data
High-writes
Freetext searches
In-memory Data
High-speed,
catching
Analytics
Data science
Data
EIM Architecture – Open Source on AWS IaaS
VISUALIZATION
Applications
Pipeline Management
Routing, mediation
INGEST
Apache NiFi
Data Processing
Normalizations,
enrichments
analytics
Apache Storm
Apache Spark
Small Data
RDBMS
MongoDBPostgreSQL
Reporting Dashboards Web Apps
HortonWorks
Elasticsearch
PandasRedis
CONSUMER
ACCESS
UNIFIEDDATALAYER
Data Transformation Deep Analytics
and Data
Exploration
Large Scale
Data Storage
LEGEND
Logical Grouping
Example
feature/ function
Example
Technology
15. FAA EIM Impacts
Key Lessons Learned
• Stove-piped data can be
integrated and accessed faster
• Enriched data frees up time for
deeper analytics
• Data treatments lower costs
(HDFS vs RDBMS)
• Infrastructure as a Service
reduces Capital Expenses and
O&M costs
Key Benefits
• Overcame data deluge in unified
platform
• Re-host, Re-point Apps to one
data platform in weeks
• Derive valuable, new Insights with
Analytics in weeks
19. Challenge #1: Prevent Coliform Contamination
Public Water Systems are monitored for a wide-array of health impacting contaminants.
Coliform bacteria treatments are postulated to be overwhelmed by precipitation events.
20. Approach – Integrate and Prepare Data
Hypothesis
Can we accurately
predict the risk of a
health-impact coliform
violation for public
water systems based
on known violations
combined with
weather data?
Violation Data
• EPA SDWIS
• Health impacts
• Coliform violation
Weather Data
• NOAA Quality
Controlled Local
Climatological Data
Transformations
• Remove PWS with
no violations
• Standardize
location
• Join by time and
nearest weather
station location
• Store in
Amazon S3
21. Athena Serverless Data Query
Approach – Explore Data for Discovery
QuickSight Data Visualization
22. Approach – Model with Amazon Machine Learning
Best Model
• 80% Precision; 61% Recall
Findings – Utility of Model
• Weather impacts certain but
not all PWS
• Proactive water treatment in
face of precipitation
• Prioritize improvements
Allows business analysts, citizen scientists, and data scientist alike
to build and deploy predictive models with simple process
23. Challenge #2: Identify fuel economy label errors
Fuel Economy estimates are useful tools for consumers and regulators alike. Consumers
use MPG as a selection criteria. Manufacturer Fleet averages must meet targets.
24. Approach – Prepare Data and Model
Hypothesis
Given examples of
”re-labeled” fuel
economy metrics can
we develop a model
to locate other
potential revisions?
Attributes Used
• Horse power
• Weight
• Adjusted City MPG
• Transmission Type
• Transmission Gears
• Cylinders
• Valves
• Labeling Approach
• Re-labeled Flag
Models
16 Models
Algorithms
• Logistic Regression
• Support Vector
Machine
• Neural Net
• Conditional Tree
• Recursive Partition
Tree
25. AWS Data Science Linux AMI - R Studio
Best Model
• 97% Precision; 91% Recall
Findings – Utility of Model
• ”re-labeled” fuel economy
ratings can be detected
• Model may be applied to
other car types to detect
label errors
• Prioritize review
• Lower costs
26. Environmental Proofs – Impacts to Date
Key Lessons Learned
• We showed that agencies can
improve monitoring, compliance,
and safety with data currently
collected
• AWS advanced analytics services
create useful data science
solutions in hours
• Still a need for some IaaS
• Predictive power of data increases
with data from related agencies
Key Benefits
• Platform expedites new analytics
• Enables agency scientists,
business analysts, and citizen
scientists alike to discover new
relationships in public date
27. CSRA AWS Data Analytics Platform
Secure
FISMA
FedRAMP
ATO
Integration
Treatments
Persistent
Ephemeral
Analytics
Predictive
Streaming
ML
PaaS
Elastic
Managed
Serverless
28. The future is already here,
it’s just not evenly distributed
- William Gibson ca. 1999
29. CSRA - Think Next. Now.
• We deliver a broad range of innovative,
next-generation IT solutions and
professional services - Bringing tomorrow’s
solutions, today.
• We meet our clients on their journey to the
cloud, to manage, analyze, optimize and
innovate
• We help customers modernize, protect their
networks, and improve effectiveness of
mission-critical functions for our warfighters
and citizens
AWS Security Services
AWS CloudTrail
security groups
IAM Users and Groups
Shared Responsibility Model
Managed by AWS - AWS IAM - foundation services and AWS global infrastructure
Managed by CSRA – Customer IAM –
Data; Platform & Application Management; OS, Network & Firewall Configuration; Encryption (Client-side, Server-side), Network Traffic Protection
Architecture Elements
Direct Connect (DX) connection between our primary and secondary colocation facilities.
Three VPCs (dev, test, production) hosting workloads.
Each leverage that same DX and
traffic is separated using VLANs as is typical of every DX connection.
FAA FTI network connects all FAA sites is also connected to CSRA routers in each colocation facility.
The Management and Control tools are in a network in the colocation facilities.
COTS
Bit9, Carbon Black
Splunk, RSA Archer, ArcSight
Use an Iterative Approach
“Security is a process” – Bruce Schneier
Start security work early and iterate
Security engineering and compliance are different
Security engineering should be baked in
Ensure hardened images are part of dev/test/ops
Work out the problems with hardening first and adapt
Engineering Comes First
Make sure design and engineering artifacts come first to help define boundaries and controls
Develop Ops CONOPS and engineering design early
Identify security controls and iteratively design and build
Security engineering becomes a collaborative part of this work and helps simplify the path to compliance
Compliance work comes later
Iterate on security engineering deliverables
Assessment and compliance require stable deliverable production-ready environment
SSP is dependent on the engineering and O&M design
Maturity in engineering and O&M makes SSP deliverables and assessment easier
Separating security engineering from compliance is a key tenet. There should be clear separation in the process, reviews, and oversight of security engineering and compliance. Efficiency is achieved by separating these two concepts and allowing engineering and design to work at its pace as a key input to the compliance process which comes later. Develop the engineering design and operations CONOPS early, and everything else will fall into place. The blocking and tackling with DevSecOps (see next slide) should be an integral part of this process. The risk is the compliance process will lengthen and assessment and reviews, including 3-party independent assessments will take much longer than planned. Preparing the engineering and operations model early and producing a well-tested integrated product makes the job of the assessors much easier.
Another example is that security deliverables for compliance should be separated from engineering reviews to avoid inefficient review cycles. The security documents like the SSP depend on the engineering design, so make sure the design process allows for these reviews early. Don’t use the compliance deliverables and process to review engineering design and operations processes.
Adapt Contracts to Cloud
Carefully set boundaries with a clear SOW
Make sure contract is aligned with authoritative FedRAMP compliance
Separate security deliverables and reviews for assessment from other deliverables for engineering lifecycle, etc.
First, adapting to the cloud is not just a technology approach, it should be suffused in the contract and the guidance for how oversight and programs work too. Make sure boundaries are set clearly with contractual documents and work statements. The SOW should be clear on scope and follow the principle that FedRAMP is the authoritative process and compliance. Going fast means that the program office and the contracts office have to be in sync on how and what will be built out and secured with focus on the essentials for how security engineering and compliance are achieved.
Today, major events like a loss of separation require analysts to access data from numerous stove-pipe systems. Retrospective analysis is overwhelmed with the tasks data identification, collection, and integration – leaving less time for meaningful data analysis.
15 Apps integrated with Data Mall
Apache Tools
Hadoop Tools
Search Tools
FAA Analytics
20 data sets ingested in 8 weeks
Data Mall
Original Data
Enriched Data
Joined Data
given the power of the cloud (storage and processing) enhanced analytics are enabled allowing programs to correlate data outside of their immediate interest to determine possible cause and effect relationships which
DID WE BUILD THIS ???
This is the AWS Big Data portfolio. We have tools like Direct Connect and Import Export that can bring in a lot of data. We can persist that data into a number of storage services from S3 to DynamoDB to EMR and RedShift for further analysis.
Amazon Redshift provides a fast, fully managed, petabyte-scale data warehouse for less than $1000 per terabyte per year. Amazon Elastic MapReduce provides a managed, easy to use analytics platform built around the powerful Hadoop framework.
Amazon Kinesis, a managed service for real-time processing of streaming big data. Amazon Glacier allows you to backup and archive an unlimited amount of data at just 1 cent per GB per month. Automate and schedule big data processing workloads with Data Pipeline.
The tools to support big data collection, computation along with collaboration and sharing are all available in a couple of clicks, with AWS.
Demonstrate the viability of a model to accurately predict the risk of a health-impact coliform violation for public water systems based on known violations combined with weather data
The Safe Drinking Water Information System (SDWIS) contains information about public water systems and their violations of EPA's drinking water regulations, as reported to EPA by the states. These regulations establish maximum contaminant levels, treatment techniques, and monitoring and reporting requirements to ensure that water systems provide safe water to their customers. This search will help you to find your drinking water supplier and view its violations and enforcement history since 1993.
Local Climatological Data (LCD) is only available for stations and locations within the United States and its territories. Select the state or territory, location, and time to view specific data. Click the station name to view details or click "ADD TO CART" to order that station's data.
Car Fleet ThinkStock item: 78459143
Reduced to nine variables
Attributes standardized with Z-scores, box-cox, log transforms and PCA
Model 1, Logistic Regression with z-scores: Relabeled ~ HP * INERTIA_WT + AdjCity
Model 2, Logistic Regression with z-scores: Relabeled ~ HP + INERTIA_WT + AdjCity
Model 3, Logistic Regression with z-scores: Relabeled ~ HP * INERTIA_WT * AdjCity
Model 4, Logistic Regression with z-scores: Relabeled ~ ECID * HP * INERTIA_WT * AdjCity
Model 5, Same Logistic Regression as above, but with Box-Cox transformed values: Relabeled ~ ECID * HP * INERTIA_WT * AdjCity
Model 6, Same Logistic Regression as above, but with log-transformed values and PCA: Relabeled ~ ECID * HP * INERTIA_WT * AdjCity
Model 7, SVM: HP_z, INERTIA_WT_z, ECID_z, AdjCity_z, Transmission, NCYL
Model 9, Neural Net: HP_z, INERTIA_WT_z, ECID_z, AdjCity_z, TRANS_TYPE, TRANS_GEARS, NCYL
Model 10, Neural Net: Adding FE_LABEL_CALC_APPROACH and NVAL to predictor variables
Model 11, Logistic regression with more categorical predictors: ECID + HP + INERTIA_WT + AdjCity + TRANS_TYPE + TRANS_GEARS + NCYL + FE_LABEL_CALC_APPROACH + NVAL
Model 12, Logistic as above but log transform the continuous predictor variables
Model 13, Conditional Tree
Model 14, Recursive Partitioning Tree
Model 15, Recursive Partitioning Tree with two more predictors
Model 16, Neural Net with the same predictors
256 Car types – 196 TN; 53 TP; 5 FN; 2 FP ...
Amazon Web Services accelerate Data Analytics
AWS services enable organizations with only limited data analytics capabilities to tackle expert challenges
When integrated with a robust cloud and security strategy, the platform can scale to support an agency's data needs
Agencies can realize mission value quickly - in days and weeks - with significant opportunity for continued innovation