Modernizing Upstream Workflows with AWS Storage
Accelerating seismic data retrieval, getting better data protection and reliability, and providing a common AWS data platform for compute and graphic intensive processing, simulation and visualization workloads.
Modernizing and transforming exploration and production workflows with AWS Storage services
Accelerating seismic data retrieval, getting better data protection and reliability, and providing a common AWS data platform for compute and graphic intensive processing, simulation and visualization workloads.
Capturing and processing streaming sensor data from remote oil rigs with Snowball Edge
Providing a Data Lake foundation for a next generation Digital Oilfield IoT analytics platform with Amazon S3
Speaker: John Mallory - AWS Storage Business Development Manager
3. “The Digital Oilfield is not merely about computer chips, processors and software. It is
about the melding of operations technology with information technology and the
Internet of Things. It involves a powerful combination of distributed network sensors,
ubiquitous mobile connectivity, cloud computing, advanced big data analytics and
artificial intelligence. It has the ability to “learn” from what works in the best producing
wells and apply those learnings to entire fields. It will predict equipment breakdown
before it happens and bring about “condition-based” maintenance rather than
“schedule-based” methods. It will track workers in the field, feed them the data they
need via various platforms, “coach” their work in real-time and remove them from
hazardous situations. Ultimately, it will produce more oil and gas for less cost.”
– Accenture 2016 Digital Oilfield Outlook
6. Analytics = Value From Data
1%
of information gathered from the field is currently made available to oil and gas decision-makers
What Keeps Us From Using More?
7. Upstream Information Management
• Very large number of diverse complex, multi-modal & multi-scale datasets
• Not a sequential series of separate tasks, rather a continuum of multiple scenario iterations
Source: Common Data Access LimitedSource: Schlumberger
8. Data Silos Are a Key Challenge
Hadoop/Stream
Analytics
Clusters
HPC
Clusters
SAP, EDW,
Databases
Exploration Production OptimizationOperations &
Planning
9. Enter the Data Lake Architecture
Data Lake is a new and increasingly popular
architecture to store and analyze massive
volumes and heterogeneous types of data.
Benefits of a Data Lake
• All Data in One Place
• Quick Ingest
• Storage vs Compute
• Schema on Read
• Multi-User Environment
10. Cloud Data Migration
Direct ConnectSnow* data
transport family
3rd Party
Connectors
Transfer
Acceleration
Storage
Gateway
Kinesis Firehose
AWS Storage Platform and SolutionsThe AWS Storage Portfolio
Object
Amazon GlacierAmazon S3
Block
Amazon EBS
(persistent)
Amazon EC2
Instance Store
(ephemeral)
File
Amazon EFS
11. Consolidate Data & Separate Storage & Compute
• Amazon S3 as the data lake storage tier; not a single analytics tool like an
IoT streaming analytics cluster or a Seismic Processing HPC cluster
• Decoupled storage and compute is cheaper and more efficient to operate
• Decoupled storage and compute allow us to evolve to clusterless
architectures (i.e. AWS Lambda, Amazon Athena, Redshift Spectrum & AWS
Glue)
• Do not build data silos in Hadoop or HPC clusters
• Gain the flexibility to use all the analytics tools and compute options in the
ecosystem around S3 & future proof the architecture
12. Designed for 11 9s
of durability
• Multiple Encryption Options
• Robust/Highly Flexible Access Controls
Durable Secure High performance
Multiple upload
Range GET
Scalable Throughput
Amazon EMR
Amazon Redshift
Amazon DynamoDB
Amazon Athena
Amazon Rekognition
Amazon Glue
Integrated
Simple REST API
AWS SDKs
Read-after-create consistency
Event notification
Lifecycle policies
Simple Management Tools
Hadoop compatibility
Easy to use
Store as much as you need
Scale storage and compute
independently
Scale without limits
Affordable
Scalable
Why Choose Amazon S3
13. S3 Standard S3 Standard - Infrequent Access Amazon Glacier
Active data Archive dataInfrequently accessed data
Milliseconds Minutes to HoursMilliseconds
$0.021/GB/mo $0.004/GB/mo$0.0125/GB/mo
Choice of storage classes on Amazon S3
14. Amazon S3 Amazon Glacier
Object
Object Storage is Foundational
LambdaEC2 EMR Spark Kinesis
Athena DynamoDB RedShift
Data Query
Steaming AnalyticsCompute
API
Gateway
QuickSight
Data Presentation
16. What About Data Management?
16
Do we have all
the latest data
for this well?
Do we keep all
the relevant
data for this
well?
Do we have
data from all
domains?
Do we have
data in
formats that I
can use?
Are there
multiple
copies of the
same data?
Do we have
data adjacent
to this area?
Where do I
find all the
data I need?
Do we know
the history of
all our data?
18. Catalog Your Data
S3
Put data in S3
Amazon
DynamoDB
Amazon Elasticsearch
Service
Metadata
What is in the data lake?
Documents the data lake
Summary statistics
Classification
Data Sources
Search capabilities
https://aws.amazon.com/answers/big-data/data-lake-solution/
19. Glue Crawlers: auto-populate data catalogs
Automatic schema inference:
• Built-in classifiers detect file type and extract
schema: record structure and data types.
• Add your own or share with others in the Glue
community - It's all Grok and Python.
Auto-detects Hive-style partitions, grouping
similar files into one table.
Run crawlers on schedule to discover new data
and schema changes.
Serverless – only pay when crawls run.
20. AWS Snowball & Snowmobile
• Accelerate PBs with AWS-provided
appliances
• 50, 80, 100 TB models
• 100PB Snowmobile
AWS Storage Gateway
• Instant hybrid cloud
• Up to 120 MB/s cloud upload rate
(4x improvement), and
Choose the Right Ingestion Methods
Amazon Kinesis Firehose
• Ingest device streams directly into
AWS data stores
AWS Direct Connect
• COLO to AWS
• Use native copy tools
Native/ISV Connectors
• Sqoop, Flume, DistCp
• Commvault, Veritas, etc
21. AWS Snowball Edge
Petabyte-scale hybrid device with onboard compute and storage
• 100 TB local storage
• Local compute equivalent to an Amazon EC2
m4.4xlarge instance
• 10GBase-T, 10/25Gb SFP28, and 40Gb
QSFP+ copper, and optical networking
• Ruggedized and rack-mountable
22. Snowball Edge key features
S3-compatible endpoint
File interface (NFS)
Clustering
Run AWS Lambda functions
Faster data transfer
Encryption
23. What Do We Do With the Data?
Field Data
Well Data
Geophysical
Data
Geological
Data
Reservoir
Data
Production
Data
Reserves
Biometric
Data
24. What Do We Do With the Data? (Part 2)
• Well Placement Optimization
• Production Optimization
• Predictive Maintenance
• Fleet & Asset Management
• Improved Safety & Compliance
25. How Do We Do It? Choose the Right Tools..
Amazon Redshift, Spectrum
Enterprise Data Warehouse
Amazon EMR
Hadoop/Spark
Amazon Athena
Clusterless SQL
Amazon Glue
Clusterless ETL
Amazon Aurora
Managed Relational Database
Amazon Machine Learning
Predictive Analytics
Amazon Quicksight
Business Intelligence/Visualization
Amazon ElasticSearch Service
ElasticSearch
Amazon ElastiCache
Redis In-memory Datastore
Amazon DynamoDB
Managed NoSQL Database
Amazon Rekognition & Amazon Polly
Image Recognition & Text-to-Speech AI APIs
Amazon Lex
Voice or Text Chatbots
26. The Emerging Analytics Architecture
AthenaAmazon Athena
Interactive Query
AWS Glue
ETL & Data Catalog
Storage
Serverless
Compute
Data
Processing
Amazon S3
Exabyte-scale Object Storage
Amazon Kinesis Firehose
Real-Time Data Streaming
Amazon EMR
Managed Hadoop Applications
AWS Lambda
Trigger-based Code Execution
AWS Glue Data Catalog
Hive-compatible Metastore
Amazon Redshift Spectrum
Fast @ Exabyte scale
Amazon Redshift
Petabyte-scale Data Warehousing
27. Amazon S3
Data Lake
Amazon Kinesis
Streams & Firehose
Hadoop / Spark
Streaming Analytics Tools
Amazon Redshift
Data Warehouse
Amazon DynamoDB
NoSQL Database
AWS Lambda
Spark Streaming on
EMR
Amazon Elasticsearch
Service
Relational Database
Amazon EMR
Amazon Aurora
Amazon Machine Learning
Predictive Analytics
Any Open Source Tool of
Choice on EC2
AWS Data Lake
Analytic
Capabilities
Data Science Sandbox
Visualization /
Reporting
Apache Storm
on EMR
Apache Flink
on EMR
Amazon Kinesis
Analytics
Serving Tier
Clusterless SQL Query
Amazon Athena
DataSourcesTransactionalData
Amazon Glue
Clusterless ETL
Amazon ElastiCache
Redis