SlideShare une entreprise Scribd logo
1  sur  21
The Life of a Data
Engineer
Alex Chalini
alejandro.chalini@unosquare.com
First:What to
they do?
Data engineers build massive reservoirs
for information.
They develop, construct, test and
maintain architectures such as
databases and large-scale data
processing systems.
Once continuous pipelines are installed
to – and from – these huge “pools” of
filtered information, data scientists can
pull relevant data sets for their
analyses.
Toolset
TechnicalSkills
required for the
Data Engineer
role
Basic:
 Database architectures
 SQL-based technologies (e.g. PostgreSQL and MySQL)
 Data modeling tools (e.g. ERWin, Enterprise Architect andVisio)
 ExtractTransform and Load (ETL) proficiency
 Python, C/C++ Java, Perl
 Data warehousing solutions
Advanced:
 NoSQL technologies (e.g.Cassandra and MongoDB)
 Hadoop-based technologies (e.g. MapReduce, Hive and Pig)
 Data mining
 Machine learning
BusinessSkills
required for the
Data Engineer
role
Creative Problem-Solving: Approaching data
organization challenges with a clear eye on what is
important; employing the right approach/methods
to make the maximum use of time and human
resources.
Effective Collaboration: Carefully listening to
management, data scientists and data architects to
establish their needs.
Intellectual Curiosity: Exploring new territories and
finding creative and unusual ways to solve data
management problems.
Industry Knowledge: Understanding the way your
chosen industry functions and how data can be
collected, analyzed and utilized; maintaining
flexibility in the face of big data developments.
Data Engineer
Responsibilities
(part 1)
Design, construct, install, test and maintain
highly scalable data management systems
Data Engineer
Responsibilities
(part 1)
Ensure systems meet business
requirements and industry practices
Data Engineer
Responsibilities
(part 1)
Build high-performance algorithms, prototypes,
predictive models and proof of concepts
Data Engineer
Responsibilities
(part 1)
Research opportunities for data acquisition and new uses
for existing data
Data Engineer
Responsibilities
(part 2)
Develop data set processes for data modeling, mining
and production
Data Engineer
Responsibilities
(part 2)
Integrate new data management technologies and
software engineering tools into existing structures
Data Engineer
Responsibilities
(part 2)
Create custom software components (e.g.
specialized UDFs) and analytics applications
Data Engineer
Responsibilities
(part 2)
Employ a variety of languages and tools (e.g.
scripting languages) to marry systems together
Data Engineer
Responsibilities
(part 3)
Clean and prune data
to discard irrelevant
information
Data Engineer
Responsibilities
(part 3)
Analyze and interpret results using standard
statistical tools and techniques
Data Engineer
Responsibilities
(part 3)
Recommend ways to improve data
reliability, efficiency and quality
Data Engineer
Responsibilities
(part 3)
Collaborate with data architects, modelers and IT
team members on project goals
The DataAnalysis
Lifecycle
Carreer
Path
How is the role
evolving ?
 RDBMS will always have a place in the data stack BUT the
industries are going strong with NoSQL and Big Data
technologies.
 A Data Engineer needs to be aware of the changes in the backend
and frontend development technologies, as the data they provide
might be consumed at all levels of the development stack.
THANKS!!!

Contenu connexe

Tendances

Azure data analytics platform - A reference architecture
Azure data analytics platform - A reference architecture Azure data analytics platform - A reference architecture
Azure data analytics platform - A reference architecture
Rajesh Kumar
 
Data Architecture for Data Governance
Data Architecture for Data GovernanceData Architecture for Data Governance
Data Architecture for Data Governance
DATAVERSITY
 

Tendances (20)

Introduction to Azure Databricks
Introduction to Azure DatabricksIntroduction to Azure Databricks
Introduction to Azure Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
Introduction to Data Engineering
Introduction to Data EngineeringIntroduction to Data Engineering
Introduction to Data Engineering
 
The Hidden Value of Hadoop Migration
The Hidden Value of Hadoop MigrationThe Hidden Value of Hadoop Migration
The Hidden Value of Hadoop Migration
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
 
Modern Data architecture Design
Modern Data architecture DesignModern Data architecture Design
Modern Data architecture Design
 
Five Things to Consider About Data Mesh and Data Governance
Five Things to Consider About Data Mesh and Data GovernanceFive Things to Consider About Data Mesh and Data Governance
Five Things to Consider About Data Mesh and Data Governance
 
Data Lifecycle Management
Data Lifecycle ManagementData Lifecycle Management
Data Lifecycle Management
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
 
Azure data platform overview
Azure data platform overviewAzure data platform overview
Azure data platform overview
 
Architecting Modern Data Platforms
Architecting Modern Data PlatformsArchitecting Modern Data Platforms
Architecting Modern Data Platforms
 
Snowflake Datawarehouse Architecturing
Snowflake Datawarehouse ArchitecturingSnowflake Datawarehouse Architecturing
Snowflake Datawarehouse Architecturing
 
BDA306 Building a Modern Data Warehouse: Deep Dive on Amazon Redshift
BDA306 Building a Modern Data Warehouse: Deep Dive on Amazon RedshiftBDA306 Building a Modern Data Warehouse: Deep Dive on Amazon Redshift
BDA306 Building a Modern Data Warehouse: Deep Dive on Amazon Redshift
 
Azure data analytics platform - A reference architecture
Azure data analytics platform - A reference architecture Azure data analytics platform - A reference architecture
Azure data analytics platform - A reference architecture
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
Data Architecture for Data Governance
Data Architecture for Data GovernanceData Architecture for Data Governance
Data Architecture for Data Governance
 
Building Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureBuilding Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft Azure
 

Similaire à (The life of a) Data engineer

Levin_Michael_2016-04
Levin_Michael_2016-04Levin_Michael_2016-04
Levin_Michael_2016-04
Michael Levin
 
Larry_Somers_Resume
Larry_Somers_ResumeLarry_Somers_Resume
Larry_Somers_Resume
Larry Somers
 
Data Scientist Salary, Skills, Jobs And Resume | Data Scientist Career | Data...
Data Scientist Salary, Skills, Jobs And Resume | Data Scientist Career | Data...Data Scientist Salary, Skills, Jobs And Resume | Data Scientist Career | Data...
Data Scientist Salary, Skills, Jobs And Resume | Data Scientist Career | Data...
Simplilearn
 

Similaire à (The life of a) Data engineer (20)

Data Engineer Course In Bangalore-October
Data Engineer Course In Bangalore-OctoberData Engineer Course In Bangalore-October
Data Engineer Course In Bangalore-October
 
Paper presentation
Paper presentationPaper presentation
Paper presentation
 
Developing and deploying AI solutions on the cloud using Team Data Science Pr...
Developing and deploying AI solutions on the cloud using Team Data Science Pr...Developing and deploying AI solutions on the cloud using Team Data Science Pr...
Developing and deploying AI solutions on the cloud using Team Data Science Pr...
 
Big Data SE vs. SE for Big Data
Big Data SE vs. SE for Big DataBig Data SE vs. SE for Big Data
Big Data SE vs. SE for Big Data
 
Lviv Data Science Club (Sergiy Lunyakin)
Lviv Data Science Club (Sergiy Lunyakin)Lviv Data Science Club (Sergiy Lunyakin)
Lviv Data Science Club (Sergiy Lunyakin)
 
pretesh2015
pretesh2015pretesh2015
pretesh2015
 
Ch1IntroductiontoDataScience.pptx
Ch1IntroductiontoDataScience.pptxCh1IntroductiontoDataScience.pptx
Ch1IntroductiontoDataScience.pptx
 
Levin_Michael_2016-04
Levin_Michael_2016-04Levin_Michael_2016-04
Levin_Michael_2016-04
 
Programming Assignment Help
Programming Assignment HelpProgramming Assignment Help
Programming Assignment Help
 
TAKE A LOOK AT THE TOP 7 SKILLS THAT A DATA ENGINEER CERTAINLY HAS TO HAVE
TAKE A LOOK AT THE TOP 7 SKILLS THAT A DATA ENGINEER CERTAINLY HAS TO HAVETAKE A LOOK AT THE TOP 7 SKILLS THAT A DATA ENGINEER CERTAINLY HAS TO HAVE
TAKE A LOOK AT THE TOP 7 SKILLS THAT A DATA ENGINEER CERTAINLY HAS TO HAVE
 
Lect 01
Lect 01Lect 01
Lect 01
 
System analysis and design
System analysis and designSystem analysis and design
System analysis and design
 
Larry_Somers_Resume
Larry_Somers_ResumeLarry_Somers_Resume
Larry_Somers_Resume
 
Datastage developer Resume
Datastage developer ResumeDatastage developer Resume
Datastage developer Resume
 
How Data Virtualization Adds Value to Your Data Science Stack
How Data Virtualization Adds Value to Your Data Science StackHow Data Virtualization Adds Value to Your Data Science Stack
How Data Virtualization Adds Value to Your Data Science Stack
 
Chapter 1(1) system development life .ppt
Chapter 1(1) system development life .pptChapter 1(1) system development life .ppt
Chapter 1(1) system development life .ppt
 
Data_Engineer_VS_Data_Scientist.pdf
Data_Engineer_VS_Data_Scientist.pdfData_Engineer_VS_Data_Scientist.pdf
Data_Engineer_VS_Data_Scientist.pdf
 
Data Scientist Salary, Skills, Jobs And Resume | Data Scientist Career | Data...
Data Scientist Salary, Skills, Jobs And Resume | Data Scientist Career | Data...Data Scientist Salary, Skills, Jobs And Resume | Data Scientist Career | Data...
Data Scientist Salary, Skills, Jobs And Resume | Data Scientist Career | Data...
 
Full-Stack Development or Data Science, Which is the more advantageous Career...
Full-Stack Development or Data Science, Which is the more advantageous Career...Full-Stack Development or Data Science, Which is the more advantageous Career...
Full-Stack Development or Data Science, Which is the more advantageous Career...
 
Arunkumar_Resume
Arunkumar_ResumeArunkumar_Resume
Arunkumar_Resume
 

Dernier

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Dernier (20)

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 

(The life of a) Data engineer