SlideShare une entreprise Scribd logo
1  sur  119
Prepared By
Dr. P L Pradhan, Ph D
CSE ( System Security)
Dept of
Information
Technology
TGPCET, RTM
NAGPUR University,
NAGPUR,INDIA
Database, BigData, Data Science
• Database, BigData, Data Science
• Database, BigData, Data Science
• Database, BigData, Data Science
• Database, BigData, Data Science
• Database, BigData, Data Science
• Database, BigData, Data Science
• Database, BigData, Data Science
• Database, BigData, Data Science
BD
DS
D A T U M
DSc
BD & DS
data
data
V3
• 1897
V x V x V
•
Keys
• Keys
PK & FK
• What is the difference between a primary key
and a foreign key?
• In a foreign key reference, a link is
created between two tables when the column
or columns that hold the primary key value
for one table are referenced by the column or
columns in another table. This column
becomes a foreign key in the second table.
4- Modules
Module-1 DBMS
Module-2 Data Warehousing
Module-3 Big Data
Module-4 Data Science
Database
• What is Data?
Information
• A set of data item satisfying to the specific
objective.
• Data about data= Meta data
Database
• Set of data items logically interconnected &
satisfied to the several users simultaneously over
a LAN & WAN.
Oracle
Sybase
MS-SQL
Ingress
Types of database
Hierarchical
database
Network
Database
Relational
database
Data Model
One-one
One-many
Many-may
One Many
Relational Model
• Primary key
• Foreign Key
• RN Name Dept RM Sub dept-id
• Prod_id Desp, Location, Store, Salsman_id
RDBMS
RDB- TABLE
Data-Items- Records-Tables
• Tuples makes Tables
• Tables makes Database
• Database makes BigData=>D. Sc
• Hadoops helps to Extract the desired data &
infomation
Types of RDBMS
Oracle
Sybase
Ms-Sql
Ms Access
Advantage of RDBMS
Key Concept of PK & FK
Design of complex application
Disadvantage
Dirty Data
Sybase, Oracle, MS Access, Excel
Diff Country have different formats
Ex dd/mm/yyyy UK
Mm/dd/yyyy USA Format
Operational data
Operational data
Operational data is not permanent- Current Data
Data is volatile
Any time & All the time data can be Read, Write &
Execute ( RWX)-Insert, Delete & Update.
Modification & Updating of Data is very risk
Therefore, Operational data have no security & privacy
Operational data
High Risk to Business, HW & SW
Not Suitable for DSS- for TOP Mgmt
OLTP
Oracle
Sybase DW
OLAP
Operational data
Sybase
Oracle
MS-SQL
DW OLAP
Comparisons
MILK-BUTTER-GHEE
Milk1
Caw Milk
Buffalo Milk
Butter
Ghee
OLAP
DATA WAREHOUSING
• Separate
• High Availability, Reliability & Scalability
• Integrated
• Time Stamped( RX)
• Subject Oriented
• Non volatile-Permanent
• Accessible for all the time
OLTP-OLAP
OLAP –DW-data –DSS & Read only( View Only)
OLTP
OLTP-OLAP
• Source of data
• OLTP: Operational data; OLTPs are the original source
of the data.
• OLAP: Consolidation data; OLAP data comes from the
various OLTP Databases
• Purpose of data
• OLTP: To control and run fundamental business tasks-
Raw-Current data
• OLAP: To help with planning, problem solving, and
decision support-Life data
OLTP-OLAP
• What the data
• OLTP: Reveals a snapshot of on going
• OLAP: Multi-dimensional views of various kinds of
• Inserts and Updates
• OLTP: Short and fast inserts and updates initiated
by end users
• OLAP: Periodic long-running batch jobs refresh
the data
OLTP-OLAP
• Queries
OLTP: Relatively standardized and simple queries
Returning relatively few records
OLAP: Often complex queries involving
aggregations. Association, Collaboration
Processing Speed
OLTP: Typically very fast
OLAP: Depends on the amount of data involved;
batch data refreshes and complex queries may
take many hours; query speed can be improved
by creating indexes
OLTP~OLAP
• Space Requirements
• OLTP: Can be relatively small if historical data is
archived
• OLAP: Larger due to the existence of aggregation
structures and history data; requires more indexes than
OLTP
• Database Design
• OLTP: Highly normalized with many tables (3-NF)
• OLAP: Typically de-normalized with fewer tables; use of
star and/or snowflake schemas
Backup and Recovery
• Backup and Recovery
OLTP: Backup religiously; operational data is
critical to run the business, data loss is likely
to entail significant monetary loss and legal
liability.
OLAP: Instead of regular backups, some
environments may consider simply reloading
the OLTP data as a recovery method source:
OLTP-DW
WRX-Read
OLTP
• Oracle
Sybase
DW-Butter
MS-SQL
Ghee
OLTP
• On line transaction processing, or OLTP, is a
class of information systems that facilitate and
manage transaction-oriented applications,
typically for data entry and retrieval
transaction processing.
• Temporary Data- Current Data
OLAP
• OLAP is an acronym for Online Analytical
Processing. OLAP performs multidimensional
analysis of business data and provides the
capability for complex calculations, trend
analysis, and sophisticated data modeling.
• Past & Present Data
OLTP Vs OLAP
DW
Example-DW
Role of Data
Big Data
Big Data
• Extremely large data sets that may be
analysed computationally to reveal patterns,
trends, and associations, especially relating to
human behaviour and interactions.
• HCI-Human Computer Interaction on BD
• “Much more IT investment is going towards
managing and maintaining big data"
Big-Data
• Challenges include analysis, capture, data
curation, search, sharing, storage, transfer,
visualization, querying,
updating and information privacy.
• The term often refers simply to the use of
predictive analytics, user behavior analytics,
or certain other advanced data analytics
methods that extract value from data, and
seldom to a particular size of data set
Characteristics
• Big Data represents the Information assets
characterized by such a High Volume, Velocity
and Variety to require specific Technology and
Analytical Methods for its transformation into
Value
Big data
• Big data is arriving from multiple sources at
an alarming velocity, volume and variety. To
extract meaningful value from big data, you
need optimal processing power, analytics
capabilities and skills. ... Insights from big
data can enable all employees to make
better decisions ...
V3
BRT
• Big Data is a collection of large datasets that
cannot be processed using traditional
computing techniques. It is not a single
technique or a tool, rather it involves many
areas of Business, Resource and Technology.
Big Data
• What Comes Under Big Data?
BRT Model
T
RB
Characteristics
• Volume: big data doesn't sample; it just observes
and tracks what happens
• Velocity: big data is often available in real-time
• Variety: big data draws from text, images, audio,
video; plus it completes missing pieces
through data fusion
• Machine Learning: big data often doesn't ask why
and simply detects patterns
• Digital footprint: big data is often a cost-free by
product of digital interaction
Characteristics
• Volume
• The quantity of generated and stored data. The size of the data determines the
value and potential insight- and whether it can actually be considered big data or
not.
• Variety
• The type and nature of the data. This helps people who analyze it to effectively use
the resulting insight.
• Velocity
• In this context, the speed at which the data is generated and processed to meet
the demands and challenges that lie in the path of growth and development.
• Variability
• Inconsistency of the data set can hamper processes to handle and manage it.
• Veracity
• The quality of captured data can vary greatly, affecting accurate analysis.
6C
• Factory work and Cyber-physical systems may
have a 6C system:
• Connection (sensor and networks)
• Cloud (computing and data on demand)
• Cyber (model and memory)
• Content/context (meaning and correlation)
• Community (sharing and collaboration)
• Customization (personalization and value)
What Comes Under Big Data?
• Black Box Data : It is a component of helicopter, airplanes, and jets,
etc. It captures voices of the flight crew, recordings of microphones
and earphones, and the performance information of the aircraft.
• Social Media Data : Social media such as Facebook and Twitter hold
information and the views posted by millions of people across the
globe.
• Stock Exchange Data : The stock exchange data holds information
about the ‘buy’ and ‘sell’ decisions made on a share of different
companies made by the customers.
• Power Grid Data : The power grid data holds information
consumed by a particular node with respect to a base station.
• Transport Data : Transport data includes model, capacity, distance
and availability of a vehicle.
• Search Engine Data : Search engines retrieve lots of data from
different databases.
Under Big Data
3V
• Thus Big Data includes huge volume, high
velocity, and extensible large variety of data.
The data in it will be of three types.
• Structured data : Relational data.
• Semi Structured data : XML data.
• Unstructured data : Word, PDF, Text, Media
Logs.
V3 Technology
D-V3
data
Volume
VerityVelocity
Big Data
Challenges
The major challenges associated with big data are
as follows:
• Capturing data
• Data Curation
• Storage
• Searching
• Sharing
• Transfer
• Analysis, visuallation, association, collaboration,
communications ( OOS, OOP, UML)
• Presentation
Data Science-Really a great thing
DSP
• The Data Science Process
• The Data Science Process is a framework for
approaching data science tasks, and is crafted
by Joe Blitzstein and Hanspeter Pfister of
Harvard's CS 109. The goal of CS 109, as per
Blitzstein himself, is to introduce students to
the overall process of data science
investigation, a goal which should provide
some insight into the framework itself.
DSP
Data Science
• Data science is an interdisciplinary field about
processes and systems to
extract knowledge or insights from data in
various forms, either structured or
unstructured, which is a continuation of some
of the data analysis fields such
as statistics, data mining, and predictive
analytics, similar to Knowledge Discovery in
Databases (KDD).
DS
• Data science employs techniques and theories drawn from
many fields within the broad areas of mathematics,
statistics, operations research, information science, and
computer science, including signal processing, probability
models, machine learning, statistical learning, data mining,
database, data engineering, pattern recognition and
learning, visualization, predictive analytics, uncertainty
modelling, data warehousing, data compression, computer
programming, artificial intelligence, and high performance
computing. Methods that scale to big data are of particular
interest in data science, although the discipline is not
generally considered to be restricted to such big data, and
big data solutions are often focused on organizing and pre-
processing the data instead of analysis. The development of
machine learning has enhanced the growth and importance
of data science.
CRISP-DM
• CRISP-DM
• As a comparison to the Data Science Process put
forth by Blitzstein & Pfister, and elaborated upon
by Squire, we take a quick look at the de facto
official (yet unquestionably falling out of fashion)
data mining framework (which has been
extended to data science problems), the Cross
Industry Standard Process for Data Mining
(CRISP-DM). Though the standard is no longer
actively maintained, it remains a popular
frameworkfor navigating data science projects.
DS Process
DSP
• Business Understanding
• Data Understanding
• Data Preparation
• Modeling
• Evaluation
• Deployment
Knowledge Discovery in Databases
• KDD Process
• Around the same time that CRISP-DM was emerging, the KDD
Process had finished developing. The KDD (Knowledge Discovery
in Databases) Process, by Fayyad, Piatetsky-Shapiro, and Smyth, is
a framework which has, at its core, "the application of specific data-
mining methods for pattern discovery and extraction." The
framework consists of the following steps:
 Selection
 Preprocessing
 Transformation
 Data Mining
 Interpretation
DSP
SAS-SEMMA
• Discussion
• It is important to note that these are not the only
frameworks in this space; SEMMA (for Sample, Explore,
Modify, Model and Assess), from SAS, and the agile-
oriented Guerilla Analyticsboth come to mind. There
are also numerous in-house processes that various
data science teams and individuals no doubt employ
across any number of companies and industries in
which data scientists work.
• So, is the Data Science Process a new take on CRISP-
DM, which is just a reworking of KDD, or is it a new,
independent framework in its own right?
Infographic
Data visualization
Data science
Exploratory data analysis
Information design
Interactive data visualization
Descriptive statistics
Inferential statistics
Statistical graphics
Plot
Data analysis • Infographic
DSP
DS
• Data science affects academic and applied research in
many domains, including machine translation, speech
recognition, robotics,search engines, digital economy,
but also the biological sciences, medical
informatics, health care, social sciences and the
humanities.
• It heavily influences economics, business and finance.
From the business perspective, data science is an
integral part of competitive intelligence, a newly
emerging field that encompasses a number of
activities, such as data mining and data analysis.
Data scientist
• Data scientists use their data and analytical ability to
find and interpret rich data sources; manage large
amounts of data despite hardware, software, and
bandwidth constraints; merge data sources; ensure
consistency of datasets; create visualizations to aid in
understanding data; build mathematical models using
the data; and present and communicate the data
insights/findings. They are often expected to produce
answers in days rather than months, work by
exploratory analysis and rapid iteration, and to produce
and present results with dashboards (displays of
current values) rather than papers/reports, as
statisticians normally do
Back Ground
Data Science
Collection of OLTP is called OLAP
Collection of OLAP is called Data mining
Data Layers
Data
OLTP
OLAP
Data Mining
Big Data
DM
OLTP
OLTP
OLAP
OLAP
• DW
OLAP
OLAP
Data
Mining
Data
Mining
Customers
Scientist
Fact Data
• Facts of a business process
• Quality of Business: sales , cost , and profit
• In data warehousing, a Fact table consists of the measurements,
metrics or facts of a business process. It is located at the center of a
star schema or a snowflake schema surrounded by
dimension tables. Where multiple fact tables are used, these are
arranged as a fact constellation schema.
• Fact tables are the large tables in our warehouse schema that store
business measurements. Fact tables typically contain facts and
foreign keys to the dimension tables. Fact tables represent data,
usually numeric and additive, that can be analyzed and
examined. Examples include sales , cost , and profit .
Fact Table
Star Schema
Star Flask Schema
Sales Data
Star Model-Multidimensional table
Star Flakes Scheme
Snow Flakes Schema
OLTP-OLAP
OLTP-OLAP
MOLAP-Cube- Hypercube
OLTP-ETL-DW-DM-OLAP
OLTP-OLAP
Customer-Scientist
OLAP
Source data-Destination data
Operation & Services
RFOS
• Relation Function Operation Services
Oracle
DB
ERP
ETL
Staging Area
Function-
Operation
DW
OLAP
Services
Business Analyst-Engineer Role
Role of Mgmt
Low Level mgmt OLTP:
Engineer & opterators
High Level mgmt
OLAP=Top Mgmt-
Scientist, CEO= DSS
Data
Low level & High level
Operations & Services
Real World-OLTP & OLAP
Data Action- Role
DM=BD
OLAP
OLTP
data
Traditional Complex IT Infrastructure
C
li
e
n
t
End Of Session
THANKS
YOU

Contenu connexe

Tendances

Application of Data Warehousing & Data Mining to Exploitation for Supporting ...
Application of Data Warehousing & Data Mining to Exploitation for Supporting ...Application of Data Warehousing & Data Mining to Exploitation for Supporting ...
Application of Data Warehousing & Data Mining to Exploitation for Supporting ...Gihan Wikramanayake
 
Big data unit 2
Big data unit 2Big data unit 2
Big data unit 2RojaT4
 
Big Data Analytics Using Hadoop
Big Data Analytics Using HadoopBig Data Analytics Using Hadoop
Big Data Analytics Using HadoopSrikanth VNV
 
Big Data Presentation - Data Center Dynamics Sydney 2014 - Dez Blanchfield
Big Data Presentation - Data Center Dynamics Sydney 2014 - Dez BlanchfieldBig Data Presentation - Data Center Dynamics Sydney 2014 - Dez Blanchfield
Big Data Presentation - Data Center Dynamics Sydney 2014 - Dez BlanchfieldDez Blanchfield
 
Chapter 08 Data Mining Techniques
Chapter 08 Data Mining Techniques Chapter 08 Data Mining Techniques
Chapter 08 Data Mining Techniques Houw Liong The
 
Big Data Final Presentation
Big Data Final PresentationBig Data Final Presentation
Big Data Final Presentation17aroumougamh
 
INF2190_W1_2016_public
INF2190_W1_2016_publicINF2190_W1_2016_public
INF2190_W1_2016_publicAttila Barta
 
Difference between data warehouse and data mining
Difference between data warehouse and data miningDifference between data warehouse and data mining
Difference between data warehouse and data miningmaxonlinetr
 
Data Mining With Excel 2007 And SQL Server 2008
Data Mining With Excel 2007 And SQL Server 2008Data Mining With Excel 2007 And SQL Server 2008
Data Mining With Excel 2007 And SQL Server 2008Mark Tabladillo
 
Data Mining and Business Intelligence Tools
Data Mining and Business Intelligence ToolsData Mining and Business Intelligence Tools
Data Mining and Business Intelligence ToolsMotaz Saad
 
Data Mining Concepts and Techniques
Data Mining Concepts and TechniquesData Mining Concepts and Techniques
Data Mining Concepts and TechniquesPratik Tambekar
 
What Is DATA MINING(INTRODUCTION)
What Is DATA MINING(INTRODUCTION)What Is DATA MINING(INTRODUCTION)
What Is DATA MINING(INTRODUCTION)Pratik Tambekar
 
Big Data Analytics 2014
Big Data Analytics 2014Big Data Analytics 2014
Big Data Analytics 2014Stratebi
 
THE 3V's OF BIG DATA: VARIETY, VELOCITY, AND VOLUME from Structure:Data 2012
THE 3V's OF BIG DATA: VARIETY, VELOCITY, AND VOLUME from Structure:Data 2012THE 3V's OF BIG DATA: VARIETY, VELOCITY, AND VOLUME from Structure:Data 2012
THE 3V's OF BIG DATA: VARIETY, VELOCITY, AND VOLUME from Structure:Data 2012Gigaom
 
Data mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniquesData mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniquesSaif Ullah
 

Tendances (20)

Application of Data Warehousing & Data Mining to Exploitation for Supporting ...
Application of Data Warehousing & Data Mining to Exploitation for Supporting ...Application of Data Warehousing & Data Mining to Exploitation for Supporting ...
Application of Data Warehousing & Data Mining to Exploitation for Supporting ...
 
Big data unit 2
Big data unit 2Big data unit 2
Big data unit 2
 
Data mining 1
Data mining 1Data mining 1
Data mining 1
 
Big Data Analytics Using Hadoop
Big Data Analytics Using HadoopBig Data Analytics Using Hadoop
Big Data Analytics Using Hadoop
 
Big Data Presentation - Data Center Dynamics Sydney 2014 - Dez Blanchfield
Big Data Presentation - Data Center Dynamics Sydney 2014 - Dez BlanchfieldBig Data Presentation - Data Center Dynamics Sydney 2014 - Dez Blanchfield
Big Data Presentation - Data Center Dynamics Sydney 2014 - Dez Blanchfield
 
Chapter 08 Data Mining Techniques
Chapter 08 Data Mining Techniques Chapter 08 Data Mining Techniques
Chapter 08 Data Mining Techniques
 
Big Data Final Presentation
Big Data Final PresentationBig Data Final Presentation
Big Data Final Presentation
 
INF2190_W1_2016_public
INF2190_W1_2016_publicINF2190_W1_2016_public
INF2190_W1_2016_public
 
Difference between data warehouse and data mining
Difference between data warehouse and data miningDifference between data warehouse and data mining
Difference between data warehouse and data mining
 
Data Mining With Excel 2007 And SQL Server 2008
Data Mining With Excel 2007 And SQL Server 2008Data Mining With Excel 2007 And SQL Server 2008
Data Mining With Excel 2007 And SQL Server 2008
 
03 data mining : data warehouse
03 data mining : data warehouse03 data mining : data warehouse
03 data mining : data warehouse
 
Data Mining and Business Intelligence Tools
Data Mining and Business Intelligence ToolsData Mining and Business Intelligence Tools
Data Mining and Business Intelligence Tools
 
Data Mining Concepts and Techniques
Data Mining Concepts and TechniquesData Mining Concepts and Techniques
Data Mining Concepts and Techniques
 
What Is DATA MINING(INTRODUCTION)
What Is DATA MINING(INTRODUCTION)What Is DATA MINING(INTRODUCTION)
What Is DATA MINING(INTRODUCTION)
 
BIG DATA
BIG DATABIG DATA
BIG DATA
 
Big Data Analytics 2014
Big Data Analytics 2014Big Data Analytics 2014
Big Data Analytics 2014
 
THE 3V's OF BIG DATA: VARIETY, VELOCITY, AND VOLUME from Structure:Data 2012
THE 3V's OF BIG DATA: VARIETY, VELOCITY, AND VOLUME from Structure:Data 2012THE 3V's OF BIG DATA: VARIETY, VELOCITY, AND VOLUME from Structure:Data 2012
THE 3V's OF BIG DATA: VARIETY, VELOCITY, AND VOLUME from Structure:Data 2012
 
Data Cleaning
Data CleaningData Cleaning
Data Cleaning
 
Big data Analytics
Big data AnalyticsBig data Analytics
Big data Analytics
 
Data mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniquesData mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniques
 

Similaire à Dw 07032018-dr pl pradhan

Introduction to data mining and data warehousing
Introduction to data mining and data warehousingIntroduction to data mining and data warehousing
Introduction to data mining and data warehousingEr. Nawaraj Bhandari
 
Data Science Machine Lerning Bigdat.pptx
Data Science Machine Lerning Bigdat.pptxData Science Machine Lerning Bigdat.pptx
Data Science Machine Lerning Bigdat.pptxPriyadarshini648418
 
Data mining techniques unit 1
Data mining techniques  unit 1Data mining techniques  unit 1
Data mining techniques unit 1malathieswaran29
 
big data processing.pptx
big data processing.pptxbig data processing.pptx
big data processing.pptxssuser96aab9
 
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?Denodo
 
Data lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiryData lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amirydatastack
 
DataScienceIntroduction.pptx
DataScienceIntroduction.pptxDataScienceIntroduction.pptx
DataScienceIntroduction.pptxKannanThangavelu2
 
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenariosThe Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarioskcmallu
 
AzureDay - Introduction Big Data Analytics.
AzureDay  - Introduction Big Data Analytics.AzureDay  - Introduction Big Data Analytics.
AzureDay - Introduction Big Data Analytics.Łukasz Grala
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
 
Big Data Analytics Materials, Chapter: 1
Big Data Analytics Materials, Chapter: 1Big Data Analytics Materials, Chapter: 1
Big Data Analytics Materials, Chapter: 1RUHULAMINHAZARIKA
 
The Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They NeedThe Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They NeedDunn Solutions Group
 

Similaire à Dw 07032018-dr pl pradhan (20)

Introduction to data mining and data warehousing
Introduction to data mining and data warehousingIntroduction to data mining and data warehousing
Introduction to data mining and data warehousing
 
Big Data Boom
Big Data BoomBig Data Boom
Big Data Boom
 
Data Science Machine Lerning Bigdat.pptx
Data Science Machine Lerning Bigdat.pptxData Science Machine Lerning Bigdat.pptx
Data Science Machine Lerning Bigdat.pptx
 
Data mining techniques unit 1
Data mining techniques  unit 1Data mining techniques  unit 1
Data mining techniques unit 1
 
big data processing.pptx
big data processing.pptxbig data processing.pptx
big data processing.pptx
 
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
 
Data lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiryData lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiry
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Big data.ppt
Big data.pptBig data.ppt
Big data.ppt
 
Lecture1
Lecture1Lecture1
Lecture1
 
DataScienceIntroduction.pptx
DataScienceIntroduction.pptxDataScienceIntroduction.pptx
DataScienceIntroduction.pptx
 
Lecture1
Lecture1Lecture1
Lecture1
 
Ch~2.pdf
Ch~2.pdfCh~2.pdf
Ch~2.pdf
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSING
 
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenariosThe Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
 
AzureDay - Introduction Big Data Analytics.
AzureDay  - Introduction Big Data Analytics.AzureDay  - Introduction Big Data Analytics.
AzureDay - Introduction Big Data Analytics.
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
 
Big Data Analytics Materials, Chapter: 1
Big Data Analytics Materials, Chapter: 1Big Data Analytics Materials, Chapter: 1
Big Data Analytics Materials, Chapter: 1
 
The Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They NeedThe Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They Need
 
DW (1).ppt
DW (1).pptDW (1).ppt
DW (1).ppt
 

Dernier

➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...amitlee9823
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...amitlee9823
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...amitlee9823
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...amitlee9823
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...amitlee9823
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...amitlee9823
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...amitlee9823
 
hybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptxhybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptx9to5mart
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...amitlee9823
 

Dernier (20)

➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
 
hybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptxhybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptx
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 

Dw 07032018-dr pl pradhan

  • 1. Prepared By Dr. P L Pradhan, Ph D CSE ( System Security) Dept of Information Technology TGPCET, RTM NAGPUR University, NAGPUR,INDIA
  • 2. Database, BigData, Data Science • Database, BigData, Data Science • Database, BigData, Data Science • Database, BigData, Data Science • Database, BigData, Data Science • Database, BigData, Data Science • Database, BigData, Data Science • Database, BigData, Data Science • Database, BigData, Data Science
  • 3. BD
  • 4. DS
  • 5. D A T U M
  • 6. DSc
  • 9. V x V x V
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 18. • What is the difference between a primary key and a foreign key? • In a foreign key reference, a link is created between two tables when the column or columns that hold the primary key value for one table are referenced by the column or columns in another table. This column becomes a foreign key in the second table.
  • 19.
  • 20. 4- Modules Module-1 DBMS Module-2 Data Warehousing Module-3 Big Data Module-4 Data Science
  • 22. Information • A set of data item satisfying to the specific objective. • Data about data= Meta data
  • 23. Database • Set of data items logically interconnected & satisfied to the several users simultaneously over a LAN & WAN. Oracle Sybase MS-SQL Ingress
  • 26. Relational Model • Primary key • Foreign Key • RN Name Dept RM Sub dept-id • Prod_id Desp, Location, Store, Salsman_id
  • 27. RDBMS
  • 29. Data-Items- Records-Tables • Tuples makes Tables • Tables makes Database • Database makes BigData=>D. Sc • Hadoops helps to Extract the desired data & infomation
  • 31. Advantage of RDBMS Key Concept of PK & FK Design of complex application
  • 32. Disadvantage Dirty Data Sybase, Oracle, MS Access, Excel Diff Country have different formats Ex dd/mm/yyyy UK Mm/dd/yyyy USA Format Operational data
  • 33. Operational data Operational data is not permanent- Current Data Data is volatile Any time & All the time data can be Read, Write & Execute ( RWX)-Insert, Delete & Update. Modification & Updating of Data is very risk Therefore, Operational data have no security & privacy
  • 34. Operational data High Risk to Business, HW & SW Not Suitable for DSS- for TOP Mgmt
  • 38. DATA WAREHOUSING • Separate • High Availability, Reliability & Scalability • Integrated • Time Stamped( RX) • Subject Oriented • Non volatile-Permanent • Accessible for all the time
  • 39. OLTP-OLAP OLAP –DW-data –DSS & Read only( View Only)
  • 40. OLTP
  • 41. OLTP-OLAP • Source of data • OLTP: Operational data; OLTPs are the original source of the data. • OLAP: Consolidation data; OLAP data comes from the various OLTP Databases • Purpose of data • OLTP: To control and run fundamental business tasks- Raw-Current data • OLAP: To help with planning, problem solving, and decision support-Life data
  • 42. OLTP-OLAP • What the data • OLTP: Reveals a snapshot of on going • OLAP: Multi-dimensional views of various kinds of • Inserts and Updates • OLTP: Short and fast inserts and updates initiated by end users • OLAP: Periodic long-running batch jobs refresh the data
  • 43. OLTP-OLAP • Queries OLTP: Relatively standardized and simple queries Returning relatively few records OLAP: Often complex queries involving aggregations. Association, Collaboration Processing Speed OLTP: Typically very fast OLAP: Depends on the amount of data involved; batch data refreshes and complex queries may take many hours; query speed can be improved by creating indexes
  • 44. OLTP~OLAP • Space Requirements • OLTP: Can be relatively small if historical data is archived • OLAP: Larger due to the existence of aggregation structures and history data; requires more indexes than OLTP • Database Design • OLTP: Highly normalized with many tables (3-NF) • OLAP: Typically de-normalized with fewer tables; use of star and/or snowflake schemas
  • 45. Backup and Recovery • Backup and Recovery OLTP: Backup religiously; operational data is critical to run the business, data loss is likely to entail significant monetary loss and legal liability. OLAP: Instead of regular backups, some environments may consider simply reloading the OLTP data as a recovery method source:
  • 49. OLTP • On line transaction processing, or OLTP, is a class of information systems that facilitate and manage transaction-oriented applications, typically for data entry and retrieval transaction processing. • Temporary Data- Current Data
  • 50. OLAP • OLAP is an acronym for Online Analytical Processing. OLAP performs multidimensional analysis of business data and provides the capability for complex calculations, trend analysis, and sophisticated data modeling. • Past & Present Data
  • 52.
  • 53. DW
  • 57. Big Data • Extremely large data sets that may be analysed computationally to reveal patterns, trends, and associations, especially relating to human behaviour and interactions. • HCI-Human Computer Interaction on BD • “Much more IT investment is going towards managing and maintaining big data"
  • 58. Big-Data • Challenges include analysis, capture, data curation, search, sharing, storage, transfer, visualization, querying, updating and information privacy. • The term often refers simply to the use of predictive analytics, user behavior analytics, or certain other advanced data analytics methods that extract value from data, and seldom to a particular size of data set
  • 59. Characteristics • Big Data represents the Information assets characterized by such a High Volume, Velocity and Variety to require specific Technology and Analytical Methods for its transformation into Value
  • 60. Big data • Big data is arriving from multiple sources at an alarming velocity, volume and variety. To extract meaningful value from big data, you need optimal processing power, analytics capabilities and skills. ... Insights from big data can enable all employees to make better decisions ...
  • 61. V3
  • 62. BRT • Big Data is a collection of large datasets that cannot be processed using traditional computing techniques. It is not a single technique or a tool, rather it involves many areas of Business, Resource and Technology.
  • 63. Big Data • What Comes Under Big Data?
  • 65. Characteristics • Volume: big data doesn't sample; it just observes and tracks what happens • Velocity: big data is often available in real-time • Variety: big data draws from text, images, audio, video; plus it completes missing pieces through data fusion • Machine Learning: big data often doesn't ask why and simply detects patterns • Digital footprint: big data is often a cost-free by product of digital interaction
  • 66. Characteristics • Volume • The quantity of generated and stored data. The size of the data determines the value and potential insight- and whether it can actually be considered big data or not. • Variety • The type and nature of the data. This helps people who analyze it to effectively use the resulting insight. • Velocity • In this context, the speed at which the data is generated and processed to meet the demands and challenges that lie in the path of growth and development. • Variability • Inconsistency of the data set can hamper processes to handle and manage it. • Veracity • The quality of captured data can vary greatly, affecting accurate analysis.
  • 67. 6C • Factory work and Cyber-physical systems may have a 6C system: • Connection (sensor and networks) • Cloud (computing and data on demand) • Cyber (model and memory) • Content/context (meaning and correlation) • Community (sharing and collaboration) • Customization (personalization and value)
  • 68. What Comes Under Big Data? • Black Box Data : It is a component of helicopter, airplanes, and jets, etc. It captures voices of the flight crew, recordings of microphones and earphones, and the performance information of the aircraft. • Social Media Data : Social media such as Facebook and Twitter hold information and the views posted by millions of people across the globe. • Stock Exchange Data : The stock exchange data holds information about the ‘buy’ and ‘sell’ decisions made on a share of different companies made by the customers. • Power Grid Data : The power grid data holds information consumed by a particular node with respect to a base station. • Transport Data : Transport data includes model, capacity, distance and availability of a vehicle. • Search Engine Data : Search engines retrieve lots of data from different databases.
  • 70. 3V • Thus Big Data includes huge volume, high velocity, and extensible large variety of data. The data in it will be of three types. • Structured data : Relational data. • Semi Structured data : XML data. • Unstructured data : Word, PDF, Text, Media Logs.
  • 73. Big Data Challenges The major challenges associated with big data are as follows: • Capturing data • Data Curation • Storage • Searching • Sharing • Transfer • Analysis, visuallation, association, collaboration, communications ( OOS, OOP, UML) • Presentation
  • 74. Data Science-Really a great thing
  • 75. DSP • The Data Science Process • The Data Science Process is a framework for approaching data science tasks, and is crafted by Joe Blitzstein and Hanspeter Pfister of Harvard's CS 109. The goal of CS 109, as per Blitzstein himself, is to introduce students to the overall process of data science investigation, a goal which should provide some insight into the framework itself.
  • 76. DSP
  • 77. Data Science • Data science is an interdisciplinary field about processes and systems to extract knowledge or insights from data in various forms, either structured or unstructured, which is a continuation of some of the data analysis fields such as statistics, data mining, and predictive analytics, similar to Knowledge Discovery in Databases (KDD).
  • 78. DS • Data science employs techniques and theories drawn from many fields within the broad areas of mathematics, statistics, operations research, information science, and computer science, including signal processing, probability models, machine learning, statistical learning, data mining, database, data engineering, pattern recognition and learning, visualization, predictive analytics, uncertainty modelling, data warehousing, data compression, computer programming, artificial intelligence, and high performance computing. Methods that scale to big data are of particular interest in data science, although the discipline is not generally considered to be restricted to such big data, and big data solutions are often focused on organizing and pre- processing the data instead of analysis. The development of machine learning has enhanced the growth and importance of data science.
  • 79. CRISP-DM • CRISP-DM • As a comparison to the Data Science Process put forth by Blitzstein & Pfister, and elaborated upon by Squire, we take a quick look at the de facto official (yet unquestionably falling out of fashion) data mining framework (which has been extended to data science problems), the Cross Industry Standard Process for Data Mining (CRISP-DM). Though the standard is no longer actively maintained, it remains a popular frameworkfor navigating data science projects.
  • 81. DSP • Business Understanding • Data Understanding • Data Preparation • Modeling • Evaluation • Deployment
  • 82. Knowledge Discovery in Databases • KDD Process • Around the same time that CRISP-DM was emerging, the KDD Process had finished developing. The KDD (Knowledge Discovery in Databases) Process, by Fayyad, Piatetsky-Shapiro, and Smyth, is a framework which has, at its core, "the application of specific data- mining methods for pattern discovery and extraction." The framework consists of the following steps:  Selection  Preprocessing  Transformation  Data Mining  Interpretation
  • 83. DSP
  • 84. SAS-SEMMA • Discussion • It is important to note that these are not the only frameworks in this space; SEMMA (for Sample, Explore, Modify, Model and Assess), from SAS, and the agile- oriented Guerilla Analyticsboth come to mind. There are also numerous in-house processes that various data science teams and individuals no doubt employ across any number of companies and industries in which data scientists work. • So, is the Data Science Process a new take on CRISP- DM, which is just a reworking of KDD, or is it a new, independent framework in its own right?
  • 86. Data science Exploratory data analysis Information design Interactive data visualization Descriptive statistics Inferential statistics Statistical graphics Plot Data analysis • Infographic
  • 87. DSP
  • 88. DS • Data science affects academic and applied research in many domains, including machine translation, speech recognition, robotics,search engines, digital economy, but also the biological sciences, medical informatics, health care, social sciences and the humanities. • It heavily influences economics, business and finance. From the business perspective, data science is an integral part of competitive intelligence, a newly emerging field that encompasses a number of activities, such as data mining and data analysis.
  • 89. Data scientist • Data scientists use their data and analytical ability to find and interpret rich data sources; manage large amounts of data despite hardware, software, and bandwidth constraints; merge data sources; ensure consistency of datasets; create visualizations to aid in understanding data; build mathematical models using the data; and present and communicate the data insights/findings. They are often expected to produce answers in days rather than months, work by exploratory analysis and rapid iteration, and to produce and present results with dashboards (displays of current values) rather than papers/reports, as statisticians normally do
  • 91. Data Science Collection of OLTP is called OLAP Collection of OLAP is called Data mining
  • 94. Fact Data • Facts of a business process • Quality of Business: sales , cost , and profit • In data warehousing, a Fact table consists of the measurements, metrics or facts of a business process. It is located at the center of a star schema or a snowflake schema surrounded by dimension tables. Where multiple fact tables are used, these are arranged as a fact constellation schema. • Fact tables are the large tables in our warehouse schema that store business measurements. Fact tables typically contain facts and foreign keys to the dimension tables. Fact tables represent data, usually numeric and additive, that can be analyzed and examined. Examples include sales , cost , and profit .
  • 107.
  • 109. OLAP
  • 112. RFOS • Relation Function Operation Services Oracle DB ERP ETL Staging Area Function- Operation DW OLAP Services Business Analyst-Engineer Role
  • 113. Role of Mgmt Low Level mgmt OLTP: Engineer & opterators High Level mgmt OLAP=Top Mgmt- Scientist, CEO= DSS Data
  • 114. Low level & High level
  • 118. Traditional Complex IT Infrastructure C li e n t

Notes de l'éditeur

  1. World wide Geographical DATA: Climate, Environment, Weather, temperature etc.
  2. Different between the Primary key & Foreign key
  3. MILK-BUTTER-GHEE
  4. Data visualization
  5. Data science Activities
  6. Data- Kernel- BIG-DATA
  7. RFOS
  8. OPERATION- SERVICES RFOS- Relation Function Operation Services-
  9. Operation & Services