Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Dw 07032018-dr pl pradhan
1. Prepared By
Dr. P L Pradhan, Ph D
CSE ( System Security)
Dept of
Information
Technology
TGPCET, RTM
NAGPUR University,
NAGPUR,INDIA
2. Database, BigData, Data Science
• Database, BigData, Data Science
• Database, BigData, Data Science
• Database, BigData, Data Science
• Database, BigData, Data Science
• Database, BigData, Data Science
• Database, BigData, Data Science
• Database, BigData, Data Science
• Database, BigData, Data Science
18. • What is the difference between a primary key
and a foreign key?
• In a foreign key reference, a link is
created between two tables when the column
or columns that hold the primary key value
for one table are referenced by the column or
columns in another table. This column
becomes a foreign key in the second table.
22. Information
• A set of data item satisfying to the specific
objective.
• Data about data= Meta data
23. Database
• Set of data items logically interconnected &
satisfied to the several users simultaneously over
a LAN & WAN.
Oracle
Sybase
MS-SQL
Ingress
29. Data-Items- Records-Tables
• Tuples makes Tables
• Tables makes Database
• Database makes BigData=>D. Sc
• Hadoops helps to Extract the desired data &
infomation
33. Operational data
Operational data is not permanent- Current Data
Data is volatile
Any time & All the time data can be Read, Write &
Execute ( RWX)-Insert, Delete & Update.
Modification & Updating of Data is very risk
Therefore, Operational data have no security & privacy
38. DATA WAREHOUSING
• Separate
• High Availability, Reliability & Scalability
• Integrated
• Time Stamped( RX)
• Subject Oriented
• Non volatile-Permanent
• Accessible for all the time
41. OLTP-OLAP
• Source of data
• OLTP: Operational data; OLTPs are the original source
of the data.
• OLAP: Consolidation data; OLAP data comes from the
various OLTP Databases
• Purpose of data
• OLTP: To control and run fundamental business tasks-
Raw-Current data
• OLAP: To help with planning, problem solving, and
decision support-Life data
42. OLTP-OLAP
• What the data
• OLTP: Reveals a snapshot of on going
• OLAP: Multi-dimensional views of various kinds of
• Inserts and Updates
• OLTP: Short and fast inserts and updates initiated
by end users
• OLAP: Periodic long-running batch jobs refresh
the data
43. OLTP-OLAP
• Queries
OLTP: Relatively standardized and simple queries
Returning relatively few records
OLAP: Often complex queries involving
aggregations. Association, Collaboration
Processing Speed
OLTP: Typically very fast
OLAP: Depends on the amount of data involved;
batch data refreshes and complex queries may
take many hours; query speed can be improved
by creating indexes
44. OLTP~OLAP
• Space Requirements
• OLTP: Can be relatively small if historical data is
archived
• OLAP: Larger due to the existence of aggregation
structures and history data; requires more indexes than
OLTP
• Database Design
• OLTP: Highly normalized with many tables (3-NF)
• OLAP: Typically de-normalized with fewer tables; use of
star and/or snowflake schemas
45. Backup and Recovery
• Backup and Recovery
OLTP: Backup religiously; operational data is
critical to run the business, data loss is likely
to entail significant monetary loss and legal
liability.
OLAP: Instead of regular backups, some
environments may consider simply reloading
the OLTP data as a recovery method source:
49. OLTP
• On line transaction processing, or OLTP, is a
class of information systems that facilitate and
manage transaction-oriented applications,
typically for data entry and retrieval
transaction processing.
• Temporary Data- Current Data
50. OLAP
• OLAP is an acronym for Online Analytical
Processing. OLAP performs multidimensional
analysis of business data and provides the
capability for complex calculations, trend
analysis, and sophisticated data modeling.
• Past & Present Data
57. Big Data
• Extremely large data sets that may be
analysed computationally to reveal patterns,
trends, and associations, especially relating to
human behaviour and interactions.
• HCI-Human Computer Interaction on BD
• “Much more IT investment is going towards
managing and maintaining big data"
58. Big-Data
• Challenges include analysis, capture, data
curation, search, sharing, storage, transfer,
visualization, querying,
updating and information privacy.
• The term often refers simply to the use of
predictive analytics, user behavior analytics,
or certain other advanced data analytics
methods that extract value from data, and
seldom to a particular size of data set
59. Characteristics
• Big Data represents the Information assets
characterized by such a High Volume, Velocity
and Variety to require specific Technology and
Analytical Methods for its transformation into
Value
60. Big data
• Big data is arriving from multiple sources at
an alarming velocity, volume and variety. To
extract meaningful value from big data, you
need optimal processing power, analytics
capabilities and skills. ... Insights from big
data can enable all employees to make
better decisions ...
62. BRT
• Big Data is a collection of large datasets that
cannot be processed using traditional
computing techniques. It is not a single
technique or a tool, rather it involves many
areas of Business, Resource and Technology.
65. Characteristics
• Volume: big data doesn't sample; it just observes
and tracks what happens
• Velocity: big data is often available in real-time
• Variety: big data draws from text, images, audio,
video; plus it completes missing pieces
through data fusion
• Machine Learning: big data often doesn't ask why
and simply detects patterns
• Digital footprint: big data is often a cost-free by
product of digital interaction
66. Characteristics
• Volume
• The quantity of generated and stored data. The size of the data determines the
value and potential insight- and whether it can actually be considered big data or
not.
• Variety
• The type and nature of the data. This helps people who analyze it to effectively use
the resulting insight.
• Velocity
• In this context, the speed at which the data is generated and processed to meet
the demands and challenges that lie in the path of growth and development.
• Variability
• Inconsistency of the data set can hamper processes to handle and manage it.
• Veracity
• The quality of captured data can vary greatly, affecting accurate analysis.
67. 6C
• Factory work and Cyber-physical systems may
have a 6C system:
• Connection (sensor and networks)
• Cloud (computing and data on demand)
• Cyber (model and memory)
• Content/context (meaning and correlation)
• Community (sharing and collaboration)
• Customization (personalization and value)
68. What Comes Under Big Data?
• Black Box Data : It is a component of helicopter, airplanes, and jets,
etc. It captures voices of the flight crew, recordings of microphones
and earphones, and the performance information of the aircraft.
• Social Media Data : Social media such as Facebook and Twitter hold
information and the views posted by millions of people across the
globe.
• Stock Exchange Data : The stock exchange data holds information
about the ‘buy’ and ‘sell’ decisions made on a share of different
companies made by the customers.
• Power Grid Data : The power grid data holds information
consumed by a particular node with respect to a base station.
• Transport Data : Transport data includes model, capacity, distance
and availability of a vehicle.
• Search Engine Data : Search engines retrieve lots of data from
different databases.
70. 3V
• Thus Big Data includes huge volume, high
velocity, and extensible large variety of data.
The data in it will be of three types.
• Structured data : Relational data.
• Semi Structured data : XML data.
• Unstructured data : Word, PDF, Text, Media
Logs.
73. Big Data
Challenges
The major challenges associated with big data are
as follows:
• Capturing data
• Data Curation
• Storage
• Searching
• Sharing
• Transfer
• Analysis, visuallation, association, collaboration,
communications ( OOS, OOP, UML)
• Presentation
75. DSP
• The Data Science Process
• The Data Science Process is a framework for
approaching data science tasks, and is crafted
by Joe Blitzstein and Hanspeter Pfister of
Harvard's CS 109. The goal of CS 109, as per
Blitzstein himself, is to introduce students to
the overall process of data science
investigation, a goal which should provide
some insight into the framework itself.
77. Data Science
• Data science is an interdisciplinary field about
processes and systems to
extract knowledge or insights from data in
various forms, either structured or
unstructured, which is a continuation of some
of the data analysis fields such
as statistics, data mining, and predictive
analytics, similar to Knowledge Discovery in
Databases (KDD).
78. DS
• Data science employs techniques and theories drawn from
many fields within the broad areas of mathematics,
statistics, operations research, information science, and
computer science, including signal processing, probability
models, machine learning, statistical learning, data mining,
database, data engineering, pattern recognition and
learning, visualization, predictive analytics, uncertainty
modelling, data warehousing, data compression, computer
programming, artificial intelligence, and high performance
computing. Methods that scale to big data are of particular
interest in data science, although the discipline is not
generally considered to be restricted to such big data, and
big data solutions are often focused on organizing and pre-
processing the data instead of analysis. The development of
machine learning has enhanced the growth and importance
of data science.
79. CRISP-DM
• CRISP-DM
• As a comparison to the Data Science Process put
forth by Blitzstein & Pfister, and elaborated upon
by Squire, we take a quick look at the de facto
official (yet unquestionably falling out of fashion)
data mining framework (which has been
extended to data science problems), the Cross
Industry Standard Process for Data Mining
(CRISP-DM). Though the standard is no longer
actively maintained, it remains a popular
frameworkfor navigating data science projects.
82. Knowledge Discovery in Databases
• KDD Process
• Around the same time that CRISP-DM was emerging, the KDD
Process had finished developing. The KDD (Knowledge Discovery
in Databases) Process, by Fayyad, Piatetsky-Shapiro, and Smyth, is
a framework which has, at its core, "the application of specific data-
mining methods for pattern discovery and extraction." The
framework consists of the following steps:
Selection
Preprocessing
Transformation
Data Mining
Interpretation
84. SAS-SEMMA
• Discussion
• It is important to note that these are not the only
frameworks in this space; SEMMA (for Sample, Explore,
Modify, Model and Assess), from SAS, and the agile-
oriented Guerilla Analyticsboth come to mind. There
are also numerous in-house processes that various
data science teams and individuals no doubt employ
across any number of companies and industries in
which data scientists work.
• So, is the Data Science Process a new take on CRISP-
DM, which is just a reworking of KDD, or is it a new,
independent framework in its own right?
86. Data science
Exploratory data analysis
Information design
Interactive data visualization
Descriptive statistics
Inferential statistics
Statistical graphics
Plot
Data analysis • Infographic
88. DS
• Data science affects academic and applied research in
many domains, including machine translation, speech
recognition, robotics,search engines, digital economy,
but also the biological sciences, medical
informatics, health care, social sciences and the
humanities.
• It heavily influences economics, business and finance.
From the business perspective, data science is an
integral part of competitive intelligence, a newly
emerging field that encompasses a number of
activities, such as data mining and data analysis.
89. Data scientist
• Data scientists use their data and analytical ability to
find and interpret rich data sources; manage large
amounts of data despite hardware, software, and
bandwidth constraints; merge data sources; ensure
consistency of datasets; create visualizations to aid in
understanding data; build mathematical models using
the data; and present and communicate the data
insights/findings. They are often expected to produce
answers in days rather than months, work by
exploratory analysis and rapid iteration, and to produce
and present results with dashboards (displays of
current values) rather than papers/reports, as
statisticians normally do
94. Fact Data
• Facts of a business process
• Quality of Business: sales , cost , and profit
• In data warehousing, a Fact table consists of the measurements,
metrics or facts of a business process. It is located at the center of a
star schema or a snowflake schema surrounded by
dimension tables. Where multiple fact tables are used, these are
arranged as a fact constellation schema.
• Fact tables are the large tables in our warehouse schema that store
business measurements. Fact tables typically contain facts and
foreign keys to the dimension tables. Fact tables represent data,
usually numeric and additive, that can be analyzed and
examined. Examples include sales , cost , and profit .