10. BIG DATA is the term for a collection of
data sets so large and complex that it
becomes difficult to process using on-hand
database management tools or traditional
data processing applications.
Friday, April 4, 14
11. • Big data is a popular term used to describe
the exponential growth and availability of
data, both structured and unstructured.
Friday, April 4, 14
12. Big data is a buzzword, or catch-phrase, used to
describe a massive volume of both structured
and unstructured data that is so large that it's
difficult to process using traditional database
and software techniques. In most enterprise
scenarios the data is too big or it moves too
fast or it exceeds current processing capacity.
Friday, April 4, 14
13. In BIG DATA there are
3Vs which are the
defining properties and
the dimensions of
Big Data
Friday, April 4, 14
16. Volume-
BigVolume consists of simple
SQL analytics and with complex
non-SQL analytics. In other
words volume refers to the
amount of data.
Friday, April 4, 14
17. SQL
• SQL Stands for Structured Query Language.
• SQL is a standardized query language for
requesting information from a database.
• SQL was first introduced as a commercial
database system in 1979 by the Oracle
Corporation.
• Historically, SQL has been the favorite query
language for database management systems
running on minicomputers and mainframes.
Friday, April 4, 14
19. Variety-
Large number of diverse data
sources to integrate. In other
words variety is basically
referring to the number of
different types of data.
Friday, April 4, 14
21. Structured Data
• Structured Data is data that resides in a fixed
field within a record or file is called
structured data.This includes data contained
in relational databases and spreadsheets.
Structured data has the advantage of being
easily entered, stored, queried and analyzed.
Friday, April 4, 14
22. • Library Catalogues (date, author, place, subject, etc)
• Census records (birth, income, employment, place etc.)
• Phone numbers (and the phone book)
• Economic data (GDP, PPI, ASX etc.)
• XML-TEI (bringing structure to the text through tagging particular
elements like versions of the word ”canal’ in 17th C Dutch.
• Databases
• Data warehouse
• Enterprise systems (CRM, ERP, etc)
EXAMPLES OF STRUCTURED DATA
Friday, April 4, 14
23. Semi structured Data
• Semi-structured data is a form of
structured data that does not conform with
the formal structure of data models
associated with relational databases or
other forms of data tables
Friday, April 4, 14
24. • Web Pages
• Information Integration
• XML
EXAMPLES OF SEMI STRUCTURED DATA
Friday, April 4, 14
25. Unstructured Data
• Unstructured Data refers to information
that either does not have a pre-defined data
model or is not organized in a predefined
manner. Unstructured information is typically
text-heavy. In other words unstructured data
is something that is at the other end of the
spectrum. It might be in any form: text, audio,
video.We definitely don’t know from looking
at the data what it means ,unless we apply
human understanding to it.
Friday, April 4, 14
26. EXAMPLES OF UNSTRUCTURED DATA
• Book
• Story
• Heavy text
• audio
• video
• RSS Feeds
• Word documents
• Excel Spreadsheets
• Email messages
Friday, April 4, 14
29. Benefits of Batch
Processing.
It can shift the time of job processing to when the computing
resources are less busy.
• It avoids idling the computing resources with minute-by-minute
manual intervention and supervision.
• By keeping high overall rate of utilization, it amortizes the computer,
especially an expensive one.
• It allows the system to use different priorities for batch and
interactive work.
• Rather than running one program multiple times to process one
transaction each time, batch processes will run the program only
once for many transactions, reducing system overhead.
Friday, April 4, 14
33. ORACLE BIG DATA
SOLUTION
• Oracle is the first vendor to offer a complete and
integrated solution to address the full spectrum of
enterprise big data requirements. Oracle’s big data
strategy is centered on the idea that you can
extend your current enterprise information
architecture to incorporate big data. New big data
technologies, such as Hadoop and Oracle NoSQL
database, run alongside your Oracle data
warehouse to deliver business value and address
your big data requirements.
Friday, April 4, 14
36. ADVANTAGES
• Data mining allows uses are that you can find correlations easier
• More calculated now therefore accuracy is higher
• Data is now combined into a big mass which allows for links to be
found
• For example: company with decades of information can make use of
Big Data and data analysis to create competitive advantages and
open new business opportunities
• Started because companies have been finding it hard to manage all
their data
• Creates new growth opportunities, lots of jobs
Friday, April 4, 14
37. DISADVANTAGES
• Big risks on security and privacy
• Challenges arise: expensive, need to spend a lot to get it working
• A lot of analyzing: uncover patterns, apply algorithms, connections
relationships
• Still need specialization regarding the analysts; hard to find the right
skill set
Friday, April 4, 14
40. • Apache Hadoop is an open source data framework for storage and
large scale processing for data sets on clusters of commodity
hardwares. It is licensed under the Apache License 2.0. The Apache
Hadoop framework is composed of the following modules:
• Hadoop Common – contains libraries and utilities needed by other
Hadoop modules.
• Hadoop Distributed File System (HDFS) – a distributed file-system
that stores data on commodity machines, providing very high
aggregate bandwidth across the cluster.
• HadoopYARN – a resource-management platform responsible for
managing compute resources in clusters and using them for scheduling
of users' applications.
• Hadoop MapReduce – a programming model for large scale data
processing.
• This is written in- Java
Friday, April 4, 14
41. • MongoDB is a big data software which came from the word
“humongous”. MongoDB is a cross-platform document-oriented
database.A document-oriented database is a computer program
designed for storing, retrieving, and managing document-oriented
information, also known as semi-structured data.This is classified as
NoSQL. A NoSQL database provides a mechanism for storage and
retrieval of data that is modeled in means other than the tabular
relations used in relational databases.
• MarkLogic is an American Business company that makes NoSQL
database.
• Language written in- C++
Friday, April 4, 14
44. Enterprise NoSQL Database
Technology
• For more than a decade, MarkLogic has
delivered a powerful, agile, and trusted
enterprise-grade NoSQL (Not Only SQL)
database that enables organizations to turn all
data into valuable and actionable information.
Key features include ACID transactions,
horizontal scaling, real-time indexing, high
availability, disaster recovery, government-
grade security, and more.
Friday, April 4, 14
45. Best Big Data Research
• Search all data for more value. Bring all relevant content back to users
– unstructured and structured, internal and public.
• Real-time updates. Real-time results.When documents are updated or
inserted, they are available for search immediately.
• Able to query all types of data. Structured, semi-structured, and
unstructured content are all supported within the same queries.
• Real-time alerts for fast response. MarkLogic has the highest
performance alerting engine available, capable of running millions of
custom queries on each and every change to the document repository
– no polling required.
• Search you can bank on. Businesses that count on revenue through
paid content search and retrieval trust MarkLogic to deliver.
MarkLogic’s scale-out, real-time platform is more than a
search engine linked to a content repository – it is the most
complete platform for building search-oriented applications.
Friday, April 4, 14
46. Real Time your Hadoop
Get more power out of Hadoop. Hadoop and MarkLogic together can
allow you to tackle problems that would be difficult or impossible to
address by either technology alone.
Save money by leveraging common infrastructure. Using MarkLogic and
Hadoop Distributed File System (HDFS) enables common batch-
processing infrastructure to be used across many different projects and
applications.
Enterprise-class support for Hadoop. Our partnership with Intel provides
a strong, supported platform for building secure, enterprise-class Big Data
Applications with Apache Hadoop.
Seamlessly combine the power of MapReduce with MarkLogic’s real-time,
interactive analysis and indexing on a single, unified platform.
Friday, April 4, 14
48. Some points of what
can you accomplish
with
BIG DATA?
Friday, April 4, 14
49. Dialogue with Consumers
• Today’s consumers are a tough nut to crack.They look around a lot
before they buy. You want to make customers to buy your
products.
• Big Data allows you to profile these increasingly vocal and fickle
little ‘tyrants’ in a far-reaching manner so that you can engage in an
almost one-on-one, real-time conversation with them.This is not
actually a luxury. If you don’t treat them like they want to, they will
leave you in the blink of an eye.
Friday, April 4, 14
50. Re-develop your Products
• Big Data can also help you understand how others perceive your
products so that you can adapt them.
• Analysis of unstructured social media text allows you to uncover
the sentiments of your customers and even segment those in
different geographical locations or among different demographic
groups.
Friday, April 4, 14
51. Perform Risk Analysis
• Success not only depends on how you run your company. Social and
economic factors are crucial for your accomplishments as well.
Predictive analytics, fueled by Big Data allows you to scan and
analyze newspaper reports or social media feeds so that you
permanently keep up to speed on the latest developments in your
industry and its environment.
• Detailed health-tests on your suppliers and customers are another
goodie that comes with Big Data.This will allow you to take action
when one of them is in risk of defaulting.
Friday, April 4, 14
52. Keeping your data safe
• You can map the entire data landscape across your company with
Big Data tools, thus allowing you to analyze the threats that you
face internally.
• You will be able to detect potentially sensitive information that is
not protected in an appropriate manner and make sure it is stored
according to regulatory requirements.
Friday, April 4, 14
55. Big Data is used in
many fields like....
Friday, April 4, 14
56. • Fault Logging and cost predictions- Car makers
place hundreds of sensors on components around the car which
constantly log data on performance and faults.All of this data can be
used to reengineer designs for more efficient products and to predict
what the strain of warranty repairs are likely to be on cost and man
resource.
Car Makers
Friday, April 4, 14
58. WHERE From Factories and from sensors
Data Center(Headquarters)
NEEDS Safety and Quality Analysis
BENEFITS Feedback from Design
TOYOTA
Friday, April 4, 14
59. • B2B supplier profiling- Finance professionals can use big
data to check on the ‘health’ of their suppliers and business
partners.They can monitor a variety of indicators including when
creditors pay their bills and whether there is any change
• Fraud detection-Companies likeVisa are using big data to
create fraud detection models which can flag up potential
fraudsters.
Finance
Friday, April 4, 14
60. WHERE Where ever they buy
Data Center(Headquarters)
NEEDS Detect Fraud, Customer’s Behavior
BENEFITS Personal Recommendation
VISA
Friday, April 4, 14
61. • Simulations-Manufacturers can take real data from their
products on the market and then run simulations based on what
would happen if they changed one particular component or design
aspect.They can then find ways to make the product cheaper, more
reliable or more environmentally friendly.The Formula 1 racing
teams are particularly adept in this area, as are advanced aerospace
companies.
• Expanded product design modeling-Similarly, with
new big-data enabled computer aided design programs, product
designers can substitute components or materials from huge
databases and then access in-depth information on how this affects
the final product, including the ramifications on cost, production
processes, environmental effects, legislative requirements, supply
chain and so on.
General Manufacturing
Friday, April 4, 14
63. WHERE Several Branches
Data Center(GM Headquarters in
Gurgaon )
NEEDS Safety and Quality Analysis.
BENEFITS Awareness and Indication on what to fix.
GM
Friday, April 4, 14
64. • Suspect tracking-By combining CCTV images, facial
recognition software, travel trends and identifiers on travel cards,
police forces can capture criminals by automatically linking people
to their likely destinations on buses and metro systems.This allows
police to catch those that they miss at the scene of the crime and
also to control arrest statistics, meeting targets for arrests in one
London borough, for instance, as needed.
Policing
Friday, April 4, 14
66. WHERE Several Branches
Data Center(CBI Headquarters in Delhi)
NEEDS To identify person’s behavior and actions
BENEFITS
Give awareness for what that person is
going to do next.What is their next plan?
CBI
Friday, April 4, 14
67. Utilities (oil & gas)
• Asset monitoring- As with the machines in manufacturing
plants, the utilities companies use big data to keep track on all of
their assets spread across a country, continent or the globe.This
enables them to fix any broken asset (such as a sewage cleansing
plant, a leaking pipe or a gas pump), perform pre-emptive running
maintenance or isolate areas in which repair actions have been
ineffective.
Friday, April 4, 14
69. WHERE From the Machines in the Manufacturing plants
Data Center(ChevronHeadquarters)
NEEDS
To keep track of what is going on in the
Manufacturing plant. Like broken pipes, leakage
and etc...
BENEFITS
This gives them feedback from designs so
they know how to improve the
construction of the manufacturing plant
because that is their main source of how
they get oil and gas.
CHEVRON
Friday, April 4, 14
70. Retail and Marketing
• Mood mapping-Retailers use feeds from social networks to
build an understanding of how their products and company
reputation is seen among the public.With the constant streams of
opinions from Facebook,Twitter, Google+ and the like, companies
are able to cheaply and quickly gather large samples of customer
opinion.
Friday, April 4, 14
73. WHERE From Social Media Networking Sites
Data Center(Air Jordan Headquarters)
NEEDS Customer’s behavior, helps to find out opinions
and feelings, feedback of their brand.
BENEFITS This gives them feedback on what the
customers are thinking about their
product. Gives feedback from audiences
to improve their product.
Air Jordan
Friday, April 4, 14