Welcome to big data

•Download as PPTX, PDF•

0 likes•2,771 views

Saravanan Subburayal

This presentation provides a good information about basics of Big Data.

Technology Business

Agenda
• What is Big data?
• Some BIG facts
• Objective
• Sources
• 3 V’s of Big data
• 3 + 1 V’s of Big data
• Technologies
• Opportunities
• Major Players
• Questions
• Conclusion

Some BIG facts
• 90% of the data in the world today has been created in the
last two years alone
• IDC Forecasting: The global universe of data will double
every two years, reaching 40,000 exabytes or 40 trillion GB
by 2020
• The Large Hadron Collider near Geneva, Switzerland, will
produce about 15 petabytes of data per year.
• Ancestry.com, the genealogy site, stores around 2.5
petabytes of data.
• The Internet Archive stores around 2 petabytes of data, and
is growing at a rate of 20 terabytes per month.

Some BIG facts – What happens everyday?
• The New York Stock Exchange generates about one
terabyte of new trade data
• Zynga processes 1 Petabyte of content
• 30 billion pieces of content were added to Facebook
• 2 billion videos are watched in Youtube
• 2.5 quintillion bytes of data is created

Some BIG facts – What happens every minute?

Courtesy: http://practicalanalytics.files.wordpress.com

Big data – Objective

Effectively store, manage and analyze all
the data to create meaningful information
out of it

Big data – 3 V’s of Big data

Courtesy: bigdatablog.emc.com

Big data – 3 + 1 V’s of Big data

Courtesy: http://www.datasciencecentral.com/

Big data - Volume

Volumes are in:
• Terabytes
• Exabytes
• Petabytes
• Zetabytes

Courtesy: http://www.datasciencecentral.com/

Big data - Volume

Name

Value

1 GB
1 Terabyte (TB)

1024 GB

1 Petabyte (PB)

1,048,576 GB

1 Exabyte (EB)

1,073,741,824 GB

1 Zeta byte (ZB)

1,099,511,627,776 GB

1 Yottabyte (YB)

Courtesy: http://www.datasciencecentral.com/

1,073,741,824 bytes

1,125,899,906,842,624 GB

Big data - Velocity

• Live Stream
• Real time
• Batch

Courtesy: http://www.datasciencecentral.com/

Big data - Variety

• Structured (Tables)
• Unstructured (Tweets, SMSes)
• Semi-structured (Logfiles, RFID)

Courtesy: http://www.datasciencecentral.com/

Big data - Veracity

• This kind of data is often
overlooked
• It is now considered as
important as 3 V’s of Big Data
• Effort to clean up data is rather
not given importance
• Poor data quality costs the U.S.
economy around $3.1 trillions a
year

Source: McKinsey, Gartner, Twitter, Cisco, EMC, SAS, IBM, MEPTEC, QAS

Big data Technologies
Technologies & Solution providers:
• Storage (MS SqlServer, Apache Hadoop, Mongo DB)
• Processing (MapReduce, Impala)
• Analytics (SAS, R, Business Intelligence)
• Integration (Flume, Sqoop)

Big data - Opportunities
•
•
•
•
•

Storage
Processing
Analytics
Integration
Solution

Viewers also liked

Big data big rewards

Zulkifflee Sofee

Case 3.1 - Big data big rewards

niz73

Week 3 Case 1 : Big Data Big Reward

dyadelm

Case study 8

khaled alsaeh

Big Data Analytics MIS presentation

AASTHA PANDEY

Big data ppt

IDBI Bank Ltd.

Big data ppt

Nasrin Hussain

Viewers also liked (7)

Big data big rewards

Case 3.1 - Big data big rewards

Week 3 Case 1 : Big Data Big Reward

Case study 8

Big Data Analytics MIS presentation

Big data ppt

Similar to Welcome to big data

DataEd Online: Demystifying Big Data

DATAVERSITY

Yes, we face a data deluge and big data seems to be largely about how to deal with it. But 99% of what has been written about big data is focused on selling hardware and services. The truth is that until the concept of big data can be objectively defined, any measurements, claims of success, quantifications, etc. must be viewed skeptically and with suspicion. While both the need for and approaches to these new requirements are faced by virtually every organization, jumping into the fray ill-prepared has (to date) reproduced the same dismal IT project results. The very real, very rapid, very great increases in data of all forms (charts showing data types and volume increases) Challenges faced by virtually all data management programs Means by which big data techniques can compliment existing data management practices Necessary but insufficient pre-requisites to exploiting big data techniques Prototyping nature of practicing big data techniques You can sign up for future Data-Ed webinars here: http://www.datablueprint.com/resource-center/webinar-schedule/

Data-Ed: Demystifying Big Data

Data Blueprint

Séminaire Big Data Alter Way - Elasticsearch - octobre 2014

ALTER WAY

Big Data - Gerami

Mohammad Reza Gerami

Big data for cio 2015

Zohar Elkayam

Briefing Room 20161213 - ep019 - Red Hat - Modern Business Storage

Dez Blanchfield

The Elephant in the Library - Integrating Hadoop

cneudecker

Big Data Analytics Strategy and Roadmap

Srinath Perera

Big Data Analytics

humerashaziya

big data

subhakirthi

Big Data basics-Unit-1.pptx

varun453331

Introduction to Big Data

Big data

Cassandra ppt 1

Big data

BigData.pptx

Big data

Hadoop HDFS.ppt

Gyorgy balogh modern_big_data_technologies_sec_world_2014

LogDrill

Big data 2017 final

Amjid Ali

Similar to Welcome to big data (20)

DataEd Online: Demystifying Big Data

Data-Ed: Demystifying Big Data

Séminaire Big Data Alter Way - Elasticsearch - octobre 2014

Big Data - Gerami

Big data for cio 2015

Briefing Room 20161213 - ep019 - Red Hat - Modern Business Storage

The Elephant in the Library - Integrating Hadoop

Big Data Analytics Strategy and Roadmap

Big Data Analytics

big data

Big Data basics-Unit-1.pptx

Introduction to Big Data

Big data

Cassandra ppt 1

Big data

BigData.pptx

Big data

Hadoop HDFS.ppt

Gyorgy balogh modern_big_data_technologies_sec_world_2014

Big data 2017 final

Recently uploaded

AWS Community Day CPH - Three problems of Terraform

Andrey Devyatkin

Effective data discovery is crucial for maintaining compliance and mitigating risks in today's rapidly evolving privacy landscape. However, traditional manual approaches often struggle to keep pace with the growing volume and complexity of data. Join us for an insightful webinar where industry leaders from TrustArc and Privya will share their expertise on leveraging AI-powered solutions to revolutionize data discovery. You'll learn how to: - Effortlessly maintain a comprehensive, up-to-date data inventory - Harness code scanning insights to gain complete visibility into data flows leveraging the advantages of code scanning over DB scanning - Simplify compliance by leveraging Privya's integration with TrustArc - Implement proven strategies to mitigate third-party risks Our panel of experts will discuss real-world case studies and share practical strategies for overcoming common data discovery challenges. They'll also explore the latest trends and innovations in AI-driven data management, and how these technologies can help organizations stay ahead of the curve in an ever-changing privacy landscape.

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery

TrustArc

A Domino Admins Adventures (Engage 2024)

Gabriella Davis

Building Digital Trust in a Digital Economy Veronica Tan, Director - Cyber Security Agency of Singapore Apidays Singapore 2024: Connecting Customers, Business and Technology (April 17 & 18, 2024) ------ Check out our conferences at https://www.apidays.global/ Do you want to sponsor or talk at one of our conferences? https://apidays.typeform.com/to/ILJeAaV8 Learn more on APIscene, the global media made by the community for the community: https://www.apiscene.io Explore the API ecosystem with the API Landscape: https://apilandscape.apiscene.io/

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...

apidays

💉💊+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHABI}}+971581248768 +971581248768 Mtp-Kit (500MG) Prices » Dubai [(+971581248768**)] Abortion Pills For Sale In Dubai, UAE, Mifepristone and Misoprostol Tablets Available In Dubai, UAE CONTACT DR.Maya Whatsapp +971581248768 We Have Abortion Pills / Cytotec Tablets /Mifegest Kit Available in Dubai, Sharjah, Abudhabi, Ajman, Alain, Fujairah, Ras Al Khaimah, Umm Al Quwain, UAE, Buy cytotec in Dubai +971581248768''''Abortion Pills near me DUBAI | ABU DHABI|UAE. Price of Misoprostol, Cytotec” +971581248768' Dr.DEEM ''BUY ABORTION PILLS MIFEGEST KIT, MISOPROTONE, CYTOTEC PILLS IN DUBAI, ABU DHABI,UAE'' Contact me now via What's App…… abortion Pills Cytotec also available Oman Qatar Doha Saudi Arabia Bahrain Above all, Cytotec Abortion Pills are Available In Dubai / UAE, you will be very happy to do abortion in Dubai we are providing cytotec 200mg abortion pill in Dubai, UAE. Medication abortion offers an alternative to Surgical Abortion for women in the early weeks of pregnancy. We only offer abortion pills from 1 week-6 Months. We then advise you to use surgery if its beyond 6 months. Our Abu Dhabi, Ajman, Al Ain, Dubai, Fujairah, Ras Al Khaimah (RAK), Sharjah, Umm Al Quwain (UAQ) United Arab Emirates Abortion Clinic provides the safest and most advanced techniques for providing non-surgical, medical and surgical abortion methods for early through late second trimester, including the Abortion By Pill Procedure (RU 486, Mifeprex, Mifepristone, early options French Abortion Pill), Tamoxifen, Methotrexate and Cytotec (Misoprostol). The Abu Dhabi, United Arab Emirates Abortion Clinic performs Same Day Abortion Procedure using medications that are taken on the first day of the office visit and will cause the abortion to occur generally within 4 to 6 hours (as early as 30 minutes) for patients who are 3 to 12 weeks pregnant. When Mifepristone and Misoprostol are used, 50% of patients complete in 4 to 6 hours; 75% to 80% in 12 hours; and 90% in 24 hours. We use a regimen that allows for completion without the need for surgery 99% of the time. All advanced second trimester and late term pregnancies at our Tampa clinic (17 to 24 weeks or greater) can be completed within 24 hours or less 99% of the time without the need surgery. The procedure is completed with minimal to no complications. Our Women's Health Center located in Abu Dhabi, United Arab Emirates, uses the latest medications for medical abortions (RU-486, Mifeprex, Mifegyne, Mifepristone, early options French abortion pill), Methotrexate and Cytotec (Misoprostol). The safety standards of our Abu Dhabi, United Arab Emirates Abortion Doctors remain unparalleled. They consistently maintain the lowest complication rates throughout the nation. Our Physicians and staff are always available to answer questions and care for women in one of the most difficult times in their lives. The decision to have an abortion at the Abortion Cl

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...

?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@

Top 10 Most Downloaded Games on Play Store in 2024

SynarionITSolutions

Created by Mozilla Research in 2012 and now part of Linux Foundation Europe, the Servo project is an experimental rendering engine written in Rust. It combines memory safety and concurrency to create an independent, modular, and embeddable rendering engine that adheres to web standards. Stewardship of Servo moved from Mozilla Research to the Linux Foundation in 2020, where its mission remains unchanged. After some slow years, in 2023 there has been renewed activity on the project, with a roadmap now focused on improving the engine’s CSS 2 conformance, exploring Android support, and making Servo a practical embeddable rendering engine. In this presentation, Rakhi Sharma reviews the status of the project, our recent developments in 2023, our collaboration with Tauri to make Servo an easy-to-use embeddable rendering engine, and our plans for the future to make Servo an alternative web rendering engine for the embedded devices industry. (c) Embedded Open Source Summit 2024 April 16-18, 2024 Seattle, Washington (US) https://events.linuxfoundation.org/embedded-open-source-summit/ https://ossna2024.sched.com/event/1aBNF/a-year-of-servo-reboot-where-are-we-now-rakhi-sharma-igalia

A Year of the Servo Reboot: Where Are We Now?

Igalia

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke

Product Anonymous

Join our latest Connector Corner webinar to discover how UiPath Integration Service revolutionizes API-centric automation in a 'Quote to Cash' process—and how that automation empowers businesses to accelerate revenue generation. A comprehensive demo will explore connecting systems, GenAI, and people, through powerful pre-built connectors designed to speed process cycle times. Speakers: James Dickson, Senior Software Engineer Charlie Greenberg, Host, Product Marketing Manager

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...

DianaGray10

Tata AIG General Insurance Company - Insurer Innovation Award 2024

The Digital Insurer

A Principled Technologies deployment guide Conclusion Deploying VMware Cloud Foundation 5.1 on next gen Dell PowerEdge servers brings together critical virtualization capabilities and high-performing hardware infrastructure. Relying on our hands-on experience, this deployment guide offers a comprehensive roadmap that can guide your organization through the seamless integration of advanced VMware cloud solutions with the performance and reliability of Dell PowerEdge servers. In addition to the deployment efficiency, the Cloud Foundation 5.1 and PowerEdge solution delivered strong performance while running a MySQL database workload. By leveraging VMware Cloud Foundation 5.1 and PowerEdge servers, you could help your organization embrace cloud computing with confidence, potentially unlocking a new level of agility, scalability, and efficiency in your data center operations.

Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...

Principled Technologies

Boost Fertility New Invention Ups Success Rates.pdf

sudhanshuwaghmare1

Increase engagement and revenue with Muvi Live Paywall! In this presentation, we will explore the five key benefits of using Muvi Live Paywall to monetize your live streams. You'll learn how Muvi Live Paywall can help you: Monetize your live content easily: Set up pay-per-view access to your live streams and start generating revenue from your content. Increase audience engagement: Provide exclusive, premium content behind the paywall to keep your viewers engaged. Gain valuable viewer insights: Track viewer data and analytics to better understand your audience and tailor your content accordingly. Reduce content piracy: Muvi Live Paywall's security features help protect your content from unauthorized distribution. Streamline your workflow: The all-in-one platform simplifies the process of managing and monetizing your live streams. With Muvi Live Paywall, you can take control of your live stream monetization and create a sustainable business model for your content. Learn more about Muvi Live Paywall and start generating revenue from your live streams today!

Top 5 Benefits OF Using Muvi Live Paywall For Live Streams

Roshan Dwivedi

Automating Google Workspace (GWS) & more with Apps Script

wesley chun

Discord is a free app offering voice, video, and text chat functionalities, primarily catering to the gaming community. It serves as a hub for users to create and join servers tailored to their interests. Discord’s ecosystem comprises servers, each functioning as a distinct online community with its own channels dedicated to specific topics or activities. Users can engage in text-based discussions, voice calls, or video chats within these channels. Understanding Discord Servers Discord servers are virtual spaces where users congregate to interact, share content, and build communities. Servers may revolve around gaming, hobbies, interests, or fandoms, providing a platform for like-minded individuals to connect. Communication Features Discord offers a range of communication tools, including text channels for messaging, voice channels for real-time audio conversations, and video channels for face-to-face interactions. These features facilitate seamless communication and collaboration. What Does NSFW Mean? The acronym NSFW stands for “Not Safe For Work,” indicating content that may be inappropriate for professional or public settings. NSFW Content NSFW content encompasses material that is sexually explicit, violent, or otherwise graphic in nature. It often includes nudity, profanity, or depictions of sensitive topics.

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf

UK Journal

Following the popularity of "Cloud Revolution: Exploring the New Wave of Serverless Spatial Data," we're thrilled to announce this much-anticipated encore webinar. In this sequel, we'll dive deeper into the Cloud-Native realm by uncovering practical applications and FME support for these new formats, including COGs, COPC, FlatGeoBuf, GeoParquet, STAC, and ZARR. Building on the foundation laid by industry leaders Michelle Roby of Radiant Earth and Chris Holmes of Planet in the first webinar, this second part offers an in-depth look at the real-world application and behind-the-scenes dynamics of these cutting-edge formats. We will spotlight specific use-cases and workflows, showcasing their efficiency and relevance in practical scenarios. Discover the vast possibilities each format holds, highlighted through detailed discussions and demonstrations. Our expert speakers will dissect the key aspects and provide critical takeaways for effective use, ensuring attendees leave with a thorough understanding of how to apply these formats in their own projects. Elevate your understanding of how FME supports these cutting-edge technologies, enhancing your ability to manage, share, and analyze spatial data. Whether you're building on knowledge from our initial session or are new to the serverless spatial data landscape, this webinar is your gateway to mastering cloud-native formats in your workflows.

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME

Safe Software

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024

The Digital Insurer

MINDCTI Revenue Release Quarter One 2024

MIND CTI

With more memory available, system performance of three Dell devices increased, which can translate to a better user experience Conclusion When your system has plenty of RAM to meet your needs, you can efficiently access the applications and data you need to finish projects and to-do lists without sacrificing time and focus. Our test results show that with more memory available, three Dell PCs delivered better performance and took less time to complete the Procyon Office Productivity benchmark. These advantages translate to users being able to complete workflows more quickly and multitask more easily. Whether you need the mobility of the Latitude 5440, the creative capabilities of the Precision 3470, or the high performance of the OptiPlex Tower Plus 7010, configuring your system with more RAM can help keep processes running smoothly, enabling you to do more without compromising performance.

Boost PC performance: How more available memory can improve productivity

Principled Technologies

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving

Edi Saputra

Recently uploaded (20)

AWS Community Day CPH - Three problems of Terraform

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery

A Domino Admins Adventures (Engage 2024)

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...

Top 10 Most Downloaded Games on Play Store in 2024

A Year of the Servo Reboot: Where Are We Now?

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...

Tata AIG General Insurance Company - Insurer Innovation Award 2024

Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...

Boost Fertility New Invention Ups Success Rates.pdf

Top 5 Benefits OF Using Muvi Live Paywall For Live Streams

Automating Google Workspace (GWS) & more with Apps Script

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024

MINDCTI Revenue Release Quarter One 2024

Boost PC performance: How more available memory can improve productivity

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving

Welcome to big data

2. Agenda • What is Big data? • Some BIG facts • Objective • Sources • 3 V’s of Big data • 3 + 1 V’s of Big data • Technologies • Opportunities • Major Players • Questions • Conclusion

3. What is Big data? Data Big Data

4. What is Big data? Data Big Data

5. Some BIG facts • 90% of the data in the world today has been created in the last two years alone • IDC Forecasting: The global universe of data will double every two years, reaching 40,000 exabytes or 40 trillion GB by 2020 • The Large Hadron Collider near Geneva, Switzerland, will produce about 15 petabytes of data per year. • Ancestry.com, the genealogy site, stores around 2.5 petabytes of data. • The Internet Archive stores around 2 petabytes of data, and is growing at a rate of 20 terabytes per month.

6. Some BIG facts – What happens everyday? • The New York Stock Exchange generates about one terabyte of new trade data • Zynga processes 1 Petabyte of content • 30 billion pieces of content were added to Facebook • 2 billion videos are watched in Youtube • 2.5 quintillion bytes of data is created

7. Some BIG facts – What happens every minute? Courtesy: http://practicalanalytics.files.wordpress.com

8. Big data – Objective Effectively store, manage and analyze all the data to create meaningful information out of it

9. Big data – Sources

10. Big data – 3 V’s of Big data Courtesy: bigdatablog.emc.com

11. Big data – 3 + 1 V’s of Big data Courtesy: http://www.datasciencecentral.com/

12. Big data - Volume Volumes are in: • Terabytes • Exabytes • Petabytes • Zetabytes Courtesy: http://www.datasciencecentral.com/

13. Big data - Volume Name Value 1 GB 1 Terabyte (TB) 1024 GB 1 Petabyte (PB) 1,048,576 GB 1 Exabyte (EB) 1,073,741,824 GB 1 Zeta byte (ZB) 1,099,511,627,776 GB 1 Yottabyte (YB) Courtesy: http://www.datasciencecentral.com/ 1,073,741,824 bytes 1,125,899,906,842,624 GB

14. Big data - Velocity • Live Stream • Real time • Batch Courtesy: http://www.datasciencecentral.com/

15. Big data - Variety • Structured (Tables) • Unstructured (Tweets, SMSes) • Semi-structured (Logfiles, RFID) Courtesy: http://www.datasciencecentral.com/

16. Big data - Veracity • This kind of data is often overlooked • It is now considered as important as 3 V’s of Big Data • Effort to clean up data is rather not given importance • Poor data quality costs the U.S. economy around $3.1 trillions a year Source: McKinsey, Gartner, Twitter, Cisco, EMC, SAS, IBM, MEPTEC, QAS

17. Big data Technologies Technologies & Solution providers: • Storage (MS SqlServer, Apache Hadoop, Mongo DB) • Processing (MapReduce, Impala) • Analytics (SAS, R, Business Intelligence) • Integration (Flume, Sqoop)

18. Big data - Opportunities • • • • • Storage Processing Analytics Integration Solution

19. Big data – Major Players

20. Big data – Questions?

21. Big data – Thank you !!!

Editor's Notes

Data Veracity, uncertain or imprecise data, is often overlooked yet may be as important as the 3 V's of Big Data: Volume, Velocity and Variety. Traditional data warehouse / business intelligence (DW/BI) architecture assumes certain and precise data pursuant to unreasonably large amounts of human capital spent on data preparation, ETL/ELT and master data management. Yet the big data revolution forces us to rethink the traditional DW/BI architecture to accept massive amounts of both structured and unstructured data at great velocity. By definition, unstructured data contains a significant amount of uncertain and imprecise data. For example, social media data is inherently uncertain.Considering variety and velocity of big data, an organization can no longer commit time and resources on traditional ETL/ELT and data preparation to clean up the data to make it certain and precise for analysis. While there are tools to help automate data preparation and cleansing, they are still in the pre-industrial age. As a result, organizations must now analyze both structured and unstructured data that is uncertain and imprecise. The level of uncertainty and imprecision varies on a case by case basis yet must be factored. It may be prudent to assign a Data Veracity score and ranking for specific data sets to avoid making decisions based on analysis of uncertain and imprecise data.
Data Veracity, uncertain or imprecise data, is often overlooked yet may be as important as the 3 V's of Big Data: Volume, Velocity and Variety. Traditional data warehouse / business intelligence (DW/BI) architecture assumes certain and precise data pursuant to unreasonably large amounts of human capital spent on data preparation, ETL/ELT and master data management. Yet the big data revolution forces us to rethink the traditional DW/BI architecture to accept massive amounts of both structured and unstructured data at great velocity. By definition, unstructured data contains a significant amount of uncertain and imprecise data. For example, social media data is inherently uncertain.Considering variety and velocity of big data, an organization can no longer commit time and resources on traditional ETL/ELT and data preparation to clean up the data to make it certain and precise for analysis. While there are tools to help automate data preparation and cleansing, they are still in the pre-industrial age. As a result, organizations must now analyze both structured and unstructured data that is uncertain and imprecise. The level of uncertainty and imprecision varies on a case by case basis yet must be factored. It may be prudent to assign a Data Veracity score and ranking for specific data sets to avoid making decisions based on analysis of uncertain and imprecise data.
Data Veracity, uncertain or imprecise data, is often overlooked yet may be as important as the 3 V's of Big Data: Volume, Velocity and Variety. Traditional data warehouse / business intelligence (DW/BI) architecture assumes certain and precise data pursuant to unreasonably large amounts of human capital spent on data preparation, ETL/ELT and master data management. Yet the big data revolution forces us to rethink the traditional DW/BI architecture to accept massive amounts of both structured and unstructured data at great velocity. By definition, unstructured data contains a significant amount of uncertain and imprecise data. For example, social media data is inherently uncertain.Considering variety and velocity of big data, an organization can no longer commit time and resources on traditional ETL/ELT and data preparation to clean up the data to make it certain and precise for analysis. While there are tools to help automate data preparation and cleansing, they are still in the pre-industrial age. As a result, organizations must now analyze both structured and unstructured data that is uncertain and imprecise. The level of uncertainty and imprecision varies on a case by case basis yet must be factored. It may be prudent to assign a Data Veracity score and ranking for specific data sets to avoid making decisions based on analysis of uncertain and imprecise data.
Data Veracity, uncertain or imprecise data, is often overlooked yet may be as important as the 3 V's of Big Data: Volume, Velocity and Variety. Traditional data warehouse / business intelligence (DW/BI) architecture assumes certain and precise data pursuant to unreasonably large amounts of human capital spent on data preparation, ETL/ELT and master data management. Yet the big data revolution forces us to rethink the traditional DW/BI architecture to accept massive amounts of both structured and unstructured data at great velocity. By definition, unstructured data contains a significant amount of uncertain and imprecise data. For example, social media data is inherently uncertain.Considering variety and velocity of big data, an organization can no longer commit time and resources on traditional ETL/ELT and data preparation to clean up the data to make it certain and precise for analysis. While there are tools to help automate data preparation and cleansing, they are still in the pre-industrial age. As a result, organizations must now analyze both structured and unstructured data that is uncertain and imprecise. The level of uncertainty and imprecision varies on a case by case basis yet must be factored. It may be prudent to assign a Data Veracity score and ranking for specific data sets to avoid making decisions based on analysis of uncertain and imprecise data.
Data Veracity, uncertain or imprecise data, is often overlooked yet may be as important as the 3 V's of Big Data: Volume, Velocity and Variety. Traditional data warehouse / business intelligence (DW/BI) architecture assumes certain and precise data pursuant to unreasonably large amounts of human capital spent on data preparation, ETL/ELT and master data management. Yet the big data revolution forces us to rethink the traditional DW/BI architecture to accept massive amounts of both structured and unstructured data at great velocity. By definition, unstructured data contains a significant amount of uncertain and imprecise data. For example, social media data is inherently uncertain.Considering variety and velocity of big data, an organization can no longer commit time and resources on traditional ETL/ELT and data preparation to clean up the data to make it certain and precise for analysis. While there are tools to help automate data preparation and cleansing, they are still in the pre-industrial age. As a result, organizations must now analyze both structured and unstructured data that is uncertain and imprecise. The level of uncertainty and imprecision varies on a case by case basis yet must be factored. It may be prudent to assign a Data Veracity score and ranking for specific data sets to avoid making decisions based on analysis of uncertain and imprecise data.
Data Veracity, uncertain or imprecise data, is often overlooked yet may be as important as the 3 V's of Big Data: Volume, Velocity and Variety. Traditional data warehouse / business intelligence (DW/BI) architecture assumes certain and precise data pursuant to unreasonably large amounts of human capital spent on data preparation, ETL/ELT and master data management. Yet the big data revolution forces us to rethink the traditional DW/BI architecture to accept massive amounts of both structured and unstructured data at great velocity. By definition, unstructured data contains a significant amount of uncertain and imprecise data. For example, social media data is inherently uncertain.Considering variety and velocity of big data, an organization can no longer commit time and resources on traditional ETL/ELT and data preparation to clean up the data to make it certain and precise for analysis. While there are tools to help automate data preparation and cleansing, they are still in the pre-industrial age. As a result, organizations must now analyze both structured and unstructured data that is uncertain and imprecise. The level of uncertainty and imprecision varies on a case by case basis yet must be factored. It may be prudent to assign a Data Veracity score and ranking for specific data sets to avoid making decisions based on analysis of uncertain and imprecise data.

Welcome to big data

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (7)

Similar to Welcome to big data

Similar to Welcome to big data (20)

More from Saravanan Subburayal

More from Saravanan Subburayal (6)

Recently uploaded

Recently uploaded (20)

Welcome to big data

Editor's Notes