Hadoop Archive and Tiering

•Télécharger en tant que PPTX, PDF•

2 j'aime•831 vues

To successfully archive and tier data in Hadoop, you must understand data heat, age, size and usage. FactorData HDFSplus can provide this visibility and enable automation and simplicity. The result is reduce infrastructure, better performance, and better planning of existing HDFS Hadoop clusters.

Technologie

2
Reasons For Storage Tiering with Hadoop:
• Single tier lends to a large imbalance of compute and storage resources
• More applications create varying workloads
• Large percent of data is cold in most cases
• More recently ingested data can be better balanced
• Fewer nodes per GB with archive nodes
• Lower infrastructure costs
Existing Tier Node
Medium Compute
Medium Capacity
Cold Tier Node
Low Compute
High Density Capacity
4x Less Per GB
Name
Nodes
Accessed Data
Archive Node Example

3
• Over 65% less hardware
• 60% fewer nodes (software licensing)
• Significant performance improvement
• Immediate ROI for cloud and private infrastructures
Archive Data Nodes
80%
Disk Data Nodes
20%
Disk Data Nodes
100%
Single Tier
HDFS Storage
“The price per GB of the ARCHIVE tier is 4x less”
-eBay Hadoop Engineering Blog
4x Fewer Nodes
Capacity 10PBCapacity 10PB

4
Access frequency
of data is the
most important
metric for
effective tiering
Age is easiest to
determine.
CAUTION: Some
data is long-term
active so this
cannot be the
only criteria.
Zero and small
files should be
treated differently
in tiering Hadoop.
Large cold files
should have
priority for archive
Knowing how
long data is
accessed once
ingested can
provide better
capacity planning
for your tiers.

5
Installed on a server
or VM
outside your existing
Hadoop cluster without
inserting any
proprietary technology
on the cluster or in the
data path.
Report data usage
(heat), small files, user
activity, replication, and
HDFS tier utilization.
Customize rules and
queries to properly utilize
infrastructure and plan
better for future scale.
Automatically archive,
promote, or change the
replication factor of data
based on usage patterns
and user defined rules.
Tier Hadoop HDFS By Heat, Age, Size & Activity
In Three Easy Steps
01/INSTALL WITHOUT
CHANGES TO CLUSTER
02/VISUALIZE &
REPORT
03/AUTOMATE
OPTIMIZATION

6
HDFSplus
Apply storage
policy based on
custom query
Files are optimized
during normal
balancing window
Query list based
on size, heat,
activity, and age
1 2 3
• Move all files 120 days old and not
accessed for 90 days to ARCHIVE…..
• FactorData creates a data list based on
query
• Limit automated run by max files or capacity
• FactorData tracks completion of each run
• Data can be excluded from run according to
path, size and application
Custom Query Example: Automated Tiering:

7
Completely out of the data path
FactorData HDFSplus sits outside the Hadoop cluster and collects only
metadata information from the Hadoop cluster
No software to install on the existing Hadoop cluster
Because HDFSplus leverages only existing Hadoop APIs and features,
there is no software to install on the cluster.
Provides a highly scalable solution in a small foot-print
HDFS visibility and automation for thousands of Hadoop nodes on a single
node, VM or server
HDFSplus
Namenodes
Communicates with
Existing Hadoop API
VM or Physical Machine
32GB RAM
4 CPU or vCPU
500GB Free Disk

8
Simplify and Automate Archive and Tiering in Hadoop Today
• Move less accessed data to storage dense nodes for better utilization
• Lower software licensing
• Free resources on existing namenodes and datanodes
How can we get more
performance out of our
existing Hadoop cluster?
How can we move data
not accessed for 90 days
to archive nodes?
How can we better plan
for future scale with real
Hadoop storage metrics?
Result: Better Performance, Lower Hardware Costs, Lower Software Costs
Plus: Get Necessary Storage Visibility To Answer These Questions & More
with FactorData HDFSplus

Contenu connexe

Dernier

Presentation on how to chat with PDF using ChatGPT code interpreter

naman860154

Discord is a free app offering voice, video, and text chat functionalities, primarily catering to the gaming community. It serves as a hub for users to create and join servers tailored to their interests. Discord’s ecosystem comprises servers, each functioning as a distinct online community with its own channels dedicated to specific topics or activities. Users can engage in text-based discussions, voice calls, or video chats within these channels. Understanding Discord Servers Discord servers are virtual spaces where users congregate to interact, share content, and build communities. Servers may revolve around gaming, hobbies, interests, or fandoms, providing a platform for like-minded individuals to connect. Communication Features Discord offers a range of communication tools, including text channels for messaging, voice channels for real-time audio conversations, and video channels for face-to-face interactions. These features facilitate seamless communication and collaboration. What Does NSFW Mean? The acronym NSFW stands for “Not Safe For Work,” indicating content that may be inappropriate for professional or public settings. NSFW Content NSFW content encompasses material that is sexually explicit, violent, or otherwise graphic in nature. It often includes nudity, profanity, or depictions of sensitive topics.

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf

UK Journal

🐬 The future of MySQL is Postgres 🐘

RTylerCroy

The 7 Things I Know About Cyber Security After 25 Years | April 2024

Rafal Los

Explore 'The Codex of Business: Writing Software for Real-World Solutions,' a compelling SlideShare presentation that delves into digital transformation in healthcare. Discover through a detailed case study how Agile methodologies empower healthcare providers to develop, iterate, and refine digital solutions that address real-world challenges. Learn how strategic planning, user feedback, and continuous improvement drive success in deploying technologies that enhance patient care and operational efficiency. Ideal for healthcare professionals, IT specialists, and digital transformation advocates seeking actionable insights and practical examples of technology making a real difference.

The Codex of Business Writing Software for Real-World Solutions 2.pptx

Malak Abu Hammad

Choosing the right accounts payable services provider is a strategic decision that can significantly impact your business's financial performance and operational efficiency. By considering factors such as expertise, range of services, technology infrastructure, scalability, cost, and reputation, businesses can make informed decisions and select a provider that aligns with their unique needs and objectives. Partnering with the right provider can streamline accounts payable processes, drive cost savings, and position your business for long-term success. https://katprotech.com/accounts-payable-and-purchase-order-automation/

Factors to Consider When Choosing Accounts Payable Services Providers.pptx

Katpro Technologies

Axa Assurance Maroc - Insurer Innovation Award 2024

The Digital Insurer

MySQL Webinar, presented on the 25th of April, 2024. Summary: MySQL solutions enable the deployment of diverse Database Architectures tailored to specific needs, including High Availability, Disaster Recovery, and Read Scale-Out. With MySQL Shell's AdminAPI, administrators can seamlessly set up, manage, and monitor these solutions, ensuring efficiency and ease of use in their administration. MySQL Router, on the other hand, provides transparent routing from the application traffic to the backend servers in the architectures, requiring minimal configuration. Completely built in-house and supported by Oracle, these solutions have been adopted by enterprises of all sizes for their business-critical applications. In this presentation, we'll delve into various database architecture solutions to help you choose the right one based on your business requirements. Focusing on technical details and the latest features to maximize the potential of these solutions.

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...

Miguel Araújo

Advantages of Hiring UIUX Design Service Providers for Your Business

Pixlogix Infotech

In this session, we will delve into strategic approaches for optimizing knowledge management within Microsoft 365, amidst the evolving landscape of Copilot. From leveraging automatic metadata classification and permission governance with SharePoint Premium, to unlocking Viva Engage for the cultivation of knowledge and communities, you will gain actionable insights to bolster your organization's knowledge-sharing initiatives. In this session, we will also explore how to facilitate solutions to enable your employees to find answers and expertise within Microsoft 365. You will leave equipped with practical techniques and a deeper understanding of how there is more to effective knowledge management than just enabling Copilot, but building actual solutions to prepare the knowledge that Copilot and your employees can use.

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...

Drew Madelung

CNv6 Instructor Chapter 6 Quality of Service

giselly40

2024: Domino Containers - The Next Step. News from the Domino Container commu...

Martijn de Jong

Scaling API-first – The story of a global engineering organization

Radu Cotescu

With more memory available, system performance of three Dell devices increased, which can translate to a better user experience Conclusion When your system has plenty of RAM to meet your needs, you can efficiently access the applications and data you need to finish projects and to-do lists without sacrificing time and focus. Our test results show that with more memory available, three Dell PCs delivered better performance and took less time to complete the Procyon Office Productivity benchmark. These advantages translate to users being able to complete workflows more quickly and multitask more easily. Whether you need the mobility of the Latitude 5440, the creative capabilities of the Precision 3470, or the high performance of the OptiPlex Tower Plus 7010, configuring your system with more RAM can help keep processes running smoothly, enabling you to do more without compromising performance.

Boost PC performance: How more available memory can improve productivity

Principled Technologies

As privacy and data protection regulations evolve rapidly, organizations operating in multiple jurisdictions face mounting challenges to ensure compliance and safeguard customer data. With state-specific privacy laws coming up in multiple states this year, it is essential to understand what their unique data protection regulations will require clearly. How will data privacy evolve in the US in 2024? How to stay compliant? Our panellists will guide you through the intricacies of these states' specific data privacy laws, clarifying complex legal frameworks and compliance requirements. This webinar will review: - The essential aspects of each state's privacy landscape and the latest updates - Common compliance challenges faced by organizations operating in multiple states and best practices to achieve regulatory adherence - Valuable insights into potential changes to existing regulations and prepare your organization for the evolving landscape

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments

TrustArc

GenCyber Cyber Security Day Presentation

Michael W. Hawkins

A Domino Admins Adventures (Engage 2024)

Gabriella Davis

BooK Now Call us at +918448380779 to hire a gorgeous and seductive call girl for sex. Take a Delhi Escort Service. The help of our escort agency is mostly meant for men who want sexual Indian Escorts In Delhi NCR. It should be noted that any impersonator will get 100 attention from our Young Girls Escorts in Delhi. They will assume the position of reliable allies. VIP Call Girl With Original Photos Book Tonight +918448380779 Our Cheap Price 1 Hour not available 2 Hours 5000 Full Night 8000 TAG: Call Girls in Delhi, Noida, Gurgaon, Ghaziabad, Connaught Place, Greater Kailash Delhi, Lajpat Nagar Delhi, Mayur Vihar Delhi, Chanakyapuri Delhi, New Friends Colony Delhi, Majnu Ka Tilla, Karol Bagh, Malviya Nagar, Saket, Khan Market, Noida Sector 18, Noida Sector 76, Noida Sector 51, Gurgaon Mg Road, Iffco Chowk Gurgaon, Rajiv Chowk Gurgaon All Delhi Ncr Free Home Deliver

08448380779 Call Girls In Civil Lines Women Seeking Men

Delhi Call girls

The presentation explores the development and application of artificial intelligence (AI) from its inception to its current status in the modern world. The term "artificial intelligence" was first coined by John McCarthy in 1956 to describe efforts to develop computer programs capable of performing tasks that typically require human intelligence. This concept was first introduced at a conference held at Dartmouth College, where programs demonstrated capabilities such as playing chess, proving theorems, and interpreting texts. In the early stages, Alan Turing contributed to the field by defining intelligence as the ability of a being to respond to certain questions intelligently, proposing what is now known as the Turing Test to evaluate the presence of intelligent behavior in machines. As the decades progressed, AI evolved significantly. The 1980s focused on machine learning, teaching computers to learn from data, leading to the development of models that could improve their performance based on their experiences. The 1990s and 2000s saw further advances in algorithms and computational power, which allowed for more sophisticated data analysis techniques, including data mining. By the 2010s, the proliferation of big data and the refinement of deep learning techniques enabled AI to become mainstream. Notable milestones included the success of Google's AlphaGo and advancements in autonomous vehicles by companies like Tesla and Waymo. A major theme of the presentation is the application of generative AI, which has been used for tasks such as natural language text generation, translation, and question answering. Generative AI uses large datasets to train models that can then produce new, coherent pieces of text or other media. The presentation also discusses the ethical implications and the need for regulation in AI, highlighting issues such as privacy, bias, and the potential for misuse. These concerns have prompted calls for comprehensive regulations to ensure the safe and equitable use of AI technologies. Artificial intelligence has also played a significant role in healthcare, particularly highlighted during the COVID-19 pandemic, where it was used in drug discovery, vaccine development, and analyzing the spread of the virus. The capabilities of AI in healthcare are vast, ranging from medical diagnostics to personalized medicine, demonstrating the technology's potential to revolutionize fields beyond just technical or consumer applications. In conclusion, AI continues to be a rapidly evolving field with significant implications for various aspects of society. The development from theoretical concepts to real-world applications illustrates both the potential benefits and the challenges that come with integrating advanced technologies into everyday life. The ongoing discussion about AI ethics and regulation underscores the importance of managing these technologies responsibly to maximize their their benefits while minimizing potential harms.

Artificial Intelligence: Facts and Myths

Joaquim Jorge

The Raspberry Pi 5 was announced on October 2023. This new version of the popular embedded device comes with a new iteration of Broadcom’s VideoCore GPU platform, and was released with a fully open source driver stack, developed by Igalia. The presentation will discuss some of the major changes required to support this new Video Core iteration, the challenges we faced in the process and the solutions we provided in order to deliver conformant OpenGL ES and Vulkan drivers. The talk will also cover the next steps for the open source Raspberry Pi 5 graphics stack. (c) Embedded Open Source Summit 2024 April 16-18, 2024 Seattle, Washington (US) https://events.linuxfoundation.org/embedded-open-source-summit/ https://eoss24.sched.com/event/1aBEx

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...

Igalia

Dernier (20)

Presentation on how to chat with PDF using ChatGPT code interpreter

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf

🐬 The future of MySQL is Postgres 🐘

The 7 Things I Know About Cyber Security After 25 Years | April 2024

The Codex of Business Writing Software for Real-World Solutions 2.pptx

Factors to Consider When Choosing Accounts Payable Services Providers.pptx

Axa Assurance Maroc - Insurer Innovation Award 2024

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...

Advantages of Hiring UIUX Design Service Providers for Your Business

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...

CNv6 Instructor Chapter 6 Quality of Service

2024: Domino Containers - The Next Step. News from the Domino Container commu...

Scaling API-first – The story of a global engineering organization

Boost PC performance: How more available memory can improve productivity

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments

GenCyber Cyber Security Day Presentation

A Domino Admins Adventures (Engage 2024)

08448380779 Call Girls In Civil Lines Women Seeking Men

Artificial Intelligence: Facts and Myths

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...

En vedette

Skeleton Culture Code

Skeleton Technologies

PEPSICO Presentation to CAGNY Conference Feb 2024

Neil Kimberley

Content Methodology: A Best Practices Report (Webinar)

contently

How to Prepare For a Successful Job Search for 2024

Albert Qian

A report by thenetworkone and Kurio. The contributing experts and agencies are (in an alphabetical order): Sylwia Rytel, Social Media Supervisor, 180heartbeats + JUNG v MATT (PL), Sharlene Jenner, Vice President - Director of Engagement Strategy, Abelson Taylor (USA), Alex Casanovas, Digital Director, Atrevia (ES), Dora Beilin, Senior Social Strategist, Barrett Hoffher (USA), Min Seo, Campaign Director, Brand New Agency (KR), Deshé M. Gully, Associate Strategist, Day One Agency (USA), Francesca Trevisan, Strategist, Different (IT), Trevor Crossman, CX and Digital Transformation Director; Olivia Hussey, Strategic Planner; Simi Srinarula, Social Media Manager, The Hallway (AUS), James Hebbert, Managing Director, Hylink (CN / UK), Mundy Álvarez, Planning Director; Pedro Rojas, Social Media Manager; Pancho González, CCO, Inbrax (CH), Oana Oprea, Head of Digital Planning, Jam Session Agency (RO), Amy Bottrill, Social Account Director, Launch (UK), Gaby Arriaga, Founder, Leonardo1452 (MX), Shantesh S Row, Creative Director, Liwa (UAE), Rajesh Mehta, Chief Strategy Officer; Dhruv Gaur, Digital Planning Lead; Leonie Mergulhao, Account Supervisor - Social Media & PR, Medulla (IN), Aurelija Plioplytė, Head of Digital & Social, Not Perfect (LI), Daiana Khaidargaliyeva, Account Manager, Osaka Labs (UK / USA), Stefanie Söhnchen, Vice President Digital, PIABO Communications (DE), Elisabeth Winiartati, Managing Consultant, Head of Global Integrated Communications; Lydia Aprina, Account Manager, Integrated Marketing and Communications; Nita Prabowo, Account Manager, Integrated Marketing and Communications; Okhi, Web Developer, PNTR Group (ID), Kei Obusan, Insights Director; Daffi Ranandi, Insights Manager, Radarr (SG), Gautam Reghunath, Co-founder & CEO, Talented (IN), Donagh Humphreys, Head of Social and Digital Innovation, THINKHOUSE (IRE), Sarah Yim, Strategy Director, Zulu Alpha Kilo (CA).

Social Media Marketing Trends 2024 // The Global Indie Insights

Kurio // The Social Media Age(ncy)

The search marketing landscape is evolving rapidly with new technologies, and professionals, like you, rely on innovative paid search strategies to meet changing demands. It’s important that you’re ready to implement new strategies in 2024. Check this out and learn the top trends in paid search advertising that are expected to gain traction, so you can drive higher ROI more efficiently in 2024. You’ll learn: - The latest trends in AI and automation, and what this means for an evolving paid search ecosystem. - New developments in privacy and data regulation. - Emerging ad formats that are expected to make an impact next year. Watch Sreekant Lanka from iQuanti and Irina Klein from OneMain Financial as they dive into the future of paid search and explore the trends, strategies, and technologies that will shape the search marketing landscape. If you’re looking to assess your paid search strategy and design an industry-aligned plan for 2024, then this webinar is for you.

Trends In Paid Search: Navigating The Digital Landscape In 2024

Search Engine Journal

From their humble beginnings in 1984, TED has grown into the world’s most powerful amplifier for speakers and thought-leaders to share their ideas. They have over 2,400 filmed talks (not including the 30,000+ TEDx videos) freely available online, and have hosted over 17,500 events around the world. With over one billion views in a year, it’s no wonder that so many speakers are looking to TED for ideas on how to share their message more effectively. The article “5 Public-Speaking Tips TED Gives Its Speakers”, by Carmine Gallo for Forbes, gives speakers five practical ways to connect with their audience, and effectively share their ideas on stage. Whether you are gearing up to get on a TED stage yourself, or just want to master the skills that so many of their speakers possess, these tips and quotes from Chris Anderson, the TED Talks Curator, will encourage you to make the most impactful impression on your audience. See the full article and more summaries like this on SpeakerHub here: https://speakerhub.com/blog/5-presentation-tips-ted-gives-its-speakers See the original article on Forbes here: http://www.forbes.com/forbes/welcome/?toURL=http://www.forbes.com/sites/carminegallo/2016/05/06/5-public-speaking-tips-ted-gives-its-speakers/&refURL=&referrer=#5c07a8221d9b

5 Public speaking tips from TED - Visualized summary

SpeakerHub

Everyone is in agreement that ChatGPT (and other generative AI tools) will shape the future of work. Yet there is little consensus on exactly how, when, and to what extent this technology will change our world. Businesses that extract maximum value from ChatGPT will use it as a collaborative tool for everything from brainstorming to technical maintenance. For individuals, now is the time to pinpoint the skills the future professional will need to thrive in the AI age. Check out this presentation to understand what ChatGPT is, how it will shape the future of work, and how you can prepare to take advantage.

ChatGPT and the Future of Work - Clark Boyd

Clark Boyd

Getting into the tech field. what next

Tessa Mero

Google's Just Not That Into You: Understanding Core Updates & Search Intent

Lily Ray

How to have difficult conversations

Rajiv Jayarajah, MAppComm, ACC

Introduction to Data Science

Christy Abraham Joy

Time Management & Productivity - Best Practices

Vit Horky

The six step guide to practical project management If you think managing projects is too difficult, think again. We’ve stripped back project management processes to the basics – to make it quicker and easier, without sacrificing the vital ingredients for success. “If you’re looking for some real-world guidance, then The Six Step Guide to Practical Project Management will help.” Dr Andrew Makar, Tactical Project Management

The six step guide to practical project management

MindGenius

Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...

RachelPearson36

During this webinar, Anand Bagmar demonstrates how AI tools such as ChatGPT can be applied to various stages of the software development life cycle (SDLC) using an eCommerce application case study. Find the on-demand recording and more info at https://applitools.info/b59 Key takeaways: • Learn how to use ChatGPT to add AI power to your testing and test automation • Understand the limitations of the technology and where human expertise is crucial • Gain insight into different AI-based tools • Adopt AI-based tools to stay relevant and optimize work for developers and testers * ChatGPT and OpenAI belong to OpenAI, L.L.C.

Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...

Applitools

12 Ways to Increase Your Influence at Work

GetSmarter

ChatGPT webinar slides

Alireza Esmikhani

More than Just Lines on a Map: Best Practices for U.S Bike Routes

Project for Public Spaces & National Center for Biking and Walking

Has your project been caught in a storm of deadlines, clashing requirements, and the need to change course halfway through? If yes, then check out how the administration team navigated through all of this, relocating 160 people from 3 countries and opening 2 offices during the most turbulent time in the last 20 years. Belka Games’ Chief Administrative Officer, Katerina Rudko, will share universal approaches and life hacks that can help your project survive unstable periods when there seem to be too many tasks and a lack of time and people.

Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...

DevGAMM Conference

En vedette (20)

Skeleton Culture Code

PEPSICO Presentation to CAGNY Conference Feb 2024

Content Methodology: A Best Practices Report (Webinar)

How to Prepare For a Successful Job Search for 2024

Social Media Marketing Trends 2024 // The Global Indie Insights

Trends In Paid Search: Navigating The Digital Landscape In 2024

5 Public speaking tips from TED - Visualized summary

ChatGPT and the Future of Work - Clark Boyd

Getting into the tech field. what next

Google's Just Not That Into You: Understanding Core Updates & Search Intent

How to have difficult conversations

Introduction to Data Science

Time Management & Productivity - Best Practices

The six step guide to practical project management

Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...

Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...

12 Ways to Increase Your Influence at Work

ChatGPT webinar slides

More than Just Lines on a Map: Best Practices for U.S Bike Routes

Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...

Hadoop Archive and Tiering

1. 1

2. 2 Reasons For Storage Tiering with Hadoop: • Single tier lends to a large imbalance of compute and storage resources • More applications create varying workloads • Large percent of data is cold in most cases • More recently ingested data can be better balanced • Fewer nodes per GB with archive nodes • Lower infrastructure costs Existing Tier Node Medium Compute Medium Capacity Cold Tier Node Low Compute High Density Capacity 4x Less Per GB Name Nodes Accessed Data Archive Node Example

3. 3 • Over 65% less hardware • 60% fewer nodes (software licensing) • Significant performance improvement • Immediate ROI for cloud and private infrastructures Archive Data Nodes 80% Disk Data Nodes 20% Disk Data Nodes 100% Single Tier HDFS Storage “The price per GB of the ARCHIVE tier is 4x less” -eBay Hadoop Engineering Blog 4x Fewer Nodes Capacity 10PBCapacity 10PB

4. 4 Access frequency of data is the most important metric for effective tiering Age is easiest to determine. CAUTION: Some data is long-term active so this cannot be the only criteria. Zero and small files should be treated differently in tiering Hadoop. Large cold files should have priority for archive Knowing how long data is accessed once ingested can provide better capacity planning for your tiers.

5. 5 Installed on a server or VM outside your existing Hadoop cluster without inserting any proprietary technology on the cluster or in the data path. Report data usage (heat), small files, user activity, replication, and HDFS tier utilization. Customize rules and queries to properly utilize infrastructure and plan better for future scale. Automatically archive, promote, or change the replication factor of data based on usage patterns and user defined rules. Tier Hadoop HDFS By Heat, Age, Size & Activity In Three Easy Steps 01/INSTALL WITHOUT CHANGES TO CLUSTER 02/VISUALIZE & REPORT 03/AUTOMATE OPTIMIZATION

6. 6 HDFSplus Apply storage policy based on custom query Files are optimized during normal balancing window Query list based on size, heat, activity, and age 1 2 3 • Move all files 120 days old and not accessed for 90 days to ARCHIVE….. • FactorData creates a data list based on query • Limit automated run by max files or capacity • FactorData tracks completion of each run • Data can be excluded from run according to path, size and application Custom Query Example: Automated Tiering:

7. 7 Completely out of the data path FactorData HDFSplus sits outside the Hadoop cluster and collects only metadata information from the Hadoop cluster No software to install on the existing Hadoop cluster Because HDFSplus leverages only existing Hadoop APIs and features, there is no software to install on the cluster. Provides a highly scalable solution in a small foot-print HDFS visibility and automation for thousands of Hadoop nodes on a single node, VM or server HDFSplus Namenodes Communicates with Existing Hadoop API VM or Physical Machine 32GB RAM 4 CPU or vCPU 500GB Free Disk

8. 8 Simplify and Automate Archive and Tiering in Hadoop Today • Move less accessed data to storage dense nodes for better utilization • Lower software licensing • Free resources on existing namenodes and datanodes How can we get more performance out of our existing Hadoop cluster? How can we move data not accessed for 90 days to archive nodes? How can we better plan for future scale with real Hadoop storage metrics? Result: Better Performance, Lower Hardware Costs, Lower Software Costs Plus: Get Necessary Storage Visibility To Answer These Questions & More with FactorData HDFSplus

9. 9

Hadoop Archive and Tiering

Recommandé

Recommandé

Contenu connexe

Dernier

Dernier (20)

En vedette

En vedette (20)

Hadoop Archive and Tiering