Aucune remarque pour cette diapositive
Digital universe grew by 62% last year to 800 000 petabytes (peta=million gigabytes) and will grow to 1.2 “zettabytes” (zetta=million petabytes) this year then in 2020 we expect 35 zettabytes
Big Data Processing. New connectivity in Informatica 9.1 enables IT to load data from any source into Hadoop, and extract data from Hadoop for delivery to any target. The connectivity also allows the application of Informatica data quality, data profiling, and other techniques to data in Hadoop. These capabilities open new possibilities for enterprises combining transaction and interaction data either inside or outside of Hadoop. Confidently deploy the Hadoop platform for Big Data processing with seamless source-and-target data integration Integrate insights from Hadoop Big Data analytics into traditional enterprise systems to improve business processes and decision-makingLeverage petabyte-scale performance to process large data sets of virtually any type and origin We are also looking to develop Graphical integrated development environment for Hadoop environment in the future release
Business Problem: Develop a centralized clearing house of sensor data for continual analytics to improve yield and safety.Raw data size of 2+ TB per rig per day, over centralized storage environment will be at 4 PB+ in 18 months easily.Shell has 40k sensors per rig but only uses data from 10% of them.Technical Challenges:Log on to sensor units from a central location.Preprocess & manage large amounts of data at multiple remote sites.Move the data from the site to a more central location often using poor commutations connections.Load the sensor data onto an server (separate from the hardware at the remote site) and determining whether they can optimize the data streams form this server to the central database.Create a central repository where data from multiple sites can be collected and kept for a long period of time.Opportunity Identification:What percentage of your sensor data do you actually use?How are you doing your real-time analytics?What is your big data strategy for dealing with theseHow are you doing your cross-rig correlation and learning.
There is news of a large meteorite approaching your datacenters. Some call it Big Data.Others ignore it. But early signs of cosmic particles tells us that its arrival is imminent.I’d like to help you prepare for it, at least from the perspective of your storage strategy.
Big Data is not just big. It is very fast, more real-time than we are used to and will need to be widely shared. Quite the contrast for the more batched G00211490 G00226066Latency can be cripplingSource:
All the interest comes from the promise of bigger fortunes. Real-time inputs bring us greater situational awareness, which lead to better, more timely decisions, which result in better financial outcomes.I’m feeling rich already.
When you take a closer look at Big Data, you uncover some very challenging attributes.Whereas today information appears to settle into convenient buckets and relatively easy to characterize, Big Data is never at rest. It roars in while it’s hot and quickly becomes lukewarm almost stale. Which means that our retention policies must change as well or we’ll become hoarders.G00211490
I see this as one of the few opportunities in our short careers to make major structural renovations. A rare chance to justify modernizing and aligning to the business needs by re-architecting our storage management techniques and making them highly adaptable.Fortunately, it’s not that difficult to pull this off. G00214426
Let’s consider a major innovation helping facilitate our task. The recent development of a storage hypervisor layer sitting between apps and storage insulating data management from all the hardware variables that Big Data throws our way.I’m going to spend a few minutes on this topic, since it has broad applicability across your infrastructure from the on-premises resources that you are so familiar with to the new cloud-based assets available for harnessing.You’ll also find it an essential ally in accelerating access to data
Inless abstract terms, the storage hypervisor is your agent of change in making raw data not only quicker to get to, but far more shareable. Automatically directing traffic among the assortment of storage devices at your disposal, and caching it close to the apps. Operationally, it gives you centralized control.You may already be employing these techniques in your server virtualization efforts, now apply them to disks.I’ll speak about each of them individually.
Resource pooling has the most immediate impact, enablingyou to aggregate the combined disk capacity across your IT infrastructure. This has the effect of making disks shareable to the most needy app while reclaiming apreviously inaccessible space.You’ll need a storage hypervisor to pull this off. Just like a server hypervisor, the specialized software emulates hardware so as to hide incompatibilities between different models.
This diagram may give you a good idea of the relative position of the storage hypervisor in the processing stack.
Perhaps it even makes more sense when seen alongside other forms of hypervisors, notably server and desktop variants.
The storagevirtualization software incorporates a great deal of automation.First to avoid waste, and more dynamically, to direct higher priority workloads to the fastest disks.Underneath, you’ll discover the magic of device-independent thin provisioning and auto-tiering at work.
Ready for more adventure?Travel with me one more hop into the hybrid cloud. That’s where you auto tier between your on-premises capacity and off-site disks rented from one of the commercial Cloud providers. Comes in real handy when you need a little scratch space, or when you are archiving documents that don’t require the same security or regulatory oversight as other consumers. It’s also a great option for storing contents that may need to be recovered during a disaster. More on that in a minute.
Such dynamic juggling of diverse resources, particularly operating across equipment from different suppliers, is on the leading edge of 21st century cloud technologies. What seems like exceptionally well-running apps to the user, is largely a product of a well- balanced arsenal of purpose-built devices orchestrated by DataCore’s storage hypervisor. Combined with thin provisioning, they translate into major savings and big time agility.
The answer to speed needs a little more explanation.
During your selection, look for the storage hypervisor to encompass these off-site disks as merely an extension of on-premises capacity.
The most visible aspect of your newly enlightened sky view comes from centralized management. While much has been said in the past about monitoring dissimilar units, the innovations we’re speaking about extend into achieving common control.One menu with discrete actions across device families whether from the same manufacturer or different suppliers.Similar to a universal remote, with equally powerful universal scripting commands used by 3rd parties for rich cross-integration.
Which brings me to all the standardization talk going around. You may have noticed how small groups of vendors are banding together under the guise of standardization to dictate building blocks for private clouds. Each club has a different recipe calling out their hardware. They also imply that choosing components outside that elite member list jeopardizes the outcome.In stark contrast, the DataCore angle on standardization is all about interchangeability. Giving you the freedom to harness the best purpose-built equipment for each tier in the cloud. Allowing you to shop for the best value among competing hardware suppliers – all of which can do a good job. Key to making this work is sticking to established disk interfaces, and treating storage as no more than largely interchangeable chunks of disk space.
Usually, after incorporating the principles of pooling, automation, caching and centralized management, our clients are ready to reinvent themselves in other ways. They tap into nearby facilities which help them inexpensively overcome the confines of their four walls.
These measures bring significant benefit well beyond mere expansion. They are key to achieving continuous availability in the face of routine causes of planned and unplanned downtime. With equipment reliability hitting five 9s, outages these days are more frequently the result of ongoing changes in the surrounding environment. Be it upgrades to the devices or to the physical plant. Sometimes the interruptions are expected, other times they are not.My number one suggestion: Mirror your critical volumes between two rooms, as far as possible within a metro area so you can still treat them as one logical site. Normally, you will be OK within 100 kilometers.Once again, automation in the storage hypervisor kicks in to replicate the information in real-time, even between unlike storage devices. Then when one site has to be taken down, the other site takes over transparently. For added safeguard against regional disasters (earthquakes, storms, floods, etc,) you may want to keep a third copy very far away at a contingency site.
Where will you experience the biggest payoffs?Our clients quantify them this way:They postpone and even avoid major disk acquisitions as a result of fully utilizing the capacity already on hand. They attain much better service levels through faster provisioning, eliminating storage-related disruptions and speeding up response from applications.
I’d be glad to spend more time with you individually to discuss these techniques and hear what you anticipate running into as the Big Data meteor gets closer.Thank you.