Reliability Modeling and Analysis of Energy-Efficient Storage Systems

Reliability Modeling and Analysis of
Energy-Efficient Storage Systems

Shu Yin

Advisor: Dr. Xiao Qin
Committee Members: Dr. Sanjeev Baskiyar
Dr. Alvin Lim
University Reader: Dr. Shiwen Mao

Presentation Outline
• Motivation
• MINT Model

• MREED Model

• Models Validation

• Reliability Improvement

• Conclusion and Future Work

2

Motivation

Stream Multimedia Bioinformatic

3D Graphic Weather Forecast

Data Intensive Applications
3

Data Intensive Computing Application

Cluster System

4

Problem: Energy Dissipation

EPA Report to Congress on Server and Data Center Energy Efficiency, 2007

5

Problem:Energy Dissipation(cont.)
Using 2010 Historical Trends
Scenario
Disk
• Data Centers consume 110
Syste Billion kWh per Year;
m
27% • Assume Average Commercial
End User Is Charged ¢9.46 per
kWh
• Disk System Can Account for
27% of the Computing Energy
Other
73% Cost of Data Centers.

Disk System May Have
An Electrical Cost of
2.8 Billion Dollars!

6

Existing Energy Conservation Techniques

Software-Directed Power Management
Dynamic Power Management
Redundancy Technique
Multi- speed Setting

How Reliable Are They?

7

Contradictory of Energy Efficiency and Reliability

Energy
Efficiency

Reliability

Example: Disk Spin Up and Down

8

• Motivation
• MINT Model
• MREED Model



9

MINT
(MATHEMATICAL RELIABILITY MODELS FOR ENERGY-EFFICIENT PARALLEL DISK SYSTEMS)

Energy Conservation
Techniques

Single Disk Reliability Model

System-Level Reliability Model

10

MINT (Single Disk)

Disk Age Temperature

Frequency Utilization


Reliability of Single
Disk

11

MINT (Single Disk)

R=α*BaseValue[1]*TemperatureFactor+β*FrequencyAdder[2]

α and β are two coefficients to R

Assumption: α = β = 1 in our research

[1] E. Pinheiro, W.-D. Weber, and L.A. Barroso. Failure trends in a large disk drive population. Proc.
USENIX Conf. File and Storage Tech., February2007.
[2] IDEMA Standards. Speciﬁcation of hard disk drive reliability.

12

MINT (Single Disk)

R=α*BaseValue*TemperatureFactor+β*FrequencyAdder

Utilization Impact on AFR Temperature Impact on Transition Frequency Impact on
Temperature Factor Frequency Adder

13

MINT (Single Disk)

R=α*BaseValue*TemperatureFactor+β*FrequencyAdder

Frequency=350/Month, T=40°C




Base Value from Google Report[3]

Single Disk Reliability

[3] E. Pinheiro, W.-D. Weber, and L.A. Barroso. Failure trends in a large disk drive population. Proc.
USENIX Conf. File and Storage Tech., February 2007.

14

MINT (Energy Conservation Techniques- PDC)

Popular Date Concentration (PDC)[3] - cold data
System Structure
- hot data

[3] E. Pinheiro and R. Bianchini. Energy conservation techniques for disk array-based servers. Int’l Conf.
on Supercomputing, pages 68–78, June 2004.

15


Access Rate<MIN(Access Rate) Access Rate>MAX(Access Rate)

Access Rate<MIN(Access Rate)
More Popular Disk Less Popular Disk
Access Rate>MAX(Access Rate)

- cold data
- hot data

16


Popular Date Concentration (PDC)[3] - cold data
System Structure
- hot data
(Optimal Result for Certain Time Phases)

17

MINT (Energy Conservation Techniques- MAID)

Massive Array of Idle Disks (MAID)[4] - cold data
System Structure
- hot data

[4] Dennis Colarelli and Dirk Grunwald. Massive arrays of idle disks for storage archives.
Supercomputing ’02: Proceedings of the 2002 ACM/IEEE conference on Supercomputing, pages 1–11,
Los Alamitos, CA, USA, 2002. IEEE Computer Society Press.

18

MINT (Energy Conservation Techniques- MAID)

Cache Disk Data Disk

Access Rate>MAX(Access Rate)

Massive Array of Idle Disks (MAID)[4] - cold data
System Structure
- hot data

[4] Dennis Colarelli and Dirk Grunwald. Massive arrays of idle disks for storage archives.
Supercomputing ’02: Proceedings of the 2002 ACM/IEEE conference on Supercomputing, pages 1–11,
Los Alamitos, CA, USA, 2002. IEEE Computer Society Press.

19

MINT (System-Level)

Access
Disk Age Temperature
Pattern

Energy Conservation
Techniques

Frequency Utilization Frequency Utilization


Reliability of Reliability of
Disk 1 Disk n

System-Level Reliability Model

Reliability of A
Parallel Disk System

20

Preliminary Results (experimental setting)

Energy-efficiency File Access Rate File Size
Number of Disks
Scheme (No. per month) (KB)
20 data
PDC 0~106 300
(20 in total)
15 data + 5 cache
MAID-1 0~106 300
(20 in total)

20 data + 5 cache
MAID-2 0~106 300
(25 in total)

Read-only Disks

21

Preliminary Result
Comparison Between PDC and MAID

AFR Comparison of PDC and MAID
Access Rate(*104) Impacts on AFR (T=35°C)

22

Preliminary Result
Comparison Between PDC and MAID

- PDC - MAID

23

MAID under High Access Rate

MAID-1

MAID-2


24

MAID under High Access Rate

MAID-1

MAID-2

MAID-1

MAID-2
MAID-1

MAID-2


25

MINT (conclusion)

Mathematical Model for Disk Systems
MINT Study on PDC and MAID
But ...

Data Stripping Mechanism
Energy Consumption Issues
What about RAID? Reliability Issues
Complexity

26

• Motivation
• MINT Model

• MREED Model


27

MREED Model
(MATHEMATICAL RELIABILITY MODELS FOR ENERGY-EFFICIENT RAID SYSTEMS)

Access Pattern Temperature

Energy Conservation Techniques

Utilization

Frequency Weibull Analysis

Annual Failure Rate

28

Weibull Analysis
A Leading Method for Fitting Life Date
Advantages:
Accurate
Small Samples
Widely Used

29

MREED Model
(Energy Conservation Techniques- PARAID)

Soft
State

RAID

Gears 1
2
3

Power-Aware RAID (PA-RAID)[5]
System Structure

[5] Charles Weddle, Mathew Oldhan, Jin Qian, An-I Andy Wang.PARAID: A Gear-Shifting Power-Aware RAID.
USENIX FAST 2007.

30

Reliability Evaluation(Experiment Setup)

Disk Type Seagate ST3146855FC
Capacity 146 GB
Cache Size Sata 16MB

Buffer to Host Transfer Rate 4Gb/s (Max)

Total Number of Disks 5

File Size 100 MB

Number of Files 1000

Synthetic Trace Poisson Distribution

Time Period 24 Hours
Interval Time (Time Phase) 1 Hour

Power on Hour Per Year 8760 Hours

31

Reliability Evaluation
(Disk Utilization Comparison)

Disk Utilization Comparison Between PARAID-0 and RAID-0 at A Low Access Rate (20/hr)

32

Reliability Evaluation
(Disk Utilization Comparison)

Disk Utilization Comparison Between PARAID-0 and RAID-0 at A High Access Rate (80/hr)

33

Reliability Evaluation (AFR Comparison)

AFR Comparison Between PARAID-0 and RAID-0 at A Low Access Rate (20/hr)

34

Reliability Evaluation (AFR Comparison)

AFR

AFR Comparison Between PARAID-0 and RAID-0 at A High Access Rate (80/hr)

35

• Motivation
• MINT Model

• MREED Model


36

Model Validation
Techniques
– Run the Systems for A Couple of Decades

– The Event Validity Validation Techniques[6]

[6] R.G. Sargent, “Verification and Validation of Simulation Models”, in Proceedings of the 37th conference on
Winter Simulation, ser. WSC’05 Winter Simulation Conference, 2005.

37

Model Validation
Challenges
Unable to Monitor PARAID Running for Years

Sample Size is Small from A Validation
Perspective (e.g. 100 Disks for Five Years)

38

Model Validation (DiskSim[7] Simulation)

File To Block Level Converter
[7] S.W.S John, S. Bucy, Jiri Schindler and G.R. Ganger, “The DiskSim Simulation Environment Version 4.0
Reference Manual”, 2008

39

Model Validation (DiskSim Simulation)

Diagram of the Storage System Corresponding to the DiskSim RAID-0

40

Model Validation (Result)

Utilization Comparison Between MREED and DiskSim Simulator

41

Model Validation (Result)

Gear Shifting Comparison Between MREED and DiskSim Simulator

42

• Motivation
• MINT Model

• MREED Model



43

Recall PDC

Popular Date Concentration (PDC) - cold data
System Structure
- hot data
(Optimal Result for Certain Time Phases)

44

Problem of PDC
The Most Popular Disk:
High AFR
No Replica

45

Reliability Improvement of PDC
Method of Improving Reliability
Mirroring
Extra Disks for Replication -> More Energy Consumption

Disk Swapping
Swap Existing Disks

46

Disk Swapping Scheme
PDC

Swap the Most Popular Disk with the Least Popular Disk

47

PDC

Swap the Highest AFR Disk with the Lowest AFR Disk

48

MAID

Swap the Cache Disks with the Data Disks

49

Preliminary Results (experimental setting)

Energy-efficiency File Access Rate File Size
Number of Disks
Scheme (No. per month) (KB)
20 data
PDC 0~106 300
(20 in total)
15 data + 5 cache
MAID-1 0~106 300
(20 in total)

20 data + 5 cache
MAID-2 0~106 300
(25 in total)

• Read-only Disks
• Mean Time to Data Lose (MTTDL)
• Swapping Thresholds (2*105, 5*105, 8*105 No./Month)
• Single Swapping
50

Comparison of Disk Swap
PDC

AFR Comparison of PDC
Threshold = 2*105 No./Month
51

Comparison of Disk Swap
PDC
AFR:
Swap2 < Swap1 < No Swap

52

Comparison Between Different Threshold
PDC

53

PDC

54

PDC

55

PDC

AFR
Higher Threshold -> Lower AFR

Threshold = 2*105 No./Month, 5*105 No./Month, 8*105 No./Month
56

Limitations
• Read Only Disk Scenario

• Data Migration within Certain Time Phases

• Simple File Access Patterns

57

Future Work
Extend the Models to investigate mixed read/write
workloads;
Research the trade-offs between reliability and energy-
efficiency;
Extend schemes to a real-world based environment;
Develop a multi-swapping mechanism
balancing the utilization & lowering the failure rate;
Evaluate more control groups.

58

Conclusion
• Generic Models coupled with power
management optimization policies;
• Two reliability models for the three well-known
energy-saving schemes -- PDC, MAID and PARAID;
• Disk swapping strategies to improve disk
reliability for PDC.

59

Reliability Modeling and Analysis of Energy-Efficient Storage Systems

Recommandé

Recommandé

Contenu connexe

Similaire à Reliability Modeling and Analysis of Energy-Efficient Storage Systems

Similaire à Reliability Modeling and Analysis of Energy-Efficient Storage Systems (20)

Plus de Xiao Qin

Plus de Xiao Qin (20)

Dernier

Dernier (20)

Reliability Modeling and Analysis of Energy-Efficient Storage Systems

Notes de l'éditeur