Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

Latency in storage

26 vues

Publié le

Latency in general in IT Infrastructure and especially on NetApp ONTAP Storage.

Publié dans : Technologie
  • Soyez le premier à commenter

  • Soyez le premier à aimer ceci

Latency in storage

  1. 1. Components of Latency in an IT infrastructure Note: Applicable to components on the Ground only, no cloud here … A simple rule of thumb:  If a particular system or application is running slower than you expect, or slower than it has historically, it might be a performance issue. However,  If a particular system or application is not working at all, it is likely not a performance related issue. Latency (response time in mili-seconds) is the time it takes for a Storage Controller [volume/LUN] to respond to I/O requests from client applications. Cause: The cause of the high latency could be on the Storage itself, Or on one or more components end-2-end i.e starting from Host <> Network <> storage: Such as: 1) Application such as SQL/Exchange/Oracle/SAP etc [Incorrect configuration/settings] 2) a) Physical Host [Incorrect settings and/or out-dated firmware/driver on the NIC or HBA and/or Insufficient CPU/Memory] b) Virtual Machine [Insufficient CPU/Memory/NIC Capabilities] c) ESX Host [Contention of resources such as disk/CPU/Memory/NIC-1g/10g] 3) Network components that attach Host to storage [cable/switches/routers etc] [Settings: Flowcontrol] 4) Storage components [Hardware: CPU/Memory/Disk[volume - cifs/nfs-datastore]/LUN/Internal components of the storage controller] [Software: Filesystem/Software/OS BUG]
  2. 2. As a storage administrator : it's my responsibility to identify if there are any bottlenecks originating from the storage end and once I have collected the stats and only when those stats suggests no bottlenecks on the storage end that I can confidently pass the ticket to the Application/Host Infrastructure team to do the analysis on their part. Similarly, Once they have the stats to prove that the Host/Applications is not a bottleneck, then it has to be something in between, and most likely Network and thereafter Network team can do analysis on their part to identify any latency issues. Honestly, it has to be one of those 3 areas or all may be all collectively responsible for application latencies. In general :If all the applications are experiencing slow-ness irrespective of the storage tiering [DISK/FLASH/SSD], then it is worth checking with network team to find out if there is any kind of disruption or issues reported before starting any investigations on the storage side [Unless the whole Storage is down ;) ] As a NetApp storage administrator for Clustered ONTAP: My job is made easier with the Introduction Clustered ontap and now simply called 'ONTAP'. I am recommending ONTAP 9.x Major release versions here as it has major improvements over previous 8.3.x versions. 3 Most Important commands to cover your back when you are faced with tickets passed to storage team for latency related issues are: Commands are: : > qos statistics volume performance show :> qos statistics volume characteristics show :> qos statistics volume latency show All the 3 commands are stepping stone towards identifying the root cause, however 'volume latency' is possibly the most useful as it breaks down the latency contribution of the individual clustered Data ONTAP components, making your job convenient and easier compared to 7-mode filers. Please note: Those 3 commands will give you real-time stats however if you are asked to investigate issue that occurred in the past then you must use OCUM [OPM] Performance Manager Tool to view the historical stats (data). Honestly, OCUM/OPM is sufficient to view, collect and identify the performance related issues on NetApp storage and it’s free. However, there is another heavy duty tool called ‘OCI’ which is a licensed product and can do the same thing across heterogeneous storage & components but if you don’t have the budget for it you can ignore it, you can live without it.
  3. 3. In the next article, I will demonstrate how to use the 3 commands along with OCUM/OPM Tool and will show you how to interpret the latency across all the different ONTAP components and it’s relation to each other and how application IO size could play a critical role with respect to your Networking environment such as1500 or 9000 mtu, and will also learn how to mitigate it from the storage side. ashwinwriter@gmail.com April, 2019