SlideShare a Scribd company logo
1 of 22
Download to read offline
Instrumenting the
real-time web:
Running node.js in production

Bryan Cantrill
VP, Engineering

bryan@joyent.com
@bcantrill
“Real-time web?”

   • The term has enjoyed some popularity, but there is
     clearly confusion about the definition of “real-time”
   • A real-time system is one in which the correctness of the
     system is relative to its timeliness
   • A hard real-time system is one which the latency
     constraints are rigid: violation constitutes total system
     failure (e.g., an actuator on a physical device)
   • A soft real-time system is one in which latency
     constraints are more flexible: violation is undesirable but
     non-fatal (e.g., a video game or MP3 player)
   • Historically, the only real-time aspect of the web has
     been in some of its static content (e.g. video, audio)
The rise of the real-time web

    • The rise of mobile + HTML5 has given rise to a new
     breed of web application: ones in which dynamic data
     has real-time semantics
    • These data-intensive real-time applications present new
     semantics for web-facing applications
    • These present new data semantics for web applications:
     CRUD, ACID, BASE, CAP — meet DIRT!
The challenge of DIRTy apps

   • DIRTy applications tend to have the human in the loop
      • Good news: deadlines are soft — microseconds only
        matter when they add up to tens of milliseconds

      • Bad news: because humans are in the loop, demand
        for the system can be non-linear

   • One must deal not only with the traditional challenge of
     scalability, but also the challenge of a real-time system!
Building DIRTy apps

   • Embedded real-time systems are sufficiently controlled
     that latency bubbles can be architected away
   • Web-facing systems are far too sloppy to expect this!
   • Focus must shift from preventing latency bubbles to
     preventing latency bubbles from cascading
   • Operations that can induce latency (network, I/O, etc.)
     must not be able to take the system out with them!
   • Implies purely asynchronous and evented architectures,
     which are notoriously difficult to implement...
Enter node.js

   • node.js is a JavaScript-based framework for building
     event-oriented servers:
      var http = require(‘http’);

      http.createServer(function (req, res) {
             res.writeHead(200, {'Content-Type': 'text/plain'});
             res.end('Hello Worldn');
      }).listen(8124, "127.0.0.1");

      console.log(‘Server running at http://127.0.0.1:8124!’);
node.js as building block

    • node.js is a confluence of three ideas:
       • JavaScriptʼs rich support for asynchrony (i.e. closures)
       • High-performance JavaScript VMs (e.g. V8)
       • The system abstractions that God intended (i.e. UNIX)
    • Because everything is asynchronous, node.js is ideal for
     delivering scale in the presence of long-latency events!
The primacy of latency

   • As the correctness of the system is its timeliness, we
     must be able to measure the system to verify it
   • In a real-time system, it does not make sense to
     measure operations per second!
   • The only metric that matters is latency
   • This is dangerous to distill to a single number; the
     distribution of latency over time is essential
   • This poses both instrumentation and visualization
     challenges!
Instrumenting for latency

    • Instrumenting for latency requires modifying the system
     twice: as an operation starts and as it finishes
    • During an operation, the system must track — on a per-
     operation basis — the start time of the operation
    • Upon operation completion, the resulting stored data
     cannot be a scalar — the distribution is essential when
     understanding latency
    • Instrumentation must be systemic; must be able to
     reach to the sources of latency deep within the system
    • These constraints eliminate static instrumentation; we
     need a better way to instrument the system
Enter DTrace

   • Facility for dynamic instrumentation of production
     systems originally developed circa 2003 for Solaris 10
   • Open sourced (along with the rest of Solaris) in 2005;
     subsequently ported to many other systems (MacOS X,
     FreeBSD, NetBSD, QNX, nascent Linux port)
   • Support for arbitrary actions, arbitrary predicates, in
     situ data aggregation, statically-defined instrumentation
   • Designed for safe, ad hoc use in production: concise
     answers to arbitrary questions
   • Particularly well suited to real-time: the original design
     center was the understanding of latency bubbles
DTrace + Node?

   • DTrace instruments the system holistically, which is to
    say, from the kernel, which poses a challenge for
    interpreted environments
   • User-level statically defined tracing (USDT) providers
    describe semantically relevant points of instrumentation
   • Some interpreted environments (e.g., Ruby, Python,
    PHP, Erlang) have added USDT providers that
    instrument the interpreter itself
   • This approach is very fine-grained (e.g., every function
    call) and doesnʼt work in JITʼd environments
   • We decided to take a different tack for Node
DTrace for node.js

    • Given the nature of the paths that we wanted to
     instrument, we introduced a function into JavaScript that
     Node can call to get into USDT-instrumented C++
    • Introduces disabled probe effect: calling from JavaScript
     into C++ costs even when probes are not enabled
    • We use USDT is-enabled probes to minimize disabled
     probe effect once in C++
    • If (and only if) the probe is enabled, we prepare a
     structure for the kernel that allows for translation into a
     structure that is familiar to node programmers
Node USDT Provider

   • Example one-liners:
     dtrace -n ‘node*:::http-server-request{
        printf(“%s of %s from %sn”, args[0]->method,
            args[0]->url, args[1]->remoteAddress)}‘

     dtrace -n http-server-request’{@[args[1]->remoteAddress] = count()}‘

     dtrace -n gc-start’{self->ts = timestamp}’ 
        -n gc-done’/self->ts/{@ = quantize(timestamp - self->ts)}’



   • A script to measure HTTP latency:
     http-server-request
     {
            self->ts[args[1]->fd] = timestamp;
     }

     http-server-response
     /self->ts[args[0]->fd]/
     {
            @[zonename] = quantize(timestamp - self->ts[args[0]->fd]);
     }
User-defined USDT probes in node.js

   • Our USDT technique has been generalized by Chris
     Andrews in his node-dtrace-provider npm module:
       https://github.com/chrisa/node-dtrace-provider
   • Used by Joyentʼs Mark Cavage in his ldap.js to measure
     and validate operation latency
   • But how to visualize operation latency?
Visualizing latency

    • Could visualize latency as a scalar (i.e., average):




    • This hides outliers — and in a real-time system, it is the
     outliers that you care about!
    • Using percentiles helps to convey distribution — but
     crucial detail remains hidden
Visualizing latency as a heatmap

    • Latency is much better visualized as a heatmap, with
     time on the x-axis, latency on the y-axis, and frequency
     represented with color saturation:




    • Many patterns are now visible (as in this example of
     MySQL query latency), but critical data is still hidden
Visualizing latency as a 4D heatmap

   • Can use hue to represent higher dimensionality: time on
     the x-axis, latency on the y-axis, frequency via color
     saturation, and hue representing the new dimension:




   • In this example, the higher dimension is the MySQL
     database table associated with the operation
Visualizing node.js latency

    • Using the USDT probes as foundation, we developed a
     cloud analytics facility that visualizes latency in real-time
     via four dimensional heatmaps:




    • Facility is available via Joyentʼs no.de service, Joyentʼs
     public cloud, or Joyentʼs SmartDataCenter
Debugging latency

   • Latency visualization is essential for understanding
     where latency is being induced in a complicated system,
     but how can we determine why?
   • This requires associating an external event — an I/O
     request, a network packet, a profiling interrupt — with
     the code thatʼs inducing it
   • For node.js — like other dynamic environments — this is
     historically very difficult: the VM is opaque to the OS
   • Using DTraceʼs helper mechanism, we have developed
     a V8 ustack helper that allows OS-level events to be
     correlated to the node.js-backtrace that induced them
   • Available for node 0.6.7 on Joyentʼs SmartOS
Visualizing node.js CPU latency

   • Using the node.js ustack helper and the DTrace profile
     provider, we can determine the relative frequency of
     stack backtraces in terms of CPU consumption
   • Stacks can be visualized with flame graphs, a stack
     visualization developed by Joyentʼs Brendan Gregg:
node.js in production

    • node.js is particularly amenable for the DIRTy apps that
     typify the real-time web
    • The ability to understand latency must be considered
     when deploying node.js-based systems into production!
    • Understanding latency requires dynamic instrumentation
     and novel visualization
    • At Joyent, we have added DTrace-based dynamic
     instrumentation for node.js to SmartOS, and novel
     visualization into our cloud and software offerings
    • Better production support — better observability, better
     debuggability — remains an important area of node.js
     development!
Thank you!

   • @ryah and @rmustacc for Node DTrace USDT
    integration
   • @dapsays, @rmustacc, @rob_ellis and @notmatt for
    cloud analytics
   • @chrisandrews for node-dtrace-provider and
    @mcavage for putting it to such great use in ldap.js
   • @dapsays for the V8 DTrace ustack helper
   • @brendangregg for both the heatmap and flame graph
    visualizations
   • More information: http://dtrace.org/blogs/dap,
    http://dtrace.org/blogs/brendan and http://smartos.org

More Related Content

What's hot

2015年度GPGPU実践プログラミング 第6回 パフォーマンス解析ツール
2015年度GPGPU実践プログラミング 第6回 パフォーマンス解析ツール2015年度GPGPU実践プログラミング 第6回 パフォーマンス解析ツール
2015年度GPGPU実践プログラミング 第6回 パフォーマンス解析ツール
智啓 出川
 
2015年度GPGPU実践基礎工学 第13回 GPUのメモリ階層
2015年度GPGPU実践基礎工学 第13回 GPUのメモリ階層2015年度GPGPU実践基礎工学 第13回 GPUのメモリ階層
2015年度GPGPU実践基礎工学 第13回 GPUのメモリ階層
智啓 出川
 

What's hot (20)

2015年度GPGPU実践プログラミング 第6回 パフォーマンス解析ツール
2015年度GPGPU実践プログラミング 第6回 パフォーマンス解析ツール2015年度GPGPU実践プログラミング 第6回 パフォーマンス解析ツール
2015年度GPGPU実践プログラミング 第6回 パフォーマンス解析ツール
 
計算機アーキテクチャを考慮した高能率画像処理プログラミング
計算機アーキテクチャを考慮した高能率画像処理プログラミング計算機アーキテクチャを考慮した高能率画像処理プログラミング
計算機アーキテクチャを考慮した高能率画像処理プログラミング
 
MRU : Monobit Reliable UDP ~5G世代のモバイルゲームに最適な通信プロトコルを目指して~
MRU : Monobit Reliable UDP ~5G世代のモバイルゲームに最適な通信プロトコルを目指して~MRU : Monobit Reliable UDP ~5G世代のモバイルゲームに最適な通信プロトコルを目指して~
MRU : Monobit Reliable UDP ~5G世代のモバイルゲームに最適な通信プロトコルを目指して~
 
20111015 勉強会 (PCIe / SR-IOV)
20111015 勉強会 (PCIe / SR-IOV)20111015 勉強会 (PCIe / SR-IOV)
20111015 勉強会 (PCIe / SR-IOV)
 
Gpu vs fpga
Gpu vs fpgaGpu vs fpga
Gpu vs fpga
 
CPU / GPU高速化セミナー!性能モデルの理論と実践:理論編
CPU / GPU高速化セミナー!性能モデルの理論と実践:理論編CPU / GPU高速化セミナー!性能モデルの理論と実践:理論編
CPU / GPU高速化セミナー!性能モデルの理論と実践:理論編
 
RISC-V User level ISA
RISC-V User level ISARISC-V User level ISA
RISC-V User level ISA
 
LLVM最適化のこつ
LLVM最適化のこつLLVM最適化のこつ
LLVM最適化のこつ
 
第14回 配信講義 計算科学技術特論A(2021)
第14回 配信講義 計算科学技術特論A(2021)第14回 配信講義 計算科学技術特論A(2021)
第14回 配信講義 計算科学技術特論A(2021)
 
一般的なチートの手法と対策について
一般的なチートの手法と対策について一般的なチートの手法と対策について
一般的なチートの手法と対策について
 
全部知ってたらTwinmotionマスター!TwinmotionのぷちTips・テクニック
全部知ってたらTwinmotionマスター!TwinmotionのぷちTips・テクニック全部知ってたらTwinmotionマスター!TwinmotionのぷちTips・テクニック
全部知ってたらTwinmotionマスター!TwinmotionのぷちTips・テクニック
 
Topology Managerについて / Kubernetes Meetup Tokyo 50
Topology Managerについて / Kubernetes Meetup Tokyo 50Topology Managerについて / Kubernetes Meetup Tokyo 50
Topology Managerについて / Kubernetes Meetup Tokyo 50
 
Chainer でのプロファイリングをちょっと楽にする話
Chainer でのプロファイリングをちょっと楽にする話Chainer でのプロファイリングをちょっと楽にする話
Chainer でのプロファイリングをちょっと楽にする話
 
第1回 配信講義 計算科学技術特論A (2021)
第1回 配信講義 計算科学技術特論A (2021)第1回 配信講義 計算科学技術特論A (2021)
第1回 配信講義 計算科学技術特論A (2021)
 
第 1 回 Jetson ユーザー勉強会
第 1 回 Jetson ユーザー勉強会第 1 回 Jetson ユーザー勉強会
第 1 回 Jetson ユーザー勉強会
 
ネットワーク ゲームにおけるTCPとUDPの使い分け
ネットワーク ゲームにおけるTCPとUDPの使い分けネットワーク ゲームにおけるTCPとUDPの使い分け
ネットワーク ゲームにおけるTCPとUDPの使い分け
 
2015年度GPGPU実践基礎工学 第13回 GPUのメモリ階層
2015年度GPGPU実践基礎工学 第13回 GPUのメモリ階層2015年度GPGPU実践基礎工学 第13回 GPUのメモリ階層
2015年度GPGPU実践基礎工学 第13回 GPUのメモリ階層
 
深層学習向け計算機クラスター MN-3
深層学習向け計算機クラスター MN-3深層学習向け計算機クラスター MN-3
深層学習向け計算機クラスター MN-3
 
HoloLensでコンテンツを操作する方法 - Gaze And Dwell -
HoloLensでコンテンツを操作する方法 - Gaze And Dwell -HoloLensでコンテンツを操作する方法 - Gaze And Dwell -
HoloLensでコンテンツを操作する方法 - Gaze And Dwell -
 
nftables: the Next Generation Firewall in Linux
nftables: the Next Generation Firewall in Linuxnftables: the Next Generation Firewall in Linux
nftables: the Next Generation Firewall in Linux
 

Viewers also liked

ql.io: Consuming HTTP at Scale
ql.io: Consuming HTTP at Scale ql.io: Consuming HTTP at Scale
ql.io: Consuming HTTP at Scale
Subbu Allamaraju
 
BPF: Tracing and more
BPF: Tracing and moreBPF: Tracing and more
BPF: Tracing and more
Brendan Gregg
 

Viewers also liked (8)

Node Summit 2012
Node Summit 2012Node Summit 2012
Node Summit 2012
 
FeedHenry at NodeJam (San Francisco, 25th Jan 2012)
FeedHenry at NodeJam (San Francisco, 25th Jan 2012)FeedHenry at NodeJam (San Francisco, 25th Jan 2012)
FeedHenry at NodeJam (San Francisco, 25th Jan 2012)
 
ql.io: Consuming HTTP at Scale
ql.io: Consuming HTTP at Scale ql.io: Consuming HTTP at Scale
ql.io: Consuming HTTP at Scale
 
Rqa14 secondary
Rqa14 secondaryRqa14 secondary
Rqa14 secondary
 
Probabilistic algorithms for fun and pseudorandom profit
Probabilistic algorithms for fun and pseudorandom profitProbabilistic algorithms for fun and pseudorandom profit
Probabilistic algorithms for fun and pseudorandom profit
 
BPF - in-kernel virtual machine
BPF - in-kernel virtual machineBPF - in-kernel virtual machine
BPF - in-kernel virtual machine
 
Linux BPF Superpowers
Linux BPF SuperpowersLinux BPF Superpowers
Linux BPF Superpowers
 
BPF: Tracing and more
BPF: Tracing and moreBPF: Tracing and more
BPF: Tracing and more
 

Similar to Instrumenting the real-time web: Node.js in production

Birmingham-20060705
Birmingham-20060705Birmingham-20060705
Birmingham-20060705
Miguel Vidal
 

Similar to Instrumenting the real-time web: Node.js in production (20)

John adams talk cloudy
John adams   talk cloudyJohn adams   talk cloudy
John adams talk cloudy
 
The impact of cloud NSBCon NY by Yves Goeleven
The impact of cloud NSBCon NY by Yves GoelevenThe impact of cloud NSBCon NY by Yves Goeleven
The impact of cloud NSBCon NY by Yves Goeleven
 
Performance Analysis: new tools and concepts from the cloud
Performance Analysis: new tools and concepts from the cloudPerformance Analysis: new tools and concepts from the cloud
Performance Analysis: new tools and concepts from the cloud
 
Build cloud native solution using open source
Build cloud native solution using open source Build cloud native solution using open source
Build cloud native solution using open source
 
Data Lake and the rise of the microservices
Data Lake and the rise of the microservicesData Lake and the rise of the microservices
Data Lake and the rise of the microservices
 
Birmingham-20060705
Birmingham-20060705Birmingham-20060705
Birmingham-20060705
 
node.js and Containers: Dispatches from the Frontier
node.js and Containers: Dispatches from the Frontiernode.js and Containers: Dispatches from the Frontier
node.js and Containers: Dispatches from the Frontier
 
Sync in an NFV World (Ram, ITSF 2016)
Sync in an NFV World  (Ram, ITSF 2016)Sync in an NFV World  (Ram, ITSF 2016)
Sync in an NFV World (Ram, ITSF 2016)
 
Sync in an NFV World (Ram, ITSF 2016)
Sync in an NFV World (Ram, ITSF 2016)Sync in an NFV World (Ram, ITSF 2016)
Sync in an NFV World (Ram, ITSF 2016)
 
Onboarding a Historical Company on the Cloud Journey
Onboarding a Historical Company on the Cloud JourneyOnboarding a Historical Company on the Cloud Journey
Onboarding a Historical Company on the Cloud Journey
 
Moving to software-based production workflows and containerisation of media a...
Moving to software-based production workflows and containerisation of media a...Moving to software-based production workflows and containerisation of media a...
Moving to software-based production workflows and containerisation of media a...
 
Fiware: Connecting to robots
Fiware: Connecting to robotsFiware: Connecting to robots
Fiware: Connecting to robots
 
Monitoring, the Prometheus Way - Julius Voltz, Prometheus
Monitoring, the Prometheus Way - Julius Voltz, Prometheus Monitoring, the Prometheus Way - Julius Voltz, Prometheus
Monitoring, the Prometheus Way - Julius Voltz, Prometheus
 
Tech 2 tech low latency networking on Janet presentation
Tech 2 tech low latency networking on Janet presentationTech 2 tech low latency networking on Janet presentation
Tech 2 tech low latency networking on Janet presentation
 
iland Internet Solutions: Leveraging Cassandra for real-time multi-datacenter...
iland Internet Solutions: Leveraging Cassandra for real-time multi-datacenter...iland Internet Solutions: Leveraging Cassandra for real-time multi-datacenter...
iland Internet Solutions: Leveraging Cassandra for real-time multi-datacenter...
 
Leveraging Cassandra for real-time multi-datacenter public cloud analytics
Leveraging Cassandra for real-time multi-datacenter public cloud analyticsLeveraging Cassandra for real-time multi-datacenter public cloud analytics
Leveraging Cassandra for real-time multi-datacenter public cloud analytics
 
Brad stack - Digital Health and Well-Being Festival
Brad stack - Digital Health and Well-Being Festival Brad stack - Digital Health and Well-Being Festival
Brad stack - Digital Health and Well-Being Festival
 
Intro to Databases
Intro to DatabasesIntro to Databases
Intro to Databases
 
Fixing twitter
Fixing twitterFixing twitter
Fixing twitter
 
Fixing_Twitter
Fixing_TwitterFixing_Twitter
Fixing_Twitter
 

More from bcantrill

More from bcantrill (20)

Predicting the Present
Predicting the PresentPredicting the Present
Predicting the Present
 
Sharpening the Axe: The Primacy of Toolmaking
Sharpening the Axe: The Primacy of ToolmakingSharpening the Axe: The Primacy of Toolmaking
Sharpening the Axe: The Primacy of Toolmaking
 
Coming of Age: Developing young technologists without robbing them of their y...
Coming of Age: Developing young technologists without robbing them of their y...Coming of Age: Developing young technologists without robbing them of their y...
Coming of Age: Developing young technologists without robbing them of their y...
 
I have come to bury the BIOS, not to open it: The need for holistic systems
I have come to bury the BIOS, not to open it: The need for holistic systemsI have come to bury the BIOS, not to open it: The need for holistic systems
I have come to bury the BIOS, not to open it: The need for holistic systems
 
Towards Holistic Systems
Towards Holistic SystemsTowards Holistic Systems
Towards Holistic Systems
 
The Coming Firmware Revolution
The Coming Firmware RevolutionThe Coming Firmware Revolution
The Coming Firmware Revolution
 
Hardware/software Co-design: The Coming Golden Age
Hardware/software Co-design: The Coming Golden AgeHardware/software Co-design: The Coming Golden Age
Hardware/software Co-design: The Coming Golden Age
 
Tockilator: Deducing Tock execution flows from Ibex Verilator traces
Tockilator: Deducing Tock execution flows from Ibex Verilator tracesTockilator: Deducing Tock execution flows from Ibex Verilator traces
Tockilator: Deducing Tock execution flows from Ibex Verilator traces
 
No Moore Left to Give: Enterprise Computing After Moore's Law
No Moore Left to Give: Enterprise Computing After Moore's LawNo Moore Left to Give: Enterprise Computing After Moore's Law
No Moore Left to Give: Enterprise Computing After Moore's Law
 
Andreessen's Corollary: Ethical Dilemmas in Software Engineering
Andreessen's Corollary: Ethical Dilemmas in Software EngineeringAndreessen's Corollary: Ethical Dilemmas in Software Engineering
Andreessen's Corollary: Ethical Dilemmas in Software Engineering
 
Visualizing Systems with Statemaps
Visualizing Systems with StatemapsVisualizing Systems with Statemaps
Visualizing Systems with Statemaps
 
Platform values, Rust, and the implications for system software
Platform values, Rust, and the implications for system softwarePlatform values, Rust, and the implications for system software
Platform values, Rust, and the implications for system software
 
Is it time to rewrite the operating system in Rust?
Is it time to rewrite the operating system in Rust?Is it time to rewrite the operating system in Rust?
Is it time to rewrite the operating system in Rust?
 
dtrace.conf(16): DTrace state of the union
dtrace.conf(16): DTrace state of the uniondtrace.conf(16): DTrace state of the union
dtrace.conf(16): DTrace state of the union
 
The Hurricane's Butterfly: Debugging pathologically performing systems
The Hurricane's Butterfly: Debugging pathologically performing systemsThe Hurricane's Butterfly: Debugging pathologically performing systems
The Hurricane's Butterfly: Debugging pathologically performing systems
 
Papers We Love: ARC after dark
Papers We Love: ARC after darkPapers We Love: ARC after dark
Papers We Love: ARC after dark
 
Principles of Technology Leadership
Principles of Technology LeadershipPrinciples of Technology Leadership
Principles of Technology Leadership
 
Zebras all the way down: The engineering challenges of the data path
Zebras all the way down: The engineering challenges of the data pathZebras all the way down: The engineering challenges of the data path
Zebras all the way down: The engineering challenges of the data path
 
Platform as reflection of values: Joyent, node.js, and beyond
Platform as reflection of values: Joyent, node.js, and beyondPlatform as reflection of values: Joyent, node.js, and beyond
Platform as reflection of values: Joyent, node.js, and beyond
 
Debugging under fire: Keeping your head when systems have lost their mind
Debugging under fire: Keeping your head when systems have lost their mindDebugging under fire: Keeping your head when systems have lost their mind
Debugging under fire: Keeping your head when systems have lost their mind
 

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Recently uploaded (20)

EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 

Instrumenting the real-time web: Node.js in production

  • 1. Instrumenting the real-time web: Running node.js in production Bryan Cantrill VP, Engineering bryan@joyent.com @bcantrill
  • 2. “Real-time web?” • The term has enjoyed some popularity, but there is clearly confusion about the definition of “real-time” • A real-time system is one in which the correctness of the system is relative to its timeliness • A hard real-time system is one which the latency constraints are rigid: violation constitutes total system failure (e.g., an actuator on a physical device) • A soft real-time system is one in which latency constraints are more flexible: violation is undesirable but non-fatal (e.g., a video game or MP3 player) • Historically, the only real-time aspect of the web has been in some of its static content (e.g. video, audio)
  • 3. The rise of the real-time web • The rise of mobile + HTML5 has given rise to a new breed of web application: ones in which dynamic data has real-time semantics • These data-intensive real-time applications present new semantics for web-facing applications • These present new data semantics for web applications: CRUD, ACID, BASE, CAP — meet DIRT!
  • 4. The challenge of DIRTy apps • DIRTy applications tend to have the human in the loop • Good news: deadlines are soft — microseconds only matter when they add up to tens of milliseconds • Bad news: because humans are in the loop, demand for the system can be non-linear • One must deal not only with the traditional challenge of scalability, but also the challenge of a real-time system!
  • 5. Building DIRTy apps • Embedded real-time systems are sufficiently controlled that latency bubbles can be architected away • Web-facing systems are far too sloppy to expect this! • Focus must shift from preventing latency bubbles to preventing latency bubbles from cascading • Operations that can induce latency (network, I/O, etc.) must not be able to take the system out with them! • Implies purely asynchronous and evented architectures, which are notoriously difficult to implement...
  • 6. Enter node.js • node.js is a JavaScript-based framework for building event-oriented servers: var http = require(‘http’); http.createServer(function (req, res) { res.writeHead(200, {'Content-Type': 'text/plain'}); res.end('Hello Worldn'); }).listen(8124, "127.0.0.1"); console.log(‘Server running at http://127.0.0.1:8124!’);
  • 7. node.js as building block • node.js is a confluence of three ideas: • JavaScriptʼs rich support for asynchrony (i.e. closures) • High-performance JavaScript VMs (e.g. V8) • The system abstractions that God intended (i.e. UNIX) • Because everything is asynchronous, node.js is ideal for delivering scale in the presence of long-latency events!
  • 8. The primacy of latency • As the correctness of the system is its timeliness, we must be able to measure the system to verify it • In a real-time system, it does not make sense to measure operations per second! • The only metric that matters is latency • This is dangerous to distill to a single number; the distribution of latency over time is essential • This poses both instrumentation and visualization challenges!
  • 9. Instrumenting for latency • Instrumenting for latency requires modifying the system twice: as an operation starts and as it finishes • During an operation, the system must track — on a per- operation basis — the start time of the operation • Upon operation completion, the resulting stored data cannot be a scalar — the distribution is essential when understanding latency • Instrumentation must be systemic; must be able to reach to the sources of latency deep within the system • These constraints eliminate static instrumentation; we need a better way to instrument the system
  • 10. Enter DTrace • Facility for dynamic instrumentation of production systems originally developed circa 2003 for Solaris 10 • Open sourced (along with the rest of Solaris) in 2005; subsequently ported to many other systems (MacOS X, FreeBSD, NetBSD, QNX, nascent Linux port) • Support for arbitrary actions, arbitrary predicates, in situ data aggregation, statically-defined instrumentation • Designed for safe, ad hoc use in production: concise answers to arbitrary questions • Particularly well suited to real-time: the original design center was the understanding of latency bubbles
  • 11. DTrace + Node? • DTrace instruments the system holistically, which is to say, from the kernel, which poses a challenge for interpreted environments • User-level statically defined tracing (USDT) providers describe semantically relevant points of instrumentation • Some interpreted environments (e.g., Ruby, Python, PHP, Erlang) have added USDT providers that instrument the interpreter itself • This approach is very fine-grained (e.g., every function call) and doesnʼt work in JITʼd environments • We decided to take a different tack for Node
  • 12. DTrace for node.js • Given the nature of the paths that we wanted to instrument, we introduced a function into JavaScript that Node can call to get into USDT-instrumented C++ • Introduces disabled probe effect: calling from JavaScript into C++ costs even when probes are not enabled • We use USDT is-enabled probes to minimize disabled probe effect once in C++ • If (and only if) the probe is enabled, we prepare a structure for the kernel that allows for translation into a structure that is familiar to node programmers
  • 13. Node USDT Provider • Example one-liners: dtrace -n ‘node*:::http-server-request{ printf(“%s of %s from %sn”, args[0]->method, args[0]->url, args[1]->remoteAddress)}‘ dtrace -n http-server-request’{@[args[1]->remoteAddress] = count()}‘ dtrace -n gc-start’{self->ts = timestamp}’ -n gc-done’/self->ts/{@ = quantize(timestamp - self->ts)}’ • A script to measure HTTP latency: http-server-request { self->ts[args[1]->fd] = timestamp; } http-server-response /self->ts[args[0]->fd]/ { @[zonename] = quantize(timestamp - self->ts[args[0]->fd]); }
  • 14. User-defined USDT probes in node.js • Our USDT technique has been generalized by Chris Andrews in his node-dtrace-provider npm module: https://github.com/chrisa/node-dtrace-provider • Used by Joyentʼs Mark Cavage in his ldap.js to measure and validate operation latency • But how to visualize operation latency?
  • 15. Visualizing latency • Could visualize latency as a scalar (i.e., average): • This hides outliers — and in a real-time system, it is the outliers that you care about! • Using percentiles helps to convey distribution — but crucial detail remains hidden
  • 16. Visualizing latency as a heatmap • Latency is much better visualized as a heatmap, with time on the x-axis, latency on the y-axis, and frequency represented with color saturation: • Many patterns are now visible (as in this example of MySQL query latency), but critical data is still hidden
  • 17. Visualizing latency as a 4D heatmap • Can use hue to represent higher dimensionality: time on the x-axis, latency on the y-axis, frequency via color saturation, and hue representing the new dimension: • In this example, the higher dimension is the MySQL database table associated with the operation
  • 18. Visualizing node.js latency • Using the USDT probes as foundation, we developed a cloud analytics facility that visualizes latency in real-time via four dimensional heatmaps: • Facility is available via Joyentʼs no.de service, Joyentʼs public cloud, or Joyentʼs SmartDataCenter
  • 19. Debugging latency • Latency visualization is essential for understanding where latency is being induced in a complicated system, but how can we determine why? • This requires associating an external event — an I/O request, a network packet, a profiling interrupt — with the code thatʼs inducing it • For node.js — like other dynamic environments — this is historically very difficult: the VM is opaque to the OS • Using DTraceʼs helper mechanism, we have developed a V8 ustack helper that allows OS-level events to be correlated to the node.js-backtrace that induced them • Available for node 0.6.7 on Joyentʼs SmartOS
  • 20. Visualizing node.js CPU latency • Using the node.js ustack helper and the DTrace profile provider, we can determine the relative frequency of stack backtraces in terms of CPU consumption • Stacks can be visualized with flame graphs, a stack visualization developed by Joyentʼs Brendan Gregg:
  • 21. node.js in production • node.js is particularly amenable for the DIRTy apps that typify the real-time web • The ability to understand latency must be considered when deploying node.js-based systems into production! • Understanding latency requires dynamic instrumentation and novel visualization • At Joyent, we have added DTrace-based dynamic instrumentation for node.js to SmartOS, and novel visualization into our cloud and software offerings • Better production support — better observability, better debuggability — remains an important area of node.js development!
  • 22. Thank you! • @ryah and @rmustacc for Node DTrace USDT integration • @dapsays, @rmustacc, @rob_ellis and @notmatt for cloud analytics • @chrisandrews for node-dtrace-provider and @mcavage for putting it to such great use in ldap.js • @dapsays for the V8 DTrace ustack helper • @brendangregg for both the heatmap and flame graph visualizations • More information: http://dtrace.org/blogs/dap, http://dtrace.org/blogs/brendan and http://smartos.org