SlideShare a Scribd company logo
1 of 78
EBtree
QCon New York, June 24-26, 2019
Design for a scheduler, and use (almost)
everywhere
Andjelko Iharos
aiharos@haproxy.com
InfoQ.com: News & Community Site
• 750,000 unique visitors/month
• Published in 4 languages (English, Chinese, Japanese and Brazilian
Portuguese)
• Post content from our QCon conferences
• News 15-20 / week
• Articles 3-4 / week
• Presentations (videos) 12-15 / week
• Interviews 2-3 / week
• Books 1 / month
Watch the video with slide
synchronization on InfoQ.com!
https://www.infoq.com/presentations/
ebtree-design/
Presented at QCon New York
www.qconnewyork.com
Purpose of QCon
- to empower software development by facilitating the spread of
knowledge and innovation
Strategy
- practitioner-driven conference designed for YOU: influencers of
change and innovation in your teams
- speakers and topics driving the evolution and innovation
- connecting and catalyzing the influencers and innovators
Highlights
- attended by more than 12,000 delegates since 2007
- held in 9 cities worldwide
EBtree
QCon New York, June 24-26, 2019
Design for a scheduler, and use (almost)
everywhere
Andjelko Iharos
aiharos@haproxy.com
● Fast tree descent & search
● Memory efficient
● Lookup by mask or prefix (i.e. IPv4 and IPv6)
● Optimized for inserts and deletes
● Great with bit-addressable data
EBtree features
● Scheduling requirements
● Candidate solutions
● EBtree design
● Implementation
● Production use
● Results
Outline
Scheduling
requirements
● Handle network connections
● Run active tasks
● Check suspended tasks, wake them up
HAProxy event loop
HAProxy event loop
HAProxy task
● Active & suspended tasks
● Insert
● Duplicates
● Read sorted
● Delete
● Priorities
Scheduler features
● Up to high frequency of events
● Up to very large number of entries
● Large variations in rate of entry change
● Frequent lookups
Scheduling environment
● Speed
● Predictability
● Simplicity
Desirable qualities
Candidate
solutions
● Array
● Linked list
● Stack, Queue
● Hash Map
● Tree
Basic data structures
Linked list
By Lasindi - Own work, Public Domain, https://commons.wikimedia.org/w/index.php?curid=2194027
By No machine-readable author provided. Dcoetzee assumed (based on copyright claims). - No machine-readable source provided. Own work assumed (based
on copyright claims)., Public Domain, https://commons.wikimedia.org/w/index.php?curid=488330
Binary search tree
By Bruno Schalch - Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=64250599
AVL tree rotations
By Claudio Rocchini - Own work, CC BY 2.5, https://commons.wikimedia.org/w/index.php?curid=2118795
Prefix (Radix) trees
● O(log n) insert, O(1) delete
● Fast comparison even for long keys
● Prefix matching
● Nodes and leaves are different
● Not balanced
Prefix (Radix) trees
EBtree design
● Simplify memory management
● Reduce impact of imbalance and tree height
Can we improve?
Basic elements
Inserting elements
0
0
0
1
0
0
1
0
0
0
1
1
0
1
0
0
0
1
0
1
Deleting elements
Implementation
● 32 bits = 4 bytes word, 2 bits available
● 64 bits = 8 bytes word, 3 bits available
0x846010 = 100001000110000000010000
0x846030 = 100001000110000000110000
0x846050 = 100001000110000001010000
Pointer tagging
● regparm compiler directive (historical reason)
● forced inlining
● __builtin_expect
● ASM
C is portable Assembler
typedef void eb_troot_t;
struct eb_root {
eb_troot_t *b[2]; /* left and right branches */
};
struct eb_node {
struct eb_root branches; /* branches, must be at the beginning */
short int bit; /* link's bit position. */
eb_troot_t *node_p; /* link node's parent */
eb_troot_t *leaf_p; /* leaf node's parent */
};
Base structs
ebtree/ebtree.h
/* Return next leaf node after an existing leaf node, or NULL if none. */
static inline struct eb_node *eb_next(struct eb_node *node)
{
eb_troot_t *t = node->leaf_p;
while (eb_gettag(t) != EB_LEFT)
/* Walking up from right branch, so we cannot be below root */
t = (eb_root_to_node(eb_untag(t, EB_RGHT)))->node_p;
/* Note that <t> cannot be NULL at this stage */
t = (eb_untag(t, EB_LEFT))->b[EB_RGHT];
if (eb_clrtag(t) == NULL)
return NULL;
return eb_walk_down(t, EB_LEFT);
}
Base functions
eb_next() in ebtree/ebtree.h
● eb32 / eb64
● ebpt for pointers
● ebim and ebis for indirect memory and strings
● ebmb and ebst for memory block and strings
(allocated after just after the node)
● All support storage and ordered retrieval of duplicate
keys
EBtree data types
struct eb64_node {
struct eb_node node; /* the tree node, must be at the beginning */
u64 key;
};
eb64 node
ebtree/eb64tree.h
troot = root->b[EB_LEFT];
if (unlikely(troot == NULL))
return NULL;
while (1) {
if ((eb_gettag(troot) == EB_LEAF)) {
node = container_of(eb_untag(troot, EB_LEAF),
struct eb64_node, node.branches);
if (node->key == x)
return node;
else
return NULL;
}
eb64 specific functions
eb64_lookup() in ebtree/eb64tree.h
node = container_of(eb_untag(troot, EB_NODE),
struct eb64_node, node.branches);
y = node->key ^ x;
if (!y) {
/* Either we found the node which holds the key, or
* we have a dup tree. */
return node;
}
if ((y >> node->node.bit) >= EB_NODE_BRANCHES) /* 2 */
return NULL; /* no more common bits */
troot = node->node.branches.b[(x >> node->node.bit) &
EB_NODE_BRANCH_MASK];
}
}
eb64 specific functions
eb64_lookup() in ebtree/eb64tree.h
Production use
● computational load associated with a proxied
connection
● active
● suspended
● millisecond resolution
HAProxy tasks
● EBtree
● indexed on expiration date
Suspended HAProxy tasks
● EBtree
● indexed on expiration date, taking priority into
consideration
Active HAProxy tasks
while (1) {
/* Process a few tasks */
process_runnable_tasks();
/* Check if we can expire some tasks */
next = wake_expired_tasks();
/* expire immediately if events are pending */
if (fd_cache_num || run_queue) next = now_ms;
/* The poller will ensure it returns around <next> */
cur_poller.poll(&cur_poller, next);
fd_process_cached_events();
}
HAProxy event loop
run_poll_loop() in haproxy-1.7.x/src/haproxy.c
/* The base for all tasks */
struct task {
struct eb32_node rq; /* ebtree node used to hold the task in the run queue */
struct eb32_node wq; /* ebtree node used to hold the task in the wait queue */
unsigned short state; /* task state : bit field of TASK_* */
short nice; /* the task's current nice value from -1024 to +1024 */
unsigned int calls; /* number of times ->process() was called */
struct task * (*process)(struct task *t); /* the function which processes the task */
void *context; /* the task's context */
int expire; /* next expiration date for this task, in ticks */
};
Task struct
haproxy-1.7.x/include/types/task.h
if (likely(last_timer && last_timer->node.bit < 0 &&
last_timer->key == task->wq.key && last_timer->node.node_p)) {
eb_insert_dup(&last_timer->node, &task->wq.node);
if (task->wq.node.bit < last_timer->node.bit)
last_timer = &task->wq;
return;
}
eb32_insert(&timers, &task->wq);
/* Make sure we don't assign the last_timer to a node-less entry */
if (task->wq.node.node_p && (!last_timer || (task->wq.node.bit < last_timer->node.bit)))
last_timer = &task->wq;
return;
}
Scheduling tasks for later
__task_queue() in haproxy-1.7.x/src/task.c
if (likely(t->nice)) {
int offset;
niced_tasks++;
if (likely(t->nice > 0))
offset = (unsigned)((tasks_run_queue * (unsigned int)t->nice) / 32U);
else
offset = -(unsigned)((tasks_run_queue * (unsigned int)-t->nice) / 32U);
t->rq.key += offset;
}
eb32_insert(&rqueue, &t->rq);
rq_next = NULL;
return t;
}
Waking up tasks to run
__task_wakeup() in haproxy-1.7.x/src/task.c
while (max_processed--) {
if (unlikely(!rq_next)) {
rq_next = eb32_lookup_ge(&rqueue, rqueue_ticks - TIMER_LOOK_BACK);
if (!rq_next) {
/* we might have reached the end of the tree, typically because
* <rqueue_ticks> is in the first half and we're first scanning
* the last half. Let's loop back to the beginning of the tree now.
*/
rq_next = eb32_first(&rqueue);
if (!rq_next)
break;
}
}
Running tasks
process_runnable_tasks() in haproxy-1.7.x/src/task.c
t = eb32_entry(rq_next, struct task, rq);
rq_next = eb32_next(rq_next);
__task_unlink_rq(t);
t->state |= TASK_RUNNING;
t->calls++;
t = t->process(t);
if (likely(t != NULL)) {
t->state &= ~TASK_RUNNING;
if (t->expire)
task_queue(t);
}
}
Running tasks
process_runnable_tasks() in haproxy-1.7.x/src/task.c
● timers
● schedulers
● ACL
● stick-tables (stats, counters)
● LRU cache
EBtree in HAProxy
● Down to 100ns inserts
● > 200k TCP conn/s
● > 350k HTTP req/s
● scheduler using up only 3-5% CPU
● Halog utility - up to 4 million log lines per second
● 450000 BGP routes table: >2 million lookups per
second
EBtree performing in HAProxy
struct lru64_list {
struct lru64_list *n;
struct lru64_list *p;
};
struct lru64_head {
struct lru64_list list;
struct eb_root keys;
struct lru64 *spare;
int cache_size;
int cache_usage;
};
LRU cache structs
ebtree/examples/lru.h
struct lru64 {
struct eb64_node node; /* indexing key, typically a hash64 */
struct lru64_list lru; /* LRU list */
void *domain; /* who this data belongs to */
unsigned long long revision; /* data revision (to avoid use-after-free) */
void *data; /* returned value, user decides how to use this */
};
LRU cache structs
ebtree/examples/lru.h
struct lru64 *lru64_get(unsigned long long key, struct lru64_head *lru,
void *domain, unsigned long long revision)
{
struct eb64_node *node;
struct lru64 *elem;
if (!lru->spare) {
if (!lru->cache_size)
return NULL;
lru->spare = malloc(sizeof(*lru->spare));
if (!lru->spare)
return NULL;
lru->spare->domain = NULL;
}
LRU cache get/store
lru64_get() in ebtree/examples/lru.c
/* Lookup or insert */
lru->spare->node.key = key;
node = __eb64_insert(&lru->keys, &lru->spare->node);
elem = container_of(node, typeof(*elem), node);
if (elem != lru->spare) {
/* Existing entry found, check validity then move it at the head of the LRU list. */
return elem;
}
else {
/* New entry inserted, initialize and move to the head of the
* LRU list, and lock it until commit. */
lru->cache_usage++;
lru->spare = NULL; // used, need a new one next time
}
LRU cache get/store
lru64_get() in ebtree/examples/lru.c
if (lru->cache_usage > lru->cache_size) {
struct lru64 *old;
old = container_of(lru->list.p, typeof(*old), lru);
if (old->domain) {
/* not locked */
LIST_DEL(&old->lru);
__eb64_delete(&old->node);
if (!lru->spare)
lru->spare = old;
else
free(old);
lru->cache_usage--;
}
}
LRU cache get/store
lru64_get() in ebtree/examples/lru.c
Results
● Fast tree descent & search
● Memory efficient
● Lookup by mask or prefix (i.e. IPv4 and IPv6)
● Optimized for inserts and deletes
● Great with bit-addressable data
EBtree features
Check out EBtree at http://git.1wt.eu/web/ebtree.git/
Check out HAProxy at haproxy.org or haproxy.com
Join development at haproxy@formilux.org
aiharos@haproxy.com
Q&A
Watch the video with slide
synchronization on InfoQ.com!
https://www.infoq.com/presentations/
ebtree-design/

More Related Content

What's hot

Virtual Memory (Making a Process)
Virtual Memory (Making a Process)Virtual Memory (Making a Process)
Virtual Memory (Making a Process)David Evans
 
Putting a Fork in Fork (Linux Process and Memory Management)
Putting a Fork in Fork (Linux Process and Memory Management)Putting a Fork in Fork (Linux Process and Memory Management)
Putting a Fork in Fork (Linux Process and Memory Management)David Evans
 
Making a Process
Making a ProcessMaking a Process
Making a ProcessDavid Evans
 
Redis: Lua scripts - a primer and use cases
Redis: Lua scripts - a primer and use casesRedis: Lua scripts - a primer and use cases
Redis: Lua scripts - a primer and use casesRedis Labs
 
My talk about Tarantool and Lua at Percona Live 2016
My talk about Tarantool and Lua at Percona Live 2016My talk about Tarantool and Lua at Percona Live 2016
My talk about Tarantool and Lua at Percona Live 2016Konstantin Osipov
 
Multi-Tasking Map (MapReduce, Tasks in Rust)
Multi-Tasking Map (MapReduce, Tasks in Rust)Multi-Tasking Map (MapReduce, Tasks in Rust)
Multi-Tasking Map (MapReduce, Tasks in Rust)David Evans
 
Intro to Rust from Applicative / NY Meetup
Intro to Rust from Applicative / NY MeetupIntro to Rust from Applicative / NY Meetup
Intro to Rust from Applicative / NY Meetupnikomatsakis
 
20100712-OTcl Command -- Getting Started
20100712-OTcl Command -- Getting Started20100712-OTcl Command -- Getting Started
20100712-OTcl Command -- Getting StartedTeerawat Issariyakul
 
Ganga: an interface to the LHC computing grid
Ganga: an interface to the LHC computing gridGanga: an interface to the LHC computing grid
Ganga: an interface to the LHC computing gridMatt Williams
 
서버 개발자가 바라 본 Functional Reactive Programming with RxJava - SpringCamp2015
서버 개발자가 바라 본 Functional Reactive Programming with RxJava - SpringCamp2015서버 개발자가 바라 본 Functional Reactive Programming with RxJava - SpringCamp2015
서버 개발자가 바라 본 Functional Reactive Programming with RxJava - SpringCamp2015NAVER / MusicPlatform
 
Pylons + Tokyo Cabinet
Pylons + Tokyo CabinetPylons + Tokyo Cabinet
Pylons + Tokyo CabinetBen Cheng
 
2013.02.02 지앤선 테크니컬 세미나 - Xcode를 활용한 디버깅 팁(OSXDEV)
2013.02.02 지앤선 테크니컬 세미나 - Xcode를 활용한 디버깅 팁(OSXDEV)2013.02.02 지앤선 테크니컬 세미나 - Xcode를 활용한 디버깅 팁(OSXDEV)
2013.02.02 지앤선 테크니컬 세미나 - Xcode를 활용한 디버깅 팁(OSXDEV)JiandSon
 
Preparation for mit ose lab4
Preparation for mit ose lab4Preparation for mit ose lab4
Preparation for mit ose lab4Benux Wei
 

What's hot (20)

Memory management
Memory managementMemory management
Memory management
 
Virtual Memory (Making a Process)
Virtual Memory (Making a Process)Virtual Memory (Making a Process)
Virtual Memory (Making a Process)
 
Putting a Fork in Fork (Linux Process and Memory Management)
Putting a Fork in Fork (Linux Process and Memory Management)Putting a Fork in Fork (Linux Process and Memory Management)
Putting a Fork in Fork (Linux Process and Memory Management)
 
Nicety of Java 8 Multithreading
Nicety of Java 8 MultithreadingNicety of Java 8 Multithreading
Nicety of Java 8 Multithreading
 
RealmDB for Android
RealmDB for AndroidRealmDB for Android
RealmDB for Android
 
Making a Process
Making a ProcessMaking a Process
Making a Process
 
Redis: Lua scripts - a primer and use cases
Redis: Lua scripts - a primer and use casesRedis: Lua scripts - a primer and use cases
Redis: Lua scripts - a primer and use cases
 
My talk about Tarantool and Lua at Percona Live 2016
My talk about Tarantool and Lua at Percona Live 2016My talk about Tarantool and Lua at Percona Live 2016
My talk about Tarantool and Lua at Percona Live 2016
 
Multi-Tasking Map (MapReduce, Tasks in Rust)
Multi-Tasking Map (MapReduce, Tasks in Rust)Multi-Tasking Map (MapReduce, Tasks in Rust)
Multi-Tasking Map (MapReduce, Tasks in Rust)
 
NS2 Object Construction
NS2 Object ConstructionNS2 Object Construction
NS2 Object Construction
 
Intro to Rust from Applicative / NY Meetup
Intro to Rust from Applicative / NY MeetupIntro to Rust from Applicative / NY Meetup
Intro to Rust from Applicative / NY Meetup
 
20100712-OTcl Command -- Getting Started
20100712-OTcl Command -- Getting Started20100712-OTcl Command -- Getting Started
20100712-OTcl Command -- Getting Started
 
Ganga: an interface to the LHC computing grid
Ganga: an interface to the LHC computing gridGanga: an interface to the LHC computing grid
Ganga: an interface to the LHC computing grid
 
서버 개발자가 바라 본 Functional Reactive Programming with RxJava - SpringCamp2015
서버 개발자가 바라 본 Functional Reactive Programming with RxJava - SpringCamp2015서버 개발자가 바라 본 Functional Reactive Programming with RxJava - SpringCamp2015
서버 개발자가 바라 본 Functional Reactive Programming with RxJava - SpringCamp2015
 
packet destruction in NS2
packet destruction in NS2packet destruction in NS2
packet destruction in NS2
 
Scheduling
SchedulingScheduling
Scheduling
 
NS2 Shadow Object Construction
NS2 Shadow Object ConstructionNS2 Shadow Object Construction
NS2 Shadow Object Construction
 
Pylons + Tokyo Cabinet
Pylons + Tokyo CabinetPylons + Tokyo Cabinet
Pylons + Tokyo Cabinet
 
2013.02.02 지앤선 테크니컬 세미나 - Xcode를 활용한 디버깅 팁(OSXDEV)
2013.02.02 지앤선 테크니컬 세미나 - Xcode를 활용한 디버깅 팁(OSXDEV)2013.02.02 지앤선 테크니컬 세미나 - Xcode를 활용한 디버깅 팁(OSXDEV)
2013.02.02 지앤선 테크니컬 세미나 - Xcode를 활용한 디버깅 팁(OSXDEV)
 
Preparation for mit ose lab4
Preparation for mit ose lab4Preparation for mit ose lab4
Preparation for mit ose lab4
 

Similar to EBtree - Design for a Scheduler and Use (Almost) Everywhere

Dragoncraft Architectural Overview
Dragoncraft Architectural OverviewDragoncraft Architectural Overview
Dragoncraft Architectural Overviewjessesanford
 
Beyond Breakpoints: A Tour of Dynamic Analysis
Beyond Breakpoints: A Tour of Dynamic AnalysisBeyond Breakpoints: A Tour of Dynamic Analysis
Beyond Breakpoints: A Tour of Dynamic AnalysisC4Media
 
Getting Started with Hadoop
Getting Started with HadoopGetting Started with Hadoop
Getting Started with HadoopJosh Devins
 
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Flink Forward
 
RR & Docker @ MuensteR Meetup (Sep 2017)
RR & Docker @ MuensteR Meetup (Sep 2017)RR & Docker @ MuensteR Meetup (Sep 2017)
RR & Docker @ MuensteR Meetup (Sep 2017)Daniel Nüst
 
Introduction to redis - version 2
Introduction to redis - version 2Introduction to redis - version 2
Introduction to redis - version 2Dvir Volk
 
What's new in Redis v3.2
What's new in Redis v3.2What's new in Redis v3.2
What's new in Redis v3.2Itamar Haber
 
2 architecture anddatastructures
2 architecture anddatastructures2 architecture anddatastructures
2 architecture anddatastructuresSolin TEM
 
4Developers 2018: Ile (nie) wiesz o strukturach w .NET (Łukasz Pyrzyk)
4Developers 2018: Ile (nie) wiesz o strukturach w .NET (Łukasz Pyrzyk)4Developers 2018: Ile (nie) wiesz o strukturach w .NET (Łukasz Pyrzyk)
4Developers 2018: Ile (nie) wiesz o strukturach w .NET (Łukasz Pyrzyk)PROIDEA
 
ROS Course Slides Course 2.pdf
ROS Course Slides Course 2.pdfROS Course Slides Course 2.pdf
ROS Course Slides Course 2.pdfduonglacbk
 
Deep Dive into Futures and the Parallel Programming Library
Deep Dive into Futures and the Parallel Programming LibraryDeep Dive into Futures and the Parallel Programming Library
Deep Dive into Futures and the Parallel Programming LibraryJim McKeeth
 
Introduction to eBPF and XDP
Introduction to eBPF and XDPIntroduction to eBPF and XDP
Introduction to eBPF and XDPlcplcp1
 
stackconf 2022: Cluster Management: Heterogeneous, Lightweight, Safe. Pick Three
stackconf 2022: Cluster Management: Heterogeneous, Lightweight, Safe. Pick Threestackconf 2022: Cluster Management: Heterogeneous, Lightweight, Safe. Pick Three
stackconf 2022: Cluster Management: Heterogeneous, Lightweight, Safe. Pick ThreeNETWAYS
 
.NET Multithreading/Multitasking
.NET Multithreading/Multitasking.NET Multithreading/Multitasking
.NET Multithreading/MultitaskingSasha Kravchuk
 
Spark with Elasticsearch - umd version 2014
Spark with Elasticsearch - umd version 2014Spark with Elasticsearch - umd version 2014
Spark with Elasticsearch - umd version 2014Holden Karau
 

Similar to EBtree - Design for a Scheduler and Use (Almost) Everywhere (20)

Dragoncraft Architectural Overview
Dragoncraft Architectural OverviewDragoncraft Architectural Overview
Dragoncraft Architectural Overview
 
Beyond Breakpoints: A Tour of Dynamic Analysis
Beyond Breakpoints: A Tour of Dynamic AnalysisBeyond Breakpoints: A Tour of Dynamic Analysis
Beyond Breakpoints: A Tour of Dynamic Analysis
 
Getting Started with Hadoop
Getting Started with HadoopGetting Started with Hadoop
Getting Started with Hadoop
 
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
 
Writing MySQL UDFs
Writing MySQL UDFsWriting MySQL UDFs
Writing MySQL UDFs
 
Return of c++
Return of c++Return of c++
Return of c++
 
RR & Docker @ MuensteR Meetup (Sep 2017)
RR & Docker @ MuensteR Meetup (Sep 2017)RR & Docker @ MuensteR Meetup (Sep 2017)
RR & Docker @ MuensteR Meetup (Sep 2017)
 
Introduction to redis - version 2
Introduction to redis - version 2Introduction to redis - version 2
Introduction to redis - version 2
 
What's new in Redis v3.2
What's new in Redis v3.2What's new in Redis v3.2
What's new in Redis v3.2
 
2 architecture anddatastructures
2 architecture anddatastructures2 architecture anddatastructures
2 architecture anddatastructures
 
4Developers 2018: Ile (nie) wiesz o strukturach w .NET (Łukasz Pyrzyk)
4Developers 2018: Ile (nie) wiesz o strukturach w .NET (Łukasz Pyrzyk)4Developers 2018: Ile (nie) wiesz o strukturach w .NET (Łukasz Pyrzyk)
4Developers 2018: Ile (nie) wiesz o strukturach w .NET (Łukasz Pyrzyk)
 
Ropython-windbg-python-extensions
Ropython-windbg-python-extensionsRopython-windbg-python-extensions
Ropython-windbg-python-extensions
 
freertos-proj.pdf
freertos-proj.pdffreertos-proj.pdf
freertos-proj.pdf
 
ROS Course Slides Course 2.pdf
ROS Course Slides Course 2.pdfROS Course Slides Course 2.pdf
ROS Course Slides Course 2.pdf
 
Deep Dive into Futures and the Parallel Programming Library
Deep Dive into Futures and the Parallel Programming LibraryDeep Dive into Futures and the Parallel Programming Library
Deep Dive into Futures and the Parallel Programming Library
 
Introduction to eBPF and XDP
Introduction to eBPF and XDPIntroduction to eBPF and XDP
Introduction to eBPF and XDP
 
stackconf 2022: Cluster Management: Heterogeneous, Lightweight, Safe. Pick Three
stackconf 2022: Cluster Management: Heterogeneous, Lightweight, Safe. Pick Threestackconf 2022: Cluster Management: Heterogeneous, Lightweight, Safe. Pick Three
stackconf 2022: Cluster Management: Heterogeneous, Lightweight, Safe. Pick Three
 
RxJava on Android
RxJava on AndroidRxJava on Android
RxJava on Android
 
.NET Multithreading/Multitasking
.NET Multithreading/Multitasking.NET Multithreading/Multitasking
.NET Multithreading/Multitasking
 
Spark with Elasticsearch - umd version 2014
Spark with Elasticsearch - umd version 2014Spark with Elasticsearch - umd version 2014
Spark with Elasticsearch - umd version 2014
 

More from C4Media

Streaming a Million Likes/Second: Real-Time Interactions on Live Video
Streaming a Million Likes/Second: Real-Time Interactions on Live VideoStreaming a Million Likes/Second: Real-Time Interactions on Live Video
Streaming a Million Likes/Second: Real-Time Interactions on Live VideoC4Media
 
Next Generation Client APIs in Envoy Mobile
Next Generation Client APIs in Envoy MobileNext Generation Client APIs in Envoy Mobile
Next Generation Client APIs in Envoy MobileC4Media
 
Software Teams and Teamwork Trends Report Q1 2020
Software Teams and Teamwork Trends Report Q1 2020Software Teams and Teamwork Trends Report Q1 2020
Software Teams and Teamwork Trends Report Q1 2020C4Media
 
Understand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java ApplicationsUnderstand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java ApplicationsC4Media
 
Kafka Needs No Keeper
Kafka Needs No KeeperKafka Needs No Keeper
Kafka Needs No KeeperC4Media
 
High Performing Teams Act Like Owners
High Performing Teams Act Like OwnersHigh Performing Teams Act Like Owners
High Performing Teams Act Like OwnersC4Media
 
Does Java Need Inline Types? What Project Valhalla Can Bring to Java
Does Java Need Inline Types? What Project Valhalla Can Bring to JavaDoes Java Need Inline Types? What Project Valhalla Can Bring to Java
Does Java Need Inline Types? What Project Valhalla Can Bring to JavaC4Media
 
Service Meshes- The Ultimate Guide
Service Meshes- The Ultimate GuideService Meshes- The Ultimate Guide
Service Meshes- The Ultimate GuideC4Media
 
Shifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDShifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDC4Media
 
CI/CD for Machine Learning
CI/CD for Machine LearningCI/CD for Machine Learning
CI/CD for Machine LearningC4Media
 
Fault Tolerance at Speed
Fault Tolerance at SpeedFault Tolerance at Speed
Fault Tolerance at SpeedC4Media
 
Architectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsArchitectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsC4Media
 
ML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsC4Media
 
Build Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerBuild Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerC4Media
 
User & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleUser & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleC4Media
 
Scaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeScaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeC4Media
 
Make Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereMake Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereC4Media
 
The Talk You've Been Await-ing For
The Talk You've Been Await-ing ForThe Talk You've Been Await-ing For
The Talk You've Been Await-ing ForC4Media
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data EngineeringC4Media
 
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreAutomated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreC4Media
 

More from C4Media (20)

Streaming a Million Likes/Second: Real-Time Interactions on Live Video
Streaming a Million Likes/Second: Real-Time Interactions on Live VideoStreaming a Million Likes/Second: Real-Time Interactions on Live Video
Streaming a Million Likes/Second: Real-Time Interactions on Live Video
 
Next Generation Client APIs in Envoy Mobile
Next Generation Client APIs in Envoy MobileNext Generation Client APIs in Envoy Mobile
Next Generation Client APIs in Envoy Mobile
 
Software Teams and Teamwork Trends Report Q1 2020
Software Teams and Teamwork Trends Report Q1 2020Software Teams and Teamwork Trends Report Q1 2020
Software Teams and Teamwork Trends Report Q1 2020
 
Understand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java ApplicationsUnderstand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java Applications
 
Kafka Needs No Keeper
Kafka Needs No KeeperKafka Needs No Keeper
Kafka Needs No Keeper
 
High Performing Teams Act Like Owners
High Performing Teams Act Like OwnersHigh Performing Teams Act Like Owners
High Performing Teams Act Like Owners
 
Does Java Need Inline Types? What Project Valhalla Can Bring to Java
Does Java Need Inline Types? What Project Valhalla Can Bring to JavaDoes Java Need Inline Types? What Project Valhalla Can Bring to Java
Does Java Need Inline Types? What Project Valhalla Can Bring to Java
 
Service Meshes- The Ultimate Guide
Service Meshes- The Ultimate GuideService Meshes- The Ultimate Guide
Service Meshes- The Ultimate Guide
 
Shifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDShifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CD
 
CI/CD for Machine Learning
CI/CD for Machine LearningCI/CD for Machine Learning
CI/CD for Machine Learning
 
Fault Tolerance at Speed
Fault Tolerance at SpeedFault Tolerance at Speed
Fault Tolerance at Speed
 
Architectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsArchitectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep Systems
 
ML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.js
 
Build Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerBuild Your Own WebAssembly Compiler
Build Your Own WebAssembly Compiler
 
User & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleUser & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix Scale
 
Scaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeScaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's Edge
 
Make Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereMake Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home Everywhere
 
The Talk You've Been Await-ing For
The Talk You've Been Await-ing ForThe Talk You've Been Await-ing For
The Talk You've Been Await-ing For
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data Engineering
 
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreAutomated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
 

Recently uploaded

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 

Recently uploaded (20)

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 

EBtree - Design for a Scheduler and Use (Almost) Everywhere

  • 1. EBtree QCon New York, June 24-26, 2019 Design for a scheduler, and use (almost) everywhere Andjelko Iharos aiharos@haproxy.com
  • 2. InfoQ.com: News & Community Site • 750,000 unique visitors/month • Published in 4 languages (English, Chinese, Japanese and Brazilian Portuguese) • Post content from our QCon conferences • News 15-20 / week • Articles 3-4 / week • Presentations (videos) 12-15 / week • Interviews 2-3 / week • Books 1 / month Watch the video with slide synchronization on InfoQ.com! https://www.infoq.com/presentations/ ebtree-design/
  • 3. Presented at QCon New York www.qconnewyork.com Purpose of QCon - to empower software development by facilitating the spread of knowledge and innovation Strategy - practitioner-driven conference designed for YOU: influencers of change and innovation in your teams - speakers and topics driving the evolution and innovation - connecting and catalyzing the influencers and innovators Highlights - attended by more than 12,000 delegates since 2007 - held in 9 cities worldwide
  • 4.
  • 5.
  • 6.
  • 7. EBtree QCon New York, June 24-26, 2019 Design for a scheduler, and use (almost) everywhere Andjelko Iharos aiharos@haproxy.com
  • 8.
  • 9. ● Fast tree descent & search ● Memory efficient ● Lookup by mask or prefix (i.e. IPv4 and IPv6) ● Optimized for inserts and deletes ● Great with bit-addressable data EBtree features
  • 10. ● Scheduling requirements ● Candidate solutions ● EBtree design ● Implementation ● Production use ● Results Outline
  • 12. ● Handle network connections ● Run active tasks ● Check suspended tasks, wake them up HAProxy event loop
  • 15. ● Active & suspended tasks ● Insert ● Duplicates ● Read sorted ● Delete ● Priorities Scheduler features
  • 16.
  • 17. ● Up to high frequency of events ● Up to very large number of entries ● Large variations in rate of entry change ● Frequent lookups Scheduling environment
  • 18. ● Speed ● Predictability ● Simplicity Desirable qualities
  • 20. ● Array ● Linked list ● Stack, Queue ● Hash Map ● Tree Basic data structures
  • 21. Linked list By Lasindi - Own work, Public Domain, https://commons.wikimedia.org/w/index.php?curid=2194027
  • 22. By No machine-readable author provided. Dcoetzee assumed (based on copyright claims). - No machine-readable source provided. Own work assumed (based on copyright claims)., Public Domain, https://commons.wikimedia.org/w/index.php?curid=488330 Binary search tree
  • 23. By Bruno Schalch - Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=64250599 AVL tree rotations
  • 24. By Claudio Rocchini - Own work, CC BY 2.5, https://commons.wikimedia.org/w/index.php?curid=2118795 Prefix (Radix) trees
  • 25. ● O(log n) insert, O(1) delete ● Fast comparison even for long keys ● Prefix matching ● Nodes and leaves are different ● Not balanced Prefix (Radix) trees
  • 27.
  • 28. ● Simplify memory management ● Reduce impact of imbalance and tree height Can we improve?
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
  • 37.
  • 38.
  • 41.
  • 42.
  • 44.
  • 45. ● 32 bits = 4 bytes word, 2 bits available ● 64 bits = 8 bytes word, 3 bits available 0x846010 = 100001000110000000010000 0x846030 = 100001000110000000110000 0x846050 = 100001000110000001010000 Pointer tagging
  • 46. ● regparm compiler directive (historical reason) ● forced inlining ● __builtin_expect ● ASM C is portable Assembler
  • 47.
  • 48. typedef void eb_troot_t; struct eb_root { eb_troot_t *b[2]; /* left and right branches */ }; struct eb_node { struct eb_root branches; /* branches, must be at the beginning */ short int bit; /* link's bit position. */ eb_troot_t *node_p; /* link node's parent */ eb_troot_t *leaf_p; /* leaf node's parent */ }; Base structs ebtree/ebtree.h
  • 49. /* Return next leaf node after an existing leaf node, or NULL if none. */ static inline struct eb_node *eb_next(struct eb_node *node) { eb_troot_t *t = node->leaf_p; while (eb_gettag(t) != EB_LEFT) /* Walking up from right branch, so we cannot be below root */ t = (eb_root_to_node(eb_untag(t, EB_RGHT)))->node_p; /* Note that <t> cannot be NULL at this stage */ t = (eb_untag(t, EB_LEFT))->b[EB_RGHT]; if (eb_clrtag(t) == NULL) return NULL; return eb_walk_down(t, EB_LEFT); } Base functions eb_next() in ebtree/ebtree.h
  • 50. ● eb32 / eb64 ● ebpt for pointers ● ebim and ebis for indirect memory and strings ● ebmb and ebst for memory block and strings (allocated after just after the node) ● All support storage and ordered retrieval of duplicate keys EBtree data types
  • 51. struct eb64_node { struct eb_node node; /* the tree node, must be at the beginning */ u64 key; }; eb64 node ebtree/eb64tree.h
  • 52. troot = root->b[EB_LEFT]; if (unlikely(troot == NULL)) return NULL; while (1) { if ((eb_gettag(troot) == EB_LEAF)) { node = container_of(eb_untag(troot, EB_LEAF), struct eb64_node, node.branches); if (node->key == x) return node; else return NULL; } eb64 specific functions eb64_lookup() in ebtree/eb64tree.h
  • 53. node = container_of(eb_untag(troot, EB_NODE), struct eb64_node, node.branches); y = node->key ^ x; if (!y) { /* Either we found the node which holds the key, or * we have a dup tree. */ return node; } if ((y >> node->node.bit) >= EB_NODE_BRANCHES) /* 2 */ return NULL; /* no more common bits */ troot = node->node.branches.b[(x >> node->node.bit) & EB_NODE_BRANCH_MASK]; } } eb64 specific functions eb64_lookup() in ebtree/eb64tree.h
  • 55. ● computational load associated with a proxied connection ● active ● suspended ● millisecond resolution HAProxy tasks
  • 56. ● EBtree ● indexed on expiration date Suspended HAProxy tasks
  • 57. ● EBtree ● indexed on expiration date, taking priority into consideration Active HAProxy tasks
  • 58.
  • 59. while (1) { /* Process a few tasks */ process_runnable_tasks(); /* Check if we can expire some tasks */ next = wake_expired_tasks(); /* expire immediately if events are pending */ if (fd_cache_num || run_queue) next = now_ms; /* The poller will ensure it returns around <next> */ cur_poller.poll(&cur_poller, next); fd_process_cached_events(); } HAProxy event loop run_poll_loop() in haproxy-1.7.x/src/haproxy.c
  • 60.
  • 61. /* The base for all tasks */ struct task { struct eb32_node rq; /* ebtree node used to hold the task in the run queue */ struct eb32_node wq; /* ebtree node used to hold the task in the wait queue */ unsigned short state; /* task state : bit field of TASK_* */ short nice; /* the task's current nice value from -1024 to +1024 */ unsigned int calls; /* number of times ->process() was called */ struct task * (*process)(struct task *t); /* the function which processes the task */ void *context; /* the task's context */ int expire; /* next expiration date for this task, in ticks */ }; Task struct haproxy-1.7.x/include/types/task.h
  • 62. if (likely(last_timer && last_timer->node.bit < 0 && last_timer->key == task->wq.key && last_timer->node.node_p)) { eb_insert_dup(&last_timer->node, &task->wq.node); if (task->wq.node.bit < last_timer->node.bit) last_timer = &task->wq; return; } eb32_insert(&timers, &task->wq); /* Make sure we don't assign the last_timer to a node-less entry */ if (task->wq.node.node_p && (!last_timer || (task->wq.node.bit < last_timer->node.bit))) last_timer = &task->wq; return; } Scheduling tasks for later __task_queue() in haproxy-1.7.x/src/task.c
  • 63. if (likely(t->nice)) { int offset; niced_tasks++; if (likely(t->nice > 0)) offset = (unsigned)((tasks_run_queue * (unsigned int)t->nice) / 32U); else offset = -(unsigned)((tasks_run_queue * (unsigned int)-t->nice) / 32U); t->rq.key += offset; } eb32_insert(&rqueue, &t->rq); rq_next = NULL; return t; } Waking up tasks to run __task_wakeup() in haproxy-1.7.x/src/task.c
  • 64. while (max_processed--) { if (unlikely(!rq_next)) { rq_next = eb32_lookup_ge(&rqueue, rqueue_ticks - TIMER_LOOK_BACK); if (!rq_next) { /* we might have reached the end of the tree, typically because * <rqueue_ticks> is in the first half and we're first scanning * the last half. Let's loop back to the beginning of the tree now. */ rq_next = eb32_first(&rqueue); if (!rq_next) break; } } Running tasks process_runnable_tasks() in haproxy-1.7.x/src/task.c
  • 65. t = eb32_entry(rq_next, struct task, rq); rq_next = eb32_next(rq_next); __task_unlink_rq(t); t->state |= TASK_RUNNING; t->calls++; t = t->process(t); if (likely(t != NULL)) { t->state &= ~TASK_RUNNING; if (t->expire) task_queue(t); } } Running tasks process_runnable_tasks() in haproxy-1.7.x/src/task.c
  • 66. ● timers ● schedulers ● ACL ● stick-tables (stats, counters) ● LRU cache EBtree in HAProxy
  • 67. ● Down to 100ns inserts ● > 200k TCP conn/s ● > 350k HTTP req/s ● scheduler using up only 3-5% CPU ● Halog utility - up to 4 million log lines per second ● 450000 BGP routes table: >2 million lookups per second EBtree performing in HAProxy
  • 68.
  • 69. struct lru64_list { struct lru64_list *n; struct lru64_list *p; }; struct lru64_head { struct lru64_list list; struct eb_root keys; struct lru64 *spare; int cache_size; int cache_usage; }; LRU cache structs ebtree/examples/lru.h
  • 70. struct lru64 { struct eb64_node node; /* indexing key, typically a hash64 */ struct lru64_list lru; /* LRU list */ void *domain; /* who this data belongs to */ unsigned long long revision; /* data revision (to avoid use-after-free) */ void *data; /* returned value, user decides how to use this */ }; LRU cache structs ebtree/examples/lru.h
  • 71. struct lru64 *lru64_get(unsigned long long key, struct lru64_head *lru, void *domain, unsigned long long revision) { struct eb64_node *node; struct lru64 *elem; if (!lru->spare) { if (!lru->cache_size) return NULL; lru->spare = malloc(sizeof(*lru->spare)); if (!lru->spare) return NULL; lru->spare->domain = NULL; } LRU cache get/store lru64_get() in ebtree/examples/lru.c
  • 72. /* Lookup or insert */ lru->spare->node.key = key; node = __eb64_insert(&lru->keys, &lru->spare->node); elem = container_of(node, typeof(*elem), node); if (elem != lru->spare) { /* Existing entry found, check validity then move it at the head of the LRU list. */ return elem; } else { /* New entry inserted, initialize and move to the head of the * LRU list, and lock it until commit. */ lru->cache_usage++; lru->spare = NULL; // used, need a new one next time } LRU cache get/store lru64_get() in ebtree/examples/lru.c
  • 73. if (lru->cache_usage > lru->cache_size) { struct lru64 *old; old = container_of(lru->list.p, typeof(*old), lru); if (old->domain) { /* not locked */ LIST_DEL(&old->lru); __eb64_delete(&old->node); if (!lru->spare) lru->spare = old; else free(old); lru->cache_usage--; } } LRU cache get/store lru64_get() in ebtree/examples/lru.c
  • 75.
  • 76. ● Fast tree descent & search ● Memory efficient ● Lookup by mask or prefix (i.e. IPv4 and IPv6) ● Optimized for inserts and deletes ● Great with bit-addressable data EBtree features
  • 77. Check out EBtree at http://git.1wt.eu/web/ebtree.git/ Check out HAProxy at haproxy.org or haproxy.com Join development at haproxy@formilux.org aiharos@haproxy.com Q&A
  • 78. Watch the video with slide synchronization on InfoQ.com! https://www.infoq.com/presentations/ ebtree-design/