SlideShare une entreprise Scribd logo
1  sur  15
Télécharger pour lire hors ligne
Hyperscan in Rspamd
When PCRE is not enough
Problem definition
• Need to match many regular expressions for the
same data:
len: 610591, time: 2492.457ms real, 882.251ms virtual
regexp statistics: 4095 pcre regexps scanned, 18 regexps matched, 694M bytes scanned using pcre
• No need to capture patterns
• Need to match both UTF8 and raw data
• Need to support Perl compatible syntax
Naive solution
• Join all expressions with ‘|’:
/abc/ + /cde/ => /(?:abc)|(?:cde)/
• Won’t work:
/abc/ + /a/ => /(?:abc)|(?:a)/ -> the second rule won’t match for ‘abc’
Hyperscan
Text
/re1/
/re2/
/re3/
/re4/
/re5/
/re6/
/re7/
Scan
Match
Match
Match
Rules and Hyperscan
From
Subject
Text Part
HTML Part
Attachment
Header RE class
Header RE class
Text Parts class
Raw Body class
Whole message
m1
m2
m3
m4
m5
m6
MIME message Hyperscan regexps
Problems with Hyperscan
• Slow compile time (like 100 times slower than
PCRE+JIT)
• Many PCRE features are unsupported (lookahead,
look-behind)
• Is not packaged for the major OSes (requires
relatively fresh boost)
Lazy hyperscan compile
HS compiler
Main
Scanners
Compiled
class
Compiled
class
Compiled
class
Use PCRE only
Compile to disk
Lazy hyperscan compile
HS compiler
Main
Scanners
Compiled
class
Compiled
class
Compiled
class
Use PCRE only
All done Broadcast message
Lazy hyperscan compile
HS compiler
Main
Scanners
Compiled
class
Compiled
class
Compiled
class
Switch to hyperscan
Loaded from cache
Wait for ‘recompile’ command
Lazy hyperscan compile
• Does not affect the starting time
• If nothing changes then scanners loads pre-
compiled expressions on startup (using crypto
hashes)
• Recompile command is used to force recompilation
process:
rspamadm recompile
Unsupported features
Pre-filter mode
Text
/re1/pre
/re2/
/re3/pre
/re4/
/re5/
/re6/
/re7/
Scan
Maybe match?
Match
Match
Check
PCRE
Match!
Replaces unsupported constructs with "weaker"
supported ones, guaranteeing a match superset:
Unsupported features
Approximation
Try hyperscan
Use hyperscan Try prefilter
Use PCRE Use prefilter
SuccessFail
SuccessFail
Timeout
Results
• Use Hyperscan for the vast majority of RE rules:
len: 610591, time: 2492.457ms real, 882.251ms virtual
regexp statistics: 4095 pcre regexps scanned, 18 regexps matched, 694M bytes scanned using pcre
len: 610591, time: 654.596ms real, 309.785ms virtual
regexp statistics: 34 pcre regexps scanned, 41 regexps matched, 8.41M bytes scanned using pcre,
9.56M bytes scanned total
Results
• The default rules all compile with Hyperscan
without prefiltering mode:
compiled 235 regular expressions to the hyperscan tree
loading hyperscan expressions after receiving compilation notice
hyperscan database of 235 regexps has been loaded
hyperscan database of 4118 regexps has been loaded
• Works for most of the SA rules (some are
prefiltering):
• Speeds up both small and large messages
Questions?
Vsevolod Stakhov
https://rspamd.com

https://01.org/hyperscan

Contenu connexe

Tendances

Donatas Mažionis, Building low latency web APIs
Donatas Mažionis, Building low latency web APIsDonatas Mažionis, Building low latency web APIs
Donatas Mažionis, Building low latency web APIsTanya Denisyuk
 
Monitoring your Python with Prometheus (Python Ireland April 2015)
Monitoring your Python with Prometheus (Python Ireland April 2015)Monitoring your Python with Prometheus (Python Ireland April 2015)
Monitoring your Python with Prometheus (Python Ireland April 2015)Brian Brazil
 
SignalFx Kafka Consumer Optimization
SignalFx Kafka Consumer OptimizationSignalFx Kafka Consumer Optimization
SignalFx Kafka Consumer OptimizationSignalFx
 
High Performance Systems in Go - GopherCon 2014
High Performance Systems in Go - GopherCon 2014High Performance Systems in Go - GopherCon 2014
High Performance Systems in Go - GopherCon 2014Derek Collison
 
HBaseCon2017 HBase at Xiaomi
HBaseCon2017 HBase at XiaomiHBaseCon2017 HBase at Xiaomi
HBaseCon2017 HBase at XiaomiHBaseCon
 
OpenTSDB: HBaseCon2017
OpenTSDB: HBaseCon2017OpenTSDB: HBaseCon2017
OpenTSDB: HBaseCon2017HBaseCon
 
Systems Monitoring with Prometheus (Devops Ireland April 2015)
Systems Monitoring with Prometheus (Devops Ireland April 2015)Systems Monitoring with Prometheus (Devops Ireland April 2015)
Systems Monitoring with Prometheus (Devops Ireland April 2015)Brian Brazil
 
Applied cryptanalysis - everything else
Applied cryptanalysis - everything elseApplied cryptanalysis - everything else
Applied cryptanalysis - everything elseVlad Garbuz
 
Graylog2 (MongoBerlin/MongoHamburg 2010)
Graylog2 (MongoBerlin/MongoHamburg 2010)Graylog2 (MongoBerlin/MongoHamburg 2010)
Graylog2 (MongoBerlin/MongoHamburg 2010)lennartkoopmann
 
0.5mln packets per second with Erlang
0.5mln packets per second with Erlang0.5mln packets per second with Erlang
0.5mln packets per second with ErlangMaxim Kharchenko
 
Applied cryptanalysis - stream ciphers
Applied cryptanalysis - stream ciphersApplied cryptanalysis - stream ciphers
Applied cryptanalysis - stream ciphersVlad Garbuz
 
NPF scripting with Lua by Lourival Vieira Neto
NPF scripting with Lua by Lourival Vieira NetoNPF scripting with Lua by Lourival Vieira Neto
NPF scripting with Lua by Lourival Vieira Netoeurobsdcon
 
Introduction to OpenHFT for Melbourne Java Users Group
Introduction to OpenHFT for Melbourne Java Users GroupIntroduction to OpenHFT for Melbourne Java Users Group
Introduction to OpenHFT for Melbourne Java Users GroupPeter Lawrey
 
Advanced off heap ipc
Advanced off heap ipcAdvanced off heap ipc
Advanced off heap ipcPeter Lawrey
 
Determinism in finance
Determinism in financeDeterminism in finance
Determinism in financePeter Lawrey
 
GC free coding in @Java presented @Geecon
GC free coding in @Java presented @GeeconGC free coding in @Java presented @Geecon
GC free coding in @Java presented @GeeconPeter Lawrey
 
Profiling with Devel::NYTProf
Profiling with Devel::NYTProfProfiling with Devel::NYTProf
Profiling with Devel::NYTProfbobcatfish
 
Deterministic behaviour and performance in trading systems
Deterministic behaviour and performance in trading systemsDeterministic behaviour and performance in trading systems
Deterministic behaviour and performance in trading systemsPeter Lawrey
 
PLNOG 18 - Paweł Małachowski - Spy hard czyli regexpem po pakietach
PLNOG 18 - Paweł Małachowski - Spy hard czyli regexpem po pakietachPLNOG 18 - Paweł Małachowski - Spy hard czyli regexpem po pakietach
PLNOG 18 - Paweł Małachowski - Spy hard czyli regexpem po pakietachPROIDEA
 

Tendances (20)

Donatas Mažionis, Building low latency web APIs
Donatas Mažionis, Building low latency web APIsDonatas Mažionis, Building low latency web APIs
Donatas Mažionis, Building low latency web APIs
 
erlang 101
erlang 101erlang 101
erlang 101
 
Monitoring your Python with Prometheus (Python Ireland April 2015)
Monitoring your Python with Prometheus (Python Ireland April 2015)Monitoring your Python with Prometheus (Python Ireland April 2015)
Monitoring your Python with Prometheus (Python Ireland April 2015)
 
SignalFx Kafka Consumer Optimization
SignalFx Kafka Consumer OptimizationSignalFx Kafka Consumer Optimization
SignalFx Kafka Consumer Optimization
 
High Performance Systems in Go - GopherCon 2014
High Performance Systems in Go - GopherCon 2014High Performance Systems in Go - GopherCon 2014
High Performance Systems in Go - GopherCon 2014
 
HBaseCon2017 HBase at Xiaomi
HBaseCon2017 HBase at XiaomiHBaseCon2017 HBase at Xiaomi
HBaseCon2017 HBase at Xiaomi
 
OpenTSDB: HBaseCon2017
OpenTSDB: HBaseCon2017OpenTSDB: HBaseCon2017
OpenTSDB: HBaseCon2017
 
Systems Monitoring with Prometheus (Devops Ireland April 2015)
Systems Monitoring with Prometheus (Devops Ireland April 2015)Systems Monitoring with Prometheus (Devops Ireland April 2015)
Systems Monitoring with Prometheus (Devops Ireland April 2015)
 
Applied cryptanalysis - everything else
Applied cryptanalysis - everything elseApplied cryptanalysis - everything else
Applied cryptanalysis - everything else
 
Graylog2 (MongoBerlin/MongoHamburg 2010)
Graylog2 (MongoBerlin/MongoHamburg 2010)Graylog2 (MongoBerlin/MongoHamburg 2010)
Graylog2 (MongoBerlin/MongoHamburg 2010)
 
0.5mln packets per second with Erlang
0.5mln packets per second with Erlang0.5mln packets per second with Erlang
0.5mln packets per second with Erlang
 
Applied cryptanalysis - stream ciphers
Applied cryptanalysis - stream ciphersApplied cryptanalysis - stream ciphers
Applied cryptanalysis - stream ciphers
 
NPF scripting with Lua by Lourival Vieira Neto
NPF scripting with Lua by Lourival Vieira NetoNPF scripting with Lua by Lourival Vieira Neto
NPF scripting with Lua by Lourival Vieira Neto
 
Introduction to OpenHFT for Melbourne Java Users Group
Introduction to OpenHFT for Melbourne Java Users GroupIntroduction to OpenHFT for Melbourne Java Users Group
Introduction to OpenHFT for Melbourne Java Users Group
 
Advanced off heap ipc
Advanced off heap ipcAdvanced off heap ipc
Advanced off heap ipc
 
Determinism in finance
Determinism in financeDeterminism in finance
Determinism in finance
 
GC free coding in @Java presented @Geecon
GC free coding in @Java presented @GeeconGC free coding in @Java presented @Geecon
GC free coding in @Java presented @Geecon
 
Profiling with Devel::NYTProf
Profiling with Devel::NYTProfProfiling with Devel::NYTProf
Profiling with Devel::NYTProf
 
Deterministic behaviour and performance in trading systems
Deterministic behaviour and performance in trading systemsDeterministic behaviour and performance in trading systems
Deterministic behaviour and performance in trading systems
 
PLNOG 18 - Paweł Małachowski - Spy hard czyli regexpem po pakietach
PLNOG 18 - Paweł Małachowski - Spy hard czyli regexpem po pakietachPLNOG 18 - Paweł Małachowski - Spy hard czyli regexpem po pakietach
PLNOG 18 - Paweł Małachowski - Spy hard czyli regexpem po pakietach
 

En vedette

New solver for FreeBSD pkg
New solver for FreeBSD pkgNew solver for FreeBSD pkg
New solver for FreeBSD pkgVsevolod Stakhov
 
Dovecot & Postfix バージョンアップ動向 201401-201505
Dovecot & Postfix バージョンアップ動向 201401-201505Dovecot & Postfix バージョンアップ動向 201401-201505
Dovecot & Postfix バージョンアップ動向 201401-201505Narimichi Takamura
 
Pkg slides from BSDCan conference
Pkg slides from BSDCan conferencePkg slides from BSDCan conference
Pkg slides from BSDCan conferenceVsevolod Stakhov
 
Linux Based Mail Server
Linux Based Mail ServerLinux Based Mail Server
Linux Based Mail ServerBernics Gábor
 

En vedette (7)

ast-rspamd
ast-rspamdast-rspamd
ast-rspamd
 
Rspamd symbols
Rspamd symbolsRspamd symbols
Rspamd symbols
 
Rspamd testing
Rspamd testingRspamd testing
Rspamd testing
 
New solver for FreeBSD pkg
New solver for FreeBSD pkgNew solver for FreeBSD pkg
New solver for FreeBSD pkg
 
Dovecot & Postfix バージョンアップ動向 201401-201505
Dovecot & Postfix バージョンアップ動向 201401-201505Dovecot & Postfix バージョンアップ動向 201401-201505
Dovecot & Postfix バージョンアップ動向 201401-201505
 
Pkg slides from BSDCan conference
Pkg slides from BSDCan conferencePkg slides from BSDCan conference
Pkg slides from BSDCan conference
 
Linux Based Mail Server
Linux Based Mail ServerLinux Based Mail Server
Linux Based Mail Server
 

Similaire à rspamd-hyperscan

Scaling up genomic analysis with ADAM
Scaling up genomic analysis with ADAMScaling up genomic analysis with ADAM
Scaling up genomic analysis with ADAMfnothaft
 
Design for Scalability in ADAM
Design for Scalability in ADAMDesign for Scalability in ADAM
Design for Scalability in ADAMfnothaft
 
20140406 loa days-tdd-with_puppet_tutorial
20140406 loa days-tdd-with_puppet_tutorial20140406 loa days-tdd-with_puppet_tutorial
20140406 loa days-tdd-with_puppet_tutorialgarrett honeycutt
 
Scalable up genomic analysis with ADAM
Scalable up genomic analysis with ADAMScalable up genomic analysis with ADAM
Scalable up genomic analysis with ADAMfnothaft
 
Алексей Петров "PHP at Scale: Knowing enough to be dangerous!"
Алексей Петров "PHP at Scale: Knowing enough to be dangerous!"Алексей Петров "PHP at Scale: Knowing enough to be dangerous!"
Алексей Петров "PHP at Scale: Knowing enough to be dangerous!"Fwdays
 
Fast Variant Calling with ADAM and avocado
Fast Variant Calling with ADAM and avocadoFast Variant Calling with ADAM and avocado
Fast Variant Calling with ADAM and avocadofnothaft
 
BayFP: Concurrent and Multicore Haskell
BayFP: Concurrent and Multicore HaskellBayFP: Concurrent and Multicore Haskell
BayFP: Concurrent and Multicore HaskellBryan O'Sullivan
 
Migrating Complex Data Aggregation from Hadoop to Spark-(Ashish Singh andPune...
Migrating Complex Data Aggregation from Hadoop to Spark-(Ashish Singh andPune...Migrating Complex Data Aggregation from Hadoop to Spark-(Ashish Singh andPune...
Migrating Complex Data Aggregation from Hadoop to Spark-(Ashish Singh andPune...Spark Summit
 
Atlanta Hadoop Users Meetup 09 21 2016
Atlanta Hadoop Users Meetup 09 21 2016Atlanta Hadoop Users Meetup 09 21 2016
Atlanta Hadoop Users Meetup 09 21 2016Chris Fregly
 
Generating Sequences with Deep LSTMs & RNNS in julia
Generating Sequences with Deep LSTMs & RNNS in juliaGenerating Sequences with Deep LSTMs & RNNS in julia
Generating Sequences with Deep LSTMs & RNNS in juliaAndre Pemmelaar
 
Seven deadly sins of ElasticSearch Benchmarking
Seven deadly sins of ElasticSearch BenchmarkingSeven deadly sins of ElasticSearch Benchmarking
Seven deadly sins of ElasticSearch BenchmarkingFan Robbin
 
Encode, tag, realize high precision text editing
Encode, tag, realize high precision text editingEncode, tag, realize high precision text editing
Encode, tag, realize high precision text editingtaeseon ryu
 
Максим Харченко. Erlang lincx
Максим Харченко. Erlang lincxМаксим Харченко. Erlang lincx
Максим Харченко. Erlang lincxAlina Dolgikh
 
0.5mln packets per second with Erlang
0.5mln packets per second with Erlang0.5mln packets per second with Erlang
0.5mln packets per second with ErlangMaxim Kharchenko
 
Zero ETL analytics with LLAP in Azure HDInsight
Zero ETL analytics with LLAP in Azure HDInsightZero ETL analytics with LLAP in Azure HDInsight
Zero ETL analytics with LLAP in Azure HDInsightAshish Thapliyal
 
PHP Performance with APC + Memcached
PHP Performance with APC + MemcachedPHP Performance with APC + Memcached
PHP Performance with APC + MemcachedFord AntiTrust
 

Similaire à rspamd-hyperscan (20)

Scaling up genomic analysis with ADAM
Scaling up genomic analysis with ADAMScaling up genomic analysis with ADAM
Scaling up genomic analysis with ADAM
 
Design for Scalability in ADAM
Design for Scalability in ADAMDesign for Scalability in ADAM
Design for Scalability in ADAM
 
20140406 loa days-tdd-with_puppet_tutorial
20140406 loa days-tdd-with_puppet_tutorial20140406 loa days-tdd-with_puppet_tutorial
20140406 loa days-tdd-with_puppet_tutorial
 
Scalable up genomic analysis with ADAM
Scalable up genomic analysis with ADAMScalable up genomic analysis with ADAM
Scalable up genomic analysis with ADAM
 
Алексей Петров "PHP at Scale: Knowing enough to be dangerous!"
Алексей Петров "PHP at Scale: Knowing enough to be dangerous!"Алексей Петров "PHP at Scale: Knowing enough to be dangerous!"
Алексей Петров "PHP at Scale: Knowing enough to be dangerous!"
 
Fast Variant Calling with ADAM and avocado
Fast Variant Calling with ADAM and avocadoFast Variant Calling with ADAM and avocado
Fast Variant Calling with ADAM and avocado
 
BayFP: Concurrent and Multicore Haskell
BayFP: Concurrent and Multicore HaskellBayFP: Concurrent and Multicore Haskell
BayFP: Concurrent and Multicore Haskell
 
Migrating Complex Data Aggregation from Hadoop to Spark-(Ashish Singh andPune...
Migrating Complex Data Aggregation from Hadoop to Spark-(Ashish Singh andPune...Migrating Complex Data Aggregation from Hadoop to Spark-(Ashish Singh andPune...
Migrating Complex Data Aggregation from Hadoop to Spark-(Ashish Singh andPune...
 
Designing REST APIs
Designing REST APIsDesigning REST APIs
Designing REST APIs
 
Asynchronous apex
Asynchronous apexAsynchronous apex
Asynchronous apex
 
Atlanta Hadoop Users Meetup 09 21 2016
Atlanta Hadoop Users Meetup 09 21 2016Atlanta Hadoop Users Meetup 09 21 2016
Atlanta Hadoop Users Meetup 09 21 2016
 
Generating Sequences with Deep LSTMs & RNNS in julia
Generating Sequences with Deep LSTMs & RNNS in juliaGenerating Sequences with Deep LSTMs & RNNS in julia
Generating Sequences with Deep LSTMs & RNNS in julia
 
Seven deadly sins of ElasticSearch Benchmarking
Seven deadly sins of ElasticSearch BenchmarkingSeven deadly sins of ElasticSearch Benchmarking
Seven deadly sins of ElasticSearch Benchmarking
 
Encode, tag, realize high precision text editing
Encode, tag, realize high precision text editingEncode, tag, realize high precision text editing
Encode, tag, realize high precision text editing
 
Максим Харченко. Erlang lincx
Максим Харченко. Erlang lincxМаксим Харченко. Erlang lincx
Максим Харченко. Erlang lincx
 
0.5mln packets per second with Erlang
0.5mln packets per second with Erlang0.5mln packets per second with Erlang
0.5mln packets per second with Erlang
 
Zero ETL analytics with LLAP in Azure HDInsight
Zero ETL analytics with LLAP in Azure HDInsightZero ETL analytics with LLAP in Azure HDInsight
Zero ETL analytics with LLAP in Azure HDInsight
 
XSEDE15_PhastaGateway
XSEDE15_PhastaGatewayXSEDE15_PhastaGateway
XSEDE15_PhastaGateway
 
Regexes in .NET
Regexes in .NETRegexes in .NET
Regexes in .NET
 
PHP Performance with APC + Memcached
PHP Performance with APC + MemcachedPHP Performance with APC + Memcached
PHP Performance with APC + Memcached
 

rspamd-hyperscan