SlideShare une entreprise Scribd logo
1  sur  21
Télécharger pour lire hors ligne
Big	
  Data	
  for	
  Security	
  
Ping	
  @opendns.com	
  
Umbrella	
  Security	
  Lab	
  @OpenDNS	
  	
  
•  100+	
  sensors	
  across	
  200+	
  countries	
  
•  200	
  million	
  unique	
  registered	
  domains	
  
names	
  
•  40	
  million	
  acDve	
  users	
  
•  50	
  billions	
  daily	
  DNS	
  requests	
  
The	
  PlaIorm	
  
HDFS,	
  HBASE	
  
KaPa	
  

ProducDon	
  

Storm	
  

naDve	
  MR	
  

Python	
  

Backup	
  
AnalyDcs	
  	
  

pig	
  

hive	
  

R	
  
 
TransacDon	
  View	
  of	
  DNS	
  Lookups	
  
15.83.5.1	
  
128.13.18.67	
  
62.8.20.54	
  
154.1.32.15	
  

usirk.ws	
  
gpioegjrhsf.ws	
  
dncdh.nl	
  
pbqxdwwv.ws	
  
hzkfooak.cn	
  
jflyyruea.com	
  
google.com	
  
labs.kaspersky.com	
  
 
	
  

	
  	
  
DNS	
  VS.	
  Retail	
  
–  Amazon’s	
  CollaboraDve	
  Filtering	
  
–  Apriori	
  algorithm	
  (frequent	
  item	
  set	
  mining)	
  
Modeling	
  Methodologies	
  

Reasoning	
  tech	
  
(generaDve,	
  
empirical,	
  
iteraDve,	
  
recursive	
  …)	
  

Data	
  abstracDon/	
  
representaDon	
  
(link	
  graph,	
  social	
  
graph	
  …)	
  

Behavior	
  abstracDon	
  
(random	
  walk)	
  
Client	
  IP	
  	
  

domain	
  	
  

?	
  
?	
  
DNS	
  transacDons	
  	
  
	
  

The	
  less	
  visited	
  by	
  good	
  
clients,	
  the	
  higher	
  
chance	
  a	
  domain	
  is	
  bad	
  	
  	
  
	
  
	
  
Two	
  types	
  of	
  node	
  
	
  
Node	
  is	
  either	
  visiDng,	
  or	
  being	
  
visited,	
  but	
  never	
  both	
  
	
  
	
  
There	
  are	
  super	
  nodes	
  that	
  link	
  to	
  
millions	
  of	
  other	
  nodes	
  	
  
	
  
	
  
Domains	
  are	
  classified	
  as	
  benign,	
  
malicious,	
  unknown	
  	
  
	
  

Page	
  rank	
  	
  
	
  

The	
  more	
  linked	
  by	
  
good	
  pages,	
  the	
  higher	
  a	
  
page	
  is	
  ranked	
  	
  
	
  
	
  
One	
  type	
  of	
  node	
  
	
  
One	
  node	
  can	
  have	
  both	
  inlinks	
  and	
  
outlinks	
  
	
  
Most	
  nodes	
  link	
  to	
  a	
  limited	
  amount	
  
of	
  other	
  nodes	
  	
  
	
  
Pages	
  are	
  not	
  classified	
  	
  
	
  
	
  
	
  
	
  
DNS	
  transacDons	
  
	
  
The	
  domains	
  visited	
  by	
  more	
  good	
  
visitors	
  are	
  ranked	
  high	
  (inlink)	
  
-­‐	
  Assign	
  a	
  “posiDve”	
  iniDal	
  value	
  
	
  	
  
Visitors	
  visiDng	
  more	
  good	
  domains	
  
are	
  ranked	
  high	
  (outlink)	
  
-­‐	
  Assign	
  a	
  “posiDve”	
  iniDal	
  value	
  
	
  
	
  
Linkage	
  matrix	
  NxM	
  (N	
  being	
  total	
  
number	
  of	
  domains,	
  M	
  being	
  total	
  
number	
  of	
  IPs)	
  
	
  
PotenDally,	
  we	
  can	
  consider	
  query	
  
count	
  as	
  linkage	
  weight	
  
	
  

Page	
  rank	
  	
  
	
  
Damping	
  factor	
  (user	
  get	
  bored)	
  
	
  
Random	
  sinks	
  and	
  cycles	
  
	
  
Page	
  rank	
  are	
  numbers	
  between	
  0	
  
and	
  1	
  and	
  sum	
  up	
  to	
  one	
  in	
  total	
  
	
  
Linkage	
  matrix	
  NxN	
  (N	
  being	
  the	
  total	
  
number	
  of	
  pages	
  	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
Recursive	
  defini-on	
  	
  

	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
	
  
r(dn)t+1 = (r(ip)t / L(ip))
	
  
for	
  all	
  ips	
  visiDng	
  domain	
  dn	
  
	
  -­‐-­‐-­‐	
  	
  r(ip)t	
  	
  rank	
  for	
  ip	
  at	
  Dme	
  t	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
	
  -­‐-­‐-­‐	
  	
  L(ip)	
  the	
  total	
  number	
  of	
  domains	
  ip	
  connects	
  to	
  (in	
  a	
  certain	
  Dme	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
window)	
  
	
  	
  
	
  r(ip) =
(r(dn)t / L(dn))
t+1
	
  
	
  for	
  all	
  domain	
  dn	
  visited	
  by	
  ip	
  
	
  -­‐-­‐-­‐	
  	
  r(ip)	
  t	
  	
  the	
  rank	
  for	
  d	
  at	
  Dme	
  t	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
	
  -­‐-­‐-­‐	
  L(dn)	
  	
  the	
  total	
  number	
  of	
  ips	
  visiDng	
  domain	
  d	
  (not	
  variant	
  by	
  Dme)	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
	
  	
  
The	
  denominator	
  gives	
  the	
  marginal	
  (the	
  sum	
  of	
  the	
  counts	
  of	
  the	
  
condiDoning	
  variable	
  co-­‐occurring	
  with	
  anything	
  else)	
  
	
  

∑

∑
 Tasks	
  
• 
• 
• 
• 
• 

Recursive	
  definiDon	
  
Build	
  linkage	
  matrix	
  
IniDalizaDon	
  
IteraDng	
  
Test	
  for	
  convergence	
  	
  
Link	
  analysis	
  –	
  build	
  sparse	
  linkage	
  matrix	
  (row-­‐wise)	
  

	
  
input	
  	
  
query	
  log	
  (each	
  entry:	
  client	
  ip	
  to	
  hostnames)	
  
	
  	
  
output	
  	
  
dn	
  -­‐>	
  	
  ip	
  	
  	
  	
  ip	
  	
  	
  	
  	
  	
  	
  ip	
  	
  	
  	
  	
  	
  	
  ip	
  
ip	
  	
  -­‐>	
  	
  dn	
  	
  	
  dn	
  	
  	
  	
  	
  	
  dn	
  	
  	
  	
  	
  dn	
  	
  	
  
	
  	
  
//STRIPE	
  DESIGN	
  
	
  	
  
//map	
  job:	
  parsing	
  query	
  entry,	
  filter	
  bad	
  hostname,	
  convert	
  hostname	
  to	
  domain	
  
emit	
  [key(domain),	
  value(ip)]	
  
emit	
  [key(ip),value(domain)]	
  
	
  	
  
//reduce	
  job:	
  	
  
emit	
  [key(domain),	
  value(ip	
  	
  	
  	
  ip	
  	
  	
  	
  ip)]	
  
emit	
  [key(ip),value(domain	
  	
  	
  	
  domain	
  	
  	
  	
  domain)]	
  
	
  
Itera-ng	
  	
  

	
  
–	
  MapReduce	
  iteraDon	
  #n	
  
	
  	
  	
  

map	
  

–	
  input	
  
Key	
  (domain),	
  value	
  (pagerank	
  	
  	
  	
  	
  ip	
  	
  	
  	
  	
  ip	
  	
  	
  	
  	
  	
  ip)	
  
Or	
  Key(ip),	
  value	
  (pagerank	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  dn	
  	
  	
  	
  dn	
  	
  	
  	
  dn	
  	
  	
  dn)	
  
–	
  output	
  
key(ip/domain),	
  value(x=pagerank/linklist.size())	
  	
  	
  
	
  	
  

reduce	
  	
  

–	
  input	
  
Key(domain/ip),	
  values	
  (x)	
  	
  //x	
  as	
  defined	
  above	
  
key	
  (domain/ip),	
  value	
  (x	
  	
  ip	
  	
  ip	
  ...	
  	
  ip	
  )	
  
	
  	
  
–	
  output	
  
Key	
  (domain/ip),	
  value	
  (Σx	
  	
  	
  	
  ip	
  	
  	
  	
  ip	
  	
  	
  	
  ip)	
  
	
  	
  
Hadoop	
  ImplementaDon	
  	
  
•  Mapreduce	
  job	
  #1	
  
–  Building	
  Link	
  lists	
  

•  Iterate	
  mapreduce	
  job	
  #2	
  
–  Security	
  ranking	
  

•  Mapreduce	
  job	
  #3	
  
–  SorDng	
  
Hadoop	
  Job	
  2	
  –	
  linkage	
  creaDon,	
  domain	
  (or	
  ip)	
  mappings	
  
Reducer	
  	
  

Mapper	
  	
  
Input	
  	
  
Querylog	
  	
  

key	
  

Output	
  	
  

Output	
  	
  
value	
  

Domain	
   IP	
  
IP	
  	
  

Domain	
  

	
  key	
  	
  

Value	
  (rank,	
  previous	
  rank,	
  
links)	
  

IP	
  

1.0	
  	
  	
  	
  	
  	
  	
  1.0	
  	
  	
  	
  	
  	
  d	
  	
  	
  	
  d	
  	
  	
  	
  	
  	
  d	
  	
  	
  	
  	
  d	
  

Domain	
  

1.0	
  	
  	
  	
  	
  	
  	
  1.0	
  	
  	
  	
  	
  	
  ip	
  	
  	
  ip	
  	
  	
  	
  	
  ip	
  	
  	
  	
  	
  ip	
  

Slide	
  18	
  
Hadoop	
  Job	
  2	
  –	
  Security	
  Ranking	
  (SR)	
  	
  
Mapper	
  	
  
Output	
  	
  

Input	
  	
  
Value	
  	
  

key	
  

value	
  	
  

IP1	
  

2.3,	
  1.0,	
  d1,	
  d2,	
  d3	
  

d1	
  

“rank”	
  2.3/(num_of_links=3)	
  

IP2	
  

-­‐9.5,1.0,	
  d1,	
  d3	
  

d1	
  

“rank”	
  -­‐9.5/(num_of_links=2)	
  

d1	
  

24,	
  1.0,	
  IP1,	
  IP2	
  

d2	
  

“rank”	
  2.3/(num_of_links=3)	
  

d3	
  

“rank”	
  2.3/(num_of_links=3)	
  

d3	
  

“rank”	
  -­‐9.5/(num_of_links=2)	
  

IP1	
  

“links”	
  2.3,	
  1.0,	
  d1,	
  d2,	
  d3	
  

IP2	
  

“links”	
  -­‐9.5,1.0,	
  d1,	
  d3	
  

UpdaDng	
  security	
  rank	
  
	
  

SR	
  =	
  Σ	
  SRi/K,	
  for	
  

each	
  outlink,	
  K	
  being	
  the	
  
number	
  of	
  outlinks	
  of	
  
enDty	
  i
	
  

	
  

Reducer	
  	
  

Key	
  

Output	
  	
  
	
  key	
  

value	
  

d1	
  

2.3/3	
  +	
  -­‐9.5/2,	
  24,	
  IP1,	
  IP2	
  
Slide	
  19	
  
Risks/Issues	
  
•  Behavior	
  changes.	
  A	
  machine	
  can	
  be	
  infected	
  at	
  any	
  
minute.	
  Is	
  a	
  day	
  or	
  an	
  hour	
  good	
  window	
  to	
  measure	
  the	
  
“cleanness”	
  of	
  a	
  client?	
  
•  Noises	
  
•  Each	
  individual	
  source	
  is	
  one	
  client	
  IP	
  or	
  a	
  user	
  or	
  machine	
  
(e.g.,	
  school	
  WIFI,	
  where	
  no	
  consistent	
  client	
  visiDng	
  
behavior	
  can	
  be	
  obtained).	
  Are	
  these	
  IPs	
  introducing	
  noises	
  
or	
  they	
  are	
  the	
  ones	
  bringing	
  in	
  the	
  most	
  likely	
  malicious	
  
connec8ons?	
  
•  Massive	
  detecDon,	
  is	
  it	
  massive	
  FP?	
  	
  	
  
Take-­‐away	
  
	
  
•  Graph-­‐based	
  discovery	
  	
  
•  Take	
  a	
  different	
  view	
  at	
  your	
  data	
  	
  
•  Machine	
  Learning	
  at	
  a	
  different	
  scale	
  

Contenu connexe

Tendances

Tendances (10)

A Measurement Study of Open Resolvers and DNS Server Version
A Measurement Study of Open Resolvers and DNS Server VersionA Measurement Study of Open Resolvers and DNS Server Version
A Measurement Study of Open Resolvers and DNS Server Version
 
Inside the ABC's new Media Transcoding system, Metro
Inside the ABC's new Media Transcoding system, MetroInside the ABC's new Media Transcoding system, Metro
Inside the ABC's new Media Transcoding system, Metro
 
Integrating R & Hadoop - Text Mining & Sentiment Analysis
Integrating R & Hadoop - Text Mining & Sentiment AnalysisIntegrating R & Hadoop - Text Mining & Sentiment Analysis
Integrating R & Hadoop - Text Mining & Sentiment Analysis
 
Natural Language Processing in R (rNLP)
Natural Language Processing in R (rNLP)Natural Language Processing in R (rNLP)
Natural Language Processing in R (rNLP)
 
Distributed Queries in IDS: New features.
Distributed Queries in IDS: New features.Distributed Queries in IDS: New features.
Distributed Queries in IDS: New features.
 
DNS, DHCP Configuration
DNS, DHCP Configuration DNS, DHCP Configuration
DNS, DHCP Configuration
 
Introduction to Data Mining with R and Data Import/Export in R
Introduction to Data Mining with R and Data Import/Export in RIntroduction to Data Mining with R and Data Import/Export in R
Introduction to Data Mining with R and Data Import/Export in R
 
Cisco vs. huawei CLI Commands
Cisco vs. huawei CLI CommandsCisco vs. huawei CLI Commands
Cisco vs. huawei CLI Commands
 
Improved EAP-SRP in Wireless Network Authentication
Improved EAP-SRP in Wireless Network AuthenticationImproved EAP-SRP in Wireless Network Authentication
Improved EAP-SRP in Wireless Network Authentication
 
Rbootcamp Day 5
Rbootcamp Day 5Rbootcamp Day 5
Rbootcamp Day 5
 

Similaire à Securerank ping-opendns

Extending lifespan with Hadoop and R
Extending lifespan with Hadoop and RExtending lifespan with Hadoop and R
Extending lifespan with Hadoop and R
Radek Maciaszek
 
Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East ...
Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East ...Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East ...
Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East ...
Spark Summit
 

Similaire à Securerank ping-opendns (20)

Spark + Clojure for Topic Discovery - Zalando Tech Clojure/Conj Talk
Spark + Clojure for Topic Discovery - Zalando Tech Clojure/Conj TalkSpark + Clojure for Topic Discovery - Zalando Tech Clojure/Conj Talk
Spark + Clojure for Topic Discovery - Zalando Tech Clojure/Conj Talk
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Iterative Spark Developmen...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Iterative Spark Developmen...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Iterative Spark Developmen...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Iterative Spark Developmen...
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Data Provenance Support in...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Data Provenance Support in...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Data Provenance Support in...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Data Provenance Support in...
 
Extending lifespan with Hadoop and R
Extending lifespan with Hadoop and RExtending lifespan with Hadoop and R
Extending lifespan with Hadoop and R
 
Apache Spark & Streaming
Apache Spark & StreamingApache Spark & Streaming
Apache Spark & Streaming
 
Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East ...
Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East ...Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East ...
Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East ...
 
Big Data Analytics with Hadoop with @techmilind
Big Data Analytics with Hadoop with @techmilindBig Data Analytics with Hadoop with @techmilind
Big Data Analytics with Hadoop with @techmilind
 
MAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptxMAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptx
 
Arvindsujeeth scaladays12
Arvindsujeeth scaladays12Arvindsujeeth scaladays12
Arvindsujeeth scaladays12
 
Tuning and Debugging in Apache Spark
Tuning and Debugging in Apache SparkTuning and Debugging in Apache Spark
Tuning and Debugging in Apache Spark
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
Data Analysis in Python
Data Analysis in PythonData Analysis in Python
Data Analysis in Python
 
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLabApache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLab
 
Scala meetup - Intro to spark
Scala meetup - Intro to sparkScala meetup - Intro to spark
Scala meetup - Intro to spark
 
Big Data Scala by the Bay: Interactive Spark in your Browser
Big Data Scala by the Bay: Interactive Spark in your BrowserBig Data Scala by the Bay: Interactive Spark in your Browser
Big Data Scala by the Bay: Interactive Spark in your Browser
 
Transformer vos solutions Telco avec Neo4j
Transformer vos solutions Telco avec Neo4jTransformer vos solutions Telco avec Neo4j
Transformer vos solutions Telco avec Neo4j
 
Tuning and Debugging in Apache Spark
Tuning and Debugging in Apache SparkTuning and Debugging in Apache Spark
Tuning and Debugging in Apache Spark
 
Hadoop trainingin bangalore
Hadoop trainingin bangaloreHadoop trainingin bangalore
Hadoop trainingin bangalore
 
IIUG 2016 Gathering Informix data into R
IIUG 2016 Gathering Informix data into RIIUG 2016 Gathering Informix data into R
IIUG 2016 Gathering Informix data into R
 
RDataMining slides-r-programming
RDataMining slides-r-programmingRDataMining slides-r-programming
RDataMining slides-r-programming
 

Dernier

Dernier (20)

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 

Securerank ping-opendns

  • 1. Big  Data  for  Security   Ping  @opendns.com  
  • 2. Umbrella  Security  Lab  @OpenDNS     •  100+  sensors  across  200+  countries   •  200  million  unique  registered  domains   names   •  40  million  acDve  users   •  50  billions  daily  DNS  requests  
  • 3.
  • 4. The  PlaIorm   HDFS,  HBASE   KaPa   ProducDon   Storm   naDve  MR   Python   Backup   AnalyDcs     pig   hive   R  
  • 5.  
  • 6. TransacDon  View  of  DNS  Lookups   15.83.5.1   128.13.18.67   62.8.20.54   154.1.32.15   usirk.ws   gpioegjrhsf.ws   dncdh.nl   pbqxdwwv.ws   hzkfooak.cn   jflyyruea.com   google.com   labs.kaspersky.com  
  • 7.        
  • 8. DNS  VS.  Retail   –  Amazon’s  CollaboraDve  Filtering   –  Apriori  algorithm  (frequent  item  set  mining)  
  • 9. Modeling  Methodologies   Reasoning  tech   (generaDve,   empirical,   iteraDve,   recursive  …)   Data  abstracDon/   representaDon   (link  graph,  social   graph  …)   Behavior  abstracDon   (random  walk)  
  • 10. Client  IP     domain     ?   ?  
  • 11. DNS  transacDons       The  less  visited  by  good   clients,  the  higher   chance  a  domain  is  bad           Two  types  of  node     Node  is  either  visiDng,  or  being   visited,  but  never  both       There  are  super  nodes  that  link  to   millions  of  other  nodes         Domains  are  classified  as  benign,   malicious,  unknown       Page  rank       The  more  linked  by   good  pages,  the  higher  a   page  is  ranked         One  type  of  node     One  node  can  have  both  inlinks  and   outlinks     Most  nodes  link  to  a  limited  amount   of  other  nodes       Pages  are  not  classified            
  • 12. DNS  transacDons     The  domains  visited  by  more  good   visitors  are  ranked  high  (inlink)   -­‐  Assign  a  “posiDve”  iniDal  value       Visitors  visiDng  more  good  domains   are  ranked  high  (outlink)   -­‐  Assign  a  “posiDve”  iniDal  value       Linkage  matrix  NxM  (N  being  total   number  of  domains,  M  being  total   number  of  IPs)     PotenDally,  we  can  consider  query   count  as  linkage  weight     Page  rank       Damping  factor  (user  get  bored)     Random  sinks  and  cycles     Page  rank  are  numbers  between  0   and  1  and  sum  up  to  one  in  total     Linkage  matrix  NxN  (N  being  the  total   number  of  pages                  
  • 13. Recursive  defini-on                                                                         r(dn)t+1 = (r(ip)t / L(ip))   for  all  ips  visiDng  domain  dn    -­‐-­‐-­‐    r(ip)t    rank  for  ip  at  Dme  t                            -­‐-­‐-­‐    L(ip)  the  total  number  of  domains  ip  connects  to  (in  a  certain  Dme                           window)        r(ip) = (r(dn)t / L(dn)) t+1    for  all  domain  dn  visited  by  ip    -­‐-­‐-­‐    r(ip)  t    the  rank  for  d  at  Dme  t                        -­‐-­‐-­‐  L(dn)    the  total  number  of  ips  visiDng  domain  d  (not  variant  by  Dme)                                 The  denominator  gives  the  marginal  (the  sum  of  the  counts  of  the   condiDoning  variable  co-­‐occurring  with  anything  else)     ∑ ∑
  • 14.  Tasks   •  •  •  •  •  Recursive  definiDon   Build  linkage  matrix   IniDalizaDon   IteraDng   Test  for  convergence    
  • 15. Link  analysis  –  build  sparse  linkage  matrix  (row-­‐wise)     input     query  log  (each  entry:  client  ip  to  hostnames)       output     dn  -­‐>    ip        ip              ip              ip   ip    -­‐>    dn      dn            dn          dn           //STRIPE  DESIGN       //map  job:  parsing  query  entry,  filter  bad  hostname,  convert  hostname  to  domain   emit  [key(domain),  value(ip)]   emit  [key(ip),value(domain)]       //reduce  job:     emit  [key(domain),  value(ip        ip        ip)]   emit  [key(ip),value(domain        domain        domain)]    
  • 16. Itera-ng       –  MapReduce  iteraDon  #n         map   –  input   Key  (domain),  value  (pagerank          ip          ip            ip)   Or  Key(ip),  value  (pagerank                      dn        dn        dn      dn)   –  output   key(ip/domain),  value(x=pagerank/linklist.size())           reduce     –  input   Key(domain/ip),  values  (x)    //x  as  defined  above   key  (domain/ip),  value  (x    ip    ip  ...    ip  )       –  output   Key  (domain/ip),  value  (Σx        ip        ip        ip)      
  • 17. Hadoop  ImplementaDon     •  Mapreduce  job  #1   –  Building  Link  lists   •  Iterate  mapreduce  job  #2   –  Security  ranking   •  Mapreduce  job  #3   –  SorDng  
  • 18. Hadoop  Job  2  –  linkage  creaDon,  domain  (or  ip)  mappings   Reducer     Mapper     Input     Querylog     key   Output     Output     value   Domain   IP   IP     Domain    key     Value  (rank,  previous  rank,   links)   IP   1.0              1.0            d        d            d          d   Domain   1.0              1.0            ip      ip          ip          ip   Slide  18  
  • 19. Hadoop  Job  2  –  Security  Ranking  (SR)     Mapper     Output     Input     Value     key   value     IP1   2.3,  1.0,  d1,  d2,  d3   d1   “rank”  2.3/(num_of_links=3)   IP2   -­‐9.5,1.0,  d1,  d3   d1   “rank”  -­‐9.5/(num_of_links=2)   d1   24,  1.0,  IP1,  IP2   d2   “rank”  2.3/(num_of_links=3)   d3   “rank”  2.3/(num_of_links=3)   d3   “rank”  -­‐9.5/(num_of_links=2)   IP1   “links”  2.3,  1.0,  d1,  d2,  d3   IP2   “links”  -­‐9.5,1.0,  d1,  d3   UpdaDng  security  rank     SR  =  Σ  SRi/K,  for   each  outlink,  K  being  the   number  of  outlinks  of   enDty  i     Reducer     Key   Output      key   value   d1   2.3/3  +  -­‐9.5/2,  24,  IP1,  IP2   Slide  19  
  • 20. Risks/Issues   •  Behavior  changes.  A  machine  can  be  infected  at  any   minute.  Is  a  day  or  an  hour  good  window  to  measure  the   “cleanness”  of  a  client?   •  Noises   •  Each  individual  source  is  one  client  IP  or  a  user  or  machine   (e.g.,  school  WIFI,  where  no  consistent  client  visiDng   behavior  can  be  obtained).  Are  these  IPs  introducing  noises   or  they  are  the  ones  bringing  in  the  most  likely  malicious   connec8ons?   •  Massive  detecDon,  is  it  massive  FP?      
  • 21. Take-­‐away     •  Graph-­‐based  discovery     •  Take  a  different  view  at  your  data     •  Machine  Learning  at  a  different  scale