SlideShare une entreprise Scribd logo
1  sur  31
Télécharger pour lire hors ligne
Scaling Big Data Mining Infrastructure:
The Smart Protection Network Experience
黃振修 (Chris Huang)
SPN 主動式雲端截毒技術架構師
About Me
• SPN 主動式雲端截毒技術架構師
• SPN Hadoop 基礎運算架構師
• Hadoop in Taiwan 2013 講師
• Hadoop.TW 活躍成員
9/10/2013 Confidential | Copyright 2013 TrendMicro Inc. 2
9/10/2013 Confidential | Copyright 2013 TrendMicro Inc.
The Journey to Big Data
3
9/10/2013 Confidential | Copyright 2013 TrendMicro Inc. 4
YesterdayYesterdayYesterdayYesterday
~40 Hadoop nodes
~15 Service/user accounts
3 Teams
<50 TB storage
<100 Jobs per day
9/10/2013 Confidential | Copyright 2013 TrendMicro Inc. 5
TodayTodayTodayToday
~200 Hadoop nodes
~130 Service/user accounts
11 Teams
~500 TB storage
>16000 Jobs per day
9/10/2013 Confidential | Copyright 2013 TrendMicro Inc. 6
1 MapReduce Job
Submitted
Each 5.4 Seconds
9/10/2013 Confidential | Copyright 2013 TrendMicro Inc. 7
Why?Why?Why?Why?
Raw Data
Actionable
Intelligence
Collaboration in the underground
網路威脅呈現爆炸性的成長
New Unique Malware Discovered
各式各樣的變種病毒、垃圾郵件、不明的下載來源等等,這些來自網路上
的威脅,躲過傳統安全防護系統的偵測,一直持續呈現爆炸性的成長,形
成嚴重的資安威脅
1M
unique
Malwares
every
month
1M
unique
Malwares
every
month
Reality Check
2011
New Unique Threats per Hour
(worldwide estimate*)
Network
Worms
Threats Found in Enterprises
(Real-world data from 150+ assessments*)
Data-Stealing
Malware
IRC
Bots
Targeting
Malware
COMPLEXITY
DANGER
Dangerous RisksSkyrocketing Volume Avoiding Detection
42%
56%
77%
100%
2010200920082007
12600
NEW
Threat Every
0.28
Seconds
2400
• 52% of companies failed to report or remediate a cyber breach
in 2011. --- SAIC, 2011
• Two new pieces of malwares are created every second. ---
Trend Micro, 2012
• A cyber intrusion occurs every 5 minutes. --- US CERT 2012
Traditional approach is no more sufficient!
9/10/2013 Confidential | Copyright 2013 TrendMicro Inc.
Big Data Exploration
17
New approach for cyber threat solution
Web CrawlerWeb Crawler
Trend Micro
Endpoint Protection
Trend Micro
Endpoint Protection
Trend Micro
Mail Protection
Trend Micro
Mail Protection
Trend Micro
Web Protection
Trend Micro
Web Protection
HoneypotHoneypot
CDN / xSPCDN / xSP Researcher
Intelligence
Researcher
Intelligence
3+ Billion Worldwide Sensors
SPN: Smart Protection Network
9/10/2013 Confidential | Copyright 2013 TrendMicro Inc. 19
Collects
Protects
Identifies
BIG
DATA
ANALYTICS
(Data Mining,
Machine Learning,
Modeling, Correlation)
DAILY STATS:
• 7.2 TB data correlated
• 1B IP addresses
• 90K malicious
threats identified
• 100+M good files
SPN High Level Architecture
9/10/2013 Confidential | Copyright 2013 TrendMicro Inc. 20
Receiver
Trend Message Exchange (Message Bus)
Hadoop Distributed File System (HDFS)
HBaseMapReduce
Adhoc-Query (Pig)
Oozie
CDN/xSP
Log
Honey
Pot
SPN
Feedback
Data SourcingData Sourcing
APP 1
MySPN Platform
Solr Cloud
API Server/Portal
Service Platform
APP 2
Service DeliveryService Delivery
MySPN Ecosystem
Portal
& API
Single
Entry-Point
SPN Infrastructure
APT KB Service
TopCVE Service
APT KB
VE DB
FB Logs
Census
MySPN
Market Place
Service Platform
SSO
New App
OPS RD / Team
Monitor SDK
All My
Guard
Threat
Connect
Dashboard
Service
Catalog
Census
Profile Alert
New App
Dispatcher
Access
Login
Trender
Need
Solution
Customer
Publish
ImplementOperate
Develop
Solution
backed-by
Data Catalogue
SPN Solution Architecture
File
URL
Web /
URL
Email
Domain
IP
File Reputation ServiceFile Reputation Service
Email Reputation ServiceEmail Reputation Service
Customer
SmartProtection
Community Intelligence
(Feedback loop)
Web Reputation ServiceWeb Reputation Service
Sourcing
Processing
& Analysis
Validate &
Create Solution
Quality
Assurance
Solution
Distribution
Solution
Adoption
SPN Correlation
9/10/2013 Confidential | Copyright 2013 TrendMicro Inc.
Big Data Case Study
23
Internet Web Server
4. Access page
1. Intercept URL
SPN Cloud
9/10/2013 24
200K+ new URL created every day
Case Study: Web Reputation Services
8+ billions URL process daily
User Traffic / Sourcing
CDN vender
Rating Server for Known
Threats
Unknown & Prefilter
Page Download
Threat
Analysis
8 billions/day
4.8 billions/day
860 millions/day
40% filtered
82% filtered
25,000 malicious URL /day
99.98% filtered
Trend Micro
Products / Technology
CDN Cache
High Throughput Web Service
Hadoop Cluster
Web Crawling
Machine Learning
Data Mining
Technology Process Operation
Block malicious URL within 15 minutes once it goes online!
WRS Architecture Overview
9/10/2013 Confidential | Copyright 2013 TrendMicro Inc.
Big Data Lesson Learned
27
How to Scale?
• Un-structure data first
• If you really need structure data
– Use Google Protocol Buffers or
– JSON string
• Purify your data before processing
• Leverage HBase more
– Well design row key to prevent hot-spot
• Use MapReduce to create Lucene index
• Leverage SolrCloud for complex real-time use cases
9/10/2013 Confidential | Copyright 2013 TrendMicro Inc. 28
Our Learning
• Has clear strategy first
• Start small, scale quickly
• Chose right solution for right problem
Q&A
9/10/2013 Confidential | Copyright 2013 TrendMicro Inc. 30
9/10/2013 Confidential | Copyright 2013 TrendMicro Inc. 31
Big Challenge
Big Opportunity
Thank You

Contenu connexe

Tendances

Going Real-Time: Creating Frequently-Updating Datasets for Personalization: S...
Going Real-Time: Creating Frequently-Updating Datasets for Personalization: S...Going Real-Time: Creating Frequently-Updating Datasets for Personalization: S...
Going Real-Time: Creating Frequently-Updating Datasets for Personalization: S...
Spark Summit
 
Real-Time Robot Predictive Maintenance in Action
Real-Time Robot Predictive Maintenance in ActionReal-Time Robot Predictive Maintenance in Action
Real-Time Robot Predictive Maintenance in Action
DataWorks Summit
 

Tendances (20)

Build a Time Series Application with Apache Spark and Apache HBase
Build a Time Series Application with Apache Spark and Apache  HBaseBuild a Time Series Application with Apache Spark and Apache  HBase
Build a Time Series Application with Apache Spark and Apache HBase
 
Advanced Threat Detection on Streaming Data
Advanced Threat Detection on Streaming DataAdvanced Threat Detection on Streaming Data
Advanced Threat Detection on Streaming Data
 
Fast Cars, Big Data How Streaming can help Formula 1
Fast Cars, Big Data How Streaming can help Formula 1Fast Cars, Big Data How Streaming can help Formula 1
Fast Cars, Big Data How Streaming can help Formula 1
 
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real- T...
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real- T...Applying Machine Learning to IOT: End to End Distributed Pipeline for Real- T...
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real- T...
 
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo LeeData Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
 
AI on Spark for Malware Analysis and Anomalous Threat Detection
AI on Spark for Malware Analysis and Anomalous Threat DetectionAI on Spark for Malware Analysis and Anomalous Threat Detection
AI on Spark for Malware Analysis and Anomalous Threat Detection
 
[DataCon.TW 2018] Metadata Store: Generalized Entity Database for Intelligenc...
[DataCon.TW 2018] Metadata Store: Generalized Entity Database for Intelligenc...[DataCon.TW 2018] Metadata Store: Generalized Entity Database for Intelligenc...
[DataCon.TW 2018] Metadata Store: Generalized Entity Database for Intelligenc...
 
[DataCon.TW 2019] Graph Query on Big-data, REST API, and Live Analysis Systems
[DataCon.TW 2019] Graph Query on Big-data, REST API, and Live Analysis Systems[DataCon.TW 2019] Graph Query on Big-data, REST API, and Live Analysis Systems
[DataCon.TW 2019] Graph Query on Big-data, REST API, and Live Analysis Systems
 
Predicting Flight Delays with Spark Machine Learning
Predicting Flight Delays with Spark Machine LearningPredicting Flight Delays with Spark Machine Learning
Predicting Flight Delays with Spark Machine Learning
 
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real-Ti...
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real-Ti...Applying Machine Learning to IOT: End to End Distributed Pipeline for Real-Ti...
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real-Ti...
 
Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...
Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...
Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...
 
Apache Spark Machine Learning Decision Trees
Apache Spark Machine Learning Decision TreesApache Spark Machine Learning Decision Trees
Apache Spark Machine Learning Decision Trees
 
Practical Machine Learning: Innovations in Recommendation Workshop
Practical Machine Learning:  Innovations in Recommendation WorkshopPractical Machine Learning:  Innovations in Recommendation Workshop
Practical Machine Learning: Innovations in Recommendation Workshop
 
Design Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data AnalyticsDesign Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data Analytics
 
Going Real-Time: Creating Frequently-Updating Datasets for Personalization: S...
Going Real-Time: Creating Frequently-Updating Datasets for Personalization: S...Going Real-Time: Creating Frequently-Updating Datasets for Personalization: S...
Going Real-Time: Creating Frequently-Updating Datasets for Personalization: S...
 
Blue Pill/Red Pill: The Matrix of Thousands of Data Streams
Blue Pill/Red Pill: The Matrix of Thousands of Data StreamsBlue Pill/Red Pill: The Matrix of Thousands of Data Streams
Blue Pill/Red Pill: The Matrix of Thousands of Data Streams
 
Opal: Simple Web Services Wrappers for Scientific Applications
Opal: Simple Web Services Wrappers for Scientific ApplicationsOpal: Simple Web Services Wrappers for Scientific Applications
Opal: Simple Web Services Wrappers for Scientific Applications
 
Real-Time Robot Predictive Maintenance in Action
Real-Time Robot Predictive Maintenance in ActionReal-Time Robot Predictive Maintenance in Action
Real-Time Robot Predictive Maintenance in Action
 
How Spark Enables the Internet of Things- Paula Ta-Shma
How Spark Enables the Internet of Things- Paula Ta-ShmaHow Spark Enables the Internet of Things- Paula Ta-Shma
How Spark Enables the Internet of Things- Paula Ta-Shma
 
HBaseCon 2015: Warcbase - Scaling 'Out' and 'Down' HBase for Web Archiving
HBaseCon 2015: Warcbase - Scaling 'Out' and 'Down' HBase for Web ArchivingHBaseCon 2015: Warcbase - Scaling 'Out' and 'Down' HBase for Web Archiving
HBaseCon 2015: Warcbase - Scaling 'Out' and 'Down' HBase for Web Archiving
 

Similaire à Scaling big-data-mining-infra2

Big Data Ecosystem at InMobi, Nasscom ATC 2013 Noida
Big Data Ecosystem at InMobi, Nasscom ATC 2013 NoidaBig Data Ecosystem at InMobi, Nasscom ATC 2013 Noida
Big Data Ecosystem at InMobi, Nasscom ATC 2013 Noida
Sharad Agarwal
 
Analyzing 1.2 Million Network Packets per Second in Real-time
Analyzing 1.2 Million Network Packets per Second in Real-timeAnalyzing 1.2 Million Network Packets per Second in Real-time
Analyzing 1.2 Million Network Packets per Second in Real-time
DataWorks Summit
 
Sukumar Nayak-Agile-DevOps-Cloud Management
Sukumar Nayak-Agile-DevOps-Cloud ManagementSukumar Nayak-Agile-DevOps-Cloud Management
Sukumar Nayak-Agile-DevOps-Cloud Management
Sukumar Nayak
 

Similaire à Scaling big-data-mining-infra2 (20)

Best Practices in Porting & Developing Enterprise Applications to the Cloud u...
Best Practices in Porting & Developing Enterprise Applications to the Cloud u...Best Practices in Porting & Developing Enterprise Applications to the Cloud u...
Best Practices in Porting & Developing Enterprise Applications to the Cloud u...
 
Big Data Ecosystem at InMobi, Nasscom ATC 2013 Noida
Big Data Ecosystem at InMobi, Nasscom ATC 2013 NoidaBig Data Ecosystem at InMobi, Nasscom ATC 2013 Noida
Big Data Ecosystem at InMobi, Nasscom ATC 2013 Noida
 
Analyzing 1.2 Million Network Packets per Second in Real-time
Analyzing 1.2 Million Network Packets per Second in Real-timeAnalyzing 1.2 Million Network Packets per Second in Real-time
Analyzing 1.2 Million Network Packets per Second in Real-time
 
Using real time big data analytics for competitive advantage
 Using real time big data analytics for competitive advantage Using real time big data analytics for competitive advantage
Using real time big data analytics for competitive advantage
 
Real-time Visibility at Scale with Sumo Logic
Real-time Visibility at Scale with Sumo LogicReal-time Visibility at Scale with Sumo Logic
Real-time Visibility at Scale with Sumo Logic
 
Big Data for Security - DNS Analytics
Big Data for Security - DNS AnalyticsBig Data for Security - DNS Analytics
Big Data for Security - DNS Analytics
 
Aioug big data and hadoop
Aioug  big data and hadoopAioug  big data and hadoop
Aioug big data and hadoop
 
Big Data in the Cloud
Big Data in the Cloud Big Data in the Cloud
Big Data in the Cloud
 
Meetup Spark UDF performance
Meetup Spark UDF performanceMeetup Spark UDF performance
Meetup Spark UDF performance
 
Infochimps: Cloud for Big Data
Infochimps: Cloud for Big DataInfochimps: Cloud for Big Data
Infochimps: Cloud for Big Data
 
Webinar splunk cloud saa s plattform für operational intelligence
Webinar splunk cloud   saa s plattform für operational intelligenceWebinar splunk cloud   saa s plattform für operational intelligence
Webinar splunk cloud saa s plattform für operational intelligence
 
End to End Machine Learning Open Source Solution Presented in Cisco Developer...
End to End Machine Learning Open Source Solution Presented in Cisco Developer...End to End Machine Learning Open Source Solution Presented in Cisco Developer...
End to End Machine Learning Open Source Solution Presented in Cisco Developer...
 
Security, ETL, BI & Analytics, and Software Integration
Security, ETL, BI & Analytics, and Software IntegrationSecurity, ETL, BI & Analytics, and Software Integration
Security, ETL, BI & Analytics, and Software Integration
 
Pivoting Spring XD to Spring Cloud Data Flow with Sabby Anandan
Pivoting Spring XD to Spring Cloud Data Flow with Sabby AnandanPivoting Spring XD to Spring Cloud Data Flow with Sabby Anandan
Pivoting Spring XD to Spring Cloud Data Flow with Sabby Anandan
 
Intelligent Integration OOW2017 - Jeff Pollock
Intelligent Integration OOW2017 - Jeff PollockIntelligent Integration OOW2017 - Jeff Pollock
Intelligent Integration OOW2017 - Jeff Pollock
 
Hong Kong AWS Summit 2017 - Keynote
Hong Kong AWS Summit 2017 - KeynoteHong Kong AWS Summit 2017 - Keynote
Hong Kong AWS Summit 2017 - Keynote
 
Cloudera's Original Pitch Deck from 2008
Cloudera's Original Pitch Deck from 2008Cloudera's Original Pitch Deck from 2008
Cloudera's Original Pitch Deck from 2008
 
Sukumar Nayak-Agile-DevOps-Cloud Management
Sukumar Nayak-Agile-DevOps-Cloud ManagementSukumar Nayak-Agile-DevOps-Cloud Management
Sukumar Nayak-Agile-DevOps-Cloud Management
 
The Hive Think Tank: "Stream Processing Systems" by M.C. Srivas of MapR
The Hive Think Tank: "Stream Processing Systems" by M.C. Srivas of MapRThe Hive Think Tank: "Stream Processing Systems" by M.C. Srivas of MapR
The Hive Think Tank: "Stream Processing Systems" by M.C. Srivas of MapR
 
2017 OpenWorld Keynote for Data Integration
2017 OpenWorld Keynote for Data Integration2017 OpenWorld Keynote for Data Integration
2017 OpenWorld Keynote for Data Integration
 

Plus de Chris Huang

20130310 solr tuorial
20130310 solr tuorial20130310 solr tuorial
20130310 solr tuorial
Chris Huang
 
Hbase status quo apache-con europe - nov 2012
Hbase status quo   apache-con europe - nov 2012Hbase status quo   apache-con europe - nov 2012
Hbase status quo apache-con europe - nov 2012
Chris Huang
 
Hbase schema design and sizing apache-con europe - nov 2012
Hbase schema design and sizing   apache-con europe - nov 2012Hbase schema design and sizing   apache-con europe - nov 2012
Hbase schema design and sizing apache-con europe - nov 2012
Chris Huang
 
重構—改善既有程式的設計(chapter 12,13)
重構—改善既有程式的設計(chapter 12,13)重構—改善既有程式的設計(chapter 12,13)
重構—改善既有程式的設計(chapter 12,13)
Chris Huang
 
重構—改善既有程式的設計(chapter 10)
重構—改善既有程式的設計(chapter 10)重構—改善既有程式的設計(chapter 10)
重構—改善既有程式的設計(chapter 10)
Chris Huang
 
重構—改善既有程式的設計(chapter 9)
重構—改善既有程式的設計(chapter 9)重構—改善既有程式的設計(chapter 9)
重構—改善既有程式的設計(chapter 9)
Chris Huang
 
重構—改善既有程式的設計(chapter 8)part 2
重構—改善既有程式的設計(chapter 8)part 2重構—改善既有程式的設計(chapter 8)part 2
重構—改善既有程式的設計(chapter 8)part 2
Chris Huang
 
重構—改善既有程式的設計(chapter 8)part 1
重構—改善既有程式的設計(chapter 8)part 1重構—改善既有程式的設計(chapter 8)part 1
重構—改善既有程式的設計(chapter 8)part 1
Chris Huang
 
重構—改善既有程式的設計(chapter 7)
重構—改善既有程式的設計(chapter 7)重構—改善既有程式的設計(chapter 7)
重構—改善既有程式的設計(chapter 7)
Chris Huang
 
重構—改善既有程式的設計(chapter 6)
重構—改善既有程式的設計(chapter 6)重構—改善既有程式的設計(chapter 6)
重構—改善既有程式的設計(chapter 6)
Chris Huang
 
重構—改善既有程式的設計(chapter 4,5)
重構—改善既有程式的設計(chapter 4,5)重構—改善既有程式的設計(chapter 4,5)
重構—改善既有程式的設計(chapter 4,5)
Chris Huang
 
重構—改善既有程式的設計(chapter 2,3)
重構—改善既有程式的設計(chapter 2,3)重構—改善既有程式的設計(chapter 2,3)
重構—改善既有程式的設計(chapter 2,3)
Chris Huang
 
重構—改善既有程式的設計(chapter 1)
重構—改善既有程式的設計(chapter 1)重構—改善既有程式的設計(chapter 1)
重構—改善既有程式的設計(chapter 1)
Chris Huang
 
Designs, Lessons and Advice from Building Large Distributed Systems
Designs, Lessons and Advice from Building Large Distributed SystemsDesigns, Lessons and Advice from Building Large Distributed Systems
Designs, Lessons and Advice from Building Large Distributed Systems
Chris Huang
 
Hw5 my house in yong he
Hw5 my house in yong heHw5 my house in yong he
Hw5 my house in yong he
Chris Huang
 

Plus de Chris Huang (20)

Data compression, data security, and machine learning
Data compression, data security, and machine learningData compression, data security, and machine learning
Data compression, data security, and machine learning
 
Kks sre book_ch10
Kks sre book_ch10Kks sre book_ch10
Kks sre book_ch10
 
Kks sre book_ch1,2
Kks sre book_ch1,2Kks sre book_ch1,2
Kks sre book_ch1,2
 
20130310 solr tuorial
20130310 solr tuorial20130310 solr tuorial
20130310 solr tuorial
 
Applying Media Content Analysis to the Production of Musical Videos as Summar...
Applying Media Content Analysis to the Production of Musical Videos as Summar...Applying Media Content Analysis to the Production of Musical Videos as Summar...
Applying Media Content Analysis to the Production of Musical Videos as Summar...
 
Wissbi osdc pdf
Wissbi osdc pdfWissbi osdc pdf
Wissbi osdc pdf
 
Hbase status quo apache-con europe - nov 2012
Hbase status quo   apache-con europe - nov 2012Hbase status quo   apache-con europe - nov 2012
Hbase status quo apache-con europe - nov 2012
 
Hbase schema design and sizing apache-con europe - nov 2012
Hbase schema design and sizing   apache-con europe - nov 2012Hbase schema design and sizing   apache-con europe - nov 2012
Hbase schema design and sizing apache-con europe - nov 2012
 
重構—改善既有程式的設計(chapter 12,13)
重構—改善既有程式的設計(chapter 12,13)重構—改善既有程式的設計(chapter 12,13)
重構—改善既有程式的設計(chapter 12,13)
 
重構—改善既有程式的設計(chapter 10)
重構—改善既有程式的設計(chapter 10)重構—改善既有程式的設計(chapter 10)
重構—改善既有程式的設計(chapter 10)
 
重構—改善既有程式的設計(chapter 9)
重構—改善既有程式的設計(chapter 9)重構—改善既有程式的設計(chapter 9)
重構—改善既有程式的設計(chapter 9)
 
重構—改善既有程式的設計(chapter 8)part 2
重構—改善既有程式的設計(chapter 8)part 2重構—改善既有程式的設計(chapter 8)part 2
重構—改善既有程式的設計(chapter 8)part 2
 
重構—改善既有程式的設計(chapter 8)part 1
重構—改善既有程式的設計(chapter 8)part 1重構—改善既有程式的設計(chapter 8)part 1
重構—改善既有程式的設計(chapter 8)part 1
 
重構—改善既有程式的設計(chapter 7)
重構—改善既有程式的設計(chapter 7)重構—改善既有程式的設計(chapter 7)
重構—改善既有程式的設計(chapter 7)
 
重構—改善既有程式的設計(chapter 6)
重構—改善既有程式的設計(chapter 6)重構—改善既有程式的設計(chapter 6)
重構—改善既有程式的設計(chapter 6)
 
重構—改善既有程式的設計(chapter 4,5)
重構—改善既有程式的設計(chapter 4,5)重構—改善既有程式的設計(chapter 4,5)
重構—改善既有程式的設計(chapter 4,5)
 
重構—改善既有程式的設計(chapter 2,3)
重構—改善既有程式的設計(chapter 2,3)重構—改善既有程式的設計(chapter 2,3)
重構—改善既有程式的設計(chapter 2,3)
 
重構—改善既有程式的設計(chapter 1)
重構—改善既有程式的設計(chapter 1)重構—改善既有程式的設計(chapter 1)
重構—改善既有程式的設計(chapter 1)
 
Designs, Lessons and Advice from Building Large Distributed Systems
Designs, Lessons and Advice from Building Large Distributed SystemsDesigns, Lessons and Advice from Building Large Distributed Systems
Designs, Lessons and Advice from Building Large Distributed Systems
 
Hw5 my house in yong he
Hw5 my house in yong heHw5 my house in yong he
Hw5 my house in yong he
 

Dernier

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Dernier (20)

A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 

Scaling big-data-mining-infra2

  • 1. Scaling Big Data Mining Infrastructure: The Smart Protection Network Experience 黃振修 (Chris Huang) SPN 主動式雲端截毒技術架構師
  • 2. About Me • SPN 主動式雲端截毒技術架構師 • SPN Hadoop 基礎運算架構師 • Hadoop in Taiwan 2013 講師 • Hadoop.TW 活躍成員 9/10/2013 Confidential | Copyright 2013 TrendMicro Inc. 2
  • 3. 9/10/2013 Confidential | Copyright 2013 TrendMicro Inc. The Journey to Big Data 3
  • 4. 9/10/2013 Confidential | Copyright 2013 TrendMicro Inc. 4 YesterdayYesterdayYesterdayYesterday ~40 Hadoop nodes ~15 Service/user accounts 3 Teams <50 TB storage <100 Jobs per day
  • 5. 9/10/2013 Confidential | Copyright 2013 TrendMicro Inc. 5 TodayTodayTodayToday ~200 Hadoop nodes ~130 Service/user accounts 11 Teams ~500 TB storage >16000 Jobs per day
  • 6. 9/10/2013 Confidential | Copyright 2013 TrendMicro Inc. 6 1 MapReduce Job Submitted Each 5.4 Seconds
  • 7. 9/10/2013 Confidential | Copyright 2013 TrendMicro Inc. 7 Why?Why?Why?Why? Raw Data Actionable Intelligence
  • 8. Collaboration in the underground
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14. 網路威脅呈現爆炸性的成長 New Unique Malware Discovered 各式各樣的變種病毒、垃圾郵件、不明的下載來源等等,這些來自網路上 的威脅,躲過傳統安全防護系統的偵測,一直持續呈現爆炸性的成長,形 成嚴重的資安威脅 1M unique Malwares every month 1M unique Malwares every month
  • 15. Reality Check 2011 New Unique Threats per Hour (worldwide estimate*) Network Worms Threats Found in Enterprises (Real-world data from 150+ assessments*) Data-Stealing Malware IRC Bots Targeting Malware COMPLEXITY DANGER Dangerous RisksSkyrocketing Volume Avoiding Detection 42% 56% 77% 100% 2010200920082007 12600 NEW Threat Every 0.28 Seconds 2400 • 52% of companies failed to report or remediate a cyber breach in 2011. --- SAIC, 2011 • Two new pieces of malwares are created every second. --- Trend Micro, 2012 • A cyber intrusion occurs every 5 minutes. --- US CERT 2012
  • 16. Traditional approach is no more sufficient!
  • 17. 9/10/2013 Confidential | Copyright 2013 TrendMicro Inc. Big Data Exploration 17
  • 18. New approach for cyber threat solution Web CrawlerWeb Crawler Trend Micro Endpoint Protection Trend Micro Endpoint Protection Trend Micro Mail Protection Trend Micro Mail Protection Trend Micro Web Protection Trend Micro Web Protection HoneypotHoneypot CDN / xSPCDN / xSP Researcher Intelligence Researcher Intelligence 3+ Billion Worldwide Sensors
  • 19. SPN: Smart Protection Network 9/10/2013 Confidential | Copyright 2013 TrendMicro Inc. 19 Collects Protects Identifies BIG DATA ANALYTICS (Data Mining, Machine Learning, Modeling, Correlation) DAILY STATS: • 7.2 TB data correlated • 1B IP addresses • 90K malicious threats identified • 100+M good files
  • 20. SPN High Level Architecture 9/10/2013 Confidential | Copyright 2013 TrendMicro Inc. 20 Receiver Trend Message Exchange (Message Bus) Hadoop Distributed File System (HDFS) HBaseMapReduce Adhoc-Query (Pig) Oozie CDN/xSP Log Honey Pot SPN Feedback Data SourcingData Sourcing APP 1 MySPN Platform Solr Cloud API Server/Portal Service Platform APP 2 Service DeliveryService Delivery
  • 21. MySPN Ecosystem Portal & API Single Entry-Point SPN Infrastructure APT KB Service TopCVE Service APT KB VE DB FB Logs Census MySPN Market Place Service Platform SSO New App OPS RD / Team Monitor SDK All My Guard Threat Connect Dashboard Service Catalog Census Profile Alert New App Dispatcher Access Login Trender Need Solution Customer Publish ImplementOperate Develop Solution backed-by Data Catalogue
  • 22. SPN Solution Architecture File URL Web / URL Email Domain IP File Reputation ServiceFile Reputation Service Email Reputation ServiceEmail Reputation Service Customer SmartProtection Community Intelligence (Feedback loop) Web Reputation ServiceWeb Reputation Service Sourcing Processing & Analysis Validate & Create Solution Quality Assurance Solution Distribution Solution Adoption SPN Correlation
  • 23. 9/10/2013 Confidential | Copyright 2013 TrendMicro Inc. Big Data Case Study 23
  • 24. Internet Web Server 4. Access page 1. Intercept URL SPN Cloud 9/10/2013 24 200K+ new URL created every day Case Study: Web Reputation Services
  • 25. 8+ billions URL process daily User Traffic / Sourcing CDN vender Rating Server for Known Threats Unknown & Prefilter Page Download Threat Analysis 8 billions/day 4.8 billions/day 860 millions/day 40% filtered 82% filtered 25,000 malicious URL /day 99.98% filtered Trend Micro Products / Technology CDN Cache High Throughput Web Service Hadoop Cluster Web Crawling Machine Learning Data Mining Technology Process Operation Block malicious URL within 15 minutes once it goes online!
  • 27. 9/10/2013 Confidential | Copyright 2013 TrendMicro Inc. Big Data Lesson Learned 27
  • 28. How to Scale? • Un-structure data first • If you really need structure data – Use Google Protocol Buffers or – JSON string • Purify your data before processing • Leverage HBase more – Well design row key to prevent hot-spot • Use MapReduce to create Lucene index • Leverage SolrCloud for complex real-time use cases 9/10/2013 Confidential | Copyright 2013 TrendMicro Inc. 28
  • 29. Our Learning • Has clear strategy first • Start small, scale quickly • Chose right solution for right problem
  • 30. Q&A 9/10/2013 Confidential | Copyright 2013 TrendMicro Inc. 30
  • 31. 9/10/2013 Confidential | Copyright 2013 TrendMicro Inc. 31 Big Challenge Big Opportunity Thank You