Publicité
Publicité

Contenu connexe

Similaire à Practical Applications of Machine Learning in Cybersecurity(20)

Publicité

Dernier(20)

Publicité

Practical Applications of Machine Learning in Cybersecurity

  1. McAfee Confidential April 25, 2019 Celeste Fralick, Ph.D., CQA Senior Principal Engineer, Chief Data Scientist Office of the CTO, McAfee Practical Applications of Machine Learning in Cybersecurity
  2. 2 The Analytics Hype-line Loosely based on https://en.wikipedia.org/wiki/Timeline_of_machine_learning Predictive Analytics Emerge 1940 AI Proposed by John McCarthy 1956 Neural Networks Emerge by Frank Rosenblatt 1958 Neural Networks Dismissed 1969 Big Data Emerges 2005 Data Scientists Emerge 2001 Watson Makes AI Interesting Again 2011 Neural Networks Acceptable 2015 Machine Learning Solves Everything 2016 AI = All Analytics 2018 Not to scale
  3. Demystifying Analytic Terms Structured data Data that resides in a fixed field within a record or file, including relational databases and spreadsheets Unstructured data Data that is not organized in a pre-defined manner, including text-heavy docs & social media Semi-structured data Data that does not conform strictly with relational databases, but contains tags/markers to enable hierarchy Reinforcement Learning Data that maximizes rewards based on exploration and exploitation of known environments (walking baby) Why do we care about these terms? It helps to select models & features!
  4. Demystifying Analytic Terms: What’s a “Feature”? Type of machine Age of machine Cleanliness of machine Temperature of water Type of water Brand of coffee Origin of coffee Grind of coffee Type of roast Organic coffee Mug or cup A Feature is an individual measurable property or characteristic that enables the desired output.
  5. L AI Deep Learning Machine Learning Statistics Architecture and Data Management Complexity & Intelligence Reason, logic, value judgments Trains & learns, patterns Complex, layered Models, summary stats Data lineage, compute capability • McAfee Investigator • McAfee ATD • Real Protect • Mobile Security Pyramid of Complexity and Intelligence in Analytics
  6. The McAfee Analytic Ecosystem: ML/DL/AI Applications Cloud McAfee Threat Research On Premises Security Operation Center Gateway ML DL AI MLML ML DL DL DL AI DL AI Via telemetry, threat analyses, and industry feeds, McAfee integrates expert analytics throughout the security ecosystem
  7. The Process of “Learning”
  8. 8McAfee Confidential Risks in Analytic Development • Poor intelligence leads to bad business decisions • Unhappy customers, reduced ROI & ROA • Lack of growth and cash generation • Increased False Positives and False Negatives
  9. 9 Examples of Specific Risks in Analytic Development Bias Statistical. human, ethics, intent Adversarial Machine Learning Evading or poisoning of training or test sets Lack of Explainability (XAI) How are decisions made? Liability? Citizen Data Scientists Data + one model ≠ data science Poor Scientific Protocol Repeatable analytic development process How long will model last in field? Implications of changes, periodic training?Data Decay RISK DESCRIPTION
  10. Why are there so many “citizen” data scientists? • “Sexy” title (HBR), LOTS of data • Demand for immediate business intelligence & action • Too many areas to learn • Too few data scientists • Ill defined job role • “Easy to learn” mentality without underlying statistical fundamentals Credits: CIO Journal (2014) and B. Marr (2016) Statistics Math SW/HW Domain Data Mgmt & Arch System Engineering What a Data Scientist Needs to Know Analytics
  11. Analytic Risk Assessment Verification & Validation Analytic Plan & Peer Review Define Requirements Post Production Release Analytic Review(s) Analytic Report & Peer Review Discover, develop & iterate analytics Planning ProductionDevelopmentExploration Define Usage Model & Problem Framing State of Art Assessment Analytic Discontinuance Analytic Life Cycle (Waterfall)
  12. • Does the Training sample represent the larger and final population? How do you know? • Is the sample balanced? If not, why not? • What is your expected compute footprint? • What 3-5 models will be attempted? What error rates will be compared? • How well will the proposed models explain the expected output? (Explainability) • How vulnerable are the algorithms to AML? • How often will the algorithm learn? • How will model drift be detected in the field? Identify, Quantify, Mitigate, and Learn Analytic Risks (also, use these questions to check your Data Scientist!) Analytic Risk Assessment Exploration
  13. Analytic Life Cycle (Agile) 13 Analytic Plan & Peer Review Analytic Report & Peer Review Post Production Release Analytic Review(s) Discover, develop & iterate analytics Validation & Verification Analytic Discontinuance Define Usage Model & Problem Framing Define Requirements State of Art Assessment Analytic Risk Assessment
  14. Validation: Have you done the RIGHT analytic? • Trace back to customer use case and contract • e.g.: Causal relationships, flow charts, visuals, graphs Verification: Have you done the analytic RIGHT? • Verify the mathematics and model fit • e.g., ROC, RMSE, R2, confidence limits ROC:https://commons.wikimedia.org/wiki/File%3ARoccurves.png
  15. 15 Summary • Understand & mitigate the hype • Risks are inherent in Analytics • Utilize an Analytic Development Protocol • Perform an Analytic Risk Assessment • Validate & Verify! • •
  16. McAfee, the McAfee logo and [insert <other relevant McAfee Names>] are trademarks or registered trademarks of McAfee, LLC or its subsidiaries in the U.S. and/or other countries. Other names and brands may be claimed as the property of others. Copyright © 2017 McAfee, LLC.
Publicité