SlideShare a Scribd company logo
1 of 31
Securing & Personalizing Commerce

             Using Identity Data Mining




                              Jonathan LeBlanc
                 Developer Evangelist (PayPal)
              Github: http://github.com/jcleblanc
                              Twitter: @jcleblanc
The Problem
Commerce Relies on Static Data Contributions
Premise



 You can determine the personality profile of
 a person based on their usage habits


 Personalization == Security
Technology was the Solution!
Then I Read This…



               Us & Them
               The Science of Identity
               By David Berreby
The Different States of Knowledge


 What a person knows


 What a person knows they don’t know


 What a person doesn’t know they don’t know
Technology was NOT the Solution



   Identity and discovery are
   NOT a technology solution
Our Subject Material
Our Subject Material

        HTML content is poorly structured


        You can’t trust that anything
        semantically valid will be present


        There are some pretty bad web
        practices on the interwebz
How We’ll Capture This Data




             Start with base linguistics

             Extend with available extras
The Basic Pieces




  Page Data   Keywords      Weighting
   Scrapey    Without all   Word diets
   Scrapey     the fluff      FTW
Capture Raw Page Data


             Semantic data on the web
             is sucktastic

             Assume 5 year olds built
             the sites
             Language is the key
Extract Keywords



              We now have a big jumble
              of words. Let’s extract

              Why is “and” a top word?
              Stop words = sad panda
Weight Keywords


             All content is not created
             equal

             Meta and headers and
             semantics oh my!

             This is where we leech
             off the work of others
Questions to Keep in Mind

   Should I use regex to parse web
   content?

    How do users interact with page
    content?

   What key identifiers can be monitored
   to detect interest?
Fetching the Data: The Request

The Simple Way

  $html = file_get_contents('URL');


The Controlled Way

  $c = curl_init('URL');
Fetching the Data: cURL
 $req = curl_init($url);

 $options = array(
    CURLOPT_URL => $url,
    CURLOPT_HEADER => $header,
    CURLOPT_RETURNTRANSFER => true,
    CURLOPT_FOLLOWLOCATION => true,
    CURLOPT_AUTOREFERER => true,
    CURLOPT_TIMEOUT => 15,
    CURLOPT_MAXREDIRS => 10
 );

 curl_setopt_array($req, $options);
//list of findable / replaceable string characters
$find = array('/r/', '/n/', '/ss+/'); $replace = array(' ', ' ', ' ');

//perform page content modification
$mod_content = preg_replace('#<script(.*?)>(.*?)</
      script>#is', '', $page_content);
$mod_content = preg_replace('#<style(.*?)>(.*?)</
      style>#is', '', $mod_content);

$mod_content = strip_tags($mod_content);
$mod_content = strtolower($mod_content);
$mod_content = preg_replace($find, $replace, $mod_content);
$mod_content = trim($mod_content);
$mod_content = explode(' ', $mod_content);

natcasesort($mod_content);
//set up list of stop words and the final found stopped list
$common_words = array('a', ..., 'zero');
$searched_words = array();

//extract list of keywords with number of occurrences
foreach($mod_content as $word) {
   $word = trim($word);
   if (preg_match('/[^a-zA-Z]/', $word) == 1){ $word = ''; }
   if(strlen($word) > 2 && !in_array($word, $common_words)){
       $searched_words[$word]++;
   }
}

arsort($searched_words, SORT_NUMERIC);
Scraping Site Meta Data



 //load scraped page data as a valid DOM document
 $dom = new DOMDocument();
 @$dom->loadHTML($page_content);

 //scrape title
 $title = $dom->getElementsByTagName("title");
 $title = $title->item(0)->nodeValue;
//loop through all found meta tags
$metas = $dom->getElementsByTagName("meta");

for ($i = 0; $i < $metas->length; $i++){
  $meta = $metas->item($i);
  if($meta->getAttribute("property")){
    if ($meta->getAttribute("property") == "og:description"){
      $dataReturn["description"] = $meta->getAttribute("content");
    }
  } else {
    if($meta->getAttribute("name") == "description"){
      $dataReturn["description"] = $meta->getAttribute("content");
    } else if($meta->getAttribute("name") == "keywords”){
      $dataReturn[”keywords"] = $meta->getAttribute("content");
    }
  }
}
Weighting Important Data


              Tags you should care
              about: meta (include OG),
              title, description, h1+,
              header

              Bonus points for adding in
              content location modifiers
Weighting Important Tags


//our keyword weights
$weights = array("keywords"   => "3.0",
                 "meta"       => "2.0",
                 "header1"    => "1.5",
                 "header2"    => "1.2");

//add modifier here
if(strlen($word) > 2 && !in_array($word, $common_words)){
   $searched_words[$word]++;
}
Expanding to Phrases


            2-3 adjacent words, making
            up a direct relevant callout

            Seems easy right? Just like
            single words

            Language gets wonky
            without stop words
Working with Unknown Users



            The majority of users won’t
            be immediately targetable

            Use HTML5 LocalStorage &
            Cookie backup
Adding in Time Interactions

             Interaction with a site does
             not necessarily mean
             interest in it

             Time needs to also include
             an interaction component

             Gift buying seasons see
             interest variations
Grouping Using Commonality




                  Common
                  Interests
      Interests               Interests
      User A                    User B
Thank You! Questions?
    www.slideshare.com/jcleblanc




                        Jonathan LeBlanc
           Developer Evangelist (PayPal)
        Github: http://github.com/jcleblanc
                        Twitter: @jcleblanc

More Related Content

What's hot

JSON-LD: Linked Data for Web Apps
JSON-LD: Linked Data for Web AppsJSON-LD: Linked Data for Web Apps
JSON-LD: Linked Data for Web AppsGregg Kellogg
 
Spiffy Applications With JavaScript
Spiffy Applications With JavaScriptSpiffy Applications With JavaScript
Spiffy Applications With JavaScriptMark Casias
 
Dream House Project Presentation
Dream House Project PresentationDream House Project Presentation
Dream House Project Presentationjongosling
 
JSON-LD: JSON for Linked Data
JSON-LD: JSON for Linked DataJSON-LD: JSON for Linked Data
JSON-LD: JSON for Linked DataGregg Kellogg
 
Contacto server API in PHP
Contacto server API in PHPContacto server API in PHP
Contacto server API in PHPHem Shrestha
 
Elastify you application: from SQL to NoSQL in less than one hour!
Elastify you application: from SQL to NoSQL in less than one hour!Elastify you application: from SQL to NoSQL in less than one hour!
Elastify you application: from SQL to NoSQL in less than one hour!David Pilato
 
Why Python Web Frameworks Are Changing the Web
Why Python Web Frameworks Are Changing the WebWhy Python Web Frameworks Are Changing the Web
Why Python Web Frameworks Are Changing the Webjoelburton
 
Hi5 Opensocial Code Lab Presentation
Hi5 Opensocial Code Lab PresentationHi5 Opensocial Code Lab Presentation
Hi5 Opensocial Code Lab Presentationplindner
 
20180424 #18 we_are_javascripters
20180424 #18 we_are_javascripters20180424 #18 we_are_javascripters
20180424 #18 we_are_javascripters将一 深見
 
nodum.io MongoDB Meetup (Dutch)
nodum.io MongoDB Meetup (Dutch)nodum.io MongoDB Meetup (Dutch)
nodum.io MongoDB Meetup (Dutch)Wietse Wind
 
Introduction to jQuery
Introduction to jQueryIntroduction to jQuery
Introduction to jQueryGunjan Kumar
 
Findability Bliss Through Web Standards
Findability Bliss Through Web StandardsFindability Bliss Through Web Standards
Findability Bliss Through Web StandardsAarron Walter
 
jQuery Presentation
jQuery PresentationjQuery Presentation
jQuery PresentationRod Johnson
 
Linked Data in Use: Schema.org, JSON-LD and hypermedia APIs - Front in Bahia...
Linked Data in Use: Schema.org, JSON-LD and hypermedia APIs  - Front in Bahia...Linked Data in Use: Schema.org, JSON-LD and hypermedia APIs  - Front in Bahia...
Linked Data in Use: Schema.org, JSON-LD and hypermedia APIs - Front in Bahia...Ícaro Medeiros
 
So cal0365productivitygroup feb2019
So cal0365productivitygroup feb2019So cal0365productivitygroup feb2019
So cal0365productivitygroup feb2019RonRohlfs1
 

What's hot (20)

JSON-LD: Linked Data for Web Apps
JSON-LD: Linked Data for Web AppsJSON-LD: Linked Data for Web Apps
JSON-LD: Linked Data for Web Apps
 
HTML5 Essentials
HTML5 EssentialsHTML5 Essentials
HTML5 Essentials
 
Spiffy Applications With JavaScript
Spiffy Applications With JavaScriptSpiffy Applications With JavaScript
Spiffy Applications With JavaScript
 
Dream House Project Presentation
Dream House Project PresentationDream House Project Presentation
Dream House Project Presentation
 
JSON-LD: JSON for Linked Data
JSON-LD: JSON for Linked DataJSON-LD: JSON for Linked Data
JSON-LD: JSON for Linked Data
 
Contacto server API in PHP
Contacto server API in PHPContacto server API in PHP
Contacto server API in PHP
 
Elastify you application: from SQL to NoSQL in less than one hour!
Elastify you application: from SQL to NoSQL in less than one hour!Elastify you application: from SQL to NoSQL in less than one hour!
Elastify you application: from SQL to NoSQL in less than one hour!
 
Why Python Web Frameworks Are Changing the Web
Why Python Web Frameworks Are Changing the WebWhy Python Web Frameworks Are Changing the Web
Why Python Web Frameworks Are Changing the Web
 
Hi5 Opensocial Code Lab Presentation
Hi5 Opensocial Code Lab PresentationHi5 Opensocial Code Lab Presentation
Hi5 Opensocial Code Lab Presentation
 
20180424 #18 we_are_javascripters
20180424 #18 we_are_javascripters20180424 #18 we_are_javascripters
20180424 #18 we_are_javascripters
 
nodum.io MongoDB Meetup (Dutch)
nodum.io MongoDB Meetup (Dutch)nodum.io MongoDB Meetup (Dutch)
nodum.io MongoDB Meetup (Dutch)
 
jQuery Best Practice
jQuery Best Practice jQuery Best Practice
jQuery Best Practice
 
Introduction to jQuery
Introduction to jQueryIntroduction to jQuery
Introduction to jQuery
 
Findability Bliss Through Web Standards
Findability Bliss Through Web StandardsFindability Bliss Through Web Standards
Findability Bliss Through Web Standards
 
jQuery Presentation
jQuery PresentationjQuery Presentation
jQuery Presentation
 
jQuery
jQueryjQuery
jQuery
 
Google Dorks
Google DorksGoogle Dorks
Google Dorks
 
jQuery
jQueryjQuery
jQuery
 
Linked Data in Use: Schema.org, JSON-LD and hypermedia APIs - Front in Bahia...
Linked Data in Use: Schema.org, JSON-LD and hypermedia APIs  - Front in Bahia...Linked Data in Use: Schema.org, JSON-LD and hypermedia APIs  - Front in Bahia...
Linked Data in Use: Schema.org, JSON-LD and hypermedia APIs - Front in Bahia...
 
So cal0365productivitygroup feb2019
So cal0365productivitygroup feb2019So cal0365productivitygroup feb2019
So cal0365productivitygroup feb2019
 

Viewers also liked

Browser MVC with YQL and YUI
Browser MVC with YQL and YUIBrowser MVC with YQL and YUI
Browser MVC with YQL and YUIJonathan LeBlanc
 
Rrv head & shoulders new
Rrv head & shoulders newRrv head & shoulders new
Rrv head & shoulders newBhupesh sahu
 
SXSWi 2012: Programming Social Applications
SXSWi 2012: Programming Social ApplicationsSXSWi 2012: Programming Social Applications
SXSWi 2012: Programming Social ApplicationsJonathan LeBlanc
 
Secure Payments Over Mixed Communication Media
Secure Payments Over Mixed Communication MediaSecure Payments Over Mixed Communication Media
Secure Payments Over Mixed Communication MediaJonathan LeBlanc
 
Modern API Security with JSON Web Tokens
Modern API Security with JSON Web TokensModern API Security with JSON Web Tokens
Modern API Security with JSON Web TokensJonathan LeBlanc
 

Viewers also liked (6)

Browser MVC with YQL and YUI
Browser MVC with YQL and YUIBrowser MVC with YQL and YUI
Browser MVC with YQL and YUI
 
2011 HackU UCSD
2011 HackU UCSD2011 HackU UCSD
2011 HackU UCSD
 
Rrv head & shoulders new
Rrv head & shoulders newRrv head & shoulders new
Rrv head & shoulders new
 
SXSWi 2012: Programming Social Applications
SXSWi 2012: Programming Social ApplicationsSXSWi 2012: Programming Social Applications
SXSWi 2012: Programming Social Applications
 
Secure Payments Over Mixed Communication Media
Secure Payments Over Mixed Communication MediaSecure Payments Over Mixed Communication Media
Secure Payments Over Mixed Communication Media
 
Modern API Security with JSON Web Tokens
Modern API Security with JSON Web TokensModern API Security with JSON Web Tokens
Modern API Security with JSON Web Tokens
 

Similar to Securing and Personalizing Commerce Using Identity Data Mining

Integrating Government Data New
Integrating Government Data NewIntegrating Government Data New
Integrating Government Data Newguest4543bb
 
Introduction to Linked Data
Introduction to Linked DataIntroduction to Linked Data
Introduction to Linked DataJuan Sequeda
 
Structured Document Search and Retrieval
Structured Document Search and RetrievalStructured Document Search and Retrieval
Structured Document Search and RetrievalOptum
 
Learning To Run - XPages for Lotus Notes Client Developers
Learning To Run - XPages for Lotus Notes Client DevelopersLearning To Run - XPages for Lotus Notes Client Developers
Learning To Run - XPages for Lotus Notes Client DevelopersKathy Brown
 
Some thoughts on social tagging
Some thoughts on social taggingSome thoughts on social tagging
Some thoughts on social taggingmarti_hearst
 
Module 9: Natural Language Processing Part 2
Module 9:  Natural Language Processing Part 2Module 9:  Natural Language Processing Part 2
Module 9: Natural Language Processing Part 2Sara Hooker
 
You Don't Know SEO
You Don't Know SEOYou Don't Know SEO
You Don't Know SEOMichael King
 
SearchLove London | Dave Sottimano, 'Using Data to Win Arguments'
SearchLove London | Dave Sottimano, 'Using Data to Win Arguments' SearchLove London | Dave Sottimano, 'Using Data to Win Arguments'
SearchLove London | Dave Sottimano, 'Using Data to Win Arguments' Distilled
 
Semantic Web: A web that is not the Web
Semantic Web: A web that is not the WebSemantic Web: A web that is not the Web
Semantic Web: A web that is not the WebBruce Esrig
 
Outranking.io Summit Entity Analysis In SEO
Outranking.io Summit Entity Analysis In SEOOutranking.io Summit Entity Analysis In SEO
Outranking.io Summit Entity Analysis In SEODan Taylor
 
How to Build a Semantic Search System
How to Build a Semantic Search SystemHow to Build a Semantic Search System
How to Build a Semantic Search SystemTrey Grainger
 
Bearish SEO: Defining the User Experience for Google’s Panda Search Landscape
Bearish SEO: Defining the User Experience for Google’s Panda Search LandscapeBearish SEO: Defining the User Experience for Google’s Panda Search Landscape
Bearish SEO: Defining the User Experience for Google’s Panda Search LandscapeMarianne Sweeny
 
The New Content SEO - Sydney SEO Conference 2023
The New Content SEO - Sydney SEO Conference 2023The New Content SEO - Sydney SEO Conference 2023
The New Content SEO - Sydney SEO Conference 2023Amanda King
 
Structured SEO Data Overview and How To
Structured SEO Data Overview and How ToStructured SEO Data Overview and How To
Structured SEO Data Overview and How Tocgmonroe
 
The Semantic Web
The Semantic WebThe Semantic Web
The Semantic Webostephens
 
The Django Web Application Framework
The Django Web Application FrameworkThe Django Web Application Framework
The Django Web Application FrameworkSimon Willison
 

Similar to Securing and Personalizing Commerce Using Identity Data Mining (20)

Integrating Government Data New
Integrating Government Data NewIntegrating Government Data New
Integrating Government Data New
 
Oops Concepts
Oops ConceptsOops Concepts
Oops Concepts
 
Introduction to Linked Data
Introduction to Linked DataIntroduction to Linked Data
Introduction to Linked Data
 
Structured Document Search and Retrieval
Structured Document Search and RetrievalStructured Document Search and Retrieval
Structured Document Search and Retrieval
 
Learning To Run - XPages for Lotus Notes Client Developers
Learning To Run - XPages for Lotus Notes Client DevelopersLearning To Run - XPages for Lotus Notes Client Developers
Learning To Run - XPages for Lotus Notes Client Developers
 
Some thoughts on social tagging
Some thoughts on social taggingSome thoughts on social tagging
Some thoughts on social tagging
 
Module 9: Natural Language Processing Part 2
Module 9:  Natural Language Processing Part 2Module 9:  Natural Language Processing Part 2
Module 9: Natural Language Processing Part 2
 
You Don't Know SEO
You Don't Know SEOYou Don't Know SEO
You Don't Know SEO
 
En24877880
En24877880En24877880
En24877880
 
SearchLove London | Dave Sottimano, 'Using Data to Win Arguments'
SearchLove London | Dave Sottimano, 'Using Data to Win Arguments' SearchLove London | Dave Sottimano, 'Using Data to Win Arguments'
SearchLove London | Dave Sottimano, 'Using Data to Win Arguments'
 
Sweo talk
Sweo talkSweo talk
Sweo talk
 
Semantic Web, e-commerce
Semantic Web, e-commerceSemantic Web, e-commerce
Semantic Web, e-commerce
 
Semantic Web: A web that is not the Web
Semantic Web: A web that is not the WebSemantic Web: A web that is not the Web
Semantic Web: A web that is not the Web
 
Outranking.io Summit Entity Analysis In SEO
Outranking.io Summit Entity Analysis In SEOOutranking.io Summit Entity Analysis In SEO
Outranking.io Summit Entity Analysis In SEO
 
How to Build a Semantic Search System
How to Build a Semantic Search SystemHow to Build a Semantic Search System
How to Build a Semantic Search System
 
Bearish SEO: Defining the User Experience for Google’s Panda Search Landscape
Bearish SEO: Defining the User Experience for Google’s Panda Search LandscapeBearish SEO: Defining the User Experience for Google’s Panda Search Landscape
Bearish SEO: Defining the User Experience for Google’s Panda Search Landscape
 
The New Content SEO - Sydney SEO Conference 2023
The New Content SEO - Sydney SEO Conference 2023The New Content SEO - Sydney SEO Conference 2023
The New Content SEO - Sydney SEO Conference 2023
 
Structured SEO Data Overview and How To
Structured SEO Data Overview and How ToStructured SEO Data Overview and How To
Structured SEO Data Overview and How To
 
The Semantic Web
The Semantic WebThe Semantic Web
The Semantic Web
 
The Django Web Application Framework
The Django Web Application FrameworkThe Django Web Application Framework
The Django Web Application Framework
 

More from Jonathan LeBlanc

JavaScript App Security: Auth and Identity on the Client
JavaScript App Security: Auth and Identity on the ClientJavaScript App Security: Auth and Identity on the Client
JavaScript App Security: Auth and Identity on the ClientJonathan LeBlanc
 
Improving Developer Onboarding Through Intelligent Data Insights
Improving Developer Onboarding Through Intelligent Data InsightsImproving Developer Onboarding Through Intelligent Data Insights
Improving Developer Onboarding Through Intelligent Data InsightsJonathan LeBlanc
 
Better Data with Machine Learning and Serverless
Better Data with Machine Learning and ServerlessBetter Data with Machine Learning and Serverless
Better Data with Machine Learning and ServerlessJonathan LeBlanc
 
Best Practices for Application Development with Box
Best Practices for Application Development with BoxBest Practices for Application Development with Box
Best Practices for Application Development with BoxJonathan LeBlanc
 
Box Platform Developer Workshop
Box Platform Developer WorkshopBox Platform Developer Workshop
Box Platform Developer WorkshopJonathan LeBlanc
 
Modern Cloud Data Security Practices
Modern Cloud Data Security PracticesModern Cloud Data Security Practices
Modern Cloud Data Security PracticesJonathan LeBlanc
 
Understanding Box UI Elements
Understanding Box UI ElementsUnderstanding Box UI Elements
Understanding Box UI ElementsJonathan LeBlanc
 
Understanding Box applications, tokens, and scoping
Understanding Box applications, tokens, and scopingUnderstanding Box applications, tokens, and scoping
Understanding Box applications, tokens, and scopingJonathan LeBlanc
 
The Future of Online Money: Creating Secure Payments Globally
The Future of Online Money: Creating Secure Payments GloballyThe Future of Online Money: Creating Secure Payments Globally
The Future of Online Money: Creating Secure Payments GloballyJonathan LeBlanc
 
Creating an In-Aisle Purchasing System from Scratch
Creating an In-Aisle Purchasing System from ScratchCreating an In-Aisle Purchasing System from Scratch
Creating an In-Aisle Purchasing System from ScratchJonathan LeBlanc
 
Protecting the Future of Mobile Payments
Protecting the Future of Mobile PaymentsProtecting the Future of Mobile Payments
Protecting the Future of Mobile PaymentsJonathan LeBlanc
 
Node.js Authentication and Data Security
Node.js Authentication and Data SecurityNode.js Authentication and Data Security
Node.js Authentication and Data SecurityJonathan LeBlanc
 
PHP Identity and Data Security
PHP Identity and Data SecurityPHP Identity and Data Security
PHP Identity and Data SecurityJonathan LeBlanc
 
Secure Payments Over Mixed Communication Media
Secure Payments Over Mixed Communication MediaSecure Payments Over Mixed Communication Media
Secure Payments Over Mixed Communication MediaJonathan LeBlanc
 
Protecting the Future of Mobile Payments
Protecting the Future of Mobile PaymentsProtecting the Future of Mobile Payments
Protecting the Future of Mobile PaymentsJonathan LeBlanc
 
Future of Identity, Data, and Wearable Security
Future of Identity, Data, and Wearable SecurityFuture of Identity, Data, and Wearable Security
Future of Identity, Data, and Wearable SecurityJonathan LeBlanc
 

More from Jonathan LeBlanc (20)

JavaScript App Security: Auth and Identity on the Client
JavaScript App Security: Auth and Identity on the ClientJavaScript App Security: Auth and Identity on the Client
JavaScript App Security: Auth and Identity on the Client
 
Improving Developer Onboarding Through Intelligent Data Insights
Improving Developer Onboarding Through Intelligent Data InsightsImproving Developer Onboarding Through Intelligent Data Insights
Improving Developer Onboarding Through Intelligent Data Insights
 
Better Data with Machine Learning and Serverless
Better Data with Machine Learning and ServerlessBetter Data with Machine Learning and Serverless
Better Data with Machine Learning and Serverless
 
Best Practices for Application Development with Box
Best Practices for Application Development with BoxBest Practices for Application Development with Box
Best Practices for Application Development with Box
 
Box Platform Overview
Box Platform OverviewBox Platform Overview
Box Platform Overview
 
Box Platform Developer Workshop
Box Platform Developer WorkshopBox Platform Developer Workshop
Box Platform Developer Workshop
 
Modern Cloud Data Security Practices
Modern Cloud Data Security PracticesModern Cloud Data Security Practices
Modern Cloud Data Security Practices
 
Box Authentication Types
Box Authentication TypesBox Authentication Types
Box Authentication Types
 
Understanding Box UI Elements
Understanding Box UI ElementsUnderstanding Box UI Elements
Understanding Box UI Elements
 
Understanding Box applications, tokens, and scoping
Understanding Box applications, tokens, and scopingUnderstanding Box applications, tokens, and scoping
Understanding Box applications, tokens, and scoping
 
The Future of Online Money: Creating Secure Payments Globally
The Future of Online Money: Creating Secure Payments GloballyThe Future of Online Money: Creating Secure Payments Globally
The Future of Online Money: Creating Secure Payments Globally
 
Creating an In-Aisle Purchasing System from Scratch
Creating an In-Aisle Purchasing System from ScratchCreating an In-Aisle Purchasing System from Scratch
Creating an In-Aisle Purchasing System from Scratch
 
Protecting the Future of Mobile Payments
Protecting the Future of Mobile PaymentsProtecting the Future of Mobile Payments
Protecting the Future of Mobile Payments
 
Node.js Authentication and Data Security
Node.js Authentication and Data SecurityNode.js Authentication and Data Security
Node.js Authentication and Data Security
 
PHP Identity and Data Security
PHP Identity and Data SecurityPHP Identity and Data Security
PHP Identity and Data Security
 
Secure Payments Over Mixed Communication Media
Secure Payments Over Mixed Communication MediaSecure Payments Over Mixed Communication Media
Secure Payments Over Mixed Communication Media
 
Protecting the Future of Mobile Payments
Protecting the Future of Mobile PaymentsProtecting the Future of Mobile Payments
Protecting the Future of Mobile Payments
 
Future of Identity, Data, and Wearable Security
Future of Identity, Data, and Wearable SecurityFuture of Identity, Data, and Wearable Security
Future of Identity, Data, and Wearable Security
 
Kill All Passwords
Kill All PasswordsKill All Passwords
Kill All Passwords
 
BattleHack Los Angeles
BattleHack Los Angeles BattleHack Los Angeles
BattleHack Los Angeles
 

Recently uploaded

GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 

Recently uploaded (20)

GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 

Securing and Personalizing Commerce Using Identity Data Mining

  • 1. Securing & Personalizing Commerce Using Identity Data Mining Jonathan LeBlanc Developer Evangelist (PayPal) Github: http://github.com/jcleblanc Twitter: @jcleblanc
  • 2. The Problem Commerce Relies on Static Data Contributions
  • 3. Premise You can determine the personality profile of a person based on their usage habits Personalization == Security
  • 4. Technology was the Solution!
  • 5. Then I Read This… Us & Them The Science of Identity By David Berreby
  • 6. The Different States of Knowledge What a person knows What a person knows they don’t know What a person doesn’t know they don’t know
  • 7. Technology was NOT the Solution Identity and discovery are NOT a technology solution
  • 9. Our Subject Material HTML content is poorly structured You can’t trust that anything semantically valid will be present There are some pretty bad web practices on the interwebz
  • 10. How We’ll Capture This Data Start with base linguistics Extend with available extras
  • 11.
  • 12. The Basic Pieces Page Data Keywords Weighting Scrapey Without all Word diets Scrapey the fluff FTW
  • 13. Capture Raw Page Data Semantic data on the web is sucktastic Assume 5 year olds built the sites Language is the key
  • 14. Extract Keywords We now have a big jumble of words. Let’s extract Why is “and” a top word? Stop words = sad panda
  • 15. Weight Keywords All content is not created equal Meta and headers and semantics oh my! This is where we leech off the work of others
  • 16.
  • 17. Questions to Keep in Mind Should I use regex to parse web content? How do users interact with page content? What key identifiers can be monitored to detect interest?
  • 18. Fetching the Data: The Request The Simple Way $html = file_get_contents('URL'); The Controlled Way $c = curl_init('URL');
  • 19. Fetching the Data: cURL $req = curl_init($url); $options = array( CURLOPT_URL => $url, CURLOPT_HEADER => $header, CURLOPT_RETURNTRANSFER => true, CURLOPT_FOLLOWLOCATION => true, CURLOPT_AUTOREFERER => true, CURLOPT_TIMEOUT => 15, CURLOPT_MAXREDIRS => 10 ); curl_setopt_array($req, $options);
  • 20. //list of findable / replaceable string characters $find = array('/r/', '/n/', '/ss+/'); $replace = array(' ', ' ', ' '); //perform page content modification $mod_content = preg_replace('#<script(.*?)>(.*?)</ script>#is', '', $page_content); $mod_content = preg_replace('#<style(.*?)>(.*?)</ style>#is', '', $mod_content); $mod_content = strip_tags($mod_content); $mod_content = strtolower($mod_content); $mod_content = preg_replace($find, $replace, $mod_content); $mod_content = trim($mod_content); $mod_content = explode(' ', $mod_content); natcasesort($mod_content);
  • 21. //set up list of stop words and the final found stopped list $common_words = array('a', ..., 'zero'); $searched_words = array(); //extract list of keywords with number of occurrences foreach($mod_content as $word) { $word = trim($word); if (preg_match('/[^a-zA-Z]/', $word) == 1){ $word = ''; } if(strlen($word) > 2 && !in_array($word, $common_words)){ $searched_words[$word]++; } } arsort($searched_words, SORT_NUMERIC);
  • 22. Scraping Site Meta Data //load scraped page data as a valid DOM document $dom = new DOMDocument(); @$dom->loadHTML($page_content); //scrape title $title = $dom->getElementsByTagName("title"); $title = $title->item(0)->nodeValue;
  • 23. //loop through all found meta tags $metas = $dom->getElementsByTagName("meta"); for ($i = 0; $i < $metas->length; $i++){ $meta = $metas->item($i); if($meta->getAttribute("property")){ if ($meta->getAttribute("property") == "og:description"){ $dataReturn["description"] = $meta->getAttribute("content"); } } else { if($meta->getAttribute("name") == "description"){ $dataReturn["description"] = $meta->getAttribute("content"); } else if($meta->getAttribute("name") == "keywords”){ $dataReturn[”keywords"] = $meta->getAttribute("content"); } } }
  • 24.
  • 25. Weighting Important Data Tags you should care about: meta (include OG), title, description, h1+, header Bonus points for adding in content location modifiers
  • 26. Weighting Important Tags //our keyword weights $weights = array("keywords" => "3.0", "meta" => "2.0", "header1" => "1.5", "header2" => "1.2"); //add modifier here if(strlen($word) > 2 && !in_array($word, $common_words)){ $searched_words[$word]++; }
  • 27. Expanding to Phrases 2-3 adjacent words, making up a direct relevant callout Seems easy right? Just like single words Language gets wonky without stop words
  • 28. Working with Unknown Users The majority of users won’t be immediately targetable Use HTML5 LocalStorage & Cookie backup
  • 29. Adding in Time Interactions Interaction with a site does not necessarily mean interest in it Time needs to also include an interaction component Gift buying seasons see interest variations
  • 30. Grouping Using Commonality Common Interests Interests Interests User A User B
  • 31. Thank You! Questions? www.slideshare.com/jcleblanc Jonathan LeBlanc Developer Evangelist (PayPal) Github: http://github.com/jcleblanc Twitter: @jcleblanc

Editor's Notes

  1. Technology is the solution!
  2. We’ll be looking at unstructured web page data
  3. The semantic data movement was an abysmal failure. Strip down the site to its basic components – the language and words used on the page
  4. Open graph protocol
  5. Different methods for making the request
  6. This is why I prefer using cURL: customization of requests, timeouts, allows redirects, etc.
  7. Stripping irrelevant data
  8. Scraping site keywords