SlideShare une entreprise Scribd logo
1  sur  18
Télécharger pour lire hors ligne
STRUCTURAL PROFILING OF WEB SITES IN THE WILD
LABORATOIRE D’INFORMATIQUE FORMELLE UNIVERSITÉ DU QUÉBEC À CHICOUTIMI
XAVIER CHAMBERLAND-THIBEAULT AND SYLVAIN HALLÉ ICWE 9 JUIN 2020
1
THE REASONING BEHIND THIS PAPER
2
DEBUGGING AND FIXING WEB APPLICATIONS
 An increasing number of tools are created to help analyze, debug, detect errors or even process the output of
web applications.
 Most of the tools focus on anlyzing the Document Object Model (DOM) and the Cascading Stylesheet (CSS) of a
page.
 Those tools have varied utilities :
 Fixing cross-browser issues ;
 DOM interpreter ;
 Detect responsive web design bugs ;
 Etc.
3
WHAT DOES A WEB PAGE LOOKS LIKE ?
 Most of the aforementioned tools have their scalability, and sometimes even their success, based on size related
features.
 What’s the average size of a web page ?
 Walsh and al. (2015) run experiments against pages of up to 196 DOM nodes, whereas Choudhary and al. (2013) chose
pages going up to 39146 DOM nodes.
 This paper aimed to address this issue by doing a large-scale analysis of 708 websites hoping to measure an array
of parameters relative to the size and structure of web pages.
4
METHODOLOGY
5
METHODOLOGY
Website collection DOM harvesting Data processing
6
WEBSITE COLLECTION
 To make sure to get a pool of websites representing the reality of the users, it was mandatory to get the sites
that the most users visit.
 To do that, the Moz top 500 most frequented websites list was used. However, there were many duplicates made of country
specific versions of the same web application.
 Out of those 500 sites, only 300 non-duplicate remained.
 Yet, sites visited by the most users do not reflect the reality, for this notion is orthogonal to the sites most visited
by an individual user.
 Therefore, we informally asked people around to provide us with the list of websites they use daily.
7
DOM HARVESTING
 To collect data on the DOM for each of these sites, a JavaScript program was designed to run when a page has
finished loading.
 The script starts at the body node of a page and performs a preordered traversal of the integral DOM tree,
recording and computing various features :
 Tag names ;
 CSS classes ;
 Visibility status ;
 Structural information.
 The script then generated two files : a JSON file containing all the data and a DOT file accepted by the Graphviz
library so we could get statistical and visual representation of a web page.
8
DOM HARVESTING – RUNNING ON EVERY PAGE
 To actually be able to run on every page, the TamperMonkey extension was used.
 This extension, available on multiple browsers, allows the user to inject and run custom JavaScript code every
time a new page is loaded in the browser.
 It is to be noted that the harvesting was done on the browser-rendered DOM and properties.
9
DATA PROCESSING
 LabPal was used to process all the 62MB of raw data :
 Every website was made into an experiment that would process the associated JSON file ;
 It was then possible to aggregate all the data recovered and even perform deeper statistical analysis.
 It is to be noted that some files were not used since the automated loading made us retrieve a lot of pop-ups.
 Manually inspecting each recovered files to detect the pop-ups would have been a tedious task, therefore it was
decided to use a more generic filter removing most of these pages by removing every file with less than 5 DOM
nodes or if the URL belonged to a list of know advertisement pages.
10
RESULTS
11
GRAPHICAL REPRESENTATION OF AWEBSITE
 Each color represents a different HTML tag name.
 The root of the tree, the body tag, is represented by
the black square.
 This is the representation of Zippyshare.com .
12
RESULTS
Cumulative distribution of websites based on
the size of DOM tree
Distribution of websites based on DOM tree
depth
13
RESULTS
Cumulative distribution of websites based on
maximum node degree
Distribution of websites based on maximum
node degree
14
RESULTS
Total number of elements using each
visibility
Distribution of websites according to the fraction
of all DOM nodes that are invisible.
15
RESULTS
Size of the DOM tree vs. number of CSS
classes
Cumulative distribution of websites based on
the average size of a CSS class
16
THREATTOVALIDITY
Website sample
Variance due to browser
Homepage analysis
17
REFERENCES
 Walsh,T.A., McMinn, P., Kapfhammer, G.M.:Automatic detection of potential layout faults following changes to
responsive web pages (N). In: Cohen, M.B., Grunske, L.,Whalen, M. (eds.) Proc.ASE 2015. pp. 709–714. IEEE
Computer Society (2015)
 Choudhary, S.R., Prasad, M.R., Orso,A.: X-PERT: accurate identification of crossbrowser issues in web applications.
In: Notkin, D., Cheng, B.H.C., Pohl, K. (eds.) Proc. ICSE 2013. pp. 702–711. IEEE Computer Society (2013)
 The Moz top 500 websites, https://moz.com/top500,Accessed October 20th, 2019
 All pictures used are licence free
18

Contenu connexe

Tendances

Prawn: Creating PDF in Ruby
Prawn: Creating PDF in RubyPrawn: Creating PDF in Ruby
Prawn: Creating PDF in RubyTom Klaasen
 
introduction to the document object model- Dom chapter5
introduction to the document object model- Dom chapter5introduction to the document object model- Dom chapter5
introduction to the document object model- Dom chapter5FLYMAN TECHNOLOGY LIMITED
 
Website Overview
Website OverviewWebsite Overview
Website OverviewChanHan Hy
 
Wikisfor Everyone
Wikisfor EveryoneWikisfor Everyone
Wikisfor Everyonemayerc
 
Dom(document object model)
Dom(document object model)Dom(document object model)
Dom(document object model)Partnered Health
 
building websites at NAU
building websites at NAUbuilding websites at NAU
building websites at NAUJonathan Smart
 
DHTML - Dynamic HTML
DHTML - Dynamic HTMLDHTML - Dynamic HTML
DHTML - Dynamic HTMLReem Alattas
 
Bruce lawson Stockholm Geek Meet
Bruce lawson Stockholm Geek MeetBruce lawson Stockholm Geek Meet
Bruce lawson Stockholm Geek Meetbrucelawson
 
Web Design Basics and HTML
Web Design Basics and HTMLWeb Design Basics and HTML
Web Design Basics and HTMLRajesh Sanabada
 
W3C Widgets: Apps made with Web Standards
W3C Widgets: Apps made with Web StandardsW3C Widgets: Apps made with Web Standards
W3C Widgets: Apps made with Web Standardsbrucelawson
 

Tendances (14)

Prawn: Creating PDF in Ruby
Prawn: Creating PDF in RubyPrawn: Creating PDF in Ruby
Prawn: Creating PDF in Ruby
 
lect9
lect9lect9
lect9
 
introduction to the document object model- Dom chapter5
introduction to the document object model- Dom chapter5introduction to the document object model- Dom chapter5
introduction to the document object model- Dom chapter5
 
Website Overview
Website OverviewWebsite Overview
Website Overview
 
Web browsers and web document
Web browsers and web documentWeb browsers and web document
Web browsers and web document
 
USER MANUAL
USER MANUALUSER MANUAL
USER MANUAL
 
Wikisfor Everyone
Wikisfor EveryoneWikisfor Everyone
Wikisfor Everyone
 
Dom(document object model)
Dom(document object model)Dom(document object model)
Dom(document object model)
 
building websites at NAU
building websites at NAUbuilding websites at NAU
building websites at NAU
 
DHTML - Dynamic HTML
DHTML - Dynamic HTMLDHTML - Dynamic HTML
DHTML - Dynamic HTML
 
Bruce lawson Stockholm Geek Meet
Bruce lawson Stockholm Geek MeetBruce lawson Stockholm Geek Meet
Bruce lawson Stockholm Geek Meet
 
Web Design Basics and HTML
Web Design Basics and HTMLWeb Design Basics and HTML
Web Design Basics and HTML
 
Dhtml sohaib ch
Dhtml sohaib chDhtml sohaib ch
Dhtml sohaib ch
 
W3C Widgets: Apps made with Web Standards
W3C Widgets: Apps made with Web StandardsW3C Widgets: Apps made with Web Standards
W3C Widgets: Apps made with Web Standards
 

Similaire à Structural profiling of web sites in the wild

Everything You Know is Not Quite Right Anymore: Rethinking Best Web Practices...
Everything You Know is Not Quite Right Anymore: Rethinking Best Web Practices...Everything You Know is Not Quite Right Anymore: Rethinking Best Web Practices...
Everything You Know is Not Quite Right Anymore: Rethinking Best Web Practices...Doug Gapinski
 
Everything You Know is Not Quite Right Anymore: Rethinking Best Practices to ...
Everything You Know is Not Quite Right Anymore: Rethinking Best Practices to ...Everything You Know is Not Quite Right Anymore: Rethinking Best Practices to ...
Everything You Know is Not Quite Right Anymore: Rethinking Best Practices to ...Dave Olsen
 
Improve your Tech Quotient
Improve your Tech QuotientImprove your Tech Quotient
Improve your Tech QuotientTarence DSouza
 
Making Of PHP Based Web Application
Making Of PHP Based Web ApplicationMaking Of PHP Based Web Application
Making Of PHP Based Web ApplicationSachin Walvekar
 
Bruce Lawson Opera Indonesia
Bruce Lawson Opera IndonesiaBruce Lawson Opera Indonesia
Bruce Lawson Opera Indonesiabrucelawson
 
LATEST_TRENDS_IN_WEBSITE_DEVELOPMENT.pptx
LATEST_TRENDS_IN_WEBSITE_DEVELOPMENT.pptxLATEST_TRENDS_IN_WEBSITE_DEVELOPMENT.pptx
LATEST_TRENDS_IN_WEBSITE_DEVELOPMENT.pptxchitrachauhan21
 
Measuring Web Performance
Measuring Web Performance Measuring Web Performance
Measuring Web Performance Dave Olsen
 
Web Client Performance
Web Client PerformanceWeb Client Performance
Web Client PerformanceHerea Adrian
 
Leverage Your Online Web Presence
Leverage Your Online Web PresenceLeverage Your Online Web Presence
Leverage Your Online Web PresenceSusan Boone
 
The Server Side of Responsive Web Design
The Server Side of Responsive Web DesignThe Server Side of Responsive Web Design
The Server Side of Responsive Web DesignDave Olsen
 
Liquidizer.js: A Responsive Web Design Algorithm
Liquidizer.js: A Responsive Web Design AlgorithmLiquidizer.js: A Responsive Web Design Algorithm
Liquidizer.js: A Responsive Web Design Algorithmtheijes
 
Two approaches to RWD: Pure & Hybrid. Brendan Falkowski
Two approaches to RWD: Pure & Hybrid. Brendan Falkowski Two approaches to RWD: Pure & Hybrid. Brendan Falkowski
Two approaches to RWD: Pure & Hybrid. Brendan Falkowski MeetMagentoNY2014
 
Advance Frameworks for Hidden Web Retrieval Using Innovative Vision-Based Pag...
Advance Frameworks for Hidden Web Retrieval Using Innovative Vision-Based Pag...Advance Frameworks for Hidden Web Retrieval Using Innovative Vision-Based Pag...
Advance Frameworks for Hidden Web Retrieval Using Innovative Vision-Based Pag...IOSR Journals
 
Building Mobile Websites with Joomla
Building Mobile Websites with JoomlaBuilding Mobile Websites with Joomla
Building Mobile Websites with JoomlaTom Deryckere
 
Web topic 26 browser compatibilty and security
Web topic 26  browser compatibilty and securityWeb topic 26  browser compatibilty and security
Web topic 26 browser compatibilty and securityCK Yang
 
Responsive Web Design_2013
Responsive Web Design_2013Responsive Web Design_2013
Responsive Web Design_2013Achieve Internet
 
G0373049057
G0373049057G0373049057
G0373049057theijes
 

Similaire à Structural profiling of web sites in the wild (20)

Everything You Know is Not Quite Right Anymore: Rethinking Best Web Practices...
Everything You Know is Not Quite Right Anymore: Rethinking Best Web Practices...Everything You Know is Not Quite Right Anymore: Rethinking Best Web Practices...
Everything You Know is Not Quite Right Anymore: Rethinking Best Web Practices...
 
Everything You Know is Not Quite Right Anymore: Rethinking Best Practices to ...
Everything You Know is Not Quite Right Anymore: Rethinking Best Practices to ...Everything You Know is Not Quite Right Anymore: Rethinking Best Practices to ...
Everything You Know is Not Quite Right Anymore: Rethinking Best Practices to ...
 
Improve your Tech Quotient
Improve your Tech QuotientImprove your Tech Quotient
Improve your Tech Quotient
 
Making Of PHP Based Web Application
Making Of PHP Based Web ApplicationMaking Of PHP Based Web Application
Making Of PHP Based Web Application
 
Bruce Lawson Opera Indonesia
Bruce Lawson Opera IndonesiaBruce Lawson Opera Indonesia
Bruce Lawson Opera Indonesia
 
LATEST_TRENDS_IN_WEBSITE_DEVELOPMENT.pptx
LATEST_TRENDS_IN_WEBSITE_DEVELOPMENT.pptxLATEST_TRENDS_IN_WEBSITE_DEVELOPMENT.pptx
LATEST_TRENDS_IN_WEBSITE_DEVELOPMENT.pptx
 
Web engineering lecture 5
Web engineering lecture 5Web engineering lecture 5
Web engineering lecture 5
 
Measuring Web Performance
Measuring Web Performance Measuring Web Performance
Measuring Web Performance
 
Web Client Performance
Web Client PerformanceWeb Client Performance
Web Client Performance
 
RWD
RWDRWD
RWD
 
Leverage Your Online Web Presence
Leverage Your Online Web PresenceLeverage Your Online Web Presence
Leverage Your Online Web Presence
 
The Server Side of Responsive Web Design
The Server Side of Responsive Web DesignThe Server Side of Responsive Web Design
The Server Side of Responsive Web Design
 
Liquidizer.js: A Responsive Web Design Algorithm
Liquidizer.js: A Responsive Web Design AlgorithmLiquidizer.js: A Responsive Web Design Algorithm
Liquidizer.js: A Responsive Web Design Algorithm
 
Two approaches to RWD: Pure & Hybrid. Brendan Falkowski
Two approaches to RWD: Pure & Hybrid. Brendan Falkowski Two approaches to RWD: Pure & Hybrid. Brendan Falkowski
Two approaches to RWD: Pure & Hybrid. Brendan Falkowski
 
Advance Frameworks for Hidden Web Retrieval Using Innovative Vision-Based Pag...
Advance Frameworks for Hidden Web Retrieval Using Innovative Vision-Based Pag...Advance Frameworks for Hidden Web Retrieval Using Innovative Vision-Based Pag...
Advance Frameworks for Hidden Web Retrieval Using Innovative Vision-Based Pag...
 
Web apps of the future
Web apps of the futureWeb apps of the future
Web apps of the future
 
Building Mobile Websites with Joomla
Building Mobile Websites with JoomlaBuilding Mobile Websites with Joomla
Building Mobile Websites with Joomla
 
Web topic 26 browser compatibilty and security
Web topic 26  browser compatibilty and securityWeb topic 26  browser compatibilty and security
Web topic 26 browser compatibilty and security
 
Responsive Web Design_2013
Responsive Web Design_2013Responsive Web Design_2013
Responsive Web Design_2013
 
G0373049057
G0373049057G0373049057
G0373049057
 

Dernier

一比一原版田纳西大学毕业证如何办理
一比一原版田纳西大学毕业证如何办理一比一原版田纳西大学毕业证如何办理
一比一原版田纳西大学毕业证如何办理F
 
Indian Escort in Abu DHabi 0508644382 Abu Dhabi Escorts
Indian Escort in Abu DHabi 0508644382 Abu Dhabi EscortsIndian Escort in Abu DHabi 0508644382 Abu Dhabi Escorts
Indian Escort in Abu DHabi 0508644382 Abu Dhabi EscortsMonica Sydney
 
APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...
APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...
APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...APNIC
 
Local Call Girls in Seoni 9332606886 HOT & SEXY Models beautiful and charmin...
Local Call Girls in Seoni  9332606886 HOT & SEXY Models beautiful and charmin...Local Call Girls in Seoni  9332606886 HOT & SEXY Models beautiful and charmin...
Local Call Girls in Seoni 9332606886 HOT & SEXY Models beautiful and charmin...kumargunjan9515
 
在线制作约克大学毕业证(yu毕业证)在读证明认证可查
在线制作约克大学毕业证(yu毕业证)在读证明认证可查在线制作约克大学毕业证(yu毕业证)在读证明认证可查
在线制作约克大学毕业证(yu毕业证)在读证明认证可查ydyuyu
 
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girlsRussian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girlsMonica Sydney
 
"Boost Your Digital Presence: Partner with a Leading SEO Agency"
"Boost Your Digital Presence: Partner with a Leading SEO Agency""Boost Your Digital Presence: Partner with a Leading SEO Agency"
"Boost Your Digital Presence: Partner with a Leading SEO Agency"growthgrids
 
Real Men Wear Diapers T Shirts sweatshirt
Real Men Wear Diapers T Shirts sweatshirtReal Men Wear Diapers T Shirts sweatshirt
Real Men Wear Diapers T Shirts sweatshirtrahman018755
 
Meaning of On page SEO & its process in detail.
Meaning of On page SEO & its process in detail.Meaning of On page SEO & its process in detail.
Meaning of On page SEO & its process in detail.krishnachandrapal52
 
APNIC Updates presented by Paul Wilson at ARIN 53
APNIC Updates presented by Paul Wilson at ARIN 53APNIC Updates presented by Paul Wilson at ARIN 53
APNIC Updates presented by Paul Wilson at ARIN 53APNIC
 
best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...
best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...
best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...kajalverma014
 
Mira Road Housewife Call Girls 07506202331, Nalasopara Call Girls
Mira Road Housewife Call Girls 07506202331, Nalasopara Call GirlsMira Road Housewife Call Girls 07506202331, Nalasopara Call Girls
Mira Road Housewife Call Girls 07506202331, Nalasopara Call GirlsPriya Reddy
 
Tadepalligudem Escorts Service Girl ^ 9332606886, WhatsApp Anytime Tadepallig...
Tadepalligudem Escorts Service Girl ^ 9332606886, WhatsApp Anytime Tadepallig...Tadepalligudem Escorts Service Girl ^ 9332606886, WhatsApp Anytime Tadepallig...
Tadepalligudem Escorts Service Girl ^ 9332606886, WhatsApp Anytime Tadepallig...meghakumariji156
 
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样ayvbos
 
Vip Firozabad Phone 8250092165 Escorts Service At 6k To 30k Along With Ac Room
Vip Firozabad Phone 8250092165 Escorts Service At 6k To 30k Along With Ac RoomVip Firozabad Phone 8250092165 Escorts Service At 6k To 30k Along With Ac Room
Vip Firozabad Phone 8250092165 Escorts Service At 6k To 30k Along With Ac Roommeghakumariji156
 
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrStory Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrHenryBriggs2
 
Nagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime Nagercoil
Nagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime NagercoilNagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime Nagercoil
Nagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime Nagercoilmeghakumariji156
 
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制pxcywzqs
 
Best SEO Services Company in Dallas | Best SEO Agency Dallas
Best SEO Services Company in Dallas | Best SEO Agency DallasBest SEO Services Company in Dallas | Best SEO Agency Dallas
Best SEO Services Company in Dallas | Best SEO Agency DallasDigicorns Technologies
 

Dernier (20)

一比一原版田纳西大学毕业证如何办理
一比一原版田纳西大学毕业证如何办理一比一原版田纳西大学毕业证如何办理
一比一原版田纳西大学毕业证如何办理
 
Indian Escort in Abu DHabi 0508644382 Abu Dhabi Escorts
Indian Escort in Abu DHabi 0508644382 Abu Dhabi EscortsIndian Escort in Abu DHabi 0508644382 Abu Dhabi Escorts
Indian Escort in Abu DHabi 0508644382 Abu Dhabi Escorts
 
APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...
APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...
APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...
 
Local Call Girls in Seoni 9332606886 HOT & SEXY Models beautiful and charmin...
Local Call Girls in Seoni  9332606886 HOT & SEXY Models beautiful and charmin...Local Call Girls in Seoni  9332606886 HOT & SEXY Models beautiful and charmin...
Local Call Girls in Seoni 9332606886 HOT & SEXY Models beautiful and charmin...
 
在线制作约克大学毕业证(yu毕业证)在读证明认证可查
在线制作约克大学毕业证(yu毕业证)在读证明认证可查在线制作约克大学毕业证(yu毕业证)在读证明认证可查
在线制作约克大学毕业证(yu毕业证)在读证明认证可查
 
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girlsRussian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
 
"Boost Your Digital Presence: Partner with a Leading SEO Agency"
"Boost Your Digital Presence: Partner with a Leading SEO Agency""Boost Your Digital Presence: Partner with a Leading SEO Agency"
"Boost Your Digital Presence: Partner with a Leading SEO Agency"
 
Real Men Wear Diapers T Shirts sweatshirt
Real Men Wear Diapers T Shirts sweatshirtReal Men Wear Diapers T Shirts sweatshirt
Real Men Wear Diapers T Shirts sweatshirt
 
Meaning of On page SEO & its process in detail.
Meaning of On page SEO & its process in detail.Meaning of On page SEO & its process in detail.
Meaning of On page SEO & its process in detail.
 
APNIC Updates presented by Paul Wilson at ARIN 53
APNIC Updates presented by Paul Wilson at ARIN 53APNIC Updates presented by Paul Wilson at ARIN 53
APNIC Updates presented by Paul Wilson at ARIN 53
 
best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...
best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...
best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...
 
call girls in Anand Vihar (delhi) call me [🔝9953056974🔝] escort service 24X7
call girls in Anand Vihar (delhi) call me [🔝9953056974🔝] escort service 24X7call girls in Anand Vihar (delhi) call me [🔝9953056974🔝] escort service 24X7
call girls in Anand Vihar (delhi) call me [🔝9953056974🔝] escort service 24X7
 
Mira Road Housewife Call Girls 07506202331, Nalasopara Call Girls
Mira Road Housewife Call Girls 07506202331, Nalasopara Call GirlsMira Road Housewife Call Girls 07506202331, Nalasopara Call Girls
Mira Road Housewife Call Girls 07506202331, Nalasopara Call Girls
 
Tadepalligudem Escorts Service Girl ^ 9332606886, WhatsApp Anytime Tadepallig...
Tadepalligudem Escorts Service Girl ^ 9332606886, WhatsApp Anytime Tadepallig...Tadepalligudem Escorts Service Girl ^ 9332606886, WhatsApp Anytime Tadepallig...
Tadepalligudem Escorts Service Girl ^ 9332606886, WhatsApp Anytime Tadepallig...
 
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
 
Vip Firozabad Phone 8250092165 Escorts Service At 6k To 30k Along With Ac Room
Vip Firozabad Phone 8250092165 Escorts Service At 6k To 30k Along With Ac RoomVip Firozabad Phone 8250092165 Escorts Service At 6k To 30k Along With Ac Room
Vip Firozabad Phone 8250092165 Escorts Service At 6k To 30k Along With Ac Room
 
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrStory Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
 
Nagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime Nagercoil
Nagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime NagercoilNagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime Nagercoil
Nagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime Nagercoil
 
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制
 
Best SEO Services Company in Dallas | Best SEO Agency Dallas
Best SEO Services Company in Dallas | Best SEO Agency DallasBest SEO Services Company in Dallas | Best SEO Agency Dallas
Best SEO Services Company in Dallas | Best SEO Agency Dallas
 

Structural profiling of web sites in the wild

  • 1. STRUCTURAL PROFILING OF WEB SITES IN THE WILD LABORATOIRE D’INFORMATIQUE FORMELLE UNIVERSITÉ DU QUÉBEC À CHICOUTIMI XAVIER CHAMBERLAND-THIBEAULT AND SYLVAIN HALLÉ ICWE 9 JUIN 2020 1
  • 2. THE REASONING BEHIND THIS PAPER 2
  • 3. DEBUGGING AND FIXING WEB APPLICATIONS  An increasing number of tools are created to help analyze, debug, detect errors or even process the output of web applications.  Most of the tools focus on anlyzing the Document Object Model (DOM) and the Cascading Stylesheet (CSS) of a page.  Those tools have varied utilities :  Fixing cross-browser issues ;  DOM interpreter ;  Detect responsive web design bugs ;  Etc. 3
  • 4. WHAT DOES A WEB PAGE LOOKS LIKE ?  Most of the aforementioned tools have their scalability, and sometimes even their success, based on size related features.  What’s the average size of a web page ?  Walsh and al. (2015) run experiments against pages of up to 196 DOM nodes, whereas Choudhary and al. (2013) chose pages going up to 39146 DOM nodes.  This paper aimed to address this issue by doing a large-scale analysis of 708 websites hoping to measure an array of parameters relative to the size and structure of web pages. 4
  • 6. METHODOLOGY Website collection DOM harvesting Data processing 6
  • 7. WEBSITE COLLECTION  To make sure to get a pool of websites representing the reality of the users, it was mandatory to get the sites that the most users visit.  To do that, the Moz top 500 most frequented websites list was used. However, there were many duplicates made of country specific versions of the same web application.  Out of those 500 sites, only 300 non-duplicate remained.  Yet, sites visited by the most users do not reflect the reality, for this notion is orthogonal to the sites most visited by an individual user.  Therefore, we informally asked people around to provide us with the list of websites they use daily. 7
  • 8. DOM HARVESTING  To collect data on the DOM for each of these sites, a JavaScript program was designed to run when a page has finished loading.  The script starts at the body node of a page and performs a preordered traversal of the integral DOM tree, recording and computing various features :  Tag names ;  CSS classes ;  Visibility status ;  Structural information.  The script then generated two files : a JSON file containing all the data and a DOT file accepted by the Graphviz library so we could get statistical and visual representation of a web page. 8
  • 9. DOM HARVESTING – RUNNING ON EVERY PAGE  To actually be able to run on every page, the TamperMonkey extension was used.  This extension, available on multiple browsers, allows the user to inject and run custom JavaScript code every time a new page is loaded in the browser.  It is to be noted that the harvesting was done on the browser-rendered DOM and properties. 9
  • 10. DATA PROCESSING  LabPal was used to process all the 62MB of raw data :  Every website was made into an experiment that would process the associated JSON file ;  It was then possible to aggregate all the data recovered and even perform deeper statistical analysis.  It is to be noted that some files were not used since the automated loading made us retrieve a lot of pop-ups.  Manually inspecting each recovered files to detect the pop-ups would have been a tedious task, therefore it was decided to use a more generic filter removing most of these pages by removing every file with less than 5 DOM nodes or if the URL belonged to a list of know advertisement pages. 10
  • 12. GRAPHICAL REPRESENTATION OF AWEBSITE  Each color represents a different HTML tag name.  The root of the tree, the body tag, is represented by the black square.  This is the representation of Zippyshare.com . 12
  • 13. RESULTS Cumulative distribution of websites based on the size of DOM tree Distribution of websites based on DOM tree depth 13
  • 14. RESULTS Cumulative distribution of websites based on maximum node degree Distribution of websites based on maximum node degree 14
  • 15. RESULTS Total number of elements using each visibility Distribution of websites according to the fraction of all DOM nodes that are invisible. 15
  • 16. RESULTS Size of the DOM tree vs. number of CSS classes Cumulative distribution of websites based on the average size of a CSS class 16
  • 17. THREATTOVALIDITY Website sample Variance due to browser Homepage analysis 17
  • 18. REFERENCES  Walsh,T.A., McMinn, P., Kapfhammer, G.M.:Automatic detection of potential layout faults following changes to responsive web pages (N). In: Cohen, M.B., Grunske, L.,Whalen, M. (eds.) Proc.ASE 2015. pp. 709–714. IEEE Computer Society (2015)  Choudhary, S.R., Prasad, M.R., Orso,A.: X-PERT: accurate identification of crossbrowser issues in web applications. In: Notkin, D., Cheng, B.H.C., Pohl, K. (eds.) Proc. ICSE 2013. pp. 702–711. IEEE Computer Society (2013)  The Moz top 500 websites, https://moz.com/top500,Accessed October 20th, 2019  All pictures used are licence free 18