SlideShare a Scribd company logo
1 of 27
Download to read offline
Syndicated content on
your web pages
Jon Warbrick
University of Cambridge Computing Service
jw35@cam.ac.uk / @jw35
What?
• Fetching other people’s content and
including it in your web pages
• RSS, Atom, JavaScript, Server-side, client-
side
• We’ll look at
• Fetching pitfalls
• Interpretation pitfalls
• Inclusion pitfalls
All a bit low level and
geeky...
• Mostly low-level stuff
• May be handled for you by your
development environment
• But how do you otherwise know what to
look for?
Fetching pitfalls
Including an RSS feed
• When rendering the page
• grab the feed
• parse it
• mark it up
• include it
• What could possibly go wrong?
<?php
$rss = new DOMDocument();
$rss->load('http://wordpress.org/news/feed/');
$feed = array();
foreach ($rss->getElementsByTagName('item') as $node) {
$item = array (
'title' => $node->getElementsByTagName('title')->item(0)->nodeValue,
'desc' => $node->getElementsByTagName('description')->item(0)->nodeValue,
'link' => $node->getElementsByTagName('link')->item(0)->nodeValue,
'date' => $node->getElementsByTagName('pubDate')->item(0)->nodeValue,
);
array_push($feed, $item);
}
$limit = 5;
for($x=0;$x<$limit;$x++) {
$title = str_replace(' & ', ' &amp; ', $feed[$x]['title']);
$link = $feed[$x]['link'];
$description = $feed[$x]['desc'];
$date = date('l F d, Y', strtotime($feed[$x]['date']));
echo '<p><strong><a href="'.$link.'" title="'.$title.'">'.$title.'</a></strong><br />';
echo '<small><em>Posted on '.$date.'</em></small></p>';
echo '<p>'.$description.'</p>';
}
?>
http://bavotasan.com/2010/display-rss-feed-with-php/
What could go wrong?
All of these things:
• Slow (connecting, parsing XML, ...)
• Load on local server
• Load on source server
• What if it’s unreachable?
• What do you display if it goes wrong?
Better:
• Seperate fetching and page rendering
• Periodic fetch, or cache on demand
• support conditional fetch
• Validate result, keep last known good
• warn of failure
• Cache pre-parsed, or even pre-rendered
Downsides
A web developer had a problem.
He said “I know, I’ll use caching!”
Now he had two problems.
“
”
Interpretation pitfalls
Computers can’t handle text
It’s all numbers
Content-Type: text/html; charset=ISO-8859-1
[In passing...]
• You really want to be using
• The Universal Character Set (AKA
Unicode)
• The UTF8 encoding
• http://www.cl.cam.ac.uk/~mgk25/
unicode.html
ASCII
32 sp 33 ! 34 " 35 # 36 $ 37 % 38 & 39 '
40 ( 41 ) 42 * 43 + 44 , 45 - 46 . 47 /
48 0 49 1 50 2 51 3 52 4 53 5 54 6 55 7
56 8 57 9 58 : 59 ; 60 < 61 = 62 > 63 ?
64 @ 65 A 66 B 67 C 68 D 69 E 70 F 71 G
72 H 73 I 74 J 75 K 76 L 77 M 78 N 79 O
80 P 81 Q 82 R 83 S 84 T 85 U 86 V 87 W
88 X 89 Y 90 Z 91 [ 92  93 ] 94 ^ 95 _
96 ` 97 a 98 b 99 c 100 d 101 e 102 f 103 g
104 h 105 i 106 j 107 k 108 l 109 m 110 n 111 o
112 p 113 q 114 r 115 s 116 t 117 u 118 v 119 w
120 x 121 y 122 z 123 { 124 | 125 } 126 ~
Mojibake
The fee will be £210
‫ا%$#"ن‬ ‫+*)(ق‬ ,-".+‫ا‬ ‫ا%0/ن‬ الإعلان العالم
ى لحقوق الإنسان
‘She said—not without some
trepidation—‟Time to go�’
Nescafé
Can’t take ‘text as numbers’ and stuff it
into other ‘text as numbers’ (unless they
happen to both use the same encoding)
Bottom line
Either: convert numbers → text → numbers
Or: directly convert numbers → numbers
An HTML alternative
• ‘Numeric entities’
• e.g.‘pound’ can always be &#163;
• Numbers always from UCS, whatever the
encoding of the document
• So your document need only contain ASCII
• But should still declare a character
encoding
Inclusion pitfalls
The RSS “Is it HTML?” question
<?xml version="1.0" encoding="utf-8" ?>
<rss version="2.0">
<channel>
<title>Latest news</title>
<link>http://www.cam.ac.uk/news</link>
<item>
<title>Fun Lab: a big hit at the Big Weekend!</title>
<link>http://www.cam.ac.uk/public-engagement/news/</link>
<description>Some had just wandered in to see what all
the fuss was about, as children and adults queued
to take part in the hands-on activities, while
others had heard about the Fun Lab on the radio
or read about it in the Cambridge Evening News.
</description>
</item>
<item>
...
</item>
</channel>
</rss>
So, if you find
&lt;strong&gt;Boo!&lt;/strong&gt;
do you render it as
<strong>Boo!</strong>
or as ‘Boo!’ ?
[Atom gets this right]
<title type="html">
AT&amp;amp;T bought &lt;b&gt;by SBC&lt;/b&gt;!
</title>
<title type="text">AT&amp;T bought by SBC!</title>
<title type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
AT&amp;T bought <b>by SBC</b>!
</div>
</title>
• If the answer is ‘plain text’ that that’s easy:
• encode ‘<‘ and ‘>’ as ‘&lt;’ and ‘&gt;’
• But if not, you have a new problem
• you can’t include HTML verbatim from an
untrusted source
• http://www.bbc.co.uk/news/
technology-12608651
• Only include safe content
• note: not ‘Exclude dangerous content’
• parse → filter → serialise
• And worry about character encodings
• Who’s have thought that
+ADw-script+AD4-
could be dangerous?
And the answer is...
Promiscuous JavaScript
considered dangerous
• Common JavaScript document.write() trick
• Attractive - no server-side work required
• Hands entire page to JS author
• ...or anyone else
• Actually applies to any 3rd party JS
So, to sum up...
• It’s harder than it looks
• Think about (at least):
• How you retrieve content
• How you intrepret it
• How you include it
If you have been,
thanks for listening.
Any questions?

More Related Content

Viewers also liked

(Why) Passwords don't work
(Why) Passwords don't work(Why) Passwords don't work
(Why) Passwords don't workJon Warbrick
 
Banner+H&amp;W+Feb+2011
Banner+H&amp;W+Feb+2011Banner+H&amp;W+Feb+2011
Banner+H&amp;W+Feb+2011judiburns6
 
The 'New [University of Cambridge] Map
The 'New [University of Cambridge] MapThe 'New [University of Cambridge] Map
The 'New [University of Cambridge] MapJon Warbrick
 
Google Apps @ Cambridge - What we did
Google Apps @ Cambridge - What we didGoogle Apps @ Cambridge - What we did
Google Apps @ Cambridge - What we didJon Warbrick
 
Banner+H&amp;W+Feb+2011
Banner+H&amp;W+Feb+2011Banner+H&amp;W+Feb+2011
Banner+H&amp;W+Feb+2011judiburns6
 
An introduction to Version Control Systems
An introduction to Version Control SystemsAn introduction to Version Control Systems
An introduction to Version Control SystemsJon Warbrick
 
Dunbar's Number, and what it means to the UIS
Dunbar's Number, and what it means to the UISDunbar's Number, and what it means to the UIS
Dunbar's Number, and what it means to the UISJon Warbrick
 
Google Apps - SSO and Identity Management at the University of Cambridge
Google Apps - SSO and Identity Management at the University of CambridgeGoogle Apps - SSO and Identity Management at the University of Cambridge
Google Apps - SSO and Identity Management at the University of CambridgeJon Warbrick
 
Lessons from IPv6 Day
Lessons from IPv6 DayLessons from IPv6 Day
Lessons from IPv6 DayJon Warbrick
 
Lessons fro IPv6 day, 2011
Lessons fro IPv6 day, 2011Lessons fro IPv6 day, 2011
Lessons fro IPv6 day, 2011Jon Warbrick
 
Web Authenication with Shibboleth - a view from the Flat East
Web Authenication with Shibboleth - a view from the Flat EastWeb Authenication with Shibboleth - a view from the Flat East
Web Authenication with Shibboleth - a view from the Flat EastJon Warbrick
 
Wechsler Intelligence Scale For Children
Wechsler Intelligence Scale For ChildrenWechsler Intelligence Scale For Children
Wechsler Intelligence Scale For Childrendavidjcarey
 
Raven progressive matrices ( RPM )
Raven progressive matrices ( RPM )Raven progressive matrices ( RPM )
Raven progressive matrices ( RPM )Ai Nurhasanah
 
State of the Raven
State of the RavenState of the Raven
State of the RavenJon Warbrick
 
Measures of Intelligence: Wechsler & Raven
Measures of Intelligence: Wechsler & RavenMeasures of Intelligence: Wechsler & Raven
Measures of Intelligence: Wechsler & RavenNabilah Azahari
 

Viewers also liked (20)

(Why) Passwords don't work
(Why) Passwords don't work(Why) Passwords don't work
(Why) Passwords don't work
 
Banner+H&amp;W+Feb+2011
Banner+H&amp;W+Feb+2011Banner+H&amp;W+Feb+2011
Banner+H&amp;W+Feb+2011
 
The 'New [University of Cambridge] Map
The 'New [University of Cambridge] MapThe 'New [University of Cambridge] Map
The 'New [University of Cambridge] Map
 
Google Apps @ Cambridge - What we did
Google Apps @ Cambridge - What we didGoogle Apps @ Cambridge - What we did
Google Apps @ Cambridge - What we did
 
Banner+H&amp;W+Feb+2011
Banner+H&amp;W+Feb+2011Banner+H&amp;W+Feb+2011
Banner+H&amp;W+Feb+2011
 
Active Lifestyles
Active LifestylesActive Lifestyles
Active Lifestyles
 
An introduction to Version Control Systems
An introduction to Version Control SystemsAn introduction to Version Control Systems
An introduction to Version Control Systems
 
Dunbar's Number, and what it means to the UIS
Dunbar's Number, and what it means to the UISDunbar's Number, and what it means to the UIS
Dunbar's Number, and what it means to the UIS
 
Google Apps - SSO and Identity Management at the University of Cambridge
Google Apps - SSO and Identity Management at the University of CambridgeGoogle Apps - SSO and Identity Management at the University of Cambridge
Google Apps - SSO and Identity Management at the University of Cambridge
 
Lessons from IPv6 Day
Lessons from IPv6 DayLessons from IPv6 Day
Lessons from IPv6 Day
 
Lessons fro IPv6 day, 2011
Lessons fro IPv6 day, 2011Lessons fro IPv6 day, 2011
Lessons fro IPv6 day, 2011
 
Monitoring Skills
Monitoring SkillsMonitoring Skills
Monitoring Skills
 
Hypertension
HypertensionHypertension
Hypertension
 
Web Authenication with Shibboleth - a view from the Flat East
Web Authenication with Shibboleth - a view from the Flat EastWeb Authenication with Shibboleth - a view from the Flat East
Web Authenication with Shibboleth - a view from the Flat East
 
Hyperlipidemia
HyperlipidemiaHyperlipidemia
Hyperlipidemia
 
Wechsler Intelligence Scale For Children
Wechsler Intelligence Scale For ChildrenWechsler Intelligence Scale For Children
Wechsler Intelligence Scale For Children
 
Raven progressive matrices ( RPM )
Raven progressive matrices ( RPM )Raven progressive matrices ( RPM )
Raven progressive matrices ( RPM )
 
State of the Raven
State of the RavenState of the Raven
State of the Raven
 
Measures of Intelligence: Wechsler & Raven
Measures of Intelligence: Wechsler & RavenMeasures of Intelligence: Wechsler & Raven
Measures of Intelligence: Wechsler & Raven
 
F.I.T.T. Principles and METs
F.I.T.T. Principles and METsF.I.T.T. Principles and METs
F.I.T.T. Principles and METs
 

Similar to Syndicated content on your web pages

JavaScript For People Who Don't Code
JavaScript For People Who Don't CodeJavaScript For People Who Don't Code
JavaScript For People Who Don't CodeChristopher Schmitt
 
Even faster web sites
Even faster web sitesEven faster web sites
Even faster web sitesFelipe Lavín
 
[HEWEBAR 2012] Beyond Desktop Browsing (HTML5)
[HEWEBAR 2012] Beyond Desktop Browsing (HTML5)[HEWEBAR 2012] Beyond Desktop Browsing (HTML5)
[HEWEBAR 2012] Beyond Desktop Browsing (HTML5)Christopher Schmitt
 
Browser Wars Episode 1: The Phantom Menace
Browser Wars Episode 1: The Phantom MenaceBrowser Wars Episode 1: The Phantom Menace
Browser Wars Episode 1: The Phantom MenaceNicholas Zakas
 
Quick Wikipedia Mining using Elastic Map Reduce
Quick Wikipedia Mining using Elastic Map ReduceQuick Wikipedia Mining using Elastic Map Reduce
Quick Wikipedia Mining using Elastic Map Reduceohkura
 
TOSSUG HTML5 讀書會 新標籤與表單
TOSSUG HTML5 讀書會 新標籤與表單TOSSUG HTML5 讀書會 新標籤與表單
TOSSUG HTML5 讀書會 新標籤與表單偉格 高
 
JavaScript front end performance optimizations
JavaScript front end performance optimizationsJavaScript front end performance optimizations
JavaScript front end performance optimizationsChris Love
 
Stefan Judis "Did we(b development) lose the right direction?"
Stefan Judis "Did we(b development) lose the right direction?"Stefan Judis "Did we(b development) lose the right direction?"
Stefan Judis "Did we(b development) lose the right direction?"Fwdays
 
RWD in the Wild
RWD in the WildRWD in the Wild
RWD in the WildRich Quick
 
The DiSo Project and the Open Web
The DiSo Project and the Open WebThe DiSo Project and the Open Web
The DiSo Project and the Open WebChris Messina
 
What you need to know bout html5
What you need to know bout html5What you need to know bout html5
What you need to know bout html5Kevin DeRudder
 
A bug bounty tale: Chrome, stylesheets, cookies, and AES
A bug bounty tale: Chrome, stylesheets, cookies, and AESA bug bounty tale: Chrome, stylesheets, cookies, and AES
A bug bounty tale: Chrome, stylesheets, cookies, and AEScgvwzq
 

Similar to Syndicated content on your web pages (20)

[O'Reilly] HTML5 Design
[O'Reilly] HTML5 Design[O'Reilly] HTML5 Design
[O'Reilly] HTML5 Design
 
Graduating to Grid
Graduating to GridGraduating to Grid
Graduating to Grid
 
JavaScript For People Who Don't Code
JavaScript For People Who Don't CodeJavaScript For People Who Don't Code
JavaScript For People Who Don't Code
 
Even faster web sites
Even faster web sitesEven faster web sites
Even faster web sites
 
[heweb11] HTML5 Makeover
[heweb11] HTML5 Makeover[heweb11] HTML5 Makeover
[heweb11] HTML5 Makeover
 
[HEWEBAR 2012] Beyond Desktop Browsing (HTML5)
[HEWEBAR 2012] Beyond Desktop Browsing (HTML5)[HEWEBAR 2012] Beyond Desktop Browsing (HTML5)
[HEWEBAR 2012] Beyond Desktop Browsing (HTML5)
 
Browser Wars Episode 1: The Phantom Menace
Browser Wars Episode 1: The Phantom MenaceBrowser Wars Episode 1: The Phantom Menace
Browser Wars Episode 1: The Phantom Menace
 
Mume HTML5 Intro
Mume HTML5 IntroMume HTML5 Intro
Mume HTML5 Intro
 
[edUi] HTML5 Workshop
[edUi] HTML5 Workshop[edUi] HTML5 Workshop
[edUi] HTML5 Workshop
 
Quick Wikipedia Mining using Elastic Map Reduce
Quick Wikipedia Mining using Elastic Map ReduceQuick Wikipedia Mining using Elastic Map Reduce
Quick Wikipedia Mining using Elastic Map Reduce
 
TOSSUG HTML5 讀書會 新標籤與表單
TOSSUG HTML5 讀書會 新標籤與表單TOSSUG HTML5 讀書會 新標籤與表單
TOSSUG HTML5 讀書會 新標籤與表單
 
JavaScript front end performance optimizations
JavaScript front end performance optimizationsJavaScript front end performance optimizations
JavaScript front end performance optimizations
 
Stefan Judis "Did we(b development) lose the right direction?"
Stefan Judis "Did we(b development) lose the right direction?"Stefan Judis "Did we(b development) lose the right direction?"
Stefan Judis "Did we(b development) lose the right direction?"
 
Ruby Robots
Ruby RobotsRuby Robots
Ruby Robots
 
RWD in the Wild
RWD in the WildRWD in the Wild
RWD in the Wild
 
Hardboiled Web Design
Hardboiled Web DesignHardboiled Web Design
Hardboiled Web Design
 
[PSU Web 2011] HTML5 Design
[PSU Web 2011] HTML5 Design[PSU Web 2011] HTML5 Design
[PSU Web 2011] HTML5 Design
 
The DiSo Project and the Open Web
The DiSo Project and the Open WebThe DiSo Project and the Open Web
The DiSo Project and the Open Web
 
What you need to know bout html5
What you need to know bout html5What you need to know bout html5
What you need to know bout html5
 
A bug bounty tale: Chrome, stylesheets, cookies, and AES
A bug bounty tale: Chrome, stylesheets, cookies, and AESA bug bounty tale: Chrome, stylesheets, cookies, and AES
A bug bounty tale: Chrome, stylesheets, cookies, and AES
 

Recently uploaded

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdfChristopherTHyatt
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 

Recently uploaded (20)

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 

Syndicated content on your web pages

  • 1. Syndicated content on your web pages Jon Warbrick University of Cambridge Computing Service jw35@cam.ac.uk / @jw35
  • 2. What? • Fetching other people’s content and including it in your web pages • RSS, Atom, JavaScript, Server-side, client- side • We’ll look at • Fetching pitfalls • Interpretation pitfalls • Inclusion pitfalls
  • 3. All a bit low level and geeky... • Mostly low-level stuff • May be handled for you by your development environment • But how do you otherwise know what to look for?
  • 5. Including an RSS feed • When rendering the page • grab the feed • parse it • mark it up • include it • What could possibly go wrong?
  • 6. <?php $rss = new DOMDocument(); $rss->load('http://wordpress.org/news/feed/'); $feed = array(); foreach ($rss->getElementsByTagName('item') as $node) { $item = array ( 'title' => $node->getElementsByTagName('title')->item(0)->nodeValue, 'desc' => $node->getElementsByTagName('description')->item(0)->nodeValue, 'link' => $node->getElementsByTagName('link')->item(0)->nodeValue, 'date' => $node->getElementsByTagName('pubDate')->item(0)->nodeValue, ); array_push($feed, $item); } $limit = 5; for($x=0;$x<$limit;$x++) { $title = str_replace(' & ', ' &amp; ', $feed[$x]['title']); $link = $feed[$x]['link']; $description = $feed[$x]['desc']; $date = date('l F d, Y', strtotime($feed[$x]['date'])); echo '<p><strong><a href="'.$link.'" title="'.$title.'">'.$title.'</a></strong><br />'; echo '<small><em>Posted on '.$date.'</em></small></p>'; echo '<p>'.$description.'</p>'; } ?> http://bavotasan.com/2010/display-rss-feed-with-php/
  • 7. What could go wrong? All of these things: • Slow (connecting, parsing XML, ...) • Load on local server • Load on source server • What if it’s unreachable? • What do you display if it goes wrong?
  • 8. Better: • Seperate fetching and page rendering • Periodic fetch, or cache on demand • support conditional fetch • Validate result, keep last known good • warn of failure • Cache pre-parsed, or even pre-rendered
  • 9. Downsides A web developer had a problem. He said “I know, I’ll use caching!” Now he had two problems. “ ”
  • 12. It’s all numbers Content-Type: text/html; charset=ISO-8859-1
  • 13. [In passing...] • You really want to be using • The Universal Character Set (AKA Unicode) • The UTF8 encoding • http://www.cl.cam.ac.uk/~mgk25/ unicode.html
  • 14. ASCII 32 sp 33 ! 34 " 35 # 36 $ 37 % 38 & 39 ' 40 ( 41 ) 42 * 43 + 44 , 45 - 46 . 47 / 48 0 49 1 50 2 51 3 52 4 53 5 54 6 55 7 56 8 57 9 58 : 59 ; 60 < 61 = 62 > 63 ? 64 @ 65 A 66 B 67 C 68 D 69 E 70 F 71 G 72 H 73 I 74 J 75 K 76 L 77 M 78 N 79 O 80 P 81 Q 82 R 83 S 84 T 85 U 86 V 87 W 88 X 89 Y 90 Z 91 [ 92 93 ] 94 ^ 95 _ 96 ` 97 a 98 b 99 c 100 d 101 e 102 f 103 g 104 h 105 i 106 j 107 k 108 l 109 m 110 n 111 o 112 p 113 q 114 r 115 s 116 t 117 u 118 v 119 w 120 x 121 y 122 z 123 { 124 | 125 } 126 ~
  • 15. Mojibake The fee will be £210 ‫ا%$#"ن‬ ‫+*)(ق‬ ,-".+‫ا‬ ‫ا%0/ن‬ الإعلان العالم Ù‰ لحقوق الإنسان ‘She said—not without some trepidation—‟Time to goâ€?’ Nescafé
  • 16. Can’t take ‘text as numbers’ and stuff it into other ‘text as numbers’ (unless they happen to both use the same encoding) Bottom line Either: convert numbers → text → numbers Or: directly convert numbers → numbers
  • 17. An HTML alternative • ‘Numeric entities’ • e.g.‘pound’ can always be &#163; • Numbers always from UCS, whatever the encoding of the document • So your document need only contain ASCII • But should still declare a character encoding
  • 19. The RSS “Is it HTML?” question
  • 20. <?xml version="1.0" encoding="utf-8" ?> <rss version="2.0"> <channel> <title>Latest news</title> <link>http://www.cam.ac.uk/news</link> <item> <title>Fun Lab: a big hit at the Big Weekend!</title> <link>http://www.cam.ac.uk/public-engagement/news/</link> <description>Some had just wandered in to see what all the fuss was about, as children and adults queued to take part in the hands-on activities, while others had heard about the Fun Lab on the radio or read about it in the Cambridge Evening News. </description> </item> <item> ... </item> </channel> </rss>
  • 21. So, if you find &lt;strong&gt;Boo!&lt;/strong&gt; do you render it as <strong>Boo!</strong> or as ‘Boo!’ ?
  • 22. [Atom gets this right] <title type="html"> AT&amp;amp;T bought &lt;b&gt;by SBC&lt;/b&gt;! </title> <title type="text">AT&amp;T bought by SBC!</title> <title type="xhtml"> <div xmlns="http://www.w3.org/1999/xhtml"> AT&amp;T bought <b>by SBC</b>! </div> </title>
  • 23. • If the answer is ‘plain text’ that that’s easy: • encode ‘<‘ and ‘>’ as ‘&lt;’ and ‘&gt;’ • But if not, you have a new problem • you can’t include HTML verbatim from an untrusted source • http://www.bbc.co.uk/news/ technology-12608651
  • 24. • Only include safe content • note: not ‘Exclude dangerous content’ • parse → filter → serialise • And worry about character encodings • Who’s have thought that +ADw-script+AD4- could be dangerous? And the answer is...
  • 25. Promiscuous JavaScript considered dangerous • Common JavaScript document.write() trick • Attractive - no server-side work required • Hands entire page to JS author • ...or anyone else • Actually applies to any 3rd party JS
  • 26. So, to sum up... • It’s harder than it looks • Think about (at least): • How you retrieve content • How you intrepret it • How you include it
  • 27. If you have been, thanks for listening. Any questions?