Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.
Freeing
Tower
Bridge
It’s 2017!
Why the f*ck are
we still scraping
web sites?
Bad
Londoner
Confession
I have lived in
London for
over 35 years
I have
never seen
Tower Bridge
lift
* Who are hiring
* Who are hiring (obviously)
Geographical
Convenience
Opportunity!
Notification
Data feed
Data feed
Android→
Data feed
Android→
Pebble→
Profit!
Data
No
Data
No
Data
(Machine readable)
DIY
It’s 2017!
Why the f*ck are
we still scraping
web sites?
Web::Query
wq('http://www.towerbridge.org.uk/lift-times/')
->find('table.lined tbody tr')
->each(sub {
push @lifts, [
map { $_->text ...
wq('http://www.towerbridge.org.uk/lift-times/')
->find('table.lined tbody tr')
->each(sub {
push @lifts, [
map { $_->text ...
wq('http://www.towerbridge.org.uk/lift-times/')
->find('table.lined tbody tr')
->each(sub {
push @lifts, [
map { $_->text ...
wq('http://www.towerbridge.org.uk/lift-times/')
->find('table.lined tbody tr')
->each(sub {
push @lifts, [
map { $_->text ...
wq('http://www.towerbridge.org.uk/lift-times/')
->find('table.lined tbody tr')
->each(sub {
push @lifts, [
map { $_->text ...
$VAR1 = [
[
'Sat',
'11 Mar',
'07:30',
'Maintenance Lift ',
'Up river'
],
[
'Sat',
'11 Mar',
'08:00',
'Maintenance Lift ',
...
Munge
iCal
Profit!
Well...
Data::ICal
Data::ICal::Entry::Event
my $ical = Data::ICal->new();
for (@lifts) {
my $date = ...;
my $event = Data::Ical::Entry::Event->new();
$event->add_prop...
No Year
my $date = $dt_parser->parse_datetime(
"$_->[2] $_->[1] $curr_year"
);
# If the month number of this event is less
# than ...
# Tower Bridge web site occasionally
# has duplicates
next if $seen{$date->epoch}++;
my $event =
Data::ICal::Entry::Event->new();
$event->add_properties(
summary => 'Tower Bridge Lift',
description => "$_->[...
A detour
Different
Timezones
(I assume)
Timezones
are easy
my $dt_parser =
DateTime::Format::Strptime->new(
pattern => '%H:%M %d %b %Y',
time_zone => 'Europe/London',
);
sub dt2ical {
my ($dt) = @_;
return $dt->ymd('') . 'T' .
$dt->hms('') .
# Or something like this.
# Check iCal specs.
$dt-...
Failed
validation
https://icalendar.org/validator.html
DateTime::Format::ICal
TZID=Europe/London:20170311T105600
Looked
OK
Failed
validation
“Invalid
TZID”
TZID=Europe/London:20170311T105600
TZID=Europe/London:20170311T105600
To the
standard
definition!
This property specifies the
text value that uniquely
identifies the
"VTIMEZONE" calendar
component in the scope of
an iCal...
If present, the
"VTIMEZONE" calendar
component defines the set of
Standard Time and Daylight
Saving Time observances (or
r...
Add VTIMEZONE
section to the iCal
file
Data::ICal::Entry::TimeZone
This module is not yet useful,
because every time zone
declaration needs to contain
at least one STANDARD or
DAYLIGHT comp...
Plan C
Back to the
iCal standard
definition
DATE WITH LOCAL TIME
The date with local time form is simply
a DATE-TIME value that does not
contain the UTC designator no...
sub dt2ical {
my ($dt) = @_;
return $dt->ymd('') . 'T' .
$dt->hms('');
}
And we have a
valid iCal feed
Throw
together
a web site
http://towerbridge.dave.org.uk
Rebuild the
data daily
Stick the code
on Github
https://github.com/davorg/towerbridge
Subscribe to
the calendar
Profit!
13th
Feb 2017 13:30
Prior
Art
Sun 2nd
April:
20:30 & 21:15
Dave Cross
@davorg
@perlhacks
https://perlhacks.com/
Freeing Tower Bridge
Freeing Tower Bridge
Freeing Tower Bridge
Freeing Tower Bridge
Freeing Tower Bridge
Freeing Tower Bridge
Freeing Tower Bridge
Freeing Tower Bridge
Freeing Tower Bridge
Freeing Tower Bridge
Freeing Tower Bridge
Freeing Tower Bridge
Freeing Tower Bridge
Freeing Tower Bridge
Freeing Tower Bridge
Freeing Tower Bridge
Freeing Tower Bridge
Freeing Tower Bridge
Freeing Tower Bridge
Freeing Tower Bridge
Freeing Tower Bridge
Prochain SlideShare
Chargement dans…5
×

Freeing Tower Bridge

546 vues

Publié le

How to make useful data feeds of the Tower Bridge lift times.

Publié dans : Logiciels
  • Identifiez-vous pour voir les commentaires

  • Soyez le premier à aimer ceci

Freeing Tower Bridge

  1. 1. Freeing Tower Bridge
  2. 2. It’s 2017! Why the f*ck are we still scraping web sites?
  3. 3. Bad Londoner Confession
  4. 4. I have lived in London for over 35 years
  5. 5. I have never seen Tower Bridge lift
  6. 6. * Who are hiring
  7. 7. * Who are hiring (obviously)
  8. 8. Geographical Convenience
  9. 9. Opportunity!
  10. 10. Notification
  11. 11. Data feed
  12. 12. Data feed Android→
  13. 13. Data feed Android→ Pebble→
  14. 14. Profit!
  15. 15. Data
  16. 16. No Data
  17. 17. No Data (Machine readable)
  18. 18. DIY
  19. 19. It’s 2017! Why the f*ck are we still scraping web sites?
  20. 20. Web::Query
  21. 21. wq('http://www.towerbridge.org.uk/lift-times/') ->find('table.lined tbody tr') ->each(sub { push @lifts, [ map { $_->text } $_[1]->contents ] });
  22. 22. wq('http://www.towerbridge.org.uk/lift-times/') ->find('table.lined tbody tr') ->each(sub { push @lifts, [ map { $_->text } $_[1]->contents ] });
  23. 23. wq('http://www.towerbridge.org.uk/lift-times/') ->find('table.lined tbody tr') ->each(sub { push @lifts, [ map { $_->text } $_[1]->contents ] });
  24. 24. wq('http://www.towerbridge.org.uk/lift-times/') ->find('table.lined tbody tr') ->each(sub { push @lifts, [ map { $_->text } $_[1]->contents ] });
  25. 25. wq('http://www.towerbridge.org.uk/lift-times/') ->find('table.lined tbody tr') ->each(sub { push @lifts, [ map { $_->text } $_[1]->contents ] });
  26. 26. $VAR1 = [ [ 'Sat', '11 Mar', '07:30', 'Maintenance Lift ', 'Up river' ], [ 'Sat', '11 Mar', '08:00', 'Maintenance Lift ', 'Down river' ], ... ];
  27. 27. Munge
  28. 28. iCal
  29. 29. Profit!
  30. 30. Well...
  31. 31. Data::ICal
  32. 32. Data::ICal::Entry::Event
  33. 33. my $ical = Data::ICal->new(); for (@lifts) { my $date = ...; my $event = Data::Ical::Entry::Event->new(); $event->add_properties(...); $ical->add_entry($event); } print $fh, '>', $ical->as_string;
  34. 34. No Year
  35. 35. my $date = $dt_parser->parse_datetime( "$_->[2] $_->[1] $curr_year" ); # If the month number of this event is less # than the current month number then we've # gone to the next year. Increment the year # number and re-calculate. if ($date->mon < $curr_mon) { ++$curr_year; $date = $dt_parser→parse_datetime( "$_->[2] $_->[1] $curr_year" ); }
  36. 36. # Tower Bridge web site occasionally # has duplicates next if $seen{$date->epoch}++;
  37. 37. my $event = Data::ICal::Entry::Event->new(); $event->add_properties( summary => 'Tower Bridge Lift', description => "$_->[3] ($_->[4])", dtstart => dt2ical($date), duration => 'PT30M', dtstamp => $now_ical, uid => $date->epoch . '@towerbridge.dave.org.uk', );
  38. 38. A detour
  39. 39. Different Timezones
  40. 40. (I assume)
  41. 41. Timezones are easy
  42. 42. my $dt_parser = DateTime::Format::Strptime->new( pattern => '%H:%M %d %b %Y', time_zone => 'Europe/London', );
  43. 43. sub dt2ical { my ($dt) = @_; return $dt->ymd('') . 'T' . $dt->hms('') . # Or something like this. # Check iCal specs. $dt->time_zone_short_name; }
  44. 44. Failed validation
  45. 45. https://icalendar.org/validator.html
  46. 46. DateTime::Format::ICal
  47. 47. TZID=Europe/London:20170311T105600
  48. 48. Looked OK
  49. 49. Failed validation
  50. 50. “Invalid TZID”
  51. 51. TZID=Europe/London:20170311T105600
  52. 52. TZID=Europe/London:20170311T105600
  53. 53. To the standard definition!
  54. 54. This property specifies the text value that uniquely identifies the "VTIMEZONE" calendar component in the scope of an iCalendar object.
  55. 55. If present, the "VTIMEZONE" calendar component defines the set of Standard Time and Daylight Saving Time observances (or rules) for a particular time zone for a given interval of time.
  56. 56. Add VTIMEZONE section to the iCal file
  57. 57. Data::ICal::Entry::TimeZone
  58. 58. This module is not yet useful, because every time zone declaration needs to contain at least one STANDARD or DAYLIGHT component, and these have not yet been implemented.
  59. 59. Plan C
  60. 60. Back to the iCal standard definition
  61. 61. DATE WITH LOCAL TIME The date with local time form is simply a DATE-TIME value that does not contain the UTC designator nor does it reference a time zone. For example, the following represents January 18, 1998, at 11 PM: 19980118T230000
  62. 62. sub dt2ical { my ($dt) = @_; return $dt->ymd('') . 'T' . $dt->hms(''); }
  63. 63. And we have a valid iCal feed
  64. 64. Throw together a web site
  65. 65. http://towerbridge.dave.org.uk
  66. 66. Rebuild the data daily
  67. 67. Stick the code on Github
  68. 68. https://github.com/davorg/towerbridge
  69. 69. Subscribe to the calendar
  70. 70. Profit!
  71. 71. 13th Feb 2017 13:30
  72. 72. Prior Art
  73. 73. Sun 2nd April: 20:30 & 21:15
  74. 74. Dave Cross @davorg @perlhacks https://perlhacks.com/

×