SlideShare une entreprise Scribd logo
1  sur  225
@dawnieando from	
  @MoveItMarketing #StateOfSearch
REDUCING	
  THE	
  
BURDEN	
  OF	
  
TECHNICAL	
  DEBT	
  
IN	
  SEO
@dawnieando from	
  @MoveItMarketing #StateOfSearch
The Great 302s Pass PageRank Debate
@dawnieando from	
  @MoveItMarketing #StateOfSearch
MULTIPLE	
  GENERATIONS	
  OF	
  A	
  
WEBSITE
@dawnieando from	
  @MoveItMarketing #StateOfSearch
JUST	
  WHAT	
  IS	
  
GENERATIONAL	
  CRUFT?
@dawnieando from	
  @MoveItMarketing #StateOfSearch
GENERATIONAL	
  CRUFT	
  MAKES	
  CRAWLING,	
  
INDEXING,	
  QUERY	
  CLUSTERING	
  &	
  
SEMANTIC	
  UNDERSTANDING	
  MORE	
  
COMPLEX	
  
@dawnieando from	
  @MoveItMarketing #StateOfSearch
TECHNICAL	
  DEBT
@dawnieando from	
  @MoveItMarketing #StateOfSearch
TECHNICAL DEBT
@dawnieando from	
  @MoveItMarketing #StateOfSearch
RECKLESS	
  DEBT PRUDENT	
  DEBT
DELIBERATE
INADVERTENT	
  DEBT
MARTIN	
  FOWLER	
  TECHNICAL	
  DEBT	
  QUADRANT
“We	
  must	
  launch	
  now	
  and	
  
deal	
  with	
  consequences”
“Now	
  we	
  know	
  how	
  we	
  
should	
  have	
  done	
  it”
“We	
  don’t	
  have	
  time	
  for	
  
design”
“What’s	
  layering?”
Credit:	
  Martin	
  Fowler	
  Technical	
  Debt	
  Quadrant
@dawnieando from	
  @MoveItMarketing #StateOfSearch
RECKLESS	
  DEBT PRUDENT	
  DEBT
DELIBERATE	
  DEBT
INADVERTENT	
  DEBT
SEO	
  TECHNICAL	
  DEBT	
  QUADRANT
“SEO	
  is	
  dead	
  /	
  doesn’t	
  
matter”
“What’s	
  a	
  URL	
  parameter?	
  	
  
“What’s	
  a	
  canonical?	
  “	
  
What’s	
  
internationalization?”
S
E
O
T
E
C
H
N
I
C
A
L
D
E
B
T
“Now	
  we	
  know	
  how	
  we	
  
should	
  have	
  done	
  it”
(Further	
  learnings	
  were	
  
discovered	
  as	
  knowledge	
  
grew)
“We	
  must	
  launch	
  now	
  and	
  deal	
  
with	
  SEO	
  issues	
  after”
”We’ll	
  SEO	
  ‘it’	
  later”
PRUDENT	
  DEBT
Project Success ===
Produce planned
deliverables, within
budget, on time
(including approved
changes)
Source:	
  http://4pm.com/2015/09/27/project-­‐failure/
70%
Of All
Projects Fail
Source:	
  http://4pm.com/2015/09/27/project-­‐failure/
@dawnieando from	
  @MoveItMarketing #StateOfSearch
PAST	
  
LEGACY PRESENT	
  
APPLICATION
FUTURE	
  
PLANS
@dawnieando from	
  @MoveItMarketing #StateOfSearch
SEOs	
  WAIT	
  A	
  LONG	
  TIME	
  FOR	
  DEV	
  
CHANGES
https://moz.com/blog/how-­‐long-­‐are-­‐seos-­‐waiting-­‐for-­‐their-­‐most-­‐important-­‐changes
Over	
  40%	
  wait	
  
12	
  months+	
  to	
  
get	
  their	
  most	
  
crucial	
  SEO	
  
changes	
  
implemented
@dawnieando from	
  @MoveItMarketing #StateOfSearch
MAJOR	
  CAUSES	
  OF	
  THIS
https://moz.com/blog/how-­‐long-­‐are-­‐seos-­‐waiting-­‐for-­‐their-­‐most-­‐important-­‐changes
“Legacy	
  technology	
  
or	
  outdated	
  
processes	
  
hampering	
  
progress”
“The	
  change	
  they	
  
want	
  is	
  “not	
  
possible”	
  with	
  
current	
  platform	
  
(37%)”
Source:	
  (Will	
  Critchlow	
  Distilled	
  Research,	
  
on	
  Moz Blog,	
  May	
  2016)
@dawnieando from	
  @MoveItMarketing #StateOfSearch
Credit:	
  Dilbert	
  by	
  Scott	
  Adams
Blame it on the marketers
We’ll	
  
SEO	
  ‘it’	
  
later
FRAGILE	
  
…	
  NOT	
  
AGILE
@dawnieando from	
  @MoveItMarketing #StateOfSearch
THE
ACCUMULATION
OF
TECHNICAL
DEBT
@dawnieando from	
  @MoveItMarketing #StateOfSearch
FIGHTING	
  A	
  LOSING	
  BATTLE
UNPAID	
  
SEO	
  TECHNICAL	
  
DEBT
===	
  SEO	
  
BANKRUPTCY
@dawnieando from	
  @MoveItMarketing #StateOfSearch
A Clean Slate
LET’S START WITH
A
CLEAN
SLATE
@dawnieando from	
  @MoveItMarketing #StateOfSearch
Websites are not disposable
BUT…
@dawnieando from	
  @MoveItMarketing #StateOfSearch
SEARCH	
  ENGINES	
  NEVER	
  FORGETS
Search	
  engines	
  
have	
  a	
  long	
  
memory	
  and	
  a	
  lot	
  
of	
  storage
@dawnieando from	
  @MoveItMarketing #StateOfSearch
A NEW URL HAS NO
BUT YOUR OLD ONES HAVE LOTS
@dawnieando from	
  @MoveItMarketing #StateOfSearch
Web Crawler System History Logs
GOOGLE	
  WON’T	
  
WAIT	
  
(LONG)	
  
FOR	
  YOU	
  
TO	
  DEAL	
  
WITH	
  YOUR	
  
TECHNICAL	
  DEBT
@dawnieando from	
  @MoveItMarketing #StateOfSearch
THE CHALLENGE IS
NOT IN INDEXING…
BUT IN KEEPING
EVERYTHING
INDEXED UP TO DATE
@dawnieando from	
  @MoveItMarketing #StateOfSearch
“If	
  “change”	
  means	
  “any	
  change”,	
  then	
  about	
  40%	
  of	
  all	
  web	
  pages	
  change	
  weekly	
  
[12].	
  Even	
  if	
  we	
  consider	
  only	
  pages	
  that	
  change	
  by	
  a	
  third	
  or	
  more,	
  about	
  7%	
  of	
  all	
  
web	
  pages	
  change	
  weekly	
  [17].”	
  (Broder,	
  A.Z.,	
  Najork,	
  M.	
  and	
  Wiener,	
  J.L.,	
  2003)
EVEN	
  AS	
  FAR	
  BACK	
  IN	
  2003
40% of ALL web pages
changed weekly
___________________
7%	
  of	
  web	
  pages	
  changed	
  a	
  1/3	
  of	
  their	
  
page	
  content	
  or	
  more	
  weekly
@dawnieando from	
  @MoveItMarketing #StateOfSearch
HOW	
  MUCH	
  BIGGER	
  &	
  DYNAMIC	
  IS	
  THE	
  WEB	
  
NOW	
  IN	
  2017?
http://www.internetlivestats.com/total-­‐number-­‐of-­‐websites/
@dawnieando from	
  @MoveItMarketing #StateOfSearch
INCREMENTAL CRAWLING NEVER ENDS
“Crawling	
  method	
  
based	
  on	
  crawl	
  
frequency	
  based	
  on	
  
URL	
  historical	
  
change	
  &	
  
importance	
  
rate”
Crawling
Which
Never
Ends
Ongoing
@dawnieando from	
  @MoveItMarketing #StateOfSearch
CRAWLING
FRONTIER
Shestakov,	
  D.,	
  2013,	
  
July.	
  Current	
  
challenges	
  in	
  web	
  
crawling.	
  
In International	
  
Conference	
  on	
  Web	
  
Engineering (pp.	
  518-­‐
521).	
  Springer,	
  Berlin,	
  
Heidelberg.
@dawnieando from	
  @MoveItMarketing #StateOfSearch
The Crawling ‘Frontier’ (THE URL QUEUE)
‘TO	
  BE	
  EXPLORED’
(OR	
  REVISTED)
@dawnieando from	
  @MoveItMarketing #StateOfSearch
URLs Are Prioritized By Importance &
Take Their Place in The Frontier Queue
(New & Revisit)
@dawnieando from	
  @MoveItMarketing #StateOfSearch
DATA FROM
HISTORY LOGS
CONTRIBUTE
TO WHEN TO
REVISIT URIs
ON THE WEB
@dawnieando from	
  @MoveItMarketing #StateOfSearch
PAST DATA IS A GREAT PREDICTOR
OF FUTURE DATA
PREDICTION	
  BASED	
  
PRIORITY	
  
SCHEDULING
…	
  WHEN	
  
THERE	
  IS	
  
CONSISTENCY
@dawnieando from	
  @MoveItMarketing #StateOfSearch
‘Sampling’ in Crawling for Efficiency
‘SMALL	
  TEST	
  VISITS	
  TO	
  A	
  SITE	
  TO	
  
UNDERSTAND	
  WHETHER	
  IT	
  IS	
  WORTH	
  
CRAWLING	
  &	
  UNDERSTAND	
  	
  URL	
  
PATTERNS	
  &	
  RESOURCES	
  THERE’
CRAWLING
‘HINTS’	
  &	
  ‘HINT	
  
RANGES’
@dawnieando from	
  @MoveItMarketing #StateOfSearch
DUSTBUSTER & DUST CRAWLING RULES
DO	
  NOT	
  
CRAWL	
  IN	
  
THE	
  DUST
BUILDS	
  
‘HINTS’	
  ON	
  
WHAT	
  NOT	
  
TO	
  CRAWL
EVERY	
  SITE	
  WILL	
  
HAVE	
  ITS	
  OWN	
  
CRAWLING	
  
RULES
@dawnieando from	
  @MoveItMarketing #StateOfSearch
Popular CMS ’Rule Patterns’ (URL Parameters)
ALL	
  WILL	
  HAVE	
  COMMON	
  
CANONICALIZATION	
  PATTERNS	
  WHICH	
  
CAN	
  BE	
  LEARNED
@dawnieando from	
  @MoveItMarketing #StateOfSearch
Every Version of Your Past Ecommerce Sites
“Exponentially	
  
multiplicative	
  
URLs”
Had	
  potential	
  to	
  spew…	
  at	
  some	
  point…
DIFFERENT	
  PARAMETERS	
  &	
  URL	
  
PATTERNS	
  WHICH	
  ARE	
  LEARNED	
  BY	
  
CRAWLERS…	
  AND	
  REMEMBERED…	
  
FOREVER
@dawnieando from	
  @MoveItMarketing #StateOfSearch
SEVERAL	
  TYPES	
  OF	
  
CRUFT	
  MAY	
  
CONTRIBUTE
@dawnieando from	
  @MoveItMarketing #StateOfSearch
SOFTWARE	
  
ROT
CODE
SMELL
SOFTWARE	
  CRUFT
@dawnieando from	
  @MoveItMarketing #StateOfSearch
SPAGHETTI	
  CODE
@dawnieando from	
  @MoveItMarketing #StateOfSearch
DEPRECATION
@dawnieando from	
  @MoveItMarketing #StateOfSearch
The	
  
hottest	
  job	
  on	
  
the	
  block	
  
at	
  one	
  point
Once	
  described	
  by	
  
W3C	
  Schools	
  as	
  
‘The	
  Developers	
  
Dream’
LEGACY	
  CODE	
  BASES	
  &	
  DEPRECATED	
  VERSIONS
How	
  Did	
  That	
  
Work	
  Out	
  For	
  Your	
  
SEO?
@dawnieando from	
  @MoveItMarketing #StateOfSearch
WHAT ABOUT ALL THAT CSS & JS
YOU COLLECTED?
@dawnieando from	
  @MoveItMarketing #StateOfSearch
PEOPLE APPEND (ADD TO FILES) -
SOMETIMES IT’S FEAR OF DEPENDENCIES
@dawnieando from	
  @MoveItMarketing #StateOfSearch
GUTENBERG
SOURCE:	
  https://speckyboy.com/meet-­‐greg-­‐schoppe-­‐developer-­‐gutenberg/
“WordPress	
  Core	
  is	
  a	
  minefield	
  of	
  design	
  
decisions	
  that	
  were	
  made	
  for	
  what	
  
WordPress	
  was	
  at	
  the	
  time,	
  and	
  didn’t	
  age	
  
well”	
  (Greg	
  Schoppe,	
  2017)
@dawnieando from	
  @MoveItMarketing #StateOfSearch
https://managewp.com/statistics-­‐about-­‐wordpress-­‐usage
Wordpress now	
  
powers	
  26%	
  of	
  
the	
  web
HUGE	
  
EXAMPLE	
  OF	
  
GENERATIONAL	
  
SOFTWARE	
  
CRUFT
@dawnieando from	
  @MoveItMarketing #StateOfSearch
CONTENT	
  CRUFT
@dawnieando from	
  @MoveItMarketing #StateOfSearch
WELL…	
  WE	
  DID	
  MAKE	
  QUITE	
  A	
  BIT	
  OF	
  
CONTENT
http://www.internetlivestats.com/total-­‐number-­‐of-­‐websites/
@dawnieando from	
  @MoveItMarketing #StateOfSearch
CONTENT CRUFT
https://moz.com/blog/c
lean-­‐site-­‐cruft-­‐before-­‐it-­‐
causes-­‐ranking-­‐
problems-­‐whiteboard-­‐
friday
@dawnieando from	
  @MoveItMarketing #StateOfSearch
Poor	
  quality	
  content	
  signals	
  build	
  up	
  
over	
  time…	
  
incremental	
  crawling	
  just	
  keeps	
  on	
  
rolling	
  and	
  crawling…	
  and	
  gathering	
  
signals
@dawnieando from	
  @MoveItMarketing #StateOfSearch
Source:	
  https://plus.google.com/u/0/+GlennGabe/posts/fXZw2BuSa5B
SIGNALS	
  OF	
  LOW	
  
QUALITY	
  JUST	
  
KEEP	
  
COMPOUNDING	
  
OVER	
  TIME
@dawnieando from	
  @MoveItMarketing #StateOfSearch
PEOPLE CANONICALIZE WRONG
ON	
  MULTIPLE	
  GENERATIONS	
  OF	
  SITES
@dawnieando from	
  @MoveItMarketing #StateOfSearch
@dawnieando from	
  @MoveItMarketing #StateOfSearch
GOOGLEBOT
GETS WHERE
WATER
COULDN’T
https://petermeadit.com/blog
/block-­‐web-­‐crawlers/
@dawnieando from	
  @MoveItMarketing #StateOfSearch
EVEN YOUR STAGING & DEV SITES
Found	
  with	
  a	
  very	
  simple	
  wildcard	
  *	
  site:	
  query
@dawnieando from	
  @MoveItMarketing #StateOfSearch
ARCHITECTURAL	
  &	
  
SEMANTIC	
  CRUFT
@dawnieando from	
  @MoveItMarketing #StateOfSearch
SEMANTIC	
  
LOSS
WONKY TOPICAL STRENGTH
HOW	
  MUCH	
  
STUFF	
  DID	
  YOU	
  
MOVE	
  AROUND	
  
OVER	
  THE	
  YEARS?
@dawnieando from	
  @MoveItMarketing #StateOfSearch
YOU BROKE YOUR SILO STRUCTURE
Image	
  credit:	
  https://www.slideshare.net/patrickstox/nlp-­‐sitemap-­‐smx-­‐2016-­‐
patrick-­‐stox-­‐latest-­‐in-­‐advanced-­‐technical-­‐seo
SEMANTIC	
  
LOSS
YOU	
  BROKE	
  YOUR	
  CORPUS	
  
‘RELATEDNESS’
1st
level
relatedness
2nd
level	
  
relatedness
MANY	
  
SIGNALS	
  
GONE
“You shall know a
word by
the company
it keeps”
(Firth,	
  1957)
(ITS	
  CO-­‐OCCURRENCE	
  VECTOR)
CO-­‐OCCURRENCE
Of	
  words	
  together	
  
&	
  
High	
  commonality	
  of	
  
other	
  shared	
  co-­‐occurring	
  
words
@dawnieando from	
  @MoveItMarketing #StateOfSearch
RELATEDNESS	
  EXAMPLES
üEat
üBake
üCake
üPeel
APPLE AUTOMOBILE
üAccident
üTraffic
üDriver
üCar
üMotor
FURNACE
üHearth
üBlast
üFiery
üGas
üElectric
Miller,	
  G.A.	
  and	
  
Charles,	
  W.G.,	
  
1991.	
  Contextual	
  
correlates	
  of	
  
semantic	
  
similarity. Langua
ge	
  and	
  cognitive	
  
processes, 6(1),	
  
pp.1-­‐28.
@dawnieando from	
  @MoveItMarketing #StateOfSearch
‘CONCEPT DRIFT’
IS A THING
fuzzy difficult  to  perceive;;  indistinct  or  vague.
synonyms: blurry, blurred, indistinct; unclear, bleary, misty, distorted, out	
  of	
  
focus, unfocused, lacking	
  definition, low	
  resolution, nebulous;
Ill-­‐
defined, indefinite, vague, hazy, imprecise, inexact, loose, woolly
"a	
  fuzzy	
  picture"
https://en.wikipedia.org/wiki/Concept_drift
AI
ALERT
@dawnieando from	
  @MoveItMarketing #StateOfSearch
BOOLEAN LOGIC – EXTREME CASES
OF TRUTH - (TRUE (1) OR FALSE (0))
@dawnieando from	
  @MoveItMarketing #StateOfSearch
FUZZY	
  LOGIC
• Rule	
  based	
  logic
• Been	
  around	
  for	
  20+	
  
years
• Is	
  within	
  a	
  subset	
  of	
  AI
@dawnieando from	
  @MoveItMarketing #StateOfSearch
‘FUZZY LOGIC’ – DEGREES OF TRUTH
SEMANTIC	
  
LOSS
@dawnieando from	
  @MoveItMarketing #StateOfSearch
FUZZY LOGIC – DEGREES OF TRUTH
0.8	
  Doc	
  ID	
  likely	
  to	
  
be	
  a	
  correct	
  URI	
  to	
  
choose	
  from	
  term	
  /	
  
query	
  cluster
@dawnieando from	
  @MoveItMarketing #StateOfSearch
Semantics	
  &	
  concepts	
  
relatednes may	
  be	
  
‘secret	
  sauce’	
  when	
  it	
  
comes	
  to	
  ’precision’	
  
over	
  ‘recall’
@dawnieando from	
  @MoveItMarketing #StateOfSearch
TWO-PHASE
RANKING IN
A SEARCH
NODE
Presented	
  by	
  B	
  Cambazoglu at	
  European	
  Summer	
  School	
  Information	
  Retrieval	
  2017	
  – (Cambazoglu,	
  B.B.	
  and	
  Baeza-­‐Yates,	
  R.,	
  
2011.	
  Scalability	
  challenges	
  in	
  web	
  search	
  engines.	
  In Advanced	
  topics	
  in	
  information	
  retrieval (pp.	
  27-­‐50).	
  Springer	
  Berlin	
  
Heidelberg.)
@dawnieando from	
  @MoveItMarketing #StateOfSearch
URL	
  CRUFT
@dawnieando from	
  @MoveItMarketing #StateOfSearch
‘URL	
  CRUFT’	
  IS	
  A	
  
THING
“characters relevant	
  or	
  meaningful	
  
only	
  to	
  the	
  people	
  who	
  created	
  the	
  
site,	
  such	
  as	
  implementation	
  details	
  
of	
  the	
  computer	
  system	
  which	
  serves	
  
the	
  page.	
  Examples	
  of	
  URL	
  cruft	
  
include filename	
  extensions such	
  
as .php or .html,	
  and	
  internal	
  
organizational	
  details	
  such	
  
as /public/or /Users/john/work/draft
s/.[9]”	
  
(Wikipedia	
  Definition)
ALL	
  THE	
  RANDOM	
  
URLS	
  YOU	
  CREATED	
  
OVER	
  THE	
  YEARS	
  &	
  
SITES
(EVEN	
  BY	
  
ACCIDENT)
@dawnieando from	
  @MoveItMarketing #StateOfSearch
410 Gone
§ “Some,	
  we’ll	
  just	
  kill	
  
off	
  with	
  a	
  410…”
§ “Then	
  the	
  URLs	
  will	
  
be	
  gone”
@dawnieando from	
  @MoveItMarketing #StateOfSearch
https://www.youtube.com/watch?v=xp5Nf8ANfOw
THE	
  DIFFERENCE	
  BETWEEN	
  HOW	
  GOOGLE	
  TREATS	
  404	
  VERSUS	
  410s
@dawnieando from	
  @MoveItMarketing #StateOfSearch
302	
  ==	
  Default 301	
  ==	
  Intentional
404	
  ==	
  Default 410	
  ==	
  Intentional
“The	
  410	
  response	
  is	
  primarily	
  intended	
  to	
  assist	
  the	
  task	
  of	
  web	
  maintenance	
  by	
  
notifying	
  the	
  recipient	
  that	
  the	
  resource	
  is	
  intentionally	
  unavailable	
  and	
  that	
  the	
  server	
  
owners	
  desire	
  that	
  remote	
  links	
  to	
  that	
  resource	
  be	
  removed.”	
  (RFC	
  7231)
https://tools.ietf.org/html/rfc7231#section-­‐6.5.9
ARE YOU SURE?
MAYBE YES
@dawnieando from	
  @MoveItMarketing #StateOfSearch
https://twitter.com/JohnMu/status/903904602617204738
@dawnieando from	
  @MoveItMarketing #StateOfSearch
DO NOT THINK 410s WON’T BE
RECRAWLED AGAIN
Source:	
  https://www.docsplace.org/4578/09/410-­‐gone-­‐stops-­‐crawling-­‐dead-­‐urls/
@dawnieando from	
  @MoveItMarketing #StateOfSearch
“We	
  knew	
  there	
  was	
  content	
  
there	
  at	
  some	
  point	
  so	
  we	
  
just	
  swing	
  by	
  every	
  now	
  and	
  
then	
  to	
  see	
  if	
  anything	
  came	
  
back”	
  (John	
  Mueller,	
  2016)
In Reality… Gone Is Never Gone
@dawnieando from	
  @MoveItMarketing #StateOfSearch
A URL IS ’NOT’ CONTENT
IT IS A LOCATION WHERE A
RESOURCE LIVES / LIVED
IT MERELY JUST BOILS
DOWN TO A DOC ID MAPPED
TO TERM IDS IN A MATRIX
@dawnieando from	
  @MoveItMarketing #StateOfSearch
ZOMBIES
ARE	
  NEVER
GONE
NO	
  URLS	
  ARE	
  
EVER	
  GONE	
  	
  
ONLY	
  THE	
  RESOURCE	
  THERE	
  
IS	
  GONE
https://www.seroundtable.com/google-­‐410-­‐indexing-­‐22584.html
5	
  YEARS	
  LATER
@dawnieando from	
  @MoveItMarketing #StateOfSearch
HOW ABOUT 14 YEARS LATER?
https://www.webmasterworld.com/google/4864613.htm
2	
  HOURS	
  ALIVE…	
  
14	
  YEARS	
  LATER
@dawnieando from	
  @MoveItMarketing #StateOfSearch
SMALL TOPICAL URL
FISH
IN A BIG TOPICAL
POND
SEMANTIC	
  
LOSS
@dawnieando from	
  @MoveItMarketing #StateOfSearch
COME TO OUR ANNUAL EVENT WITH
THE SAME NAME BUT A NEW URL EVERY YEAR
@dawnieando from	
  @MoveItMarketing #StateOfSearch
“COOL	
  URIs	
  DON’T	
  
CHANGE”
Sir	
  Tim	
  Berners-­‐Lee
(Inventor	
  of	
  the	
  World	
  Wide	
  Web)
https://www.w3.org/Provider/Style/URI
Attrubution:	
  By	
  Uldis Bojārs (Flickr.)	
  [CC	
  BY-­‐SA	
  2.0	
  (http://creativecommons.org/licenses/by-­‐sa/2.0)],	
  via	
  Wikimedia	
  
Commons
@dawnieando from	
  @MoveItMarketing #StateOfSearch
YOU END UP WITH A CONGA LINE OF
LEGACY URLS, SUBDOMAINS
& VARIOUS SITE
PROTOCOLS
…In	
  the	
  URL	
  queue
@dawnieando from	
  @MoveItMarketing #StateOfSearch
URL_SEEN TEST
YOU CAN’T JUST KEEP TRYING TO JUMP
THE INDEXING QUEUE EITHER
PUSH	
  INDEXING PULL INDEXING
E.G.	
  FETCH	
  AS	
  GOOGLEBOT	
  &	
  
SUBMIT	
  TO	
  INDEX,	
  XML	
  
SITEMAP	
  SUBMISSIONS
VISITS	
  BY	
  NATURAL	
  CRAWLING	
  
&	
  DISCOVERY	
  OF	
  URLS	
  /	
  URL	
  
VISIT	
  SCHEDULING	
  /	
  REVISITS
@dawnieando from	
  @MoveItMarketing #StateOfSearch
CRAWL	
  CRUFT	
  IS	
  A	
  
SYMPTOM
@dawnieando from	
  @MoveItMarketing #StateOfSearch
IMPORTANCE
TIERING
FOR SCALE
(EFFICIENCY)
@dawnieando from	
  @MoveItMarketing #StateOfSearch
TWO STEPS FORWARD & ONE STEP BACK
STRONG	
  CANONICAL	
  CONTEXT	
  URL
YES YES YES
NONO
YES YES YES YES YES
NO NO NO NO
AN	
  OTHER	
  OR	
  MULTIPLE	
  WEAK	
  ALTERNATIVES
@dawnieando from	
  @MoveItMarketing #StateOfSearch
PAST DATA ON CHANGE IS A GREAT
PREDICTOR OF FUTURE DATA
PREDICTION	
  BASED	
  
PRIORITY	
  
SCHEDULING
…	
  WHEN	
  
THERE	
  IS	
  
CONSISTENCY
“past	
  changes	
  to	
  a	
  page	
  are	
  a	
  good	
  predictor	
  of	
  future	
  changes.	
  This	
  result	
  
has	
  practical	
  implications	
  for	
  incremental	
  web	
  crawlers	
  that	
  seek	
  to	
  
maximize	
  the	
  freshness	
  of	
  a	
  web	
  page	
  collection	
  or	
  index.”	
  (
@dawnieando from	
  @MoveItMarketing #StateOfSearch
TO	
  BUILD	
  
PROBABILITY	
  &	
  
PREDICTABILITY	
  
MODELS
@dawnieando from	
  @MoveItMarketing #StateOfSearch
BASED	
  ON	
  ROLLING	
  
AVERAGES	
  /	
  SIGNALS
FROM	
  PAST
CRAWL	
  VISITS
@dawnieando from	
  @MoveItMarketing #StateOfSearch
‘Transitive’?? - ‘THE WHOLETREE IS ROTTEN’
Transitive	
  -­‐ A	
  ==	
  B	
  +	
  B	
  ==	
  C	
  then	
  A	
  ==	
  C
For	
  some	
  types	
  of	
  content	
  more	
  than	
  
others	
  – e.g.	
  ecommerce/directories	
  but	
  
not	
  news
SAMPLING
@dawnieando from	
  @MoveItMarketing #StateOfSearch
CRAWL	
  SAMPLES	
  ALSO	
  
HELP	
  WITH	
  MODELLING	
  
TO	
  MAP	
  DOCS	
  TO	
  TOPIC	
  
RELEVANCE	
  &	
  
RELATEDNESS
@dawnieando from	
  @MoveItMarketing #StateOfSearch
TOPICAL	
  DILUTION	
  &
URL	
  IMPORTANCE	
  DILUTION
RELATEDNESS	
  HELPS	
  
WITH	
  
‘GROUNDING’	
  
(confirmation	
  of	
  other	
  signals)
@dawnieando from	
  @MoveItMarketing #StateOfSearch
WRONG
URL
RANKING
’SWAPPING
OUT’
(Especially	
  
multiple	
  
child	
  nodes)
SHARP	
  &	
  
VOLATILE
RANKING	
  
FLUX
SOME	
  SYMPTOMS
@dawnieando from	
  @MoveItMarketing #StateOfSearch
SOME	
  SOLUTIONS
§ How	
  can	
  you	
  change	
  the	
  hints	
  
associated	
  with	
  your	
  site	
  for	
  better	
  
rankings	
  and	
  SEO?
@dawnieando from	
  @MoveItMarketing #StateOfSearch
ACKNOWLEDGE	
  &	
  
CALCULATE	
  THE	
  DEBT
@dawnieando from	
  @MoveItMarketing #StateOfSearch
@dawnieando from	
  @MoveItMarketing #StateOfSearch
THE
BOSTON
MATRIX &
SEO Cash	
  Cows	
  
(High	
  converting	
  
queries	
  &	
  
URLs)
Dogs
Low	
  return,	
  low	
  
conversion	
  
queries	
  /	
  URLs
Question	
  
Marks	
  
(Jury’s	
  out)
?
PRIORITIZE	
  
THE	
  DEBT
MARKET	
  	
  GROWTH
MARKET	
  	
  SHARE
Stars
High	
  potential.	
  	
  
Worth	
  more	
  
effort	
  
@dawnieando from	
  @MoveItMarketing #StateOfSearch
ADD EVERYTHING TO GSC FROM THE PAST
& PRESENT
THERE	
  MAY	
  STILL	
  BE	
  UNDETECTED	
  ACTIVITY	
  GOING	
  ON	
  THERE
@dawnieando from	
  @MoveItMarketing #StateOfSearch
IDENTIFY	
  PAGES	
  IN	
  QUERY	
  CLUSTERS	
  
(QUERY	
  CLASSES	
  &	
  INTENT	
  MEETING	
  SAME	
  
INFORMATION	
  NEED	
  CATEGORY)
@dawnieando from	
  @MoveItMarketing #StateOfSearch
REVIEW RELATIVE IMPORTANCE
SIGNALS OF INTERNAL LINKS
ARE	
  THESE	
  
REALLY	
  
AMONGST	
  
YOUR	
  
MOST
IMPORTANT	
  
URLS?
@dawnieando from	
  @MoveItMarketing #StateOfSearch
BUT…
REVIEW THE
‘RELATEDNESS’
OF
INTERNAL
LINKS
TO PAGES
Domain URL
INTERNALLY	
  LINKING	
  PAGES	
  TO	
  THE	
  
TARGET	
  URL
IS	
  THE	
  ‘RELATEDNESS’	
  HIGHLY	
  
RELEVANT	
  TO	
  ASSIST	
  WITH	
  
CONTEXTUAL	
  &	
  SEMANTIC	
  SIGNALS?
IS	
  
RELATEDNESS	
  
HIGH?
@dawnieando from	
  @MoveItMarketing #StateOfSearch
FIND SITES ON THE SAME SERVER
@dawnieando from	
  @MoveItMarketing #StateOfSearch
YOU	
  NEED	
  
TO	
  KNOW	
  
WHAT’S	
  ON	
  
THAT	
  
SERVER
DIAGNOSE: HEAD BACK TO THE
SERVER
@dawnieando from	
  @MoveItMarketing #StateOfSearch
SOME QUESTIONS TO ASK
HOW MANY MICRO-SITES HAVE YOU HAD?
HOW MANY SUBDOMAINS?
HOW MANY OTHER DOMAINS?
WHO IS RESPONSIBLE FOR DOMAIN REG
WHO KNOWS WITHIN THE ORGANISATION?
WHO REGISTERED THE DOMAINS?
WHO CAN UPDATE DNS RECORDS?
ARE THESE SITES STILL ON SERVERS?
HAVE ANY OF THESE SITES HAD MANUALACTIONS?
HOW ARE THESE SITES REDIRECTED?
ARE THEY PARKED DOMAINS?
@dawnieando from	
  @MoveItMarketing #StateOfSearch
Source:	
  https://www.seroundtable.com/poll-­‐log-­‐files-­‐seo-­‐24523.html
NEARLY	
  1/3	
  OF	
  
SEOs	
  SAY	
  THEY	
  
DON’T	
  NEED	
  LOG	
  
FILES	
  
@dawnieando from	
  @MoveItMarketing #StateOfSearch
DIAGNOSE: SERVER LOG FILE ANALYSIS
BUT	
  WATCH	
  OUT	
  FOR	
  
OTHER	
  TOOLS	
  EMULATING	
  
GOOGLEBOT	
  AND	
  FILTER	
  
THEM	
  OUT
ANALYSE	
  THE	
  LOGS	
  FOR	
  
‘ALL’	
  YOUR	
  SITES	
  AND	
  ‘ALL’	
  
PROTOCOLS	
  TO	
  SEE	
  THE	
  
CRAWL	
  PATTERNS	
  
EMERGE
NB:	
  YOU	
  MAY	
  
BE	
  LOOKING	
  
AT	
  URLS	
  
QUEUED	
  
LONG	
  AGO
@dawnieando from	
  @MoveItMarketing #StateOfSearch
SET	
  UP	
  A	
  PLAN	
  TO	
  
REPAY	
  THE	
  DEBT
@dawnieando from	
  @MoveItMarketing #StateOfSearch
MoSCoW Approach
MUST	
  
HAVE
SHOULD	
  
HAVE
COULD	
  
HAVE
WON’T	
  
HAVE	
  
THIS	
  TIME
@dawnieando from	
  @MoveItMarketing #StateOfSearch
MoSCoW Prioritization
Source:	
  https://www.agilebusiness.org/content/moscow-­‐prioritisation-­‐0
@dawnieando from	
  @MoveItMarketing #StateOfSearch
SEO Refactoring Away
The Past Whilst Working
Towards The Future
@dawnieando from	
  @MoveItMarketing #StateOfSearch
TECHNICAL	
  DEBT
COMES	
  WITH	
  INTEREST
TO	
  
BE	
  
REPAID	
  
VIA	
  REFACTORING
@dawnieando from	
  @MoveItMarketing #StateOfSearch
Refactoring	
  Definition
“Refactoring	
  …is	
  a	
  disciplined	
  technique	
  for	
  restructuring	
  an	
  
existing	
  body	
  of	
  code,	
  altering	
  its	
  internal	
  structure	
  without	
  
changing	
  its	
  external	
  behavior.	
  
Its	
  heart	
  is	
  a	
  series	
  of	
  small	
  behavior	
  preserving	
  
transformations.”
https://en.wikipedia.org/wiki/Code_refactoring
@dawnieando from	
  @MoveItMarketing #StateOfSearch
SEO	
  
REFACTORING
HOUSE	
  
KEEPING
WORKING	
  ON	
  THE	
  
PAST,	
  PRESENT	
  &	
  
FUTURE	
  
SIMULTANEOUSLY
USING	
  
APPROACHES	
  
LIKE	
  MoSCoW
ONGOING	
  
ITERATIVE	
  
IMPROVEMENTS
ONGOING
ROLLING
AUDITS
PAYING	
  OFF	
  
DEBT	
  ‘A	
  BIT	
  
AT	
  A	
  TIME’
MARGINAL	
  
GAINS
@dawnieando from	
  @MoveItMarketing #StateOfSearch
SKIP	
  &	
  DIVERT	
  THE	
  
DEBT
@dawnieando from	
  @MoveItMarketing #StateOfSearch
HAVE YOUR SAY IN CRAWLING ‘RULES’
Help	
  Google	
  Build	
  ‘Crawling	
  
Rules’	
  for	
  your	
  site	
  rather	
  
than	
  wasting	
  time	
  on	
  
‘sampling’	
  and	
  giving	
  a	
  bad	
  
impression
GIVE	
  HELP	
  AND	
  
GUIDANCE	
  WITH	
  THE	
  
CRAWL	
  RULE	
  AND	
  
HINT	
  BUILDING
@dawnieando from	
  @MoveItMarketing #StateOfSearch
Help	
  Google	
  Build	
  
‘Crawling	
  Rules’	
  for	
  
your	
  site	
  rather	
  than	
  
wasting	
  time	
  on	
  
‘sampling’	
  and	
  giving	
  
a	
  bad	
  impression
BE	
  VERY	
  
CAREFUL
@dawnieando from	
  @MoveItMarketing #StateOfSearch
REVISIT ALLPAST .HTACCESS FILES
Can	
  you	
  rewrite	
  the	
  rules	
  to	
  be	
  
more	
  efficient	
  with	
  regex	
  or	
  cut	
  out	
  
some	
  old	
  rules	
  still	
  firing	
  
unnecessarily?	
  (CREATE	
  SHORTCUTS)
REMEMBER	
  .HTACCESS	
  RULES	
  RUN	
  IN	
  ORDER	
  OF	
  
THEIR	
  APPEARANCE	
  IN	
  THE	
  FILE.	
  	
  
CAN	
  YOU	
  USE	
  WILDCARDS	
  TO	
  OPTIMIZE	
  OR	
  SKIP	
  
STEPS?
.HTACCESS	
  
SITE	
  1
.HTACCESS	
  
SITE	
  2
.HTACCESS	
  
SITE	
  3
@dawnieando from	
  @MoveItMarketing #StateOfSearch
Learn By Heart Regular
Expressions & How URLs
with Multiple Parameters Are
Handled
The	
  most	
  restrictive	
  parameter	
  blocked	
  overrules	
  
lesser	
  restrictions
@dawnieando from	
  @MoveItMarketing #StateOfSearch
FIND & CHOP BACK REDIRECT CHAINS
@dawnieando from	
  @MoveItMarketing #StateOfSearch
REVIEW &
REMOVE
REDUNDANT
FILES
@dawnieando from	
  @MoveItMarketing #StateOfSearch
WHAT SUPERFLUOUS JAVASCRIPT &
CSS IS THERE UNNECESSARILY?
Avoid relative URLs
versus absolute URLs
(particularly in
Wordpress)
@dawnieando from	
  @MoveItMarketing #StateOfSearch
REVIEW & UNDERSTAND - THE
CANONICAL LINK RELATION
§ 30X	
  redirects
§ Canonical	
  tag
§ Href lang
§ HTTPS	
  protocol
§ Global	
  canonicalization	
  rules
§ URL	
  normalization
In	
  ’ALL’	
  its	
  forms
RFC6596
@dawnieando from	
  @MoveItMarketing #StateOfSearch
CUT	
  
THROUGH	
  TO	
  
THE	
  
DEVELOPERS
@dawnieando from	
  @MoveItMarketing #StateOfSearch
REBUILD	
  STRONG	
  SEMANTICS	
  &	
  
‘RELATEDNESS’
@dawnieando from	
  @MoveItMarketing #StateOfSearch
BE	
  CAREFUL	
  
WITH	
  THE	
  
CONTENT	
  
PRUNING	
  
‘CHAINSAW’
Did	
  you	
  just	
  
‘prune	
  away’	
  
your	
  corpus	
  
‘relatedness’?	
  
@dawnieando from	
  @MoveItMarketing #StateOfSearch
UPCYCLING
URLs
RATHER	
  THAN	
  ’REMOVE’	
  
CONSIDER	
  ‘IMPROVE’
EXPAND,	
  DE-­‐GROUP	
  &	
  RE-­‐GROUP
@dawnieando from	
  @MoveItMarketing #StateOfSearch
THOUGHTFUL	
  
QUERY	
  
CLUSTER	
  
BASED	
  
‘PRUNING’	
  ,	
  
‘CONTENT	
  
MORPHING’	
  
AND	
  ‘QUERY	
  
CLUSTER	
  RE-­‐
GROUPING’
@dawnieando from	
  @MoveItMarketing #StateOfSearch
Pass Strong Clues - Highly Relevant New
Conceptual Structures
STRONG
SEMANTICS	
  &	
  
CONCEPTUALLY	
  
CO-­‐OCCURRING	
  
TERMS
@dawnieando from	
  @MoveItMarketing #StateOfSearch
WHAT	
  CORRELATES?
https://www.google.com/trends/correlate/search
@dawnieando from	
  @MoveItMarketing #StateOfSearch
SOLUTION: Wiki Page
Redirects on Topics
https://dbpedia.org/sparql
Wikipedia	
  
Redirects
thesaurus.com
OR	
  A	
  GOOD	
  OLD	
  FASHIONED	
  THESAURUS
@dawnieando from	
  @MoveItMarketing #StateOfSearch
TIE	
  YOUR	
  THEMATIC	
  CORPUS	
  
BACK	
  TOGETHER
@dawnieando from	
  @MoveItMarketing #StateOfSearch
USE	
  ‘STRONGLY	
  CONNECTED	
  
COMPONENTS’	
  (TOPICAL	
  HUBS	
  
FOR	
  FOCUSED	
  CRAWLING)	
  TO	
  
REAFFIRM	
  THE	
  SEMANTIC	
  
STRENGTH	
  YOU	
  ONCE	
  HAD
@dawnieando from	
  @MoveItMarketing #StateOfSearch
BUILD	
  RICH	
  CONTENT	
  HUBS
FOR	
  PRIMARY	
  TARGET	
  TOPICS
Broder,	
  A.,	
  Kumar,	
  R.,	
  
Maghoul,	
  F.,	
  Raghavan,	
  P.,	
  
Rajagopalan,	
  S.,	
  Stata,	
  R.,	
  
Tomkins,	
  A.	
  and	
  Wiener,	
  J.,	
  
2000.	
  Graph	
  structure	
  in	
  the	
  
web. Computer	
  
networks, 33(1),	
  pp.309-­‐320.
STRONGLY	
  
CONNECTED	
  
HUB
@dawnieando from	
  @MoveItMarketing #StateOfSearch
BUILD WELL CATEGORIZED AND
CONCEPTUALLY STRUCTURED
SITEMAPS
https://www.slideshare.net/p
atrickstox/nlp-­‐sitemap-­‐smx-­‐
2016-­‐patrick-­‐stox-­‐latest-­‐in-­‐
advanced-­‐technical-­‐seo
@dawnieando from	
  @MoveItMarketing #StateOfSearch
XML Sitemaps Are Your Friend… (Strong
Foundations)
They	
  help	
  to	
  
pass	
  
‘importance’	
  
signals	
  to	
  URLs
But…	
  never	
  
leave	
  them	
  to	
  
just	
  
autogenerate
without	
  
periodically	
  
checking
‘The	
  
foundations’	
  
underneath	
  a	
  
site
@dawnieando from	
  @MoveItMarketing #StateOfSearch
CREATE WELL ORGANISED XML
SITEMAPS WITH IMPORTANT URLS
@dawnieando from	
  @MoveItMarketing #StateOfSearch
EXTERNALLY HOSTED XML SITEMAPS
• Take	
  back	
  control
• Jump	
  the	
  dev	
  queue
• Allows	
  for	
  custom	
  configuration	
  of	
  optimal	
  
canonical	
  click	
  paths
• Allows	
  for	
  consistent	
  signals	
  of	
  importance	
  to	
  
included	
  URLs
• Forget	
  about	
  setting	
  priority
• Forget	
  about	
  last	
  modified
• Even	
  a	
  simple	
  list	
  of	
  URLs	
  FTW	
  will	
  do
• Keep	
  them	
  organised for	
  granular	
  analysis	
  of	
  
problem	
  site	
  sections
“Increase and decrease
importance via internal
link optimization to
signal key quality
sections”
@dawnieando from	
  @MoveItMarketing #StateOfSearch
EXCLUDE LOWER
QUALITY SITE
SECTIONS (for now)
Excluded	
  
sections
@dawnieando from	
  @MoveItMarketing #StateOfSearch
BUT…
REVIEW THE
‘RELATEDNESS’
OF
INTERNAL
LINKS
TO PAGES
Domain URL
INTERNALLY	
  LINKING	
  PAGES	
  TO	
  THE	
  
TARGET	
  URL
IS	
  THE	
  ‘RELATEDNESS’	
  HIGHLY	
  
RELEVANT	
  TO	
  ASSIST	
  WITH	
  
CONTEXTUAL	
  &	
  SEMANTIC	
  SIGNALS?
IS	
  
RELATEDNESS	
  
HIGH?
@dawnieando from	
  @MoveItMarketing #StateOfSearch
DON’T	
  LET	
  THE	
  
DEBT	
  HOLD	
  
YOU	
  BACK
MAKE	
  GREAT
CONTENT	
  &	
  
BRAND	
  BUZZ
@dawnieando from	
  @MoveItMarketing #StateOfSearch
A	
  ‘TWO-­‐OARED	
  
ROWING	
  BOAT	
  GOES	
  
FURTHER
@dawnieando from	
  @MoveItMarketing #StateOfSearch
THEN…
MONITOR
& BE
PATIENT
@dawnieando from	
  @MoveItMarketing #StateOfSearch
“It’s	
  simple	
  really,	
  the	
  businesses	
  
seeing	
  growth	
  in	
  natural	
  search	
  
are	
  those	
  implementing	
  technical	
  
changes	
  successfully,	
  the	
  most	
  
common	
  cause	
  of	
  decline	
  is	
  either	
  
ignoring	
  technical	
  or	
  getting	
  it	
  
wrong.”
Tim	
  Grice,	
  Branded3,	
  2017
https://www.branded3.com/blog/link-­‐spam-­‐migration-­‐disasters-­‐penguin-­‐organic-­‐growth-­‐2017/
@dawnieando from	
  @MoveItMarketing #StateOfSearch
Positive Consistency
is KEY
’ROLLING	
  AVERAGES	
  
CAN	
  GO	
  BOTH	
  WAYS’
@dawnieando from	
  @MoveItMarketing #StateOfSearch
APPENDIX	
  &	
  EDITORS	
  
CUT
@dawnieando from	
  @MoveItMarketing #StateOfSearch
BUT WHEN DATA IS INCONSISTENT
FUZZY LOGIC MAY FAIL
‘DEGREES	
  OF	
  
TRUTH’
MORE	
  
BLURRED	
  /	
  
VAGUE
1st
level
relatedness
A	
  measure	
  of	
  words	
  
that	
  directly	
  occur	
  
together	
  in	
  a	
  text	
  or	
  
‘corpus’	
  (collection	
  of	
  
documents	
  together)
’TWO	
  WORDS	
  WHICH	
  
TEND	
  TO	
  CO-­‐OCCUR	
  
MUST	
  BE	
  RELATED’
CO-­‐OCCURRENCE	
  VECTORS
EXAMPLES:	
  car/automobile,	
  coast/shore,	
  furnace/stove	
  (Miller	
  &	
  Charles,	
  1991)	
  
2nd
level
relatedness
Share	
  common	
  words	
  they	
  
co-­‐occur	
  with	
  aside	
  from	
  
directly	
  co-­‐occurring	
  
together	
  (both	
  appear	
  in	
  
same	
  types	
  of	
  text	
  as	
  each	
  
other	
  ===	
  related
EXAMPLE:	
  	
  FURNACE	
  &	
  
OVEN	
  BOTH	
  SHARE	
  HEAT,	
  
MOTOR	
  &	
  ROAD,	
  CAR	
  &	
  
AUTOMOBILE	
  BOTH	
  SHARE	
  
PASSENGERS
CO-­‐OCCURRENCE	
  VECTORS
@dawnieando from	
  @MoveItMarketing #StateOfSearch
MORE Solutions
• Do	
  a	
  bit	
  of	
  ‘up	
  front’	
  thinking	
  (AVOID	
  TECHNICAL	
  DEBT	
  IN	
  
FIRST	
  PLACE)
• Measure	
  SEO	
  technical	
  debt
• Refactor	
  SEO	
  technical	
  debt	
  away
• Reducing	
  SEO	
  technical	
  debt	
  should	
  be	
  inbuilt
• Accept	
  some	
  SEO	
  (least	
  impactful)	
  technical	
  debt	
  is	
  
necessary	
  for	
  agility
• ’Chip	
  away’	
  at	
  SEO	
  technical	
  debt
@dawnieando from	
  @MoveItMarketing #StateOfSearch
’Fuzzy’ URL Targets with Each Site Generation
EVERYTHING	
  GETS	
  
A	
  BIT	
  BLURRED
‘Which	
  is	
  the	
  target	
  URL	
  
again?
@dawnieando from	
  @MoveItMarketing #StateOfSearch
”The	
  URL	
  page	
  importance	
  score	
  can	
  be	
  retrieved	
  from	
  the	
  …	
  URL	
  history	
  log …or	
  it	
  can	
  
be	
  obtained	
  by	
  obtaining	
  the	
  historical	
  page	
  importance	
  score	
  for	
  the	
  URL	
  for	
  a	
  
predefined	
  number	
  of	
  prior	
  crawls	
  and	
  then	
  performing	
  a	
  predefined	
  filtering	
  function	
  
on	
  those	
  values	
  to	
  obtain	
  the	
  URL	
  page	
  importance	
  score.”
Scheduler	
  for	
  Search	
  Engine	
  Crawler
https://www.google.com/patents/US8042112
DOC	
  ID CRAWL	
  1	
  
IMPORTANCE	
  
RECORD
CRAWL	
  2	
  
IMPORTANCE	
  
RECORD
CRAWL 3	
  
IMPORTANCE	
  
RECORD
CRAWL	
  4	
  
IMPORTANCE	
  
RECORD
CRAWL	
  5	
  
IMPORTANCE	
  
RECORD
CRAWL	
  6
IMPORTANCE	
  
RECORD
DOC	
  ID	
  1 1 0.8 0.6 0.4 0.2 0
DOC	
  ID	
  2 0 0.2 0.4 0.6 0.8 1
@dawnieando from	
  @MoveItMarketing #StateOfSearch
Example	
  MoSCoW Prioritisation
MUST	
  HAVE SHOULD	
  HAVE COULD	
  HAVE WON’T	
  HAVE	
  THIS	
  TIME
Remove	
  infinite	
  loops Identify ideal	
  click	
  
paths
Canonicalize to	
  superset e.g.	
  Page	
  title	
  rewriting
Redirect	
  true	
  dupes Check	
  server	
  log files Upcycle	
  conflict content e.g.	
  All	
  meta-­‐description
Review	
  parameter
handling
Analyse queries	
  on	
  
near-­‐dupes
Strengthen	
  categories &	
  
subcategories	
  (relevance)
Review	
  ‘added-­‐value’	
  
difference	
  in	
  ’similars’
Add	
  seasonal	
  &	
  TIME	
  IS
OF	
  THE	
  ESSENCE	
  content
pieces	
  (topical /	
  
evergreen)
Review internal	
  link	
  
popularity	
  of	
  
important	
  pages
Build	
  topic	
  hub	
  static pages	
  
(Strongly	
  connected	
  
component)
Add	
  ‘flow’	
  content	
  to	
  
amplify	
  via	
  social
Check soft	
  404s Review	
  crawling	
  on	
  
near dupes	
  &	
  similars
Sectional content	
  audit Add site	
  section	
  
properties	
  in	
  GSC	
  
Check	
  server	
  errors Review queries	
  on	
  
similars
Add categorized	
  XML	
  
sitemaps
Add	
  content	
  from sub	
  to	
  
superset	
  &	
  canonicalize
PAST
PRESENT
FUTURE
@dawnieando from	
  @MoveItMarketing #StateOfSearch
URL
NORMALIZATION
Can be
problematic
and ‘crufty’
too
https://en.wikipedia.org/wiki/URL_normalization
@dawnieando from	
  @MoveItMarketing #StateOfSearch
SOLUTION – Think Carefully About
Creating New Dynamic Parameters
QUEUEING…	
  AGAIN
Waiting	
  for	
  good	
  URLs	
  to	
  be	
  
visited…	
  AGAIN
@dawnieando from	
  @MoveItMarketing #StateOfSearch
SEMANTIC	
  DRIFT
@dawnieando from	
  @MoveItMarketing #StateOfSearch
A DOG IS NOT ALWAYS A DOG
@dawnieando from	
  @MoveItMarketing #StateOfSearch
BIG TOPICAL
URL FISH IN
A SMALL
TOPICAL
POND
@dawnieando from	
  @MoveItMarketing #StateOfSearch
TERM-FREQUENCY INVERSE
DOCUMENT FREQUENCY
Architectural,	
  URL,	
  
software	
  &	
  content	
  cruft	
  
can	
  also	
  skew	
  term-­‐
frequency	
  inverse	
  
document	
  frequency
AND	
  THE	
  QUERY	
  CLUSTERS	
  DOCUMENTS	
  BELONG	
  TO
@dawnieando from	
  @MoveItMarketing #StateOfSearch
YOU INHERITED SEO TECHNICAL DEBT
• Previous	
  content	
  /	
  link	
  manual	
  actions
• Previous	
  algorithmic	
  suppressions
• Past	
  infinite	
  loops
• “We’ll	
  SEO	
  it	
  after	
  launch”
• “SEO	
  is	
  dead…	
  so	
  we	
  won’t	
  optimise”
• Dodgy	
  URL	
  parameters
• SEO	
  is	
  a	
  ‘one	
  time	
  audit’
• Misconfigured	
  URL	
  parameters
• Old	
  URL	
  crawling	
  ‘rules	
  /	
  hints’
@dawnieando from	
  @MoveItMarketing #StateOfSearch
CRAWLING PATTERNS ARE
DEVELOPED FOR EFFICIENCY
- CRAWLERS TAKES ‘HINTS’AND ‘HINT
RANGES’ (rules / patterns)
Help	
  Google	
  Build	
  ‘Crawling	
  
Rules’	
  for	
  your	
  site	
  rather	
  
than	
  wasting	
  time	
  on	
  
‘sampling’	
  and	
  giving	
  a	
  bad	
  
impression
GIVE	
  HELP	
  AND	
  
GUIDANCE	
  WITH	
  THE	
  
CRAWL	
  RULE	
  AND	
  
HINT	
  BUILDING
@dawnieando from	
  @MoveItMarketing #StateOfSearch
“REL=NEXT  /  REL  =  
PREV”  is  NOT a  form  
of  canonicalization
@dawnieando from	
  @MoveItMarketing #StateOfSearch
“301s  and  302s  are  
BOTH  forms  of  
canonicalization”
@dawnieando from	
  @MoveItMarketing #StateOfSearch
Href Lang  is  a  form  of
Canonicalization
(Internationalization)
@dawnieando from	
  @MoveItMarketing #StateOfSearch
History Log Records Include:
• URL	
  fingerprint
• Timestamp	
  (last	
  crawl	
  or	
  download	
  
attempt)
• Crawl	
  status	
  (success	
  or	
  error)	
  
(Response	
  code)
• Content	
  checksum	
  (binary	
  code)
• Source	
  ID	
  (accessed	
  from	
  cache	
  or	
  
downloaded)
• Segment	
  identifier	
  (Crawl	
  
segment	
  assigned	
  to??)
• Page	
  importance	
  (a	
  measure	
  of	
  
importance	
  assigned	
  to	
  the	
  URL)
@dawnieando from	
  @MoveItMarketing #StateOfSearch
INSTEAD	
  OF	
  REMOVE…	
  
CONSIDER…	
  DISTRACT	
  
&	
  ITERATIVELY
IMPROVE
@dawnieando from	
  @MoveItMarketing #StateOfSearch
SYSTEM	
  &	
  PEOPLE	
  
CRUFT
@dawnieando from	
  @MoveItMarketing #StateOfSearch
PEOPLE CHURN
INTERNAL	
  TEAM	
  
CHURN
EXTERNAL	
  
AGENCY	
  CHURN
“The average staff
turnover rate for
agencies is 17% each
year”
(Drum,	
  2017)
“We’ve Always Done It
This Way”
HIPPO
“What’s The Business Case?”
Mowarrr Data	
  Please
@dawnieando from	
  @MoveItMarketing #StateOfSearch
THINK CAREFULLY ABOUT URL CREATION
Not	
  EVERYTHING	
  is	
  
worthy	
  of	
  its	
  own	
  URL
VARIANTS
STEMMINGS
PLURALS
RANDOM	
  TAGS
LONG,	
  LONG,	
  LONG	
  
TAIL	
  PARAMETERS
@dawnieando from	
  @MoveItMarketing #StateOfSearch
’DANGLY’
NODES
AND
UNLINKED
SITES
@dawnieando from	
  @MoveItMarketing #StateOfSearch
A CAT IS NOT ALWAYS A CAT
@dawnieando from	
  @MoveItMarketing #StateOfSearch
ARE THEY CHIPS OR ARE THEY CRISPS?
@dawnieando from	
  @MoveItMarketing #StateOfSearch
MIXED CONTENT & MULTIPLE SITE
VERSIONS
http://www.itv.com/news/
@dawnieando from	
  @MoveItMarketing #StateOfSearch
MIXED
CONTENT &
MULTIPLE SITE
VERSIONS
http://www.itv.com/news/
BOTH	
  HTTP	
  &	
  
HTTPS	
  FIGHTING	
  
EACH	
  OTHER
@dawnieando from	
  @MoveItMarketing #StateOfSearch
ROGUE	
  INTERNAL	
  
LINKS	
  TO	
  PREVIOUS	
  
DOMAIN
@dawnieando from	
  @MoveItMarketing #StateOfSearch
410’s	
  DO	
  USE	
  CRAWL	
  BUDGET	
  (MAYBE	
  NOT	
  
TOO	
  MUCH	
  ON	
  REVISITS,	
  BUT	
  THESE	
  THINGS	
  
ADD	
  UP).	
  	
  THEY	
  ALSO	
  STILL	
  NEED	
  TO	
  BE	
  
DISCOVERED	
  WHICH	
  USES	
  BUDGET
https://twitter.com/dawnieando/status/906465965029969920
@dawnieando from	
  @MoveItMarketing #StateOfSearch
GENERATIONAL	
  
CRUFT	
  CAN	
  
SNOWBALL
• Past	
  infinite	
  loops
• Dodgy	
  URL	
  parameters
• Misconfigured	
  URL	
  parameters
• Old	
  URL	
  crawling	
  ‘rules	
  /	
  hints’
• Old	
  ‘importance	
  /	
  quality’	
  
scores
• Filtered	
  dupes	
  &	
  near-­‐dupes
• Mixed	
  messaging	
  canonicals
• 410s	
  still	
  being	
  revisited
• Internal	
  links	
  to	
  old	
  sites	
  /	
  
protocols
@dawnieando from	
  @MoveItMarketing #StateOfSearch
“Failure is simply a few
errors in judgement
repeated every day”
Jim Rohn
@dawnieando from	
  @MoveItMarketing #StateOfSearch
The Generational ’Snail Trail’
• Old	
  XML	
  sitemaps
• Redirects	
  drop	
  away	
  on	
  old	
  site	
  
.htaccess
• DNS	
  issues
• People	
  link	
  to	
  old	
  site	
  but	
  wrong	
  
protocol
• Old	
  sites	
  no	
  longer	
  verified	
  in	
  GSC
• Not	
  all	
  protocols	
  redirecting
Leaving	
  it’s	
  
slithery	
  	
  
footprint
@dawnieando from	
  @MoveItMarketing #StateOfSearch
History Log Records Include:
• URL	
  fingerprint
• Timestamp	
  (last	
  crawl	
  or	
  download	
  
attempt)
• Crawl	
  status	
  (success	
  or	
  error)	
  (Response	
  
code)
• Content	
  checksum	
  (binary	
  code)
• Source	
  ID	
  (accessed	
  from	
  cache	
  or	
  
downloaded)
• Segment	
  identifier	
  (Crawl	
  segment	
  assigned	
  
to??)
• Page	
  importance	
  (a	
  measure	
  of	
  importance	
  
assigned	
  to	
  the	
  URL)
May	
  be	
  
calculated	
  by	
  
identifying	
  
historical	
  
importance	
  
scores	
  based	
  on	
  
past	
  X	
  number	
  of	
  
crawls
@dawnieando from	
  @MoveItMarketing #StateOfSearch
EVERY	
  SINGLE	
  TIME	
  YOU	
  MIGRATE,	
  CHANGE	
  DESIGN,	
  REDIRECT,	
  REINVENT	
  A	
  SITE	
  /	
  URL
A	
  CLEAN	
  START
REDIRECTIONS
ANOTHER	
  STRUCTURE
FIRST	
  SITE	
  
STRUCTURE
NEW	
  CRAWLING	
  ‘RULES’	
  
BUILT
CRAWLING	
  
‘RULES’	
  BUILT
EVERYTHING	
  
IS	
  ‘200	
  OK’
MORE	
  URLs
MIXED	
  RESPONSE	
  CODES
REDIRECTIONS
‘FUZZINESS’	
  IS	
  EMERGING
NEW	
  CRAWLING	
  ‘RULES’	
  BUILT
MORE	
  URLs
REDIRECT	
  CHAINS	
  &	
  MIXED	
  
RESPONSE	
  CODES
NEW	
  SEO’s	
  DON’T	
  
KNOW	
  THE	
  ‘HISTORY’
TARGET	
  URLs	
  NOW	
  ‘VERY	
  FUZZY’
@dawnieando from	
  @MoveItMarketing #StateOfSearch
SOLUTION: Wiki
Page Redirects on
Topics
https://dbpedia.org/sparql
Wikipedia	
  
Redirects
@dawnieando from	
  @MoveItMarketing #StateOfSearch
Time Seems To Fly… The Older You Get
Your	
  new	
  site	
  URL	
  is	
  just	
  
one	
  of	
  very	
  many	
  historical	
  
URLs	
  on	
  your	
  IP	
  to	
  be	
  
visited	
  periodically
A	
  tiny	
  fish	
  in	
  a	
  very	
  
big	
  URL	
  pond	
  queue
@dawnieando from	
  @MoveItMarketing #StateOfSearch
@dawnieando from	
  @MoveItMarketing #StateOfSearch
A New Beginning
§ “A	
  new	
  website	
  will	
  solve	
  ALL	
  our	
  problems”
“Let’s	
  start	
  again”
“We’ll	
  just	
  migrate…	
  and	
  redirect	
  
everything”
@dawnieando from	
  @MoveItMarketing #StateOfSearch
A	
  LONG,	
  LONG	
  TIME	
  AGO
• You	
  need	
  to	
  go	
  right	
  back	
  to	
  the	
  beginning
• What	
  domains	
  did	
  the	
  organisation EVER	
  register?
• Where	
  do	
  they	
  redirect	
  to?
• Is	
  it	
  via	
  301,	
  302	
  or	
  are	
  they	
  merely	
  parked	
  domains?
• Who	
  would	
  know?	
  	
  Who	
  is	
  responsible?
• Verify	
  them	
  all	
  in	
  Google	
  Search	
  Console
• Some	
  of	
  these	
  may	
  EVEN	
  HAVE	
  PENALTIES	
  HISTORICALLY
• If	
  there	
  are	
  links	
  to	
  any	
  there	
  is	
  likely	
  still	
  crawling	
  activity	
  there
• Analyse logs	
  across	
  multiple	
  subdomains	
  &	
  protocols
@dawnieando from	
  @MoveItMarketing #StateOfSearch
SOME TYPES OF URL CRUFT
• INCORRECTLY	
  APPLIED	
  CANONICAL	
  
TAGS	
  
• CONFLICTING	
  HREF	
  LANG	
  &	
  
CANONICAL	
  TAGS
• MIXED	
  CONTENT
• URL	
  SHORTENERS
• SESSION	
  IDS
• UTM	
  TAGGING
• OLD	
  AJAX	
  FRAGMENTS
• PARAMETERS	
  FROM	
  MULTI	
  FACET	
  
DROP	
  DOWN	
  CHOICES
• .html,	
  .php,	
  .index.html,	
  .aspx
• LEGACY	
  URL	
  REWRITING	
  &	
  
PARAMETERS	
  IN	
  .HTACCESS	
  FILES
• LEGACY	
  FOLDERS	
  WHICH	
  CONTRIBUTE	
  
NO	
  MEANING	
  TO	
  SITE	
  ONTOLOGY
UNCRUFTY
www.myeasyurlwillmakeyouw
onder.com/resume
CRUFTY
www.myeasyurlwillmakeyouw
onder.com/resume.html
CRUFTY
http://nymag.com/scienceofus/2015/07/how-­‐
to-­‐recover-­‐from-­‐an-­‐all-­‐
nighter.html?om_rid=AAENcg&om_mid=_BTtF
a0B869PyJp&utm_content=buffer8fdd1&utm_
medium=social&utm_source=twitter.com&ut
m_campaign=buffer
‘RELATEDNESS’
(DISTRIBUTIONAL	
  SIMILARITY)
1st
level
relatedness
2nd
level	
  
relatedness
@dawnieando from	
  @MoveItMarketing #StateOfSearch
IT’S	
  VERY	
  
IMPORTANT…	
  
YOU	
  STAY	
  OUT	
  
OF	
  SERVER	
  
ERROR	
  STATUS
500
‘Try	
  again’	
  intervals	
  likely	
  extended	
  
between	
  each	
  failed	
  connection	
  
attempt
@dawnieando from	
  @MoveItMarketing #StateOfSearch
“Forever,
And ever,
And ever,
And ever…
You’ll be a
URL”
@dawnieando from	
  @MoveItMarketing #StateOfSearch
LEGACY ISSUES VIA CANONICALS OR
REDIRECTION (COMMON MISTAKES)
• PAGE	
  CANONICALIZED	
  TO	
  IS	
  NOT	
  A	
  SUPERSET	
  OR	
  
DUPLICATIVE	
  (IT	
  IS	
  NOT	
  RELEVANT	
  ENOUGH)
• 301s	
  TO	
  IRRELEVANT	
  PAGES	
  BECOME	
  SOFT	
  404
• FOLDING	
  UP	
  PRODUCT	
  PAGES	
  TO	
  CATEGORES	
  (PEOPLE	
  
WERE	
  LOOKING	
  FOR	
  A	
  SPECIFIC	
  PRODUCT)
• CANONICALIZATION	
  TO	
  PAGES	
  WHEN	
  IN	
  THE	
  FUTURE	
  
301	
  REDIRECT	
  TO	
  ANOTHER	
  URL	
  THEREFORE	
  NEGATING	
  
THE	
  PAGES	
  CANONICALIZING	
  TO	
  THEM
• CONFLICTS	
  BETWEEN	
  HREF	
  LANG	
  AND	
  
CANONICALIZATION
@dawnieando from	
  @MoveItMarketing #StateOfSearch
SOLUTION: Increase ‘Importance’ quickly of
target URLs
• Internal	
  link	
  optimization
• Canonicalise to	
  (if	
  relevant)
• Strengthen	
  up	
  importance	
  signals
• Inclusion	
  in	
  front	
  facing	
  HTML	
  and	
  XML	
  
sitemaps
• Improve	
  the	
  content	
  &	
  keep	
  it	
  updated
• 301	
  redirect	
  to	
  (if	
  relevant	
  redundant	
  
content)
• Topical	
  hubs	
  and	
  strong	
  information	
  
views	
  to	
  navigate	
  users	
  &	
  add	
  relevance
@dawnieando from	
  @MoveItMarketing #StateOfSearch
SOLUTION: Reduce ‘Importance’ quickly of old
URLs
• Internal	
  link	
  UNOPTIMIZATION
• 410
• Dig	
  out	
  URLs	
  with	
  links	
  to	
  them
• Orphan	
  URLs
• Canonicals	
  to	
  HTTPs
• EXCLUSION	
  from	
  XML	
  sitemaps	
  
(even	
  old	
  ones	
  on	
  the	
  server)
• Archiving	
  of	
  content
@dawnieando from	
  @MoveItMarketing #StateOfSearch
404	
  NOT	
  
FOUND
&	
  410	
  
GONE
§ “Of	
  course,	
  we	
  
won’t	
  redirect	
  
everything…”
§ “Not	
  everything	
  
will	
  be	
  worth	
  
redirecting”
@dawnieando from	
  @MoveItMarketing #StateOfSearch
“Usually	
  seeing	
  it	
  (410)	
  1-­‐2	
  
times	
  is	
  enough	
  for	
  us	
  to	
  drop	
  
those	
  URLs	
  from	
  the	
  index”	
  	
  
John	
  M	
  on	
  Google+
(https://plus.google.com/u/0/+JohnMueller/posts/NEsqE7Sr4Z4)
@dawnieando from	
  @MoveItMarketing #StateOfSearch
410 Likely Get Deindexed Quicker
https://plus.google.com/+JohnMueller/
posts/NEsqE7Sr4Z4
@dawnieando from	
  @MoveItMarketing #StateOfSearch
“404	
  vs	
  410	
  doesn't	
  affect	
  the	
  recrawl
rate:	
  we'll	
  still	
  occasionally	
  check	
  to	
  
see	
  if	
  these	
  pages	
  are	
  still	
  gone,	
  
espectially when	
  we	
  spot	
  a	
  new	
  link	
  to	
  
them”
John	
  Mueller,	
  Google+
2015
https://plus.google.com/u/0/+JohnMu
eller/posts/NEsqE7Sr4Z4
410 – DOES THAT PAGE NEED TO BE
REINDEXED?
@dawnieando from	
  @MoveItMarketing #StateOfSearch
The URL Generational ’Snail Trail’
• Old	
  XML	
  sitemaps
• Badly	
  coded	
  subcategory	
  &	
  attribute	
  parameters
• Redirects	
  drop	
  away	
  on	
  old	
  site	
  .htaccess
• Canonicalizing and	
  then	
  later	
  ‘301ing’	
  ‘context’	
  URL	
  (invalid	
  canonical)
• DNS	
  issues
• People	
  link	
  to	
  old	
  site	
  but	
  wrong	
  protocol
• Old	
  sites	
  not	
  verified	
  in	
  GSC
• Not	
  all	
  protocols	
  redirecting
• Relative	
  Wordpress URLs	
  appending	
  /wwws on	
  current	
  viewed	
  pages
• JS	
  fired	
  URLs	
  on	
  Language	
  drop	
  down	
  Internationalization	
  crawled
• Legacy	
  Ajax	
  issues	
  with	
  parts	
  of	
  page	
  content	
  pulled
• Canonical	
  URLs	
  NOT	
  a	
  superset	
  or	
  duplicate	
  of	
  canonicals	
  pointing	
  at	
  them
Leaving	
  it’s	
  
slithery	
  	
  
footprint
@dawnieando from	
  @MoveItMarketing #StateOfSearch
INSTEAD	
  OF	
  
REMOVE…	
  
CONSIDER…	
  
DISTRACT	
  &	
  
ITERATIVELY
IMPROVE
STRATEGIC	
  USE	
  OF	
  INTERNAL	
  LINK	
  
POPULARITY
REDUCE	
  IMPORTANCE	
  SIGNALS	
  
TO	
  DIFFERENT	
  PAGES
INCLUDE	
  IMPORTANT	
  PAGES	
  IN	
  
XML	
  SITEMAPS
EXCLUDE	
  LOW	
  IMPORTANCE	
  
PAGES	
  IN	
  XML	
  SITEMAPS
INCLUDE	
  IMPORTANT	
  PAGES	
  IN	
  
HTML	
  SITEMAPS
@dawnieando from	
  @MoveItMarketing #StateOfSearch
“404	
  vs	
  410	
  doesn't	
  affect	
  the	
  recrawl
rate:	
  we'll	
  still	
  occasionally	
  check	
  to	
  
see	
  if	
  these	
  pages	
  are	
  still	
  gone,	
  
especially	
  when	
  we	
  spot	
  a	
  new	
  link	
  to	
  
them”
John	
  Mueller,	
  Google+
2015
https://plus.google.com/u/0/+JohnMu
eller/posts/NEsqE7Sr4Z4
ESPECIALLY IF
THERE ARE
LINKS TO IT
2nd
level
relatedness
Share	
  common	
  words	
  they	
  
co-­‐occur	
  with	
  aside	
  from	
  
directly	
  co-­‐occurring	
  
together	
  (both	
  appear	
  in	
  
same	
  types	
  of	
  text	
  as	
  each	
  
other	
  ===	
  related
EXAMPLE:	
  	
  FURNACE	
  &	
  
OVEN	
  BOTH	
  SHARE	
  HEAT,	
  
MOTOR	
  &	
  ROAD,	
  CAR	
  &	
  
AUTOMOBILE	
  BOTH	
  SHARE	
  
PASSENGERS
CO-­‐OCCURRENCE	
  VECTORS
@dawnieando from	
  @MoveItMarketing #StateOfSearch
Aged ‘Patchwork Quilt’ Sites
A	
  LITTLE	
  BIT	
  OF	
  THIS	
  CMS	
  AND	
  A	
  
LITTLE	
  BIT	
  OF	
  THAT	
  CMS
MANY	
  HISTORICAL	
  PARAMETERS	
  
CREATED	
  &	
  CRAWLING	
  SAMPLE	
  
PATTERNS
@dawnieando from	
  @MoveItMarketing #StateOfSearch
LACK	
  OF	
  PROCESS	
  OR	
  
UNDERSTANDING
• Lack	
  of	
  process	
  or	
  understanding
• No	
  or	
  poor	
  documentation	
  to	
  work	
  to
• Insufficient	
  testing	
  facilities	
  &	
  staging	
  /	
  
optimizing	
  environments
• Lack	
  of	
  collaboration	
  between	
  depts
• Parallel	
  development	
  &	
  version	
  control	
  
issues	
  (too	
  much	
  happening)
• Small	
  improvements	
  left	
  till	
  last
• Business	
  pressures	
  /	
  business	
  case	
  demands
• Insufficient	
  ‘up	
  front’	
  definition	
  (scope	
  
creep)
LOTS	
  OF	
  
REASONS	
  
FOR	
  
TECHNICAL	
  
DEBT
@dawnieando from	
  @MoveItMarketing #StateOfSearch
A JAGUAR IS NOT
ALWAYS A JAGUAR
Disambiguation
TECHNICAL	
  DEBT	
  IS	
  NOT	
  ALWAYS	
  ABOUT	
  BAD	
  CODE
IT	
  OFTEN	
  COMES	
  
AS	
  A	
  RESULT	
  OF	
  
MINIMUM	
  VIABLE	
  
PRODUCT
@dawnieando from	
  @MoveItMarketing #StateOfSearch
THESE	
  THINGS	
  ADD	
  UP
THEY	
  ALSO	
  STILL	
  NEED	
  TO	
  BE	
  DISCOVERED	
  
WHICH	
  REQUIRES	
  INITIAL	
  CRAWLING
https://twitter.com/dawnieando/status/906465965029969920
LEGACY SITES COST
BOTH TO MAINTAIN &
IMPROVE
DOUBLE DEBT
DOUBLE INTEREST
@dawnieando from	
  @MoveItMarketing #StateOfSearch
REFERENCES
@dawnieando from	
  @MoveItMarketing #StateOfSearch
Sources & References
Bar-­‐Yossef,	
  Z.,	
  Keidar,	
  I.	
  and	
  Schonfeld,	
  U.,	
  2009.	
  Do	
  not	
  crawl	
  in	
  the	
  dust:	
  
different	
  urls with	
  similar	
  text. ACM	
  Transactions	
  on	
  the	
  Web	
  (TWEB), 3(1),	
  p.3
Broder,	
  A.Z.,	
  Najork,	
  M.	
  and	
  Wiener,	
  J.L.,	
  2003,	
  May.	
  Efficient	
  URL	
  caching	
  for	
  
world	
  wide	
  web	
  crawling.	
  In Proceedings	
  of	
  the	
  12th	
  international	
  conference	
  
on	
  World	
  Wide	
  Web (pp.	
  679-­‐689).	
  ACM
Broder,	
  A.,	
  Kumar,	
  R.,	
  Maghoul,	
  F.,	
  Raghavan,	
  P.,	
  Rajagopalan,	
  S.,	
  Stata,	
  R.,	
  
Tomkins,	
  A.	
  and	
  Wiener,	
  J.,	
  2000.	
  Graph	
  structure	
  in	
  the	
  web. Computer	
  
networks, 33(1),	
  pp.309-­‐320.
Cambazoglu,	
  B.B.	
  and	
  Baeza-­‐Yates,	
  R.,	
  2011.	
  Scalability	
  challenges	
  in	
  web	
  search	
  
engines.	
  In Advanced	
  topics	
  in	
  information	
  retrieval (pp.	
  27-­‐50).	
  Springer	
  Berlin	
  
Heidelberg.
Cho,	
  J.,	
  Garcia-­‐Molina,	
  H.	
  and	
  Page,	
  L.,	
  1998.	
  Efficient	
  crawling	
  through	
  URL	
  
ordering. Computer	
  Networks	
  and	
  ISDN	
  Systems, 30(1),	
  pp.161-­‐172
Fetterly,	
  D.,	
  Manasse,	
  M.,	
  Najork,	
  M.	
  and	
  Wiener,	
  J.,	
  2003,	
  May.	
  A	
  large-­‐scale	
  
study	
  of	
  the	
  evolution	
  of	
  web	
  pages.	
  In Proceedings	
  of	
  the	
  12th	
  international	
  
@dawnieando from	
  @MoveItMarketing #StateOfSearch
Sources & References
Grice,	
  T,	
  2017. Link	
  spam,	
  migration	
  disasters	
  and	
  Penguin	
  is	
  nowhere	
  to	
  be	
  seen	
  -­‐
Organic	
  growth	
  in	
  2017 [ONLINE]	
  Available	
  at: https://www.branded3.com/blog/link-­‐
spam-­‐migration-­‐disasters-­‐penguin-­‐organic-­‐growth-­‐2017/.	
  [Accessed	
  08	
  October	
  2017].
Olston,	
  C.	
  and	
  Najork,	
  M.,	
  2010.	
  Web	
  crawling. Foundations	
  and	
  Trends®	
  in	
  Information	
  
Retrieval, 4(3),	
  pp.175-­‐246.
Pandey,	
  S.	
  and	
  Olston,	
  C.,	
  2008,	
  February.	
  Crawl	
  ordering	
  by	
  search	
  impact.	
  
In Proceedings	
  of	
  the	
  2008	
  International	
  Conference	
  on	
  Web	
  Search	
  and	
  Data	
  
Mining (pp.	
  3-­‐14).	
  ACM.
Olston,	
  C.	
  and	
  Pandey,	
  S.,	
  2008,	
  April.	
  Recrawl scheduling	
  based	
  on	
  information	
  
longevity.	
  In Proceedings	
  of	
  the	
  17th	
  international	
  conference	
  on	
  World	
  Wide	
  Web (pp.	
  
437-­‐446).	
  ACM
Pandey,	
  S.	
  and	
  Olston,	
  C.,	
  2005,	
  May.	
  User-­‐centric	
  web	
  crawling.	
  In Proceedings	
  of	
  the	
  
14th	
  international	
  conference	
  on	
  World	
  Wide	
  Web (pp.	
  401-­‐411).	
  ACM.
Pandey,	
  S.	
  and	
  Olston,	
  C.,	
  2008,	
  February.	
  Crawl	
  ordering	
  by	
  search	
  impact.	
  
In Proceedings	
  of	
  the	
  2008	
  International	
  Conference	
  on	
  Web	
  Search	
  and	
  Data	
  
Mining (pp.	
  3-­‐14).	
  ACM
@dawnieando from	
  @MoveItMarketing #StateOfSearch
Sources & References
martinfowler.com.	
  2009. TechnicalDebtQuadrant.	
  [ONLINE]	
  Available	
  
at: https://martinfowler.com/bliki/TechnicalDebtQuadrant.html.	
  [Accessed	
  03	
  October	
  2017].
https://martinfowler.com/bliki/TechnicalDebtQuadrant.html
Malte Ubi on	
  Twitter	
  -­‐ https://twitter.com/cramforce/status/897502737268592640
Is	
  Technology	
  Debt	
  Bankrupting	
  Your	
  Competitiveness	
  – Accenture	
  2017	
  -­‐
https://www.accenture.com/t20170504T221347__w__/ie-­‐en/_acnmedia/PDF-­‐43/Accenture-­‐
Strategy-­‐Technology-­‐Debt-­‐PoV.pdf
Project	
  Management	
  Certification.	
  2015. Project	
  Failure	
  -­‐ Why	
  Projects	
  Fail	
  So	
  Often.	
  [ONLINE]	
  
Available	
  at: http://4pm.com/2015/09/27/project-­‐failure/.	
  [Accessed	
  30	
  September	
  2017].
https://patentimages.storage.googleapis.com/US8042112B1/US08042112-­‐20111018-­‐D00000.png
Randall,	
  K.H.,	
  Google	
  Inc.,	
  2010. Scheduler	
  for	
  search	
  engine	
  crawler.	
  U.S.	
  Patent	
  7,725,452.
https://patentimages.storage.googleapis.com/US8042112B1/US08042112-­‐20111018-­‐D00000.png
Randall,	
  K.H.,	
  Google	
  Inc.,	
  2010. Scheduler	
  for	
  search	
  engine	
  crawler.	
  U.S.	
  Patent	
  7,725,452.
@dawnieando from	
  @MoveItMarketing #StateOfSearch
Sources & References
The	
  Drum.	
  2017. On	
  trend?	
  The	
  Wow	
  Company	
  reports	
  on	
  what	
  the	
  average	
  UK	
  agency	
  
looks	
  like	
  |	
  The	
  Drum.	
  [ONLINE]	
  Available	
  
at: http://www.thedrum.com/opinion/2017/04/12/trend-­‐the-­‐wow-­‐company-­‐reports-­‐
what-­‐the-­‐average-­‐uk-­‐agency-­‐looks.	
  [Accessed	
  28	
  September	
  2017].

Contenu connexe

Tendances

From Web Site to Web App: Fantastic Optimisations and Where To Find Them
From Web Site to Web App: Fantastic Optimisations and Where To Find ThemFrom Web Site to Web App: Fantastic Optimisations and Where To Find Them
From Web Site to Web App: Fantastic Optimisations and Where To Find ThemMobileMoxie
 
BrightonSEO Takeaways September 2017
BrightonSEO Takeaways September 2017BrightonSEO Takeaways September 2017
BrightonSEO Takeaways September 2017Semrush
 
Rendering SEO Manifesto - Why we need to go beyond JavaScript SEO
Rendering SEO Manifesto - Why we need to go beyond JavaScript SEORendering SEO Manifesto - Why we need to go beyond JavaScript SEO
Rendering SEO Manifesto - Why we need to go beyond JavaScript SEOOnely
 
Setting AMP for Success at #BrightonSEO
Setting AMP for Success at #BrightonSEOSetting AMP for Success at #BrightonSEO
Setting AMP for Success at #BrightonSEOAleyda Solís
 
David Lockie 'Using Open Source to Speed Up your Roadmap' BrightonSEO 2017
David Lockie 'Using Open Source to Speed Up your Roadmap' BrightonSEO 2017David Lockie 'Using Open Source to Speed Up your Roadmap' BrightonSEO 2017
David Lockie 'Using Open Source to Speed Up your Roadmap' BrightonSEO 2017Angry Creative (UK)
 
SearchLove Boston 2018 - Tom Anthony - Hacking Google: what you can learn fro...
SearchLove Boston 2018 - Tom Anthony - Hacking Google: what you can learn fro...SearchLove Boston 2018 - Tom Anthony - Hacking Google: what you can learn fro...
SearchLove Boston 2018 - Tom Anthony - Hacking Google: what you can learn fro...Distilled
 
Technical SEO - Gone is Never Gone - Fixing Generational Cruft and Technical ...
Technical SEO - Gone is Never Gone - Fixing Generational Cruft and Technical ...Technical SEO - Gone is Never Gone - Fixing Generational Cruft and Technical ...
Technical SEO - Gone is Never Gone - Fixing Generational Cruft and Technical ...Dawn Anderson MSc DigM
 
Perfect Starts: How to Get the Right Traffic with a Content Audit
Perfect Starts: How to Get the Right Traffic with a Content AuditPerfect Starts: How to Get the Right Traffic with a Content Audit
Perfect Starts: How to Get the Right Traffic with a Content AuditMichael King
 
BrightonSEO - How to use XPath with eCommerce Websites
BrightonSEO - How to use XPath with eCommerce WebsitesBrightonSEO - How to use XPath with eCommerce Websites
BrightonSEO - How to use XPath with eCommerce WebsitesJanet Plumpton
 
Matching Keywords to Pages - Information Architecture
Matching Keywords to Pages - Information ArchitectureMatching Keywords to Pages - Information Architecture
Matching Keywords to Pages - Information ArchitectureDominic Woodman
 
Moving URLs: Structural Web changes 
without losing rankings #SearchLove
Moving URLs: Structural Web changes 
without losing rankings #SearchLoveMoving URLs: Structural Web changes 
without losing rankings #SearchLove
Moving URLs: Structural Web changes 
without losing rankings #SearchLoveAleyda Solís
 
Building your outreach machine
Building your outreach machineBuilding your outreach machine
Building your outreach machineMichael King
 
TechSEO Boost - Apps script for SEOs
TechSEO Boost - Apps script for SEOsTechSEO Boost - Apps script for SEOs
TechSEO Boost - Apps script for SEOsDavid Sottimano
 
SEO Cannibalisation of Your Own SEO Success
SEO Cannibalisation of Your Own SEO SuccessSEO Cannibalisation of Your Own SEO Success
SEO Cannibalisation of Your Own SEO SuccessDawn Anderson MSc DigM
 
How to Improve Your Website's Indexation - Sean Butcher Brighton SEO Present...
How to Improve Your Website's Indexation  - Sean Butcher Brighton SEO Present...How to Improve Your Website's Indexation  - Sean Butcher Brighton SEO Present...
How to Improve Your Website's Indexation - Sean Butcher Brighton SEO Present...Sean Butcher
 
PWAs: Why you want one and how to optimize them #SearchY
PWAs: Why you want one and how to optimize them #SearchYPWAs: Why you want one and how to optimize them #SearchY
PWAs: Why you want one and how to optimize them #SearchYAleyda Solís
 
The Technical Seo Renaissance - Mike King
 The Technical Seo Renaissance - Mike King   The Technical Seo Renaissance - Mike King
The Technical Seo Renaissance - Mike King Glen Dimaandal
 
PubCon Last Vegas 2015 - Editing AdWords Scripts
PubCon Last Vegas 2015 - Editing AdWords ScriptsPubCon Last Vegas 2015 - Editing AdWords Scripts
PubCon Last Vegas 2015 - Editing AdWords ScriptsChristi Olson
 
SearchLove Boston 2016 | Mary Bowling | Local Search Experience Optimization
SearchLove Boston 2016 | Mary Bowling | Local Search Experience OptimizationSearchLove Boston 2016 | Mary Bowling | Local Search Experience Optimization
SearchLove Boston 2016 | Mary Bowling | Local Search Experience OptimizationDistilled
 
Modern Day Link Building by Jon Cooper
Modern Day Link Building by Jon CooperModern Day Link Building by Jon Cooper
Modern Day Link Building by Jon CooperGlen Dimaandal
 

Tendances (20)

From Web Site to Web App: Fantastic Optimisations and Where To Find Them
From Web Site to Web App: Fantastic Optimisations and Where To Find ThemFrom Web Site to Web App: Fantastic Optimisations and Where To Find Them
From Web Site to Web App: Fantastic Optimisations and Where To Find Them
 
BrightonSEO Takeaways September 2017
BrightonSEO Takeaways September 2017BrightonSEO Takeaways September 2017
BrightonSEO Takeaways September 2017
 
Rendering SEO Manifesto - Why we need to go beyond JavaScript SEO
Rendering SEO Manifesto - Why we need to go beyond JavaScript SEORendering SEO Manifesto - Why we need to go beyond JavaScript SEO
Rendering SEO Manifesto - Why we need to go beyond JavaScript SEO
 
Setting AMP for Success at #BrightonSEO
Setting AMP for Success at #BrightonSEOSetting AMP for Success at #BrightonSEO
Setting AMP for Success at #BrightonSEO
 
David Lockie 'Using Open Source to Speed Up your Roadmap' BrightonSEO 2017
David Lockie 'Using Open Source to Speed Up your Roadmap' BrightonSEO 2017David Lockie 'Using Open Source to Speed Up your Roadmap' BrightonSEO 2017
David Lockie 'Using Open Source to Speed Up your Roadmap' BrightonSEO 2017
 
SearchLove Boston 2018 - Tom Anthony - Hacking Google: what you can learn fro...
SearchLove Boston 2018 - Tom Anthony - Hacking Google: what you can learn fro...SearchLove Boston 2018 - Tom Anthony - Hacking Google: what you can learn fro...
SearchLove Boston 2018 - Tom Anthony - Hacking Google: what you can learn fro...
 
Technical SEO - Gone is Never Gone - Fixing Generational Cruft and Technical ...
Technical SEO - Gone is Never Gone - Fixing Generational Cruft and Technical ...Technical SEO - Gone is Never Gone - Fixing Generational Cruft and Technical ...
Technical SEO - Gone is Never Gone - Fixing Generational Cruft and Technical ...
 
Perfect Starts: How to Get the Right Traffic with a Content Audit
Perfect Starts: How to Get the Right Traffic with a Content AuditPerfect Starts: How to Get the Right Traffic with a Content Audit
Perfect Starts: How to Get the Right Traffic with a Content Audit
 
BrightonSEO - How to use XPath with eCommerce Websites
BrightonSEO - How to use XPath with eCommerce WebsitesBrightonSEO - How to use XPath with eCommerce Websites
BrightonSEO - How to use XPath with eCommerce Websites
 
Matching Keywords to Pages - Information Architecture
Matching Keywords to Pages - Information ArchitectureMatching Keywords to Pages - Information Architecture
Matching Keywords to Pages - Information Architecture
 
Moving URLs: Structural Web changes 
without losing rankings #SearchLove
Moving URLs: Structural Web changes 
without losing rankings #SearchLoveMoving URLs: Structural Web changes 
without losing rankings #SearchLove
Moving URLs: Structural Web changes 
without losing rankings #SearchLove
 
Building your outreach machine
Building your outreach machineBuilding your outreach machine
Building your outreach machine
 
TechSEO Boost - Apps script for SEOs
TechSEO Boost - Apps script for SEOsTechSEO Boost - Apps script for SEOs
TechSEO Boost - Apps script for SEOs
 
SEO Cannibalisation of Your Own SEO Success
SEO Cannibalisation of Your Own SEO SuccessSEO Cannibalisation of Your Own SEO Success
SEO Cannibalisation of Your Own SEO Success
 
How to Improve Your Website's Indexation - Sean Butcher Brighton SEO Present...
How to Improve Your Website's Indexation  - Sean Butcher Brighton SEO Present...How to Improve Your Website's Indexation  - Sean Butcher Brighton SEO Present...
How to Improve Your Website's Indexation - Sean Butcher Brighton SEO Present...
 
PWAs: Why you want one and how to optimize them #SearchY
PWAs: Why you want one and how to optimize them #SearchYPWAs: Why you want one and how to optimize them #SearchY
PWAs: Why you want one and how to optimize them #SearchY
 
The Technical Seo Renaissance - Mike King
 The Technical Seo Renaissance - Mike King   The Technical Seo Renaissance - Mike King
The Technical Seo Renaissance - Mike King
 
PubCon Last Vegas 2015 - Editing AdWords Scripts
PubCon Last Vegas 2015 - Editing AdWords ScriptsPubCon Last Vegas 2015 - Editing AdWords Scripts
PubCon Last Vegas 2015 - Editing AdWords Scripts
 
SearchLove Boston 2016 | Mary Bowling | Local Search Experience Optimization
SearchLove Boston 2016 | Mary Bowling | Local Search Experience OptimizationSearchLove Boston 2016 | Mary Bowling | Local Search Experience Optimization
SearchLove Boston 2016 | Mary Bowling | Local Search Experience Optimization
 
Modern Day Link Building by Jon Cooper
Modern Day Link Building by Jon CooperModern Day Link Building by Jon Cooper
Modern Day Link Building by Jon Cooper
 

Similaire à Cruft busting technical debt code smell and refactoring for seo - state of search

Mobile Visibility to the Max - 2016 Edition #BigDigitalADL
Mobile Visibility to the Max - 2016 Edition #BigDigitalADLMobile Visibility to the Max - 2016 Edition #BigDigitalADL
Mobile Visibility to the Max - 2016 Edition #BigDigitalADLAleyda Solís
 
Leveraging the powers of Structured Data ✨
Leveraging the powers of Structured Data ✨Leveraging the powers of Structured Data ✨
Leveraging the powers of Structured Data ✨Izzi Smith
 
A Technical SEO Journey: Insights from Logs, hreflang & Mobile Analysis #SMXl...
A Technical SEO Journey: Insights from Logs, hreflang & Mobile Analysis #SMXl...A Technical SEO Journey: Insights from Logs, hreflang & Mobile Analysis #SMXl...
A Technical SEO Journey: Insights from Logs, hreflang & Mobile Analysis #SMXl...Aleyda Solís
 
Mobile-First Indexing or a Whole New Google - Digitalzone 2018
Mobile-First Indexing or a Whole New Google - Digitalzone 2018Mobile-First Indexing or a Whole New Google - Digitalzone 2018
Mobile-First Indexing or a Whole New Google - Digitalzone 2018MobileMoxie
 
Cindy Krum-Digitalzone18
Cindy Krum-Digitalzone18Cindy Krum-Digitalzone18
Cindy Krum-Digitalzone18Zeo
 
Cross Functional SEO at #UKMarketingDay
Cross Functional SEO at #UKMarketingDay Cross Functional SEO at #UKMarketingDay
Cross Functional SEO at #UKMarketingDay Aleyda Solís
 
BrightEdge Share15 - S302: Beyond the Algorithm – Advanced SEO & Technical Tr...
BrightEdge Share15 - S302: Beyond the Algorithm – Advanced SEO & Technical Tr...BrightEdge Share15 - S302: Beyond the Algorithm – Advanced SEO & Technical Tr...
BrightEdge Share15 - S302: Beyond the Algorithm – Advanced SEO & Technical Tr...BrightEdge Technologies
 
Cross-Functional SEO at #SearchFest
Cross-Functional SEO at #SearchFestCross-Functional SEO at #SearchFest
Cross-Functional SEO at #SearchFestAleyda Solís
 
The Technical Marketer Toolbox in 2015 at #BrightonSEO
The Technical Marketer Toolbox in 2015 at #BrightonSEOThe Technical Marketer Toolbox in 2015 at #BrightonSEO
The Technical Marketer Toolbox in 2015 at #BrightonSEOAleyda Solís
 
International SEO: How to Grow your Online Business Abroad #WAQ19
International SEO: How to Grow your Online Business Abroad #WAQ19 International SEO: How to Grow your Online Business Abroad #WAQ19
International SEO: How to Grow your Online Business Abroad #WAQ19 Aleyda Solís
 
How to Perform a Mobile Web & App SEO Audit: Key Criteria, Validations & Tools
How to Perform a Mobile Web & App SEO Audit: Key Criteria, Validations & ToolsHow to Perform a Mobile Web & App SEO Audit: Key Criteria, Validations & Tools
How to Perform a Mobile Web & App SEO Audit: Key Criteria, Validations & ToolsAleyda Solís
 
How to Win SEO in Complex Web Migrations Scenarios #YoastCon
How to Win SEO in Complex Web Migrations Scenarios #YoastConHow to Win SEO in Complex Web Migrations Scenarios #YoastCon
How to Win SEO in Complex Web Migrations Scenarios #YoastConAleyda Solís
 
Roles & Responsibilities on a Web Team
Roles & Responsibilities on a Web TeamRoles & Responsibilities on a Web Team
Roles & Responsibilities on a Web TeamShane Diffily
 
The Importance Of Brand Building For Search - Malcolm Slade at Figaro Digital...
The Importance Of Brand Building For Search - Malcolm Slade at Figaro Digital...The Importance Of Brand Building For Search - Malcolm Slade at Figaro Digital...
The Importance Of Brand Building For Search - Malcolm Slade at Figaro Digital...Epiphany
 
Creating Effective Ecommerce Information Architecture #SearchLove 2018
Creating Effective Ecommerce Information Architecture #SearchLove 2018Creating Effective Ecommerce Information Architecture #SearchLove 2018
Creating Effective Ecommerce Information Architecture #SearchLove 2018Jamie Indigo
 
SearchLove London 2018 - Jamie Alberico - Creating Effective ecommerce Inform...
SearchLove London 2018 - Jamie Alberico - Creating Effective ecommerce Inform...SearchLove London 2018 - Jamie Alberico - Creating Effective ecommerce Inform...
SearchLove London 2018 - Jamie Alberico - Creating Effective ecommerce Inform...Distilled
 
Making Digital Governance Work (JBoye)
Making Digital Governance Work (JBoye)Making Digital Governance Work (JBoye)
Making Digital Governance Work (JBoye)Shane Diffily
 
We’ve analysed the SEO of over 100 eCom sites - this is what we’ve learned!
We’ve analysed the SEO of over 100 eCom sites - this is what we’ve learned!We’ve analysed the SEO of over 100 eCom sites - this is what we’ve learned!
We’ve analysed the SEO of over 100 eCom sites - this is what we’ve learned!DanielCartland
 
SEO Low hanging Fruit: Identifying SEO Opportunities to Achieve Results Fast ...
SEO Low hanging Fruit: Identifying SEO Opportunities to Achieve Results Fast ...SEO Low hanging Fruit: Identifying SEO Opportunities to Achieve Results Fast ...
SEO Low hanging Fruit: Identifying SEO Opportunities to Achieve Results Fast ...Aleyda Solís
 
Optimizing for Mobile First Index
Optimizing for Mobile First IndexOptimizing for Mobile First Index
Optimizing for Mobile First IndexJamie Indigo
 

Similaire à Cruft busting technical debt code smell and refactoring for seo - state of search (20)

Mobile Visibility to the Max - 2016 Edition #BigDigitalADL
Mobile Visibility to the Max - 2016 Edition #BigDigitalADLMobile Visibility to the Max - 2016 Edition #BigDigitalADL
Mobile Visibility to the Max - 2016 Edition #BigDigitalADL
 
Leveraging the powers of Structured Data ✨
Leveraging the powers of Structured Data ✨Leveraging the powers of Structured Data ✨
Leveraging the powers of Structured Data ✨
 
A Technical SEO Journey: Insights from Logs, hreflang & Mobile Analysis #SMXl...
A Technical SEO Journey: Insights from Logs, hreflang & Mobile Analysis #SMXl...A Technical SEO Journey: Insights from Logs, hreflang & Mobile Analysis #SMXl...
A Technical SEO Journey: Insights from Logs, hreflang & Mobile Analysis #SMXl...
 
Mobile-First Indexing or a Whole New Google - Digitalzone 2018
Mobile-First Indexing or a Whole New Google - Digitalzone 2018Mobile-First Indexing or a Whole New Google - Digitalzone 2018
Mobile-First Indexing or a Whole New Google - Digitalzone 2018
 
Cindy Krum-Digitalzone18
Cindy Krum-Digitalzone18Cindy Krum-Digitalzone18
Cindy Krum-Digitalzone18
 
Cross Functional SEO at #UKMarketingDay
Cross Functional SEO at #UKMarketingDay Cross Functional SEO at #UKMarketingDay
Cross Functional SEO at #UKMarketingDay
 
BrightEdge Share15 - S302: Beyond the Algorithm – Advanced SEO & Technical Tr...
BrightEdge Share15 - S302: Beyond the Algorithm – Advanced SEO & Technical Tr...BrightEdge Share15 - S302: Beyond the Algorithm – Advanced SEO & Technical Tr...
BrightEdge Share15 - S302: Beyond the Algorithm – Advanced SEO & Technical Tr...
 
Cross-Functional SEO at #SearchFest
Cross-Functional SEO at #SearchFestCross-Functional SEO at #SearchFest
Cross-Functional SEO at #SearchFest
 
The Technical Marketer Toolbox in 2015 at #BrightonSEO
The Technical Marketer Toolbox in 2015 at #BrightonSEOThe Technical Marketer Toolbox in 2015 at #BrightonSEO
The Technical Marketer Toolbox in 2015 at #BrightonSEO
 
International SEO: How to Grow your Online Business Abroad #WAQ19
International SEO: How to Grow your Online Business Abroad #WAQ19 International SEO: How to Grow your Online Business Abroad #WAQ19
International SEO: How to Grow your Online Business Abroad #WAQ19
 
How to Perform a Mobile Web & App SEO Audit: Key Criteria, Validations & Tools
How to Perform a Mobile Web & App SEO Audit: Key Criteria, Validations & ToolsHow to Perform a Mobile Web & App SEO Audit: Key Criteria, Validations & Tools
How to Perform a Mobile Web & App SEO Audit: Key Criteria, Validations & Tools
 
How to Win SEO in Complex Web Migrations Scenarios #YoastCon
How to Win SEO in Complex Web Migrations Scenarios #YoastConHow to Win SEO in Complex Web Migrations Scenarios #YoastCon
How to Win SEO in Complex Web Migrations Scenarios #YoastCon
 
Roles & Responsibilities on a Web Team
Roles & Responsibilities on a Web TeamRoles & Responsibilities on a Web Team
Roles & Responsibilities on a Web Team
 
The Importance Of Brand Building For Search - Malcolm Slade at Figaro Digital...
The Importance Of Brand Building For Search - Malcolm Slade at Figaro Digital...The Importance Of Brand Building For Search - Malcolm Slade at Figaro Digital...
The Importance Of Brand Building For Search - Malcolm Slade at Figaro Digital...
 
Creating Effective Ecommerce Information Architecture #SearchLove 2018
Creating Effective Ecommerce Information Architecture #SearchLove 2018Creating Effective Ecommerce Information Architecture #SearchLove 2018
Creating Effective Ecommerce Information Architecture #SearchLove 2018
 
SearchLove London 2018 - Jamie Alberico - Creating Effective ecommerce Inform...
SearchLove London 2018 - Jamie Alberico - Creating Effective ecommerce Inform...SearchLove London 2018 - Jamie Alberico - Creating Effective ecommerce Inform...
SearchLove London 2018 - Jamie Alberico - Creating Effective ecommerce Inform...
 
Making Digital Governance Work (JBoye)
Making Digital Governance Work (JBoye)Making Digital Governance Work (JBoye)
Making Digital Governance Work (JBoye)
 
We’ve analysed the SEO of over 100 eCom sites - this is what we’ve learned!
We’ve analysed the SEO of over 100 eCom sites - this is what we’ve learned!We’ve analysed the SEO of over 100 eCom sites - this is what we’ve learned!
We’ve analysed the SEO of over 100 eCom sites - this is what we’ve learned!
 
SEO Low hanging Fruit: Identifying SEO Opportunities to Achieve Results Fast ...
SEO Low hanging Fruit: Identifying SEO Opportunities to Achieve Results Fast ...SEO Low hanging Fruit: Identifying SEO Opportunities to Achieve Results Fast ...
SEO Low hanging Fruit: Identifying SEO Opportunities to Achieve Results Fast ...
 
Optimizing for Mobile First Index
Optimizing for Mobile First IndexOptimizing for Mobile First Index
Optimizing for Mobile First Index
 

Plus de Dawn Anderson MSc DigM

Human vs AI Quality Raters for Search Engines.pdf
Human vs AI Quality Raters for Search Engines.pdfHuman vs AI Quality Raters for Search Engines.pdf
Human vs AI Quality Raters for Search Engines.pdfDawn Anderson MSc DigM
 
Life of An SEO - Surfing The Waves of Googles Many Algorithmic Updates
Life of An SEO - Surfing The Waves of Googles Many Algorithmic UpdatesLife of An SEO - Surfing The Waves of Googles Many Algorithmic Updates
Life of An SEO - Surfing The Waves of Googles Many Algorithmic UpdatesDawn Anderson MSc DigM
 
Natural Semantic SEO - Surfacing Walnuts in Densely Represented, Every Increa...
Natural Semantic SEO - Surfacing Walnuts in Densely Represented, Every Increa...Natural Semantic SEO - Surfacing Walnuts in Densely Represented, Every Increa...
Natural Semantic SEO - Surfacing Walnuts in Densely Represented, Every Increa...Dawn Anderson MSc DigM
 
Passage indexing is likely more important than you think
Passage indexing is likely more important than you thinkPassage indexing is likely more important than you think
Passage indexing is likely more important than you thinkDawn Anderson MSc DigM
 
Zipfs Law & Zipfian Distribution in SEO - Pubcon Virtual Fall 2020 - Dawn And...
Zipfs Law & Zipfian Distribution in SEO - Pubcon Virtual Fall 2020 - Dawn And...Zipfs Law & Zipfian Distribution in SEO - Pubcon Virtual Fall 2020 - Dawn And...
Zipfs Law & Zipfian Distribution in SEO - Pubcon Virtual Fall 2020 - Dawn And...Dawn Anderson MSc DigM
 
Google BERT - SMX London 2020 Virtual Conference
Google BERT - SMX London 2020 Virtual ConferenceGoogle BERT - SMX London 2020 Virtual Conference
Google BERT - SMX London 2020 Virtual ConferenceDawn Anderson MSc DigM
 
Google BERT - What SEOs and Marketers Need to Know
Google BERT - What SEOs and Marketers Need to KnowGoogle BERT - What SEOs and Marketers Need to Know
Google BERT - What SEOs and Marketers Need to KnowDawn Anderson MSc DigM
 
Disambiguating Equiprobability in SEO Dawn Anderson Friends of Search 2020
Disambiguating Equiprobability in SEO Dawn Anderson Friends of Search 2020Disambiguating Equiprobability in SEO Dawn Anderson Friends of Search 2020
Disambiguating Equiprobability in SEO Dawn Anderson Friends of Search 2020Dawn Anderson MSc DigM
 
2019 Tech SEO Boost Dawn Anderson Contextual Recommender Search
2019 Tech SEO Boost Dawn Anderson Contextual Recommender Search2019 Tech SEO Boost Dawn Anderson Contextual Recommender Search
2019 Tech SEO Boost Dawn Anderson Contextual Recommender SearchDawn Anderson MSc DigM
 
Connecting The Worlds of Information Retrieval & SEO - Search solutions 2019 ...
Connecting The Worlds of Information Retrieval & SEO - Search solutions 2019 ...Connecting The Worlds of Information Retrieval & SEO - Search solutions 2019 ...
Connecting The Worlds of Information Retrieval & SEO - Search solutions 2019 ...Dawn Anderson MSc DigM
 
Planning an SEO Strategy for a New Website - SMXL Milan 2019
Planning an SEO Strategy for a New Website - SMXL Milan 2019Planning an SEO Strategy for a New Website - SMXL Milan 2019
Planning an SEO Strategy for a New Website - SMXL Milan 2019Dawn Anderson MSc DigM
 
Google BERT and Family and the Natural Language Understanding Leaderboard Race
Google BERT and Family and the Natural Language Understanding Leaderboard RaceGoogle BERT and Family and the Natural Language Understanding Leaderboard Race
Google BERT and Family and the Natural Language Understanding Leaderboard RaceDawn Anderson MSc DigM
 
The User is the Query - The Rise of Predictive Proactive Search
The User is the Query - The Rise of Predictive Proactive SearchThe User is the Query - The Rise of Predictive Proactive Search
The User is the Query - The Rise of Predictive Proactive SearchDawn Anderson MSc DigM
 
Natural Language Processing and Search Intent Understanding C3 Conductor 2019...
Natural Language Processing and Search Intent Understanding C3 Conductor 2019...Natural Language Processing and Search Intent Understanding C3 Conductor 2019...
Natural Language Processing and Search Intent Understanding C3 Conductor 2019...Dawn Anderson MSc DigM
 
Using topic modelling frameworks for NLP and semantic search
Using topic modelling frameworks for NLP and semantic searchUsing topic modelling frameworks for NLP and semantic search
Using topic modelling frameworks for NLP and semantic searchDawn Anderson MSc DigM
 
Voice Search and Conversation Action Assistive Systems - Challenges & Opportu...
Voice Search and Conversation Action Assistive Systems - Challenges & Opportu...Voice Search and Conversation Action Assistive Systems - Challenges & Opportu...
Voice Search and Conversation Action Assistive Systems - Challenges & Opportu...Dawn Anderson MSc DigM
 
The Iceberg Approach - Power from what lies beneath in SEO for a mobile-first...
The Iceberg Approach - Power from what lies beneath in SEO for a mobile-first...The Iceberg Approach - Power from what lies beneath in SEO for a mobile-first...
The Iceberg Approach - Power from what lies beneath in SEO for a mobile-first...Dawn Anderson MSc DigM
 
SEO and The Mobile-First Paradigm Shift
SEO and The Mobile-First Paradigm ShiftSEO and The Mobile-First Paradigm Shift
SEO and The Mobile-First Paradigm ShiftDawn Anderson MSc DigM
 

Plus de Dawn Anderson MSc DigM (20)

Human vs AI Quality Raters for Search Engines.pdf
Human vs AI Quality Raters for Search Engines.pdfHuman vs AI Quality Raters for Search Engines.pdf
Human vs AI Quality Raters for Search Engines.pdf
 
Life of An SEO - Surfing The Waves of Googles Many Algorithmic Updates
Life of An SEO - Surfing The Waves of Googles Many Algorithmic UpdatesLife of An SEO - Surfing The Waves of Googles Many Algorithmic Updates
Life of An SEO - Surfing The Waves of Googles Many Algorithmic Updates
 
Natural Semantic SEO - Surfacing Walnuts in Densely Represented, Every Increa...
Natural Semantic SEO - Surfacing Walnuts in Densely Represented, Every Increa...Natural Semantic SEO - Surfacing Walnuts in Densely Represented, Every Increa...
Natural Semantic SEO - Surfacing Walnuts in Densely Represented, Every Increa...
 
Passage indexing is likely more important than you think
Passage indexing is likely more important than you thinkPassage indexing is likely more important than you think
Passage indexing is likely more important than you think
 
Zipfs Law & Zipfian Distribution in SEO - Pubcon Virtual Fall 2020 - Dawn And...
Zipfs Law & Zipfian Distribution in SEO - Pubcon Virtual Fall 2020 - Dawn And...Zipfs Law & Zipfian Distribution in SEO - Pubcon Virtual Fall 2020 - Dawn And...
Zipfs Law & Zipfian Distribution in SEO - Pubcon Virtual Fall 2020 - Dawn And...
 
Google BERT - SMX London 2020 Virtual Conference
Google BERT - SMX London 2020 Virtual ConferenceGoogle BERT - SMX London 2020 Virtual Conference
Google BERT - SMX London 2020 Virtual Conference
 
Google BERT - What SEOs and Marketers Need to Know
Google BERT - What SEOs and Marketers Need to KnowGoogle BERT - What SEOs and Marketers Need to Know
Google BERT - What SEOs and Marketers Need to Know
 
Disambiguating Equiprobability in SEO Dawn Anderson Friends of Search 2020
Disambiguating Equiprobability in SEO Dawn Anderson Friends of Search 2020Disambiguating Equiprobability in SEO Dawn Anderson Friends of Search 2020
Disambiguating Equiprobability in SEO Dawn Anderson Friends of Search 2020
 
2019 Tech SEO Boost Dawn Anderson Contextual Recommender Search
2019 Tech SEO Boost Dawn Anderson Contextual Recommender Search2019 Tech SEO Boost Dawn Anderson Contextual Recommender Search
2019 Tech SEO Boost Dawn Anderson Contextual Recommender Search
 
Connecting The Worlds of Information Retrieval & SEO - Search solutions 2019 ...
Connecting The Worlds of Information Retrieval & SEO - Search solutions 2019 ...Connecting The Worlds of Information Retrieval & SEO - Search solutions 2019 ...
Connecting The Worlds of Information Retrieval & SEO - Search solutions 2019 ...
 
Planning an SEO Strategy for a New Website - SMXL Milan 2019
Planning an SEO Strategy for a New Website - SMXL Milan 2019Planning an SEO Strategy for a New Website - SMXL Milan 2019
Planning an SEO Strategy for a New Website - SMXL Milan 2019
 
Google BERT and Family and the Natural Language Understanding Leaderboard Race
Google BERT and Family and the Natural Language Understanding Leaderboard RaceGoogle BERT and Family and the Natural Language Understanding Leaderboard Race
Google BERT and Family and the Natural Language Understanding Leaderboard Race
 
The User is the Query - The Rise of Predictive Proactive Search
The User is the Query - The Rise of Predictive Proactive SearchThe User is the Query - The Rise of Predictive Proactive Search
The User is the Query - The Rise of Predictive Proactive Search
 
Natural Language Processing and Search Intent Understanding C3 Conductor 2019...
Natural Language Processing and Search Intent Understanding C3 Conductor 2019...Natural Language Processing and Search Intent Understanding C3 Conductor 2019...
Natural Language Processing and Search Intent Understanding C3 Conductor 2019...
 
Using topic modelling frameworks for NLP and semantic search
Using topic modelling frameworks for NLP and semantic searchUsing topic modelling frameworks for NLP and semantic search
Using topic modelling frameworks for NLP and semantic search
 
SEO in a Mobile First World
SEO in a Mobile First WorldSEO in a Mobile First World
SEO in a Mobile First World
 
Modern Ecommerce SEO
Modern Ecommerce SEOModern Ecommerce SEO
Modern Ecommerce SEO
 
Voice Search and Conversation Action Assistive Systems - Challenges & Opportu...
Voice Search and Conversation Action Assistive Systems - Challenges & Opportu...Voice Search and Conversation Action Assistive Systems - Challenges & Opportu...
Voice Search and Conversation Action Assistive Systems - Challenges & Opportu...
 
The Iceberg Approach - Power from what lies beneath in SEO for a mobile-first...
The Iceberg Approach - Power from what lies beneath in SEO for a mobile-first...The Iceberg Approach - Power from what lies beneath in SEO for a mobile-first...
The Iceberg Approach - Power from what lies beneath in SEO for a mobile-first...
 
SEO and The Mobile-First Paradigm Shift
SEO and The Mobile-First Paradigm ShiftSEO and The Mobile-First Paradigm Shift
SEO and The Mobile-First Paradigm Shift
 

Dernier

[Expert Panel] New Google Shopping Ads Strategies Uncovered
[Expert Panel] New Google Shopping Ads Strategies Uncovered[Expert Panel] New Google Shopping Ads Strategies Uncovered
[Expert Panel] New Google Shopping Ads Strategies UncoveredSearch Engine Journal
 
SP Search Term Data Optimization Template.pdf
SP Search Term Data Optimization Template.pdfSP Search Term Data Optimization Template.pdf
SP Search Term Data Optimization Template.pdfPauleneNicoleLapira
 
Digital-Marketing-Into-by-Zoraiz-Ahmad.pptx
Digital-Marketing-Into-by-Zoraiz-Ahmad.pptxDigital-Marketing-Into-by-Zoraiz-Ahmad.pptx
Digital-Marketing-Into-by-Zoraiz-Ahmad.pptxZACGaming
 
Martal Group - B2B Lead Gen Agency - Onboarding Overview
Martal Group - B2B Lead Gen Agency - Onboarding OverviewMartal Group - B2B Lead Gen Agency - Onboarding Overview
Martal Group - B2B Lead Gen Agency - Onboarding OverviewMartal Group
 
Discover Ardency Elite: Elevate Your Lifestyle
Discover Ardency Elite: Elevate Your LifestyleDiscover Ardency Elite: Elevate Your Lifestyle
Discover Ardency Elite: Elevate Your LifestyleMy Heart Throw Pillow
 
Best 5 Graphics Designing Course In Chandigarh
Best 5 Graphics Designing Course In ChandigarhBest 5 Graphics Designing Course In Chandigarh
Best 5 Graphics Designing Course In Chandigarhhamitthakurdma01
 
Enhancing Business Visibility PR Firms in San Francisco
Enhancing Business Visibility PR Firms in San FranciscoEnhancing Business Visibility PR Firms in San Francisco
Enhancing Business Visibility PR Firms in San Franciscosanfranciscoprfirms
 
20180928 Hofstede Insights Conference Milan The Power of Culture Led Brands.pptx
20180928 Hofstede Insights Conference Milan The Power of Culture Led Brands.pptx20180928 Hofstede Insights Conference Milan The Power of Culture Led Brands.pptx
20180928 Hofstede Insights Conference Milan The Power of Culture Led Brands.pptxMartinKaraffa3
 
The+State+of+Careers+In+Retention+Marketing-2.pdf
The+State+of+Careers+In+Retention+Marketing-2.pdfThe+State+of+Careers+In+Retention+Marketing-2.pdf
The+State+of+Careers+In+Retention+Marketing-2.pdfSocial Samosa
 
Social Media Marketing Portfolio - Maharsh Benday
Social Media Marketing Portfolio - Maharsh BendaySocial Media Marketing Portfolio - Maharsh Benday
Social Media Marketing Portfolio - Maharsh BendayMaharshBenday
 
Social Media Marketing Portfolio - Maharsh Benday
Social Media Marketing Portfolio - Maharsh BendaySocial Media Marketing Portfolio - Maharsh Benday
Social Media Marketing Portfolio - Maharsh BendayMaharshBenday
 
Five Essential Tools for International SEO - Natalia Witczyk - SearchNorwich 15
Five Essential Tools for International SEO - Natalia Witczyk - SearchNorwich 15Five Essential Tools for International SEO - Natalia Witczyk - SearchNorwich 15
Five Essential Tools for International SEO - Natalia Witczyk - SearchNorwich 15SearchNorwich
 
W.H.Bender Quote 61 -Influential restaurant and food service industry network...
W.H.Bender Quote 61 -Influential restaurant and food service industry network...W.H.Bender Quote 61 -Influential restaurant and food service industry network...
W.H.Bender Quote 61 -Influential restaurant and food service industry network...William (Bill) H. Bender, FCSI
 
VIP Call Girls Dongri WhatsApp +91-9833363713, Full Night Service
VIP Call Girls Dongri WhatsApp +91-9833363713, Full Night ServiceVIP Call Girls Dongri WhatsApp +91-9833363713, Full Night Service
VIP Call Girls Dongri WhatsApp +91-9833363713, Full Night Servicemeghakumariji156
 
Busty Desi⚡Call Girls in Sector 49 Noida Escorts >༒8448380779 Escort Service
Busty Desi⚡Call Girls in Sector 49 Noida Escorts >༒8448380779 Escort ServiceBusty Desi⚡Call Girls in Sector 49 Noida Escorts >༒8448380779 Escort Service
Busty Desi⚡Call Girls in Sector 49 Noida Escorts >༒8448380779 Escort ServiceDelhi Call girls
 
10 Email Marketing Best Practices to Increase Engagements, CTR, And ROI
10 Email Marketing Best Practices to Increase Engagements, CTR, And ROI10 Email Marketing Best Practices to Increase Engagements, CTR, And ROI
10 Email Marketing Best Practices to Increase Engagements, CTR, And ROIShamsudeen Adeshokan
 
FULL ENJOY Call Girls In Majnu.Ka.Tilla Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Majnu.Ka.Tilla Delhi Contact Us 8377877756FULL ENJOY Call Girls In Majnu.Ka.Tilla Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Majnu.Ka.Tilla Delhi Contact Us 8377877756dollysharma2066
 
Micro-Choices, Max Impact Personalizing Your Journey, One Moment at a Time.pdf
Micro-Choices, Max Impact Personalizing Your Journey, One Moment at a Time.pdfMicro-Choices, Max Impact Personalizing Your Journey, One Moment at a Time.pdf
Micro-Choices, Max Impact Personalizing Your Journey, One Moment at a Time.pdfPiyush Kumar
 
Unlocking the Mystery of the Voynich Manuscript
Unlocking the Mystery of the Voynich ManuscriptUnlocking the Mystery of the Voynich Manuscript
Unlocking the Mystery of the Voynich Manuscriptelizabethella096
 

Dernier (20)

[Expert Panel] New Google Shopping Ads Strategies Uncovered
[Expert Panel] New Google Shopping Ads Strategies Uncovered[Expert Panel] New Google Shopping Ads Strategies Uncovered
[Expert Panel] New Google Shopping Ads Strategies Uncovered
 
SP Search Term Data Optimization Template.pdf
SP Search Term Data Optimization Template.pdfSP Search Term Data Optimization Template.pdf
SP Search Term Data Optimization Template.pdf
 
Digital-Marketing-Into-by-Zoraiz-Ahmad.pptx
Digital-Marketing-Into-by-Zoraiz-Ahmad.pptxDigital-Marketing-Into-by-Zoraiz-Ahmad.pptx
Digital-Marketing-Into-by-Zoraiz-Ahmad.pptx
 
Martal Group - B2B Lead Gen Agency - Onboarding Overview
Martal Group - B2B Lead Gen Agency - Onboarding OverviewMartal Group - B2B Lead Gen Agency - Onboarding Overview
Martal Group - B2B Lead Gen Agency - Onboarding Overview
 
Discover Ardency Elite: Elevate Your Lifestyle
Discover Ardency Elite: Elevate Your LifestyleDiscover Ardency Elite: Elevate Your Lifestyle
Discover Ardency Elite: Elevate Your Lifestyle
 
Best 5 Graphics Designing Course In Chandigarh
Best 5 Graphics Designing Course In ChandigarhBest 5 Graphics Designing Course In Chandigarh
Best 5 Graphics Designing Course In Chandigarh
 
Enhancing Business Visibility PR Firms in San Francisco
Enhancing Business Visibility PR Firms in San FranciscoEnhancing Business Visibility PR Firms in San Francisco
Enhancing Business Visibility PR Firms in San Francisco
 
20180928 Hofstede Insights Conference Milan The Power of Culture Led Brands.pptx
20180928 Hofstede Insights Conference Milan The Power of Culture Led Brands.pptx20180928 Hofstede Insights Conference Milan The Power of Culture Led Brands.pptx
20180928 Hofstede Insights Conference Milan The Power of Culture Led Brands.pptx
 
The+State+of+Careers+In+Retention+Marketing-2.pdf
The+State+of+Careers+In+Retention+Marketing-2.pdfThe+State+of+Careers+In+Retention+Marketing-2.pdf
The+State+of+Careers+In+Retention+Marketing-2.pdf
 
Social Media Marketing Portfolio - Maharsh Benday
Social Media Marketing Portfolio - Maharsh BendaySocial Media Marketing Portfolio - Maharsh Benday
Social Media Marketing Portfolio - Maharsh Benday
 
Social Media Marketing Portfolio - Maharsh Benday
Social Media Marketing Portfolio - Maharsh BendaySocial Media Marketing Portfolio - Maharsh Benday
Social Media Marketing Portfolio - Maharsh Benday
 
Five Essential Tools for International SEO - Natalia Witczyk - SearchNorwich 15
Five Essential Tools for International SEO - Natalia Witczyk - SearchNorwich 15Five Essential Tools for International SEO - Natalia Witczyk - SearchNorwich 15
Five Essential Tools for International SEO - Natalia Witczyk - SearchNorwich 15
 
4 TRIK CARA MENGGUGURKAN JANIN ATAU ABORSI KANDUNGAN
4 TRIK CARA MENGGUGURKAN JANIN ATAU ABORSI KANDUNGAN4 TRIK CARA MENGGUGURKAN JANIN ATAU ABORSI KANDUNGAN
4 TRIK CARA MENGGUGURKAN JANIN ATAU ABORSI KANDUNGAN
 
W.H.Bender Quote 61 -Influential restaurant and food service industry network...
W.H.Bender Quote 61 -Influential restaurant and food service industry network...W.H.Bender Quote 61 -Influential restaurant and food service industry network...
W.H.Bender Quote 61 -Influential restaurant and food service industry network...
 
VIP Call Girls Dongri WhatsApp +91-9833363713, Full Night Service
VIP Call Girls Dongri WhatsApp +91-9833363713, Full Night ServiceVIP Call Girls Dongri WhatsApp +91-9833363713, Full Night Service
VIP Call Girls Dongri WhatsApp +91-9833363713, Full Night Service
 
Busty Desi⚡Call Girls in Sector 49 Noida Escorts >༒8448380779 Escort Service
Busty Desi⚡Call Girls in Sector 49 Noida Escorts >༒8448380779 Escort ServiceBusty Desi⚡Call Girls in Sector 49 Noida Escorts >༒8448380779 Escort Service
Busty Desi⚡Call Girls in Sector 49 Noida Escorts >༒8448380779 Escort Service
 
10 Email Marketing Best Practices to Increase Engagements, CTR, And ROI
10 Email Marketing Best Practices to Increase Engagements, CTR, And ROI10 Email Marketing Best Practices to Increase Engagements, CTR, And ROI
10 Email Marketing Best Practices to Increase Engagements, CTR, And ROI
 
FULL ENJOY Call Girls In Majnu.Ka.Tilla Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Majnu.Ka.Tilla Delhi Contact Us 8377877756FULL ENJOY Call Girls In Majnu.Ka.Tilla Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Majnu.Ka.Tilla Delhi Contact Us 8377877756
 
Micro-Choices, Max Impact Personalizing Your Journey, One Moment at a Time.pdf
Micro-Choices, Max Impact Personalizing Your Journey, One Moment at a Time.pdfMicro-Choices, Max Impact Personalizing Your Journey, One Moment at a Time.pdf
Micro-Choices, Max Impact Personalizing Your Journey, One Moment at a Time.pdf
 
Unlocking the Mystery of the Voynich Manuscript
Unlocking the Mystery of the Voynich ManuscriptUnlocking the Mystery of the Voynich Manuscript
Unlocking the Mystery of the Voynich Manuscript
 

Cruft busting technical debt code smell and refactoring for seo - state of search

  • 1. @dawnieando from  @MoveItMarketing #StateOfSearch REDUCING  THE   BURDEN  OF   TECHNICAL  DEBT   IN  SEO
  • 2. @dawnieando from  @MoveItMarketing #StateOfSearch The Great 302s Pass PageRank Debate
  • 3. @dawnieando from  @MoveItMarketing #StateOfSearch MULTIPLE  GENERATIONS  OF  A   WEBSITE
  • 4. @dawnieando from  @MoveItMarketing #StateOfSearch JUST  WHAT  IS   GENERATIONAL  CRUFT?
  • 5. @dawnieando from  @MoveItMarketing #StateOfSearch GENERATIONAL  CRUFT  MAKES  CRAWLING,   INDEXING,  QUERY  CLUSTERING  &   SEMANTIC  UNDERSTANDING  MORE   COMPLEX  
  • 6. @dawnieando from  @MoveItMarketing #StateOfSearch TECHNICAL  DEBT
  • 7. @dawnieando from  @MoveItMarketing #StateOfSearch TECHNICAL DEBT
  • 8.
  • 9. @dawnieando from  @MoveItMarketing #StateOfSearch RECKLESS  DEBT PRUDENT  DEBT DELIBERATE INADVERTENT  DEBT MARTIN  FOWLER  TECHNICAL  DEBT  QUADRANT “We  must  launch  now  and   deal  with  consequences” “Now  we  know  how  we   should  have  done  it” “We  don’t  have  time  for   design” “What’s  layering?” Credit:  Martin  Fowler  Technical  Debt  Quadrant
  • 10. @dawnieando from  @MoveItMarketing #StateOfSearch RECKLESS  DEBT PRUDENT  DEBT DELIBERATE  DEBT INADVERTENT  DEBT SEO  TECHNICAL  DEBT  QUADRANT “SEO  is  dead  /  doesn’t   matter” “What’s  a  URL  parameter?     “What’s  a  canonical?  “   What’s   internationalization?” S E O T E C H N I C A L D E B T “Now  we  know  how  we   should  have  done  it” (Further  learnings  were   discovered  as  knowledge   grew) “We  must  launch  now  and  deal   with  SEO  issues  after” ”We’ll  SEO  ‘it’  later” PRUDENT  DEBT
  • 11. Project Success === Produce planned deliverables, within budget, on time (including approved changes) Source:  http://4pm.com/2015/09/27/project-­‐failure/
  • 12. 70% Of All Projects Fail Source:  http://4pm.com/2015/09/27/project-­‐failure/
  • 13. @dawnieando from  @MoveItMarketing #StateOfSearch PAST   LEGACY PRESENT   APPLICATION FUTURE   PLANS
  • 14. @dawnieando from  @MoveItMarketing #StateOfSearch SEOs  WAIT  A  LONG  TIME  FOR  DEV   CHANGES https://moz.com/blog/how-­‐long-­‐are-­‐seos-­‐waiting-­‐for-­‐their-­‐most-­‐important-­‐changes Over  40%  wait   12  months+  to   get  their  most   crucial  SEO   changes   implemented
  • 15. @dawnieando from  @MoveItMarketing #StateOfSearch MAJOR  CAUSES  OF  THIS https://moz.com/blog/how-­‐long-­‐are-­‐seos-­‐waiting-­‐for-­‐their-­‐most-­‐important-­‐changes “Legacy  technology   or  outdated   processes   hampering   progress” “The  change  they   want  is  “not   possible”  with   current  platform   (37%)” Source:  (Will  Critchlow  Distilled  Research,   on  Moz Blog,  May  2016)
  • 16. @dawnieando from  @MoveItMarketing #StateOfSearch Credit:  Dilbert  by  Scott  Adams Blame it on the marketers
  • 19. @dawnieando from  @MoveItMarketing #StateOfSearch THE ACCUMULATION OF TECHNICAL DEBT
  • 20. @dawnieando from  @MoveItMarketing #StateOfSearch FIGHTING  A  LOSING  BATTLE UNPAID   SEO  TECHNICAL   DEBT ===  SEO   BANKRUPTCY
  • 21. @dawnieando from  @MoveItMarketing #StateOfSearch A Clean Slate LET’S START WITH A CLEAN SLATE
  • 22. @dawnieando from  @MoveItMarketing #StateOfSearch Websites are not disposable BUT…
  • 23. @dawnieando from  @MoveItMarketing #StateOfSearch SEARCH  ENGINES  NEVER  FORGETS Search  engines   have  a  long   memory  and  a  lot   of  storage
  • 24. @dawnieando from  @MoveItMarketing #StateOfSearch A NEW URL HAS NO BUT YOUR OLD ONES HAVE LOTS
  • 25. @dawnieando from  @MoveItMarketing #StateOfSearch Web Crawler System History Logs
  • 26. GOOGLE  WON’T   WAIT   (LONG)   FOR  YOU   TO  DEAL   WITH  YOUR   TECHNICAL  DEBT
  • 27. @dawnieando from  @MoveItMarketing #StateOfSearch THE CHALLENGE IS NOT IN INDEXING… BUT IN KEEPING EVERYTHING INDEXED UP TO DATE
  • 28. @dawnieando from  @MoveItMarketing #StateOfSearch “If  “change”  means  “any  change”,  then  about  40%  of  all  web  pages  change  weekly   [12].  Even  if  we  consider  only  pages  that  change  by  a  third  or  more,  about  7%  of  all   web  pages  change  weekly  [17].”  (Broder,  A.Z.,  Najork,  M.  and  Wiener,  J.L.,  2003) EVEN  AS  FAR  BACK  IN  2003 40% of ALL web pages changed weekly ___________________ 7%  of  web  pages  changed  a  1/3  of  their   page  content  or  more  weekly
  • 29. @dawnieando from  @MoveItMarketing #StateOfSearch HOW  MUCH  BIGGER  &  DYNAMIC  IS  THE  WEB   NOW  IN  2017? http://www.internetlivestats.com/total-­‐number-­‐of-­‐websites/
  • 30. @dawnieando from  @MoveItMarketing #StateOfSearch INCREMENTAL CRAWLING NEVER ENDS “Crawling  method   based  on  crawl   frequency  based  on   URL  historical   change  &   importance   rate” Crawling Which Never Ends Ongoing
  • 31. @dawnieando from  @MoveItMarketing #StateOfSearch CRAWLING FRONTIER Shestakov,  D.,  2013,   July.  Current   challenges  in  web   crawling.   In International   Conference  on  Web   Engineering (pp.  518-­‐ 521).  Springer,  Berlin,   Heidelberg.
  • 32. @dawnieando from  @MoveItMarketing #StateOfSearch The Crawling ‘Frontier’ (THE URL QUEUE) ‘TO  BE  EXPLORED’ (OR  REVISTED)
  • 33. @dawnieando from  @MoveItMarketing #StateOfSearch URLs Are Prioritized By Importance & Take Their Place in The Frontier Queue (New & Revisit)
  • 34. @dawnieando from  @MoveItMarketing #StateOfSearch DATA FROM HISTORY LOGS CONTRIBUTE TO WHEN TO REVISIT URIs ON THE WEB
  • 35. @dawnieando from  @MoveItMarketing #StateOfSearch PAST DATA IS A GREAT PREDICTOR OF FUTURE DATA PREDICTION  BASED   PRIORITY   SCHEDULING …  WHEN   THERE  IS   CONSISTENCY
  • 36. @dawnieando from  @MoveItMarketing #StateOfSearch ‘Sampling’ in Crawling for Efficiency ‘SMALL  TEST  VISITS  TO  A  SITE  TO   UNDERSTAND  WHETHER  IT  IS  WORTH   CRAWLING  &  UNDERSTAND    URL   PATTERNS  &  RESOURCES  THERE’
  • 38. @dawnieando from  @MoveItMarketing #StateOfSearch DUSTBUSTER & DUST CRAWLING RULES DO  NOT   CRAWL  IN   THE  DUST BUILDS   ‘HINTS’  ON   WHAT  NOT   TO  CRAWL EVERY  SITE  WILL   HAVE  ITS  OWN   CRAWLING   RULES
  • 39. @dawnieando from  @MoveItMarketing #StateOfSearch Popular CMS ’Rule Patterns’ (URL Parameters) ALL  WILL  HAVE  COMMON   CANONICALIZATION  PATTERNS  WHICH   CAN  BE  LEARNED
  • 40. @dawnieando from  @MoveItMarketing #StateOfSearch Every Version of Your Past Ecommerce Sites “Exponentially   multiplicative   URLs” Had  potential  to  spew…  at  some  point… DIFFERENT  PARAMETERS  &  URL   PATTERNS  WHICH  ARE  LEARNED  BY   CRAWLERS…  AND  REMEMBERED…   FOREVER
  • 41. @dawnieando from  @MoveItMarketing #StateOfSearch SEVERAL  TYPES  OF   CRUFT  MAY   CONTRIBUTE
  • 42. @dawnieando from  @MoveItMarketing #StateOfSearch SOFTWARE   ROT CODE SMELL SOFTWARE  CRUFT
  • 43. @dawnieando from  @MoveItMarketing #StateOfSearch SPAGHETTI  CODE
  • 44. @dawnieando from  @MoveItMarketing #StateOfSearch DEPRECATION
  • 45. @dawnieando from  @MoveItMarketing #StateOfSearch The   hottest  job  on   the  block   at  one  point Once  described  by   W3C  Schools  as   ‘The  Developers   Dream’ LEGACY  CODE  BASES  &  DEPRECATED  VERSIONS
  • 46. How  Did  That   Work  Out  For  Your   SEO?
  • 47. @dawnieando from  @MoveItMarketing #StateOfSearch WHAT ABOUT ALL THAT CSS & JS YOU COLLECTED?
  • 48. @dawnieando from  @MoveItMarketing #StateOfSearch PEOPLE APPEND (ADD TO FILES) - SOMETIMES IT’S FEAR OF DEPENDENCIES
  • 49. @dawnieando from  @MoveItMarketing #StateOfSearch GUTENBERG SOURCE:  https://speckyboy.com/meet-­‐greg-­‐schoppe-­‐developer-­‐gutenberg/ “WordPress  Core  is  a  minefield  of  design   decisions  that  were  made  for  what   WordPress  was  at  the  time,  and  didn’t  age   well”  (Greg  Schoppe,  2017)
  • 50. @dawnieando from  @MoveItMarketing #StateOfSearch https://managewp.com/statistics-­‐about-­‐wordpress-­‐usage Wordpress now   powers  26%  of   the  web HUGE   EXAMPLE  OF   GENERATIONAL   SOFTWARE   CRUFT
  • 51. @dawnieando from  @MoveItMarketing #StateOfSearch CONTENT  CRUFT
  • 52. @dawnieando from  @MoveItMarketing #StateOfSearch WELL…  WE  DID  MAKE  QUITE  A  BIT  OF   CONTENT http://www.internetlivestats.com/total-­‐number-­‐of-­‐websites/
  • 53. @dawnieando from  @MoveItMarketing #StateOfSearch CONTENT CRUFT https://moz.com/blog/c lean-­‐site-­‐cruft-­‐before-­‐it-­‐ causes-­‐ranking-­‐ problems-­‐whiteboard-­‐ friday
  • 54. @dawnieando from  @MoveItMarketing #StateOfSearch Poor  quality  content  signals  build  up   over  time…   incremental  crawling  just  keeps  on   rolling  and  crawling…  and  gathering   signals
  • 55. @dawnieando from  @MoveItMarketing #StateOfSearch Source:  https://plus.google.com/u/0/+GlennGabe/posts/fXZw2BuSa5B SIGNALS  OF  LOW   QUALITY  JUST   KEEP   COMPOUNDING   OVER  TIME
  • 56. @dawnieando from  @MoveItMarketing #StateOfSearch PEOPLE CANONICALIZE WRONG ON  MULTIPLE  GENERATIONS  OF  SITES
  • 58. @dawnieando from  @MoveItMarketing #StateOfSearch GOOGLEBOT GETS WHERE WATER COULDN’T https://petermeadit.com/blog /block-­‐web-­‐crawlers/
  • 59. @dawnieando from  @MoveItMarketing #StateOfSearch EVEN YOUR STAGING & DEV SITES Found  with  a  very  simple  wildcard  *  site:  query
  • 60. @dawnieando from  @MoveItMarketing #StateOfSearch ARCHITECTURAL  &   SEMANTIC  CRUFT
  • 61. @dawnieando from  @MoveItMarketing #StateOfSearch SEMANTIC   LOSS WONKY TOPICAL STRENGTH HOW  MUCH   STUFF  DID  YOU   MOVE  AROUND   OVER  THE  YEARS?
  • 62. @dawnieando from  @MoveItMarketing #StateOfSearch YOU BROKE YOUR SILO STRUCTURE Image  credit:  https://www.slideshare.net/patrickstox/nlp-­‐sitemap-­‐smx-­‐2016-­‐ patrick-­‐stox-­‐latest-­‐in-­‐advanced-­‐technical-­‐seo SEMANTIC   LOSS
  • 63. YOU  BROKE  YOUR  CORPUS   ‘RELATEDNESS’ 1st level relatedness 2nd level   relatedness MANY   SIGNALS   GONE
  • 64. “You shall know a word by the company it keeps” (Firth,  1957) (ITS  CO-­‐OCCURRENCE  VECTOR)
  • 65. CO-­‐OCCURRENCE Of  words  together   &   High  commonality  of   other  shared  co-­‐occurring   words
  • 66. @dawnieando from  @MoveItMarketing #StateOfSearch RELATEDNESS  EXAMPLES üEat üBake üCake üPeel APPLE AUTOMOBILE üAccident üTraffic üDriver üCar üMotor FURNACE üHearth üBlast üFiery üGas üElectric Miller,  G.A.  and   Charles,  W.G.,   1991.  Contextual   correlates  of   semantic   similarity. Langua ge  and  cognitive   processes, 6(1),   pp.1-­‐28.
  • 67. @dawnieando from  @MoveItMarketing #StateOfSearch ‘CONCEPT DRIFT’ IS A THING fuzzy difficult  to  perceive;;  indistinct  or  vague. synonyms: blurry, blurred, indistinct; unclear, bleary, misty, distorted, out  of   focus, unfocused, lacking  definition, low  resolution, nebulous; Ill-­‐ defined, indefinite, vague, hazy, imprecise, inexact, loose, woolly "a  fuzzy  picture" https://en.wikipedia.org/wiki/Concept_drift AI ALERT
  • 68. @dawnieando from  @MoveItMarketing #StateOfSearch BOOLEAN LOGIC – EXTREME CASES OF TRUTH - (TRUE (1) OR FALSE (0))
  • 69. @dawnieando from  @MoveItMarketing #StateOfSearch FUZZY  LOGIC • Rule  based  logic • Been  around  for  20+   years • Is  within  a  subset  of  AI
  • 70. @dawnieando from  @MoveItMarketing #StateOfSearch ‘FUZZY LOGIC’ – DEGREES OF TRUTH SEMANTIC   LOSS
  • 71. @dawnieando from  @MoveItMarketing #StateOfSearch FUZZY LOGIC – DEGREES OF TRUTH 0.8  Doc  ID  likely  to   be  a  correct  URI  to   choose  from  term  /   query  cluster
  • 72. @dawnieando from  @MoveItMarketing #StateOfSearch Semantics  &  concepts   relatednes may  be   ‘secret  sauce’  when  it   comes  to  ’precision’   over  ‘recall’
  • 73. @dawnieando from  @MoveItMarketing #StateOfSearch TWO-PHASE RANKING IN A SEARCH NODE Presented  by  B  Cambazoglu at  European  Summer  School  Information  Retrieval  2017  – (Cambazoglu,  B.B.  and  Baeza-­‐Yates,  R.,   2011.  Scalability  challenges  in  web  search  engines.  In Advanced  topics  in  information  retrieval (pp.  27-­‐50).  Springer  Berlin   Heidelberg.)
  • 74. @dawnieando from  @MoveItMarketing #StateOfSearch URL  CRUFT
  • 75. @dawnieando from  @MoveItMarketing #StateOfSearch ‘URL  CRUFT’  IS  A   THING “characters relevant  or  meaningful   only  to  the  people  who  created  the   site,  such  as  implementation  details   of  the  computer  system  which  serves   the  page.  Examples  of  URL  cruft   include filename  extensions such   as .php or .html,  and  internal   organizational  details  such   as /public/or /Users/john/work/draft s/.[9]”   (Wikipedia  Definition)
  • 76. ALL  THE  RANDOM   URLS  YOU  CREATED   OVER  THE  YEARS  &   SITES (EVEN  BY   ACCIDENT)
  • 77. @dawnieando from  @MoveItMarketing #StateOfSearch 410 Gone § “Some,  we’ll  just  kill   off  with  a  410…” § “Then  the  URLs  will   be  gone”
  • 78. @dawnieando from  @MoveItMarketing #StateOfSearch https://www.youtube.com/watch?v=xp5Nf8ANfOw THE  DIFFERENCE  BETWEEN  HOW  GOOGLE  TREATS  404  VERSUS  410s
  • 79. @dawnieando from  @MoveItMarketing #StateOfSearch 302  ==  Default 301  ==  Intentional 404  ==  Default 410  ==  Intentional “The  410  response  is  primarily  intended  to  assist  the  task  of  web  maintenance  by   notifying  the  recipient  that  the  resource  is  intentionally  unavailable  and  that  the  server   owners  desire  that  remote  links  to  that  resource  be  removed.”  (RFC  7231) https://tools.ietf.org/html/rfc7231#section-­‐6.5.9 ARE YOU SURE? MAYBE YES
  • 80. @dawnieando from  @MoveItMarketing #StateOfSearch https://twitter.com/JohnMu/status/903904602617204738
  • 81. @dawnieando from  @MoveItMarketing #StateOfSearch DO NOT THINK 410s WON’T BE RECRAWLED AGAIN Source:  https://www.docsplace.org/4578/09/410-­‐gone-­‐stops-­‐crawling-­‐dead-­‐urls/
  • 82. @dawnieando from  @MoveItMarketing #StateOfSearch “We  knew  there  was  content   there  at  some  point  so  we   just  swing  by  every  now  and   then  to  see  if  anything  came   back”  (John  Mueller,  2016) In Reality… Gone Is Never Gone
  • 83. @dawnieando from  @MoveItMarketing #StateOfSearch A URL IS ’NOT’ CONTENT IT IS A LOCATION WHERE A RESOURCE LIVES / LIVED IT MERELY JUST BOILS DOWN TO A DOC ID MAPPED TO TERM IDS IN A MATRIX
  • 84. @dawnieando from  @MoveItMarketing #StateOfSearch ZOMBIES ARE  NEVER GONE NO  URLS  ARE   EVER  GONE     ONLY  THE  RESOURCE  THERE   IS  GONE https://www.seroundtable.com/google-­‐410-­‐indexing-­‐22584.html 5  YEARS  LATER
  • 85. @dawnieando from  @MoveItMarketing #StateOfSearch HOW ABOUT 14 YEARS LATER? https://www.webmasterworld.com/google/4864613.htm 2  HOURS  ALIVE…   14  YEARS  LATER
  • 86. @dawnieando from  @MoveItMarketing #StateOfSearch SMALL TOPICAL URL FISH IN A BIG TOPICAL POND SEMANTIC   LOSS
  • 87. @dawnieando from  @MoveItMarketing #StateOfSearch COME TO OUR ANNUAL EVENT WITH THE SAME NAME BUT A NEW URL EVERY YEAR
  • 88. @dawnieando from  @MoveItMarketing #StateOfSearch “COOL  URIs  DON’T   CHANGE” Sir  Tim  Berners-­‐Lee (Inventor  of  the  World  Wide  Web) https://www.w3.org/Provider/Style/URI Attrubution:  By  Uldis Bojārs (Flickr.)  [CC  BY-­‐SA  2.0  (http://creativecommons.org/licenses/by-­‐sa/2.0)],  via  Wikimedia   Commons
  • 89. @dawnieando from  @MoveItMarketing #StateOfSearch YOU END UP WITH A CONGA LINE OF LEGACY URLS, SUBDOMAINS & VARIOUS SITE PROTOCOLS …In  the  URL  queue
  • 90. @dawnieando from  @MoveItMarketing #StateOfSearch URL_SEEN TEST YOU CAN’T JUST KEEP TRYING TO JUMP THE INDEXING QUEUE EITHER PUSH  INDEXING PULL INDEXING E.G.  FETCH  AS  GOOGLEBOT  &   SUBMIT  TO  INDEX,  XML   SITEMAP  SUBMISSIONS VISITS  BY  NATURAL  CRAWLING   &  DISCOVERY  OF  URLS  /  URL   VISIT  SCHEDULING  /  REVISITS
  • 91. @dawnieando from  @MoveItMarketing #StateOfSearch CRAWL  CRUFT  IS  A   SYMPTOM
  • 92. @dawnieando from  @MoveItMarketing #StateOfSearch IMPORTANCE TIERING FOR SCALE (EFFICIENCY)
  • 93. @dawnieando from  @MoveItMarketing #StateOfSearch TWO STEPS FORWARD & ONE STEP BACK STRONG  CANONICAL  CONTEXT  URL YES YES YES NONO YES YES YES YES YES NO NO NO NO AN  OTHER  OR  MULTIPLE  WEAK  ALTERNATIVES
  • 94. @dawnieando from  @MoveItMarketing #StateOfSearch PAST DATA ON CHANGE IS A GREAT PREDICTOR OF FUTURE DATA PREDICTION  BASED   PRIORITY   SCHEDULING …  WHEN   THERE  IS   CONSISTENCY “past  changes  to  a  page  are  a  good  predictor  of  future  changes.  This  result   has  practical  implications  for  incremental  web  crawlers  that  seek  to   maximize  the  freshness  of  a  web  page  collection  or  index.”  (
  • 95. @dawnieando from  @MoveItMarketing #StateOfSearch TO  BUILD   PROBABILITY  &   PREDICTABILITY   MODELS
  • 96. @dawnieando from  @MoveItMarketing #StateOfSearch BASED  ON  ROLLING   AVERAGES  /  SIGNALS FROM  PAST CRAWL  VISITS
  • 97. @dawnieando from  @MoveItMarketing #StateOfSearch ‘Transitive’?? - ‘THE WHOLETREE IS ROTTEN’ Transitive  -­‐ A  ==  B  +  B  ==  C  then  A  ==  C For  some  types  of  content  more  than   others  – e.g.  ecommerce/directories  but   not  news SAMPLING
  • 98. @dawnieando from  @MoveItMarketing #StateOfSearch CRAWL  SAMPLES  ALSO   HELP  WITH  MODELLING   TO  MAP  DOCS  TO  TOPIC   RELEVANCE  &   RELATEDNESS
  • 99. @dawnieando from  @MoveItMarketing #StateOfSearch TOPICAL  DILUTION  & URL  IMPORTANCE  DILUTION
  • 100. RELATEDNESS  HELPS   WITH   ‘GROUNDING’   (confirmation  of  other  signals)
  • 101. @dawnieando from  @MoveItMarketing #StateOfSearch WRONG URL RANKING ’SWAPPING OUT’ (Especially   multiple   child  nodes) SHARP  &   VOLATILE RANKING   FLUX SOME  SYMPTOMS
  • 102. @dawnieando from  @MoveItMarketing #StateOfSearch SOME  SOLUTIONS § How  can  you  change  the  hints   associated  with  your  site  for  better   rankings  and  SEO?
  • 103. @dawnieando from  @MoveItMarketing #StateOfSearch ACKNOWLEDGE  &   CALCULATE  THE  DEBT
  • 105. @dawnieando from  @MoveItMarketing #StateOfSearch THE BOSTON MATRIX & SEO Cash  Cows   (High  converting   queries  &   URLs) Dogs Low  return,  low   conversion   queries  /  URLs Question   Marks   (Jury’s  out) ? PRIORITIZE   THE  DEBT MARKET    GROWTH MARKET    SHARE Stars High  potential.     Worth  more   effort  
  • 106. @dawnieando from  @MoveItMarketing #StateOfSearch ADD EVERYTHING TO GSC FROM THE PAST & PRESENT THERE  MAY  STILL  BE  UNDETECTED  ACTIVITY  GOING  ON  THERE
  • 107. @dawnieando from  @MoveItMarketing #StateOfSearch IDENTIFY  PAGES  IN  QUERY  CLUSTERS   (QUERY  CLASSES  &  INTENT  MEETING  SAME   INFORMATION  NEED  CATEGORY)
  • 108. @dawnieando from  @MoveItMarketing #StateOfSearch REVIEW RELATIVE IMPORTANCE SIGNALS OF INTERNAL LINKS ARE  THESE   REALLY   AMONGST   YOUR   MOST IMPORTANT   URLS?
  • 109. @dawnieando from  @MoveItMarketing #StateOfSearch BUT… REVIEW THE ‘RELATEDNESS’ OF INTERNAL LINKS TO PAGES Domain URL INTERNALLY  LINKING  PAGES  TO  THE   TARGET  URL IS  THE  ‘RELATEDNESS’  HIGHLY   RELEVANT  TO  ASSIST  WITH   CONTEXTUAL  &  SEMANTIC  SIGNALS? IS   RELATEDNESS   HIGH?
  • 110. @dawnieando from  @MoveItMarketing #StateOfSearch FIND SITES ON THE SAME SERVER
  • 111. @dawnieando from  @MoveItMarketing #StateOfSearch YOU  NEED   TO  KNOW   WHAT’S  ON   THAT   SERVER DIAGNOSE: HEAD BACK TO THE SERVER
  • 112. @dawnieando from  @MoveItMarketing #StateOfSearch SOME QUESTIONS TO ASK HOW MANY MICRO-SITES HAVE YOU HAD? HOW MANY SUBDOMAINS? HOW MANY OTHER DOMAINS? WHO IS RESPONSIBLE FOR DOMAIN REG WHO KNOWS WITHIN THE ORGANISATION? WHO REGISTERED THE DOMAINS? WHO CAN UPDATE DNS RECORDS? ARE THESE SITES STILL ON SERVERS? HAVE ANY OF THESE SITES HAD MANUALACTIONS? HOW ARE THESE SITES REDIRECTED? ARE THEY PARKED DOMAINS?
  • 113. @dawnieando from  @MoveItMarketing #StateOfSearch Source:  https://www.seroundtable.com/poll-­‐log-­‐files-­‐seo-­‐24523.html NEARLY  1/3  OF   SEOs  SAY  THEY   DON’T  NEED  LOG   FILES  
  • 114. @dawnieando from  @MoveItMarketing #StateOfSearch DIAGNOSE: SERVER LOG FILE ANALYSIS BUT  WATCH  OUT  FOR   OTHER  TOOLS  EMULATING   GOOGLEBOT  AND  FILTER   THEM  OUT ANALYSE  THE  LOGS  FOR   ‘ALL’  YOUR  SITES  AND  ‘ALL’   PROTOCOLS  TO  SEE  THE   CRAWL  PATTERNS   EMERGE NB:  YOU  MAY   BE  LOOKING   AT  URLS   QUEUED   LONG  AGO
  • 115. @dawnieando from  @MoveItMarketing #StateOfSearch SET  UP  A  PLAN  TO   REPAY  THE  DEBT
  • 116. @dawnieando from  @MoveItMarketing #StateOfSearch MoSCoW Approach MUST   HAVE SHOULD   HAVE COULD   HAVE WON’T   HAVE   THIS  TIME
  • 117. @dawnieando from  @MoveItMarketing #StateOfSearch MoSCoW Prioritization Source:  https://www.agilebusiness.org/content/moscow-­‐prioritisation-­‐0
  • 118. @dawnieando from  @MoveItMarketing #StateOfSearch SEO Refactoring Away The Past Whilst Working Towards The Future
  • 119. @dawnieando from  @MoveItMarketing #StateOfSearch TECHNICAL  DEBT COMES  WITH  INTEREST TO   BE   REPAID   VIA  REFACTORING
  • 120. @dawnieando from  @MoveItMarketing #StateOfSearch Refactoring  Definition “Refactoring  …is  a  disciplined  technique  for  restructuring  an   existing  body  of  code,  altering  its  internal  structure  without   changing  its  external  behavior.   Its  heart  is  a  series  of  small  behavior  preserving   transformations.” https://en.wikipedia.org/wiki/Code_refactoring
  • 121. @dawnieando from  @MoveItMarketing #StateOfSearch SEO   REFACTORING HOUSE   KEEPING WORKING  ON  THE   PAST,  PRESENT  &   FUTURE   SIMULTANEOUSLY USING   APPROACHES   LIKE  MoSCoW ONGOING   ITERATIVE   IMPROVEMENTS ONGOING ROLLING AUDITS PAYING  OFF   DEBT  ‘A  BIT   AT  A  TIME’ MARGINAL   GAINS
  • 122. @dawnieando from  @MoveItMarketing #StateOfSearch SKIP  &  DIVERT  THE   DEBT
  • 123. @dawnieando from  @MoveItMarketing #StateOfSearch HAVE YOUR SAY IN CRAWLING ‘RULES’ Help  Google  Build  ‘Crawling   Rules’  for  your  site  rather   than  wasting  time  on   ‘sampling’  and  giving  a  bad   impression GIVE  HELP  AND   GUIDANCE  WITH  THE   CRAWL  RULE  AND   HINT  BUILDING
  • 124. @dawnieando from  @MoveItMarketing #StateOfSearch Help  Google  Build   ‘Crawling  Rules’  for   your  site  rather  than   wasting  time  on   ‘sampling’  and  giving   a  bad  impression BE  VERY   CAREFUL
  • 125. @dawnieando from  @MoveItMarketing #StateOfSearch REVISIT ALLPAST .HTACCESS FILES Can  you  rewrite  the  rules  to  be   more  efficient  with  regex  or  cut  out   some  old  rules  still  firing   unnecessarily?  (CREATE  SHORTCUTS) REMEMBER  .HTACCESS  RULES  RUN  IN  ORDER  OF   THEIR  APPEARANCE  IN  THE  FILE.     CAN  YOU  USE  WILDCARDS  TO  OPTIMIZE  OR  SKIP   STEPS? .HTACCESS   SITE  1 .HTACCESS   SITE  2 .HTACCESS   SITE  3
  • 126. @dawnieando from  @MoveItMarketing #StateOfSearch Learn By Heart Regular Expressions & How URLs with Multiple Parameters Are Handled The  most  restrictive  parameter  blocked  overrules   lesser  restrictions
  • 127. @dawnieando from  @MoveItMarketing #StateOfSearch FIND & CHOP BACK REDIRECT CHAINS
  • 128. @dawnieando from  @MoveItMarketing #StateOfSearch REVIEW & REMOVE REDUNDANT FILES
  • 129. @dawnieando from  @MoveItMarketing #StateOfSearch WHAT SUPERFLUOUS JAVASCRIPT & CSS IS THERE UNNECESSARILY?
  • 130. Avoid relative URLs versus absolute URLs (particularly in Wordpress)
  • 131. @dawnieando from  @MoveItMarketing #StateOfSearch REVIEW & UNDERSTAND - THE CANONICAL LINK RELATION § 30X  redirects § Canonical  tag § Href lang § HTTPS  protocol § Global  canonicalization  rules § URL  normalization In  ’ALL’  its  forms RFC6596
  • 132. @dawnieando from  @MoveItMarketing #StateOfSearch CUT   THROUGH  TO   THE   DEVELOPERS
  • 133. @dawnieando from  @MoveItMarketing #StateOfSearch REBUILD  STRONG  SEMANTICS  &   ‘RELATEDNESS’
  • 134. @dawnieando from  @MoveItMarketing #StateOfSearch BE  CAREFUL   WITH  THE   CONTENT   PRUNING   ‘CHAINSAW’
  • 135. Did  you  just   ‘prune  away’   your  corpus   ‘relatedness’?  
  • 136. @dawnieando from  @MoveItMarketing #StateOfSearch UPCYCLING URLs RATHER  THAN  ’REMOVE’   CONSIDER  ‘IMPROVE’ EXPAND,  DE-­‐GROUP  &  RE-­‐GROUP
  • 137. @dawnieando from  @MoveItMarketing #StateOfSearch THOUGHTFUL   QUERY   CLUSTER   BASED   ‘PRUNING’  ,   ‘CONTENT   MORPHING’   AND  ‘QUERY   CLUSTER  RE-­‐ GROUPING’
  • 138. @dawnieando from  @MoveItMarketing #StateOfSearch Pass Strong Clues - Highly Relevant New Conceptual Structures STRONG SEMANTICS  &   CONCEPTUALLY   CO-­‐OCCURRING   TERMS
  • 139. @dawnieando from  @MoveItMarketing #StateOfSearch WHAT  CORRELATES? https://www.google.com/trends/correlate/search
  • 140. @dawnieando from  @MoveItMarketing #StateOfSearch SOLUTION: Wiki Page Redirects on Topics https://dbpedia.org/sparql Wikipedia   Redirects thesaurus.com OR  A  GOOD  OLD  FASHIONED  THESAURUS
  • 141. @dawnieando from  @MoveItMarketing #StateOfSearch TIE  YOUR  THEMATIC  CORPUS   BACK  TOGETHER
  • 142. @dawnieando from  @MoveItMarketing #StateOfSearch USE  ‘STRONGLY  CONNECTED   COMPONENTS’  (TOPICAL  HUBS   FOR  FOCUSED  CRAWLING)  TO   REAFFIRM  THE  SEMANTIC   STRENGTH  YOU  ONCE  HAD
  • 143. @dawnieando from  @MoveItMarketing #StateOfSearch BUILD  RICH  CONTENT  HUBS FOR  PRIMARY  TARGET  TOPICS Broder,  A.,  Kumar,  R.,   Maghoul,  F.,  Raghavan,  P.,   Rajagopalan,  S.,  Stata,  R.,   Tomkins,  A.  and  Wiener,  J.,   2000.  Graph  structure  in  the   web. Computer   networks, 33(1),  pp.309-­‐320. STRONGLY   CONNECTED   HUB
  • 144. @dawnieando from  @MoveItMarketing #StateOfSearch BUILD WELL CATEGORIZED AND CONCEPTUALLY STRUCTURED SITEMAPS https://www.slideshare.net/p atrickstox/nlp-­‐sitemap-­‐smx-­‐ 2016-­‐patrick-­‐stox-­‐latest-­‐in-­‐ advanced-­‐technical-­‐seo
  • 145. @dawnieando from  @MoveItMarketing #StateOfSearch XML Sitemaps Are Your Friend… (Strong Foundations) They  help  to   pass   ‘importance’   signals  to  URLs But…  never   leave  them  to   just   autogenerate without   periodically   checking ‘The   foundations’   underneath  a   site
  • 146. @dawnieando from  @MoveItMarketing #StateOfSearch CREATE WELL ORGANISED XML SITEMAPS WITH IMPORTANT URLS
  • 147. @dawnieando from  @MoveItMarketing #StateOfSearch EXTERNALLY HOSTED XML SITEMAPS • Take  back  control • Jump  the  dev  queue • Allows  for  custom  configuration  of  optimal   canonical  click  paths • Allows  for  consistent  signals  of  importance  to   included  URLs • Forget  about  setting  priority • Forget  about  last  modified • Even  a  simple  list  of  URLs  FTW  will  do • Keep  them  organised for  granular  analysis  of   problem  site  sections
  • 148. “Increase and decrease importance via internal link optimization to signal key quality sections”
  • 149. @dawnieando from  @MoveItMarketing #StateOfSearch EXCLUDE LOWER QUALITY SITE SECTIONS (for now) Excluded   sections
  • 150. @dawnieando from  @MoveItMarketing #StateOfSearch BUT… REVIEW THE ‘RELATEDNESS’ OF INTERNAL LINKS TO PAGES Domain URL INTERNALLY  LINKING  PAGES  TO  THE   TARGET  URL IS  THE  ‘RELATEDNESS’  HIGHLY   RELEVANT  TO  ASSIST  WITH   CONTEXTUAL  &  SEMANTIC  SIGNALS? IS   RELATEDNESS   HIGH?
  • 151. @dawnieando from  @MoveItMarketing #StateOfSearch DON’T  LET  THE   DEBT  HOLD   YOU  BACK MAKE  GREAT CONTENT  &   BRAND  BUZZ
  • 152. @dawnieando from  @MoveItMarketing #StateOfSearch A  ‘TWO-­‐OARED   ROWING  BOAT  GOES   FURTHER
  • 153. @dawnieando from  @MoveItMarketing #StateOfSearch THEN… MONITOR & BE PATIENT
  • 154. @dawnieando from  @MoveItMarketing #StateOfSearch “It’s  simple  really,  the  businesses   seeing  growth  in  natural  search   are  those  implementing  technical   changes  successfully,  the  most   common  cause  of  decline  is  either   ignoring  technical  or  getting  it   wrong.” Tim  Grice,  Branded3,  2017 https://www.branded3.com/blog/link-­‐spam-­‐migration-­‐disasters-­‐penguin-­‐organic-­‐growth-­‐2017/
  • 155. @dawnieando from  @MoveItMarketing #StateOfSearch Positive Consistency is KEY ’ROLLING  AVERAGES   CAN  GO  BOTH  WAYS’
  • 156. @dawnieando from  @MoveItMarketing #StateOfSearch APPENDIX  &  EDITORS   CUT
  • 157. @dawnieando from  @MoveItMarketing #StateOfSearch BUT WHEN DATA IS INCONSISTENT FUZZY LOGIC MAY FAIL ‘DEGREES  OF   TRUTH’ MORE   BLURRED  /   VAGUE
  • 158. 1st level relatedness A  measure  of  words   that  directly  occur   together  in  a  text  or   ‘corpus’  (collection  of   documents  together) ’TWO  WORDS  WHICH   TEND  TO  CO-­‐OCCUR   MUST  BE  RELATED’ CO-­‐OCCURRENCE  VECTORS EXAMPLES:  car/automobile,  coast/shore,  furnace/stove  (Miller  &  Charles,  1991)  
  • 159. 2nd level relatedness Share  common  words  they   co-­‐occur  with  aside  from   directly  co-­‐occurring   together  (both  appear  in   same  types  of  text  as  each   other  ===  related EXAMPLE:    FURNACE  &   OVEN  BOTH  SHARE  HEAT,   MOTOR  &  ROAD,  CAR  &   AUTOMOBILE  BOTH  SHARE   PASSENGERS CO-­‐OCCURRENCE  VECTORS
  • 160. @dawnieando from  @MoveItMarketing #StateOfSearch MORE Solutions • Do  a  bit  of  ‘up  front’  thinking  (AVOID  TECHNICAL  DEBT  IN   FIRST  PLACE) • Measure  SEO  technical  debt • Refactor  SEO  technical  debt  away • Reducing  SEO  technical  debt  should  be  inbuilt • Accept  some  SEO  (least  impactful)  technical  debt  is   necessary  for  agility • ’Chip  away’  at  SEO  technical  debt
  • 161. @dawnieando from  @MoveItMarketing #StateOfSearch ’Fuzzy’ URL Targets with Each Site Generation EVERYTHING  GETS   A  BIT  BLURRED ‘Which  is  the  target  URL   again?
  • 162. @dawnieando from  @MoveItMarketing #StateOfSearch ”The  URL  page  importance  score  can  be  retrieved  from  the  …  URL  history  log …or  it  can   be  obtained  by  obtaining  the  historical  page  importance  score  for  the  URL  for  a   predefined  number  of  prior  crawls  and  then  performing  a  predefined  filtering  function   on  those  values  to  obtain  the  URL  page  importance  score.” Scheduler  for  Search  Engine  Crawler https://www.google.com/patents/US8042112 DOC  ID CRAWL  1   IMPORTANCE   RECORD CRAWL  2   IMPORTANCE   RECORD CRAWL 3   IMPORTANCE   RECORD CRAWL  4   IMPORTANCE   RECORD CRAWL  5   IMPORTANCE   RECORD CRAWL  6 IMPORTANCE   RECORD DOC  ID  1 1 0.8 0.6 0.4 0.2 0 DOC  ID  2 0 0.2 0.4 0.6 0.8 1
  • 163. @dawnieando from  @MoveItMarketing #StateOfSearch Example  MoSCoW Prioritisation MUST  HAVE SHOULD  HAVE COULD  HAVE WON’T  HAVE  THIS  TIME Remove  infinite  loops Identify ideal  click   paths Canonicalize to  superset e.g.  Page  title  rewriting Redirect  true  dupes Check  server  log files Upcycle  conflict content e.g.  All  meta-­‐description Review  parameter handling Analyse queries  on   near-­‐dupes Strengthen  categories &   subcategories  (relevance) Review  ‘added-­‐value’   difference  in  ’similars’ Add  seasonal  &  TIME  IS OF  THE  ESSENCE  content pieces  (topical /   evergreen) Review internal  link   popularity  of   important  pages Build  topic  hub  static pages   (Strongly  connected   component) Add  ‘flow’  content  to   amplify  via  social Check soft  404s Review  crawling  on   near dupes  &  similars Sectional content  audit Add site  section   properties  in  GSC   Check  server  errors Review queries  on   similars Add categorized  XML   sitemaps Add  content  from sub  to   superset  &  canonicalize PAST PRESENT FUTURE
  • 164. @dawnieando from  @MoveItMarketing #StateOfSearch URL NORMALIZATION Can be problematic and ‘crufty’ too https://en.wikipedia.org/wiki/URL_normalization
  • 165. @dawnieando from  @MoveItMarketing #StateOfSearch SOLUTION – Think Carefully About Creating New Dynamic Parameters QUEUEING…  AGAIN Waiting  for  good  URLs  to  be   visited…  AGAIN
  • 166. @dawnieando from  @MoveItMarketing #StateOfSearch SEMANTIC  DRIFT
  • 167. @dawnieando from  @MoveItMarketing #StateOfSearch A DOG IS NOT ALWAYS A DOG
  • 168. @dawnieando from  @MoveItMarketing #StateOfSearch BIG TOPICAL URL FISH IN A SMALL TOPICAL POND
  • 169. @dawnieando from  @MoveItMarketing #StateOfSearch TERM-FREQUENCY INVERSE DOCUMENT FREQUENCY Architectural,  URL,   software  &  content  cruft   can  also  skew  term-­‐ frequency  inverse   document  frequency AND  THE  QUERY  CLUSTERS  DOCUMENTS  BELONG  TO
  • 170. @dawnieando from  @MoveItMarketing #StateOfSearch YOU INHERITED SEO TECHNICAL DEBT • Previous  content  /  link  manual  actions • Previous  algorithmic  suppressions • Past  infinite  loops • “We’ll  SEO  it  after  launch” • “SEO  is  dead…  so  we  won’t  optimise” • Dodgy  URL  parameters • SEO  is  a  ‘one  time  audit’ • Misconfigured  URL  parameters • Old  URL  crawling  ‘rules  /  hints’
  • 171. @dawnieando from  @MoveItMarketing #StateOfSearch CRAWLING PATTERNS ARE DEVELOPED FOR EFFICIENCY - CRAWLERS TAKES ‘HINTS’AND ‘HINT RANGES’ (rules / patterns) Help  Google  Build  ‘Crawling   Rules’  for  your  site  rather   than  wasting  time  on   ‘sampling’  and  giving  a  bad   impression GIVE  HELP  AND   GUIDANCE  WITH  THE   CRAWL  RULE  AND   HINT  BUILDING
  • 172. @dawnieando from  @MoveItMarketing #StateOfSearch “REL=NEXT  /  REL  =   PREV”  is  NOT a  form   of  canonicalization
  • 173. @dawnieando from  @MoveItMarketing #StateOfSearch “301s  and  302s  are   BOTH  forms  of   canonicalization”
  • 174. @dawnieando from  @MoveItMarketing #StateOfSearch Href Lang  is  a  form  of Canonicalization (Internationalization)
  • 175. @dawnieando from  @MoveItMarketing #StateOfSearch History Log Records Include: • URL  fingerprint • Timestamp  (last  crawl  or  download   attempt) • Crawl  status  (success  or  error)   (Response  code) • Content  checksum  (binary  code) • Source  ID  (accessed  from  cache  or   downloaded) • Segment  identifier  (Crawl   segment  assigned  to??) • Page  importance  (a  measure  of   importance  assigned  to  the  URL)
  • 176. @dawnieando from  @MoveItMarketing #StateOfSearch INSTEAD  OF  REMOVE…   CONSIDER…  DISTRACT   &  ITERATIVELY IMPROVE
  • 177. @dawnieando from  @MoveItMarketing #StateOfSearch SYSTEM  &  PEOPLE   CRUFT
  • 178. @dawnieando from  @MoveItMarketing #StateOfSearch PEOPLE CHURN INTERNAL  TEAM   CHURN EXTERNAL   AGENCY  CHURN
  • 179. “The average staff turnover rate for agencies is 17% each year” (Drum,  2017)
  • 180. “We’ve Always Done It This Way” HIPPO
  • 181. “What’s The Business Case?” Mowarrr Data  Please
  • 182. @dawnieando from  @MoveItMarketing #StateOfSearch THINK CAREFULLY ABOUT URL CREATION Not  EVERYTHING  is   worthy  of  its  own  URL VARIANTS STEMMINGS PLURALS RANDOM  TAGS LONG,  LONG,  LONG   TAIL  PARAMETERS
  • 183. @dawnieando from  @MoveItMarketing #StateOfSearch ’DANGLY’ NODES AND UNLINKED SITES
  • 184. @dawnieando from  @MoveItMarketing #StateOfSearch A CAT IS NOT ALWAYS A CAT
  • 185. @dawnieando from  @MoveItMarketing #StateOfSearch ARE THEY CHIPS OR ARE THEY CRISPS?
  • 186. @dawnieando from  @MoveItMarketing #StateOfSearch MIXED CONTENT & MULTIPLE SITE VERSIONS http://www.itv.com/news/
  • 187. @dawnieando from  @MoveItMarketing #StateOfSearch MIXED CONTENT & MULTIPLE SITE VERSIONS http://www.itv.com/news/ BOTH  HTTP  &   HTTPS  FIGHTING   EACH  OTHER
  • 188. @dawnieando from  @MoveItMarketing #StateOfSearch ROGUE  INTERNAL   LINKS  TO  PREVIOUS   DOMAIN
  • 189. @dawnieando from  @MoveItMarketing #StateOfSearch 410’s  DO  USE  CRAWL  BUDGET  (MAYBE  NOT   TOO  MUCH  ON  REVISITS,  BUT  THESE  THINGS   ADD  UP).    THEY  ALSO  STILL  NEED  TO  BE   DISCOVERED  WHICH  USES  BUDGET https://twitter.com/dawnieando/status/906465965029969920
  • 190. @dawnieando from  @MoveItMarketing #StateOfSearch GENERATIONAL   CRUFT  CAN   SNOWBALL • Past  infinite  loops • Dodgy  URL  parameters • Misconfigured  URL  parameters • Old  URL  crawling  ‘rules  /  hints’ • Old  ‘importance  /  quality’   scores • Filtered  dupes  &  near-­‐dupes • Mixed  messaging  canonicals • 410s  still  being  revisited • Internal  links  to  old  sites  /   protocols
  • 191. @dawnieando from  @MoveItMarketing #StateOfSearch “Failure is simply a few errors in judgement repeated every day” Jim Rohn
  • 192. @dawnieando from  @MoveItMarketing #StateOfSearch The Generational ’Snail Trail’ • Old  XML  sitemaps • Redirects  drop  away  on  old  site   .htaccess • DNS  issues • People  link  to  old  site  but  wrong   protocol • Old  sites  no  longer  verified  in  GSC • Not  all  protocols  redirecting Leaving  it’s   slithery     footprint
  • 193. @dawnieando from  @MoveItMarketing #StateOfSearch History Log Records Include: • URL  fingerprint • Timestamp  (last  crawl  or  download   attempt) • Crawl  status  (success  or  error)  (Response   code) • Content  checksum  (binary  code) • Source  ID  (accessed  from  cache  or   downloaded) • Segment  identifier  (Crawl  segment  assigned   to??) • Page  importance  (a  measure  of  importance   assigned  to  the  URL) May  be   calculated  by   identifying   historical   importance   scores  based  on   past  X  number  of   crawls
  • 194. @dawnieando from  @MoveItMarketing #StateOfSearch EVERY  SINGLE  TIME  YOU  MIGRATE,  CHANGE  DESIGN,  REDIRECT,  REINVENT  A  SITE  /  URL A  CLEAN  START REDIRECTIONS ANOTHER  STRUCTURE FIRST  SITE   STRUCTURE NEW  CRAWLING  ‘RULES’   BUILT CRAWLING   ‘RULES’  BUILT EVERYTHING   IS  ‘200  OK’ MORE  URLs MIXED  RESPONSE  CODES REDIRECTIONS ‘FUZZINESS’  IS  EMERGING NEW  CRAWLING  ‘RULES’  BUILT MORE  URLs REDIRECT  CHAINS  &  MIXED   RESPONSE  CODES NEW  SEO’s  DON’T   KNOW  THE  ‘HISTORY’ TARGET  URLs  NOW  ‘VERY  FUZZY’
  • 195. @dawnieando from  @MoveItMarketing #StateOfSearch SOLUTION: Wiki Page Redirects on Topics https://dbpedia.org/sparql Wikipedia   Redirects
  • 196. @dawnieando from  @MoveItMarketing #StateOfSearch Time Seems To Fly… The Older You Get Your  new  site  URL  is  just   one  of  very  many  historical   URLs  on  your  IP  to  be   visited  periodically A  tiny  fish  in  a  very   big  URL  pond  queue
  • 198. @dawnieando from  @MoveItMarketing #StateOfSearch A New Beginning § “A  new  website  will  solve  ALL  our  problems” “Let’s  start  again” “We’ll  just  migrate…  and  redirect   everything”
  • 199. @dawnieando from  @MoveItMarketing #StateOfSearch A  LONG,  LONG  TIME  AGO • You  need  to  go  right  back  to  the  beginning • What  domains  did  the  organisation EVER  register? • Where  do  they  redirect  to? • Is  it  via  301,  302  or  are  they  merely  parked  domains? • Who  would  know?    Who  is  responsible? • Verify  them  all  in  Google  Search  Console • Some  of  these  may  EVEN  HAVE  PENALTIES  HISTORICALLY • If  there  are  links  to  any  there  is  likely  still  crawling  activity  there • Analyse logs  across  multiple  subdomains  &  protocols
  • 200. @dawnieando from  @MoveItMarketing #StateOfSearch SOME TYPES OF URL CRUFT • INCORRECTLY  APPLIED  CANONICAL   TAGS   • CONFLICTING  HREF  LANG  &   CANONICAL  TAGS • MIXED  CONTENT • URL  SHORTENERS • SESSION  IDS • UTM  TAGGING • OLD  AJAX  FRAGMENTS • PARAMETERS  FROM  MULTI  FACET   DROP  DOWN  CHOICES • .html,  .php,  .index.html,  .aspx • LEGACY  URL  REWRITING  &   PARAMETERS  IN  .HTACCESS  FILES • LEGACY  FOLDERS  WHICH  CONTRIBUTE   NO  MEANING  TO  SITE  ONTOLOGY UNCRUFTY www.myeasyurlwillmakeyouw onder.com/resume CRUFTY www.myeasyurlwillmakeyouw onder.com/resume.html CRUFTY http://nymag.com/scienceofus/2015/07/how-­‐ to-­‐recover-­‐from-­‐an-­‐all-­‐ nighter.html?om_rid=AAENcg&om_mid=_BTtF a0B869PyJp&utm_content=buffer8fdd1&utm_ medium=social&utm_source=twitter.com&ut m_campaign=buffer
  • 202. @dawnieando from  @MoveItMarketing #StateOfSearch IT’S  VERY   IMPORTANT…   YOU  STAY  OUT   OF  SERVER   ERROR  STATUS 500 ‘Try  again’  intervals  likely  extended   between  each  failed  connection   attempt
  • 203. @dawnieando from  @MoveItMarketing #StateOfSearch “Forever, And ever, And ever, And ever… You’ll be a URL”
  • 204. @dawnieando from  @MoveItMarketing #StateOfSearch LEGACY ISSUES VIA CANONICALS OR REDIRECTION (COMMON MISTAKES) • PAGE  CANONICALIZED  TO  IS  NOT  A  SUPERSET  OR   DUPLICATIVE  (IT  IS  NOT  RELEVANT  ENOUGH) • 301s  TO  IRRELEVANT  PAGES  BECOME  SOFT  404 • FOLDING  UP  PRODUCT  PAGES  TO  CATEGORES  (PEOPLE   WERE  LOOKING  FOR  A  SPECIFIC  PRODUCT) • CANONICALIZATION  TO  PAGES  WHEN  IN  THE  FUTURE   301  REDIRECT  TO  ANOTHER  URL  THEREFORE  NEGATING   THE  PAGES  CANONICALIZING  TO  THEM • CONFLICTS  BETWEEN  HREF  LANG  AND   CANONICALIZATION
  • 205. @dawnieando from  @MoveItMarketing #StateOfSearch SOLUTION: Increase ‘Importance’ quickly of target URLs • Internal  link  optimization • Canonicalise to  (if  relevant) • Strengthen  up  importance  signals • Inclusion  in  front  facing  HTML  and  XML   sitemaps • Improve  the  content  &  keep  it  updated • 301  redirect  to  (if  relevant  redundant   content) • Topical  hubs  and  strong  information   views  to  navigate  users  &  add  relevance
  • 206. @dawnieando from  @MoveItMarketing #StateOfSearch SOLUTION: Reduce ‘Importance’ quickly of old URLs • Internal  link  UNOPTIMIZATION • 410 • Dig  out  URLs  with  links  to  them • Orphan  URLs • Canonicals  to  HTTPs • EXCLUSION  from  XML  sitemaps   (even  old  ones  on  the  server) • Archiving  of  content
  • 207. @dawnieando from  @MoveItMarketing #StateOfSearch 404  NOT   FOUND &  410   GONE § “Of  course,  we   won’t  redirect   everything…” § “Not  everything   will  be  worth   redirecting”
  • 208. @dawnieando from  @MoveItMarketing #StateOfSearch “Usually  seeing  it  (410)  1-­‐2   times  is  enough  for  us  to  drop   those  URLs  from  the  index”     John  M  on  Google+ (https://plus.google.com/u/0/+JohnMueller/posts/NEsqE7Sr4Z4)
  • 209. @dawnieando from  @MoveItMarketing #StateOfSearch 410 Likely Get Deindexed Quicker https://plus.google.com/+JohnMueller/ posts/NEsqE7Sr4Z4
  • 210. @dawnieando from  @MoveItMarketing #StateOfSearch “404  vs  410  doesn't  affect  the  recrawl rate:  we'll  still  occasionally  check  to   see  if  these  pages  are  still  gone,   espectially when  we  spot  a  new  link  to   them” John  Mueller,  Google+ 2015 https://plus.google.com/u/0/+JohnMu eller/posts/NEsqE7Sr4Z4 410 – DOES THAT PAGE NEED TO BE REINDEXED?
  • 211. @dawnieando from  @MoveItMarketing #StateOfSearch The URL Generational ’Snail Trail’ • Old  XML  sitemaps • Badly  coded  subcategory  &  attribute  parameters • Redirects  drop  away  on  old  site  .htaccess • Canonicalizing and  then  later  ‘301ing’  ‘context’  URL  (invalid  canonical) • DNS  issues • People  link  to  old  site  but  wrong  protocol • Old  sites  not  verified  in  GSC • Not  all  protocols  redirecting • Relative  Wordpress URLs  appending  /wwws on  current  viewed  pages • JS  fired  URLs  on  Language  drop  down  Internationalization  crawled • Legacy  Ajax  issues  with  parts  of  page  content  pulled • Canonical  URLs  NOT  a  superset  or  duplicate  of  canonicals  pointing  at  them Leaving  it’s   slithery     footprint
  • 212. @dawnieando from  @MoveItMarketing #StateOfSearch INSTEAD  OF   REMOVE…   CONSIDER…   DISTRACT  &   ITERATIVELY IMPROVE STRATEGIC  USE  OF  INTERNAL  LINK   POPULARITY REDUCE  IMPORTANCE  SIGNALS   TO  DIFFERENT  PAGES INCLUDE  IMPORTANT  PAGES  IN   XML  SITEMAPS EXCLUDE  LOW  IMPORTANCE   PAGES  IN  XML  SITEMAPS INCLUDE  IMPORTANT  PAGES  IN   HTML  SITEMAPS
  • 213. @dawnieando from  @MoveItMarketing #StateOfSearch “404  vs  410  doesn't  affect  the  recrawl rate:  we'll  still  occasionally  check  to   see  if  these  pages  are  still  gone,   especially  when  we  spot  a  new  link  to   them” John  Mueller,  Google+ 2015 https://plus.google.com/u/0/+JohnMu eller/posts/NEsqE7Sr4Z4 ESPECIALLY IF THERE ARE LINKS TO IT
  • 214. 2nd level relatedness Share  common  words  they   co-­‐occur  with  aside  from   directly  co-­‐occurring   together  (both  appear  in   same  types  of  text  as  each   other  ===  related EXAMPLE:    FURNACE  &   OVEN  BOTH  SHARE  HEAT,   MOTOR  &  ROAD,  CAR  &   AUTOMOBILE  BOTH  SHARE   PASSENGERS CO-­‐OCCURRENCE  VECTORS
  • 215. @dawnieando from  @MoveItMarketing #StateOfSearch Aged ‘Patchwork Quilt’ Sites A  LITTLE  BIT  OF  THIS  CMS  AND  A   LITTLE  BIT  OF  THAT  CMS MANY  HISTORICAL  PARAMETERS   CREATED  &  CRAWLING  SAMPLE   PATTERNS
  • 216. @dawnieando from  @MoveItMarketing #StateOfSearch LACK  OF  PROCESS  OR   UNDERSTANDING • Lack  of  process  or  understanding • No  or  poor  documentation  to  work  to • Insufficient  testing  facilities  &  staging  /   optimizing  environments • Lack  of  collaboration  between  depts • Parallel  development  &  version  control   issues  (too  much  happening) • Small  improvements  left  till  last • Business  pressures  /  business  case  demands • Insufficient  ‘up  front’  definition  (scope   creep) LOTS  OF   REASONS   FOR   TECHNICAL   DEBT
  • 217. @dawnieando from  @MoveItMarketing #StateOfSearch A JAGUAR IS NOT ALWAYS A JAGUAR Disambiguation
  • 218. TECHNICAL  DEBT  IS  NOT  ALWAYS  ABOUT  BAD  CODE IT  OFTEN  COMES   AS  A  RESULT  OF   MINIMUM  VIABLE   PRODUCT
  • 219. @dawnieando from  @MoveItMarketing #StateOfSearch THESE  THINGS  ADD  UP THEY  ALSO  STILL  NEED  TO  BE  DISCOVERED   WHICH  REQUIRES  INITIAL  CRAWLING https://twitter.com/dawnieando/status/906465965029969920
  • 220. LEGACY SITES COST BOTH TO MAINTAIN & IMPROVE DOUBLE DEBT DOUBLE INTEREST
  • 221. @dawnieando from  @MoveItMarketing #StateOfSearch REFERENCES
  • 222. @dawnieando from  @MoveItMarketing #StateOfSearch Sources & References Bar-­‐Yossef,  Z.,  Keidar,  I.  and  Schonfeld,  U.,  2009.  Do  not  crawl  in  the  dust:   different  urls with  similar  text. ACM  Transactions  on  the  Web  (TWEB), 3(1),  p.3 Broder,  A.Z.,  Najork,  M.  and  Wiener,  J.L.,  2003,  May.  Efficient  URL  caching  for   world  wide  web  crawling.  In Proceedings  of  the  12th  international  conference   on  World  Wide  Web (pp.  679-­‐689).  ACM Broder,  A.,  Kumar,  R.,  Maghoul,  F.,  Raghavan,  P.,  Rajagopalan,  S.,  Stata,  R.,   Tomkins,  A.  and  Wiener,  J.,  2000.  Graph  structure  in  the  web. Computer   networks, 33(1),  pp.309-­‐320. Cambazoglu,  B.B.  and  Baeza-­‐Yates,  R.,  2011.  Scalability  challenges  in  web  search   engines.  In Advanced  topics  in  information  retrieval (pp.  27-­‐50).  Springer  Berlin   Heidelberg. Cho,  J.,  Garcia-­‐Molina,  H.  and  Page,  L.,  1998.  Efficient  crawling  through  URL   ordering. Computer  Networks  and  ISDN  Systems, 30(1),  pp.161-­‐172 Fetterly,  D.,  Manasse,  M.,  Najork,  M.  and  Wiener,  J.,  2003,  May.  A  large-­‐scale   study  of  the  evolution  of  web  pages.  In Proceedings  of  the  12th  international  
  • 223. @dawnieando from  @MoveItMarketing #StateOfSearch Sources & References Grice,  T,  2017. Link  spam,  migration  disasters  and  Penguin  is  nowhere  to  be  seen  -­‐ Organic  growth  in  2017 [ONLINE]  Available  at: https://www.branded3.com/blog/link-­‐ spam-­‐migration-­‐disasters-­‐penguin-­‐organic-­‐growth-­‐2017/.  [Accessed  08  October  2017]. Olston,  C.  and  Najork,  M.,  2010.  Web  crawling. Foundations  and  Trends®  in  Information   Retrieval, 4(3),  pp.175-­‐246. Pandey,  S.  and  Olston,  C.,  2008,  February.  Crawl  ordering  by  search  impact.   In Proceedings  of  the  2008  International  Conference  on  Web  Search  and  Data   Mining (pp.  3-­‐14).  ACM. Olston,  C.  and  Pandey,  S.,  2008,  April.  Recrawl scheduling  based  on  information   longevity.  In Proceedings  of  the  17th  international  conference  on  World  Wide  Web (pp.   437-­‐446).  ACM Pandey,  S.  and  Olston,  C.,  2005,  May.  User-­‐centric  web  crawling.  In Proceedings  of  the   14th  international  conference  on  World  Wide  Web (pp.  401-­‐411).  ACM. Pandey,  S.  and  Olston,  C.,  2008,  February.  Crawl  ordering  by  search  impact.   In Proceedings  of  the  2008  International  Conference  on  Web  Search  and  Data   Mining (pp.  3-­‐14).  ACM
  • 224. @dawnieando from  @MoveItMarketing #StateOfSearch Sources & References martinfowler.com.  2009. TechnicalDebtQuadrant.  [ONLINE]  Available   at: https://martinfowler.com/bliki/TechnicalDebtQuadrant.html.  [Accessed  03  October  2017]. https://martinfowler.com/bliki/TechnicalDebtQuadrant.html Malte Ubi on  Twitter  -­‐ https://twitter.com/cramforce/status/897502737268592640 Is  Technology  Debt  Bankrupting  Your  Competitiveness  – Accenture  2017  -­‐ https://www.accenture.com/t20170504T221347__w__/ie-­‐en/_acnmedia/PDF-­‐43/Accenture-­‐ Strategy-­‐Technology-­‐Debt-­‐PoV.pdf Project  Management  Certification.  2015. Project  Failure  -­‐ Why  Projects  Fail  So  Often.  [ONLINE]   Available  at: http://4pm.com/2015/09/27/project-­‐failure/.  [Accessed  30  September  2017]. https://patentimages.storage.googleapis.com/US8042112B1/US08042112-­‐20111018-­‐D00000.png Randall,  K.H.,  Google  Inc.,  2010. Scheduler  for  search  engine  crawler.  U.S.  Patent  7,725,452. https://patentimages.storage.googleapis.com/US8042112B1/US08042112-­‐20111018-­‐D00000.png Randall,  K.H.,  Google  Inc.,  2010. Scheduler  for  search  engine  crawler.  U.S.  Patent  7,725,452.
  • 225. @dawnieando from  @MoveItMarketing #StateOfSearch Sources & References The  Drum.  2017. On  trend?  The  Wow  Company  reports  on  what  the  average  UK  agency   looks  like  |  The  Drum.  [ONLINE]  Available   at: http://www.thedrum.com/opinion/2017/04/12/trend-­‐the-­‐wow-­‐company-­‐reports-­‐ what-­‐the-­‐average-­‐uk-­‐agency-­‐looks.  [Accessed  28  September  2017].