Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.
How Anyone Can Leverage APIs for SEO
Paul Shapiro
#TTTLIVE19
Paul Shapiro
Partner, Director of Strategy & Innovation /
SEO Practice Lead @ Catalyst
#TTTLIVE19
#TTTLIVE19
#TTTLIVE19
WTF is an API!?
(RESTful Web API)
#TTTLIVE19
Application Programming
Interface
#TTTLIVE19
Basically, APIs provide you a way to
interface with an external web service.
This enables automation,
permits y...
#TTTLIVE19
So, how does this work exactly?
#TTTLIVE19
SERVER
HTTP is the protocol that
facilitates communication
between the client computer and
server computer via ...
#TTTLIVE19
CRUD Operations:
Create
Read
Update
Delete
Operation SQL HTTP
RESTful Web
Services
Create INSERT PUT / POST POS...
#TTTLIVE19
The interaction between client and server can
be facilitated via several structured methods
(sometimes referred...
#TTTLIVE19
GET and POST are the most common methods
and commonly used in conjunction with web
APIs.
#TTTLIVE19
#TTTLIVE19
• “GET is used to request data from a
specified resource.”
• “POST is used to send data to a server to
create/u...
#TTTLIVE19
• “PUT is used to send data to a server to
create/update a resource.”
• “DELETE method deletes the specified
re...
#TTTLIVE19
#TTTLIVE19
APIs are a little bit like this antiquated ordering system…
1. You need to look at available inventory. You loo...
#TTTLIVE19
{
"accountId": "8675309",
"shipAddress":
{ "name": "Bob SpiceyMcSpiceFace",
"address": "237 South Broad Street"...
#TTTLIVE19
API Endpoint:
http://suggestqueries.google.com/complete/search?output=toolbar&hl=en&
q=board%20games
Variable,
...
#TTTLIVE19
Response (XML):
#TTTLIVE19
Parse the XML
board games
board games for kids
board games for adults
board games near me
board games online
bo...
#TTTLIVE19
Answer The Public?
Ubersuggest?
Keywordtool.io
http://suggestqueries.google.co
m/complete/search?output=tool
ba...
#TTTLIVE19
API Endpoint:
http://api.grepwords.com/lookup?apikey=api_key_string&q=keyword
String is unique
to you, like
pas...
#TTTLIVE19
http://api.grepwords.com/lookup?apikey=secret&q=board+games
Response (JSON):
[{"keyword":"board games","updated...
#TTTLIVE19
Parse the JSON
keyword gms
board games 246,000
#TTTLIVE19
https://www.catalystdigital.com
/techseoboost/
#TTTLIVE19
Full Python Script
(GrepWords/JSON):
import requests
import json
boardgames = ["Gaia Project", "Great Western T...
#TTTLIVE19
#TTTLIVE19
Full Python Script (Google Autosuggest/XML):
import requests
import xml.etree.ElementTree as ET
boardgames = ["...
#TTTLIVE19
#TTTLIVE19
Combine Them Together?
import requests
import xml.etree.ElementTree as ET
import json
boardgames = ["board game...
#TTTLIVE19
#TTTLIVE19
Google Autocomplete ✓
#TTTLIVE19
GrepWords ✓
#TTTLIVE19
Other API Examples
#TTTLIVE19
WebPageTest.org
#TTTLIVE19
import requests
import json
import xml.etree.ElementTree as ET
import time
testurls = ["https://searchwildernes...
#TTTLIVE19
SEMRush
#TTTLIVE19
import csv
import requests
domain = “trafficthinktank.com"
key = "YOUR API KEY"
api_url = "https://api.semrush....
#TTTLIVE19
Google Analytics?
#TTTLIVE19
Sample Code:
https://developers.google.com/analyti
cs/devguides/reporting/core/v4/quicks
tart/installed-py
JSON...
#TTTLIVE19
Moz (Linkscape)
#TTTLIVE19
from mozscape import Mozscape
import pandas as pd
import numpy as np
import requests
import time
def divide_chu...
#TTTLIVE19
Search Console
#TTTLIVE19
Schedule to run monthly with Cron and
backup to SQL database:
https://searchwilderness.com/gwmt-
data-python/
J...
#TTTLIVE19
Webhose.io
#TTTLIVE19
import requests
import json
import datetime
import urllib.parse
apikey = "KEY"
search = 'title:"board games" -s...
#TTTLIVE19
Reddit
#TTTLIVE19
https://searchwilderness.com/
reddit-python-code/
#TTTLIVE19
Wayback Machine
#TTTLIVE19
import requests
import json
domain = "trafficthinktank.com"
apiurl = "https://web.archive.org/cdx/search/cdx?ur...
#TTTLIVE19
Other APIs
• STAT / Rank Tracking
• Google Natural Language Processing
• Various Machine Learning Services
• De...
#TTTLIVE19
Putting things together and making magic
#TTTLIVE19
1. Take outlink report
from Screaming Frog
2. Distills URLs to
Domains
3. Runs Moz Linkscape
API against the li...
#TTTLIVE19
https://github.com/MLTSEO/MLTS
#TTTLIVE19
Thanks!
TTT: @Paul Shapiro
Twitter: @fighto
Blog: SearchWilderness.com
Prochain SlideShare
Chargement dans…5
×

How to Leverage APIs for SEO #TTTLive2019

2 149 vues

Publié le

Learn the basic of APIs and how they can be leveraged for SEO and marketing. Chalk full of Python code examples.

The URL to the GitHub gist link on slide 54 has changed to the following:
https://gist.github.com/pshapiro/a86dc340f57c38fc22d0545ddec1fc9e

Publié dans : Marketing

How to Leverage APIs for SEO #TTTLive2019

  1. 1. How Anyone Can Leverage APIs for SEO Paul Shapiro
  2. 2. #TTTLIVE19 Paul Shapiro Partner, Director of Strategy & Innovation / SEO Practice Lead @ Catalyst
  3. 3. #TTTLIVE19
  4. 4. #TTTLIVE19
  5. 5. #TTTLIVE19 WTF is an API!? (RESTful Web API)
  6. 6. #TTTLIVE19 Application Programming Interface
  7. 7. #TTTLIVE19 Basically, APIs provide you a way to interface with an external web service. This enables automation, permits you to incorporate 3rd party systems into your own application, and to expand both systems by combining those services and features.
  8. 8. #TTTLIVE19 So, how does this work exactly?
  9. 9. #TTTLIVE19 SERVER HTTP is the protocol that facilitates communication between the client computer and server computer via requests and responses
  10. 10. #TTTLIVE19 CRUD Operations: Create Read Update Delete Operation SQL HTTP RESTful Web Services Create INSERT PUT / POST POST Read (Retrieve) SELECT GET GET Update (Modify) UPDATE PUT / POST / PAT CH PUT Delete (Destroy) DELETE DELETE DELETE https://en.wikipedia.org/wiki/Create,_read,_update_and_de lete
  11. 11. #TTTLIVE19 The interaction between client and server can be facilitated via several structured methods (sometimes referred to as verbs).
  12. 12. #TTTLIVE19 GET and POST are the most common methods and commonly used in conjunction with web APIs.
  13. 13. #TTTLIVE19
  14. 14. #TTTLIVE19 • “GET is used to request data from a specified resource.” • “POST is used to send data to a server to create/update a resource.” https://www.w3schools.com/tags/ref_httpmethods.asp
  15. 15. #TTTLIVE19 • “PUT is used to send data to a server to create/update a resource.” • “DELETE method deletes the specified resource.” https://www.w3schools.com/tags/ref_httpmethods.asp
  16. 16. #TTTLIVE19
  17. 17. #TTTLIVE19 APIs are a little bit like this antiquated ordering system… 1. You need to look at available inventory. You look at Spice Company’s catalogue via the GET method. This gives them a list of products they can order. 2. Once you know what you would like to purchase, your internal system marks it down according to some pre- defined business logic (in the form of item numbers and corresponding quantities) 3. Your program places and order sending this payload to the corresponding API endpoint using the POST method and you receive the product at your physical address sometime after.
  18. 18. #TTTLIVE19 { "accountId": "8675309", "shipAddress": { "name": "Bob SpiceyMcSpiceFace", "address": "237 South Broad Street", "city": "Philadelphia", "state": "PA" } "order": { "itemNumber": 86, "quantity": 5 } }
  19. 19. #TTTLIVE19 API Endpoint: http://suggestqueries.google.com/complete/search?output=toolbar&hl=en& q=board%20games Variable, encoded Simple API example via GET request
  20. 20. #TTTLIVE19 Response (XML):
  21. 21. #TTTLIVE19 Parse the XML board games board games for kids board games for adults board games near me board games online board games list board games walmart board games boston board games 2018 board games for toddlers
  22. 22. #TTTLIVE19 Answer The Public? Ubersuggest? Keywordtool.io http://suggestqueries.google.co m/complete/search?output=tool bar&hl=en&q=board%20games • q=board%20games%20can • q=board%20games%20vs • q=how%20board%20games
  23. 23. #TTTLIVE19 API Endpoint: http://api.grepwords.com/lookup?apikey=api_key_string&q=keyword String is unique to you, like password (authentication) Variable, changes and often looped Simple API example via GET request (with authentication)
  24. 24. #TTTLIVE19 http://api.grepwords.com/lookup?apikey=secret&q=board+games Response (JSON): [{"keyword":"board games","updated_cpc":"2018-04-30","updated_cmp":"2018-04- 30","updated_lms":"2018-04-30","updated_history":"2018-04- 30","lms":246000,"ams":246000,"gms":246000,"competition":0.86204091185173,"competeti on":0.86204091185173,"cmp":0.86204091185173,"cpc":0.5,"m1":201000,"m1_month":"2018- 02","m2":246000,"m2_month":"2018-01","m3":450000,"m3_month":"2017- 12","m4":368000,"m4_month":"2017-11","m5":201000,"m5_month":"2017- 10","m6":201000,"m6_month":"2017-09","m7":201000,"m7_month":"2017- 08","m8":201000,"m8_month":"2017-07","m9":201000,"m9_month":"2017- 06","m10":201000,"m10_month":"2017-05","m11":201000,"m11_month":"2017- 04","m12":201000,"m12_month":"2017-03"}] Simple API example via GET request
  25. 25. #TTTLIVE19 Parse the JSON keyword gms board games 246,000
  26. 26. #TTTLIVE19 https://www.catalystdigital.com /techseoboost/
  27. 27. #TTTLIVE19 Full Python Script (GrepWords/JSON): import requests import json boardgames = ["Gaia Project", "Great Western Trail", "Spirit Island"] for x in boardgames: apiurl = "http://api.grepwords.com/lookup?apikey=key&q=" + x r = requests.get(apiurl) parsed_json = json.loads(r.text) print(parsed_json[0]['gms']) 1 2 3 4 5 6 7 8
  28. 28. #TTTLIVE19
  29. 29. #TTTLIVE19 Full Python Script (Google Autosuggest/XML): import requests import xml.etree.ElementTree as ET boardgames = ["Gaia Project", "Great Western Trail", "Spirit Island"] for x in boardgames: apiurl = "http://suggestqueries.google.com/complete/search?output=toolbar&hl=en&q=" + x r = requests.get(apiurl) tree = ET.fromstring(r.content) for child in tree.iter('suggestion'): print(child.attrib['data'])
  30. 30. #TTTLIVE19
  31. 31. #TTTLIVE19 Combine Them Together? import requests import xml.etree.ElementTree as ET import json boardgames = ["board game", "bgg", "board game geek"] for x in boardgames: suggest_url = "http://suggestqueries.google.com/complete/search?output=toolbar&hl=en&q=" + x r = requests.get(suggest_url) tree = ET.fromstring(r.content) for child in tree.iter('suggestion'): print(child.attrib['data']) grep_url = "http://api.grepwords.com/lookup?apikey=key&q=" + child.attrib['data'] r = requests.get(grep_url) parsed_json = json.loads(r.text) try: print(parsed_json[0]['gms']) except KeyError: print("No data available in GrepWords.")
  32. 32. #TTTLIVE19
  33. 33. #TTTLIVE19 Google Autocomplete ✓
  34. 34. #TTTLIVE19 GrepWords ✓
  35. 35. #TTTLIVE19 Other API Examples
  36. 36. #TTTLIVE19 WebPageTest.org
  37. 37. #TTTLIVE19 import requests import json import xml.etree.ElementTree as ET import time testurls = ["https://searchwilderness.com/", "https://trafficthinktank.com/", "https://searchengineland.com/"] for x in testurls: apiurl = "http://www.webpagetest.org/runtest.php?fvonly=1&k=KEY&lighthouse=1&f=xml&url=" + x r = requests.get(apiurl) tree = ET.fromstring(r.content) for child in tree.findall('data'): wpturl = child.find('jsonUrl').text print(wpturl) ready = True while ready: r = requests.get(wpturl) parsed_json = json.loads(r.text) try: if(parsed_json['data']['statusCode']==100): print("Not yet ready. Trying again in 20 seconds.") ready = True time.sleep(20) except KeyError: ready = False print(x + "rn") print("Lighthouse Average First Contentful Paint: " + str(parsed_json['data']['average']['firstView']['chromeUserTiming.firstContentfulPaint']))
  38. 38. #TTTLIVE19 SEMRush
  39. 39. #TTTLIVE19 import csv import requests domain = “trafficthinktank.com" key = "YOUR API KEY" api_url = "https://api.semrush.com/?type=domain_organic&key=" + key + "&display_filter=%2B%7CPh%7CCo%7Cseo&display_limit=10&export_columns=Ph,Po,Pp,Pd,Nq,Cp, Ur,Tr,Tc,Co,Nr,Td&domain=" + domain + "&display_sort=tr_desc&database=us" with requests.Session() as s: download = s.get(api_url) decoded_content = download.content.decode('utf-8') cr = csv.reader(decoded_content.splitlines(), delimiter=';') my_list = list(cr) for column in my_list: print(column[0]) # Keyword print(column[1]) # Position print(column[4]) # Search Volume print(column[6]) # URL
  40. 40. #TTTLIVE19 Google Analytics?
  41. 41. #TTTLIVE19 Sample Code: https://developers.google.com/analyti cs/devguides/reporting/core/v4/quicks tart/installed-py JSON Payload Help: https://ga-dev- tools.appspot.com/query-explorer/
  42. 42. #TTTLIVE19 Moz (Linkscape)
  43. 43. #TTTLIVE19 from mozscape import Mozscape import pandas as pd import numpy as np import requests import time def divide_chunks(l, n): for i in range(0, len(l), n): yield l[i:i + n] client = Mozscape('access_id', 'sectet_key') csv = pd.read_csv('./all_outlinks.csv', skiprows=1) links = csv[csv['Type'] == 'AHREF'] # filter out CDNs, self-references, and other known cruft links = csv[~csv['Destination'].str.match('https?://boardgamegeek.com.*')] Domains = links['Destination'].replace(to_replace="(.*://)?([^/?]+).*", value=r"12", regex=True) x = list(divide_chunks(Domains.unique().tolist(), 5)) df = pd.DataFrame(columns=['pda','upa','url']) for vals in x: da_pa = client.urlMetrics(vals, Mozscape.UMCols.domainAuthority | Mozscape.UMCols.pageAuthority) i = 0 for y in da_pa: y['url'] = vals[i] i = i+1 df = df.append(y, ignore_index=True) print("Processing a batch of 5 URLs. Total URLs: " + str(len(Domains.unique()))) time.sleep(5) print(df) https://github.com/seomoz/SEOmozAPISamples /tree/master/python
  44. 44. #TTTLIVE19 Search Console
  45. 45. #TTTLIVE19 Schedule to run monthly with Cron and backup to SQL database: https://searchwilderness.com/gwmt- data-python/ JR Oakes’ BigQuery vision: http://bit.ly/2vmjDe8
  46. 46. #TTTLIVE19 Webhose.io
  47. 47. #TTTLIVE19 import requests import json import datetime import urllib.parse apikey = "KEY" search = 'title:"board games" -shipping -sale site_type:news language:english' query = urllib.parse.quote(search) time_diff = -30 time = int((dt.datetime.now(dt.timezone.utc) + dt.timedelta(time_diff)).timestamp()) apiurl = "http://webhose.io/filterWebContent?token=" + apikey + "&format=json&ts=" + str(time) + "&sort=crawled&q=" + query r = requests.get(apiurl) parsed_json = json.loads(r.text) for i in range(int(parsed_json['totalResults'])): try: print(parsed_json['posts'][i]['title']) print(parsed_json['posts'][i]['thread']['social']['facebook']) except IndexError: print("error occurred")
  48. 48. #TTTLIVE19 Reddit
  49. 49. #TTTLIVE19 https://searchwilderness.com/ reddit-python-code/
  50. 50. #TTTLIVE19 Wayback Machine
  51. 51. #TTTLIVE19 import requests import json domain = "trafficthinktank.com" apiurl = "https://web.archive.org/cdx/search/cdx?url=" + domain + "&matchType=domain&fl=original,timestamp&collapse=urlkey&filter=mi metype:text/html&filter=!original:.*%3A80.*&filter=!original:.*.(p ng%7Cjs%7Ccss%7Cjpg%7Csvg%7Cjpeg%7Cgif%7Cxml%7Crss%7CPNG%7CJS%7CCS S%7CJPG%7CSVG%7CJPEG%7CGIF%7CXML%7CRSS%7Ctxt%7CTXT%7Cico%7CICO%7Cp df%7CPDF).*&output=json" r = requests.get(apiurl) parsed_json = json.loads(r.text) for x in range(int(len(parsed_json))): print(parsed_json[x][0])
  52. 52. #TTTLIVE19 Other APIs • STAT / Rank Tracking • Google Natural Language Processing • Various Machine Learning Services • DeepCrawl / Botify / Cloud Crawlers • Stripe (for payment) • Map / Geolocation Data (Google Maps/Foursquare) • Slack • Whois data
  53. 53. #TTTLIVE19 Putting things together and making magic
  54. 54. #TTTLIVE19 1. Take outlink report from Screaming Frog 2. Distills URLs to Domains 3. Runs Moz Linkscape API against the list for PA & DA 4. Checks HTTP Status Code 5. Runs WHOIS API to see if domain is available https://gist.github.com/pshapiro/819cd172f f8fe576f2a4e1f74395ec47
  55. 55. #TTTLIVE19 https://github.com/MLTSEO/MLTS
  56. 56. #TTTLIVE19
  57. 57. Thanks! TTT: @Paul Shapiro Twitter: @fighto Blog: SearchWilderness.com

×