73. #SMX #25A @portentint
THANK YOU!
SEE YOU @SMX WEST
SAN JOSE, CA
MARCH 1-3, 2016
Oh, also: Portent’s hiring SEOs,
PPCs & all around smarty-pants
Editor's Notes
A few things: This presentation assumes you know how to use Excel and a basic web crawler. This is, after all, advanced.
I’m not using particularly elegant tools. Fact is, this is ugly as hell. We’ve invested in building something ourselves because we got tired of melting computers.
A few things: This presentation assumes you know how to use Excel and a basic web crawler. This is, after all, advanced.
I’m not using particularly elegant tools. Fact is, this is ugly as hell. We’ve invested in building something ourselves because we got tired of melting computers.
Why even use force-directed diagrams? They’re hard to generate. You could generate a more basic sitemap.
Why even use force-directed diagrams? They’re hard to generate. You could generate a more basic sitemap.
Somehow it feels… unnecessary, right? We’re just adding plumage when the information is already good.
I’m just going to show you how to get data in and make it reasonably consumable
Basically, you can configure how you display link counts, duplication, etc.
Every force directed diagram needs nodes and edges. That’s each unique page, if you’re doing a website, and every link.
Like this. More links = bigger circle, links increase ‘gravity.’ See how a page with one link is smaller, and the 1 link is further away? That makes this a valuable visualization tool above & beyond a standard sitemap.
Like this. More links = bigger circle, links increase ‘gravity.’ See how a page with one link is smaller, and the 1 link is further away? That makes this a valuable visualization tool above & beyond a standard sitemap.
So here’s how you get your serious geek on.
First, we need a list of every URL on the site, and a list of all internal links to and from each URL. I recommend only doing all links and pages if you have a fast computer. Really, really fast. Really really really fast. More on that in a minute. Note: Filter the crap out of everything. Remove any page types you don’t want. In general, I find this whole process chokes if you have more than 5,000 links in your crawl.
I use Screaming Frog to crawl the site. I filter out non-HTML stuff and non-200 responses. For my purposes here, that’s the right way to go.
Then I export the crawl.
Then I export a report: The ‘All Inlinks’ report shows me every link to and from every page. It’s a monster. It’s also what will give us the connections we need. It’ll become our edges and nodes. Delete all columns except Source and Destination. Then delete the https://www.portent.com or whatever.
And now, one last pain in the ass. You only want unique links. So you have to filter in-place or use another tool. I use the advanced filter in Excel.
Run it, then go for a walk or something. It takes a long time. Or, use a text editor or GREP or something to do it faster.
(My secret – I actually use Sublime Text to do this – I just wanted to make fun of Excel)
It is very important that the URLs in this sheet match the URLs in the other.
Now, combine the data: Paste the contents of nodes.xls into a new tab in this sheet. Save it as ‘site-nodes-edges.xls’ or whatever.
Don’t use tables. Your computer will melt.
You’ll want to be able to pull the first link to any given page, ignoring the others. I just add a field called ‘first.’ Then add a field using the formula above. This is a convenience thing you’ll see in a minute.
You’ll want to be able to pull the first link to any given page, ignoring the others. I just add a field called ‘first.’ Then add a field using the formula above. This is a convenience thing you’ll see in a minute.
You’ll want to be able to pull the first link to any given page, ignoring the others. I just add a field called ‘first.’ Then add a field using the formula above. This is a convenience thing you’ll see in a minute.
Export edges
Export edges
I use GEPHI. Note: There is a really annoying issue in Gephi relating to Java. Google ‘Gephi won’t start Java’ for solutions.
Create a new project.
Go to the Data Laboratory.
Import spreadsheet
Import edges
Do the import. Set Connections to ‘Biginteger’
Do the import. Check ‘create missing nodes.’ That makes it create the nodes for you, which is a lot easier.
You should get something that looks like this
Do the import. Check ‘create missing nodes.’ That makes it create the nodes for you, which is a lot easier.
oookaaayyyyyy
Now you can filter! Woot! Filter the edges table, removing ‘false.’
That’ll get you just unique edges – remember?
You’ll know it’s working because edges will be less than 100% visible
I like to use Force Atlas layout. It’s up to you. You’ll want to play around to find the idea mix.
Still not great, but improved.
Prettier, using colors, messing w/ settings, etc. Something may not be right here.
Clean it up to emphasize one or two things. Here, I’ve found a serious pagination tunnel. Reduce the number of edges and nodes for a little more clarity.
Add some labels
Yep
Export edges
This is a hideously ugly process. Learn it, then refine it and perfect it as much as you can.