The SEO Spider Tool Crawls & Reports On The Following
A quick summary of some of the data collected in a crawl include –- Errors – Client errors such as broken links & server errors (No responses, 4XX, 5XX).
- Redirects – Permanent or temporary redirects (3XX responses).
- Blocked URLs – View & audit URLs disallowed by the robots.txt protocol.
- External Links – All external links and their status codes.
- Protocol – Whether the URLs are secure (HTTPS) or insecure (HTTP).
- URI Issues – Non ASCII characters, underscores, uppercase characters, parameters, or long URLs.
- Duplicate Pages – Hash value / MD5checksums algorithmic check for exact duplicate pages.
- Page Titles – Missing, duplicate, over 65 characters, short, pixel width truncation, same as h1, or multiple.
- Meta Description – Missing, duplicate, over 156 characters, short, pixel width truncation or multiple.
- Meta Keywords – Mainly for reference, as they are not used by Google, Bing or Yahoo.
- File Size – Size of URLs & images.
- Response Time.
- Last-Modified Header.
- Page Depth Level.
- Word Count.
- H1 – Missing, duplicate, over 70 characters, multiple.
- H2 – Missing, duplicate, over 70 characters, multiple.
- Meta Robots – Index, noindex, follow, nofollow, noarchive, nosnippet, noodp, noydir etc.
- Meta Refresh – Including target page and time delay.
- Canonical link element & canonical HTTP headers.
- X-Robots-Tag.
- rel=“next” and rel=“prev”.
- AJAX – The SEO Spider obeys Google’s AJAX Crawling Scheme.
- Inlinks – All pages linking to a URI.
- Outlinks – All pages a URI links out to.
- Anchor Text – All link text. Alt text from images with links.
- Follow & Nofollow – At page and link level (true/false).
- Images – All URIs with the image link & all images from a given page. Images over 100kb, missing alt text, alt text over 100 characters.
- User-Agent Switcher – Crawl as Googlebot, Bingbot, Yahoo! Slurp, mobile user-agents or your own custom UA.
- Configurable Accept-Language Header – Supply an Accept-Language HTTP header to crawl locale-adaptive content.
- Redirect Chains – Discover redirect chains and loops.
- Custom Source Code Search – The SEO Spider allows you to find anything you want in the source code of a website! Whether that’s Google Analytics code, specific text, or code etc.
- Custom Extraction – You can collect any data from the HTML of a URL using XPath, CSS Path selectors or regex.
- Google Analytics Integration – You can connect to the Google Analytics API and pull in user and conversion data directly during a crawl.
- Google Search Console Integration – You can connect to the Google Search Analytics API and collect impression, click and average position data against URLs.
- XML Sitemap Generator – You can create an XML sitemap and an image sitemap using the SEO spider.