How Crawling Works

Crawling is how Google finds out what sites and pages exist on the web. Once Google crawls a website, they can add that information to their database. Google finds new web pages by following links from one webpage to another. Website owners can also tell Google to visit, or crawl, their website by submitting a list of pages called a sitemap. 

“The crawling process begins with a list of web addresses from past crawls and sitemaps provided by website owners. As our crawlers visit these websites, they use links on those sites to discover other pages. The software pays special attention to new sites, changes to existing sites and dead links.” 

Google

Google doesn’t have infinite time and resources to crawl every page of every website all the time. Over the last decade, as the internet has grown in size and complexity, they have acknowledged their limitations and disclosed that they discover only a fraction of the internet’s content. That makes it the webmasters’ job to factor “crawl budget” into their technical SEO strategy, such that Google is able to discover and crawl the “right” URLs more often. Continue reading to learn more.

Control Crawling

You have some control over how Google crawls your site and what signals your site is giving to search engine crawlers. You want to signal to Google which are the most important URLs and sections on your site. 

Additionally, Google has provided a tool called Google Search Console to help you control crawling. With Google Search Console you can provide detailed instructions about how to process pages on your site, request a recrawl, or opt-out of crawling altogether. 

Organize Your Site Navigation

One way you can help search engines understand which parts of your site are most important is by organizing your pages by topical relevance and importance. For example, if your product pages are really important to you, your business, and the overall purpose of your website, you want to make that clear by putting those pages in your navigation menu. 

You can organize your navigation into categories and subcategories as well. So, if you want to first and foremost be considered a clothing brand, you might create a page on your site called “Clothing” and then if you can signal which clothing items you sell through the subcategories shorts, dresses, and jackets.

Organizing your website in this way helps Google know which pages to prioritize in search results ranking, and which pages to crawl more frequently. 

Internal Linking

Another way you can control which pages Google crawls is through a strategy called internal linking. Internal linking is simply the practice of linking one page of your website to another website through hyperlinks. 

The more frequently you link to a specific page from various other pages on your website, the more frequently Google will have an opportunity to crawl that page as they are crawling your website. This decreases the chances of that page getting missed and signals to Google the relevance and importance of this page to the rest of your overall website.