How Search Engines Learn What’s On Every Page

Spiders. The terrifying answer to how search engines know what is happening all over the internet all the time is that they have an army of millions of spiders. The internet is positively covered in spiders making their way into every nook and cranny and reporting back to their masters on what they find.

The less terrifying answer is that on the internet the term ‘spider’ is just a bad pun that persists. You see a ‘spider’ is a type of bot that goes from webpage to webpage following links and reporting back on the content it finds. It’s called a spider because it crawls the (World Wide) Web. Geddit?! Sigh.

If it makes you feel better, this type of spider is also technically known as a web-crawler or automatic indexer. It is also however called an ant. This means it would be perfectly reasonable to think of the internet as a picnic covered in ants. You probably shouldn’t though.

Anyway, that is how Google knows what is on pretty much every page of the internet – it goes out looking. However, if you have a website it wouldn’t hurt to give Google a hand to learn about your specific web pages.


How Can I Tell Google What's On My Site?


Indexing Webpages

As we previous explored, Google keeps its own index of web pages so that it can return search results as fast as possible. This index is like a stripped-down archive of the World Wide Web which only contains information that Google cares about. In other words, it’s filled with ugly versions of web pages.

To create these archives it sends out these bots known as spiders to a list of URLs. The spiders then make note of everything they find on each URL. This includes all the links they find to other pages. The links they find are then added to the master list of URLs to crawl, and so more spiders are sent out.

There is of course a much larger technical discussion to be had about how these spiders work. The internet is huge and there are links going everywhere. This means the spiders have to prioritise and coordinate their efforts in order to be efficient. However, unless you are making your own spider, you don’t really need to know much more about it than this.

As a website owner, what you do need to know is how to get Google to index your pages, and how to keep Google coming back every time you update them.


What it looks like: Non-Human Traffic


Make A Map

While spiders are sent out to look for new web pages by copying down links, if your pages aren’t linked to then they will never be found. Never – unless you have a sitemap that is.

All websites need a map. More specifically an XML Sitemap. This is a file that shows every page on your website, as well as when each page was last updated. There are many free websites that can be used to generate one of these. There are also many more paid websites that will automatically update them for you.

If you are using a CMS (such as WordPress) then there are plugins you can get which will automatically generate and update the sitemap for you (such as Yoast). This is important because if you have a site that you update or add to regularly, that would mean you have to generate and upload a new sitemap yourself each time. It might not take long to do it once, but over time it would really become a chore.

On top of this, you want your sitemap to automatically update so that if you make your content better, then Google will know about it. If you are optimising your site for search, it wouldn’t be very helpful if Google never updated what it thought was on each of your pages.

Web-crawlers should in theory find your sitemap eventually if you link to it on your site. However, most Search Engines actually make it easy for you by allowing you to register your sitemap with them directly.


Google Search Console (and others)

Once your site has a sitemap the next thing you need to do is register with Google Search Console. It’s free, and it provides lots of stats about Google Search with regards to your website. These stats can be vital to your SEO efforts but we’re not here to talk about that now.

The most important reason that every website should sign up to Google Search Console is so that you can register your sitemap with them. Just put the URL in the box and click submit.


Submit a sitemap to Google Search Console


Within a day or two Google will have checked your sitemap and start indexing your site. This is a huge deal and a huge time saver. If your sitemap updates automatically, this means that Google will always know what’s on your website. Considering that most web traffic comes from search engines, this is likely the single largest step you can take to get people to notice your site at all.

While Google still makes up the vast majority of search traffic, there are other search engines out there, and it would be foolish to ignore them. Unfortunately, only two of them are as helpful as Google in providing a search console-type service. These are:

These do basically the same things as Google Search Console, but for Bing and Yandex (Yandex is popular in Russia). Bing Webmaster Tools actually has a lot of interesting stuff in it, so it’s worth poking around in after you have signed up.


Get Some Links

The other key way to let Google know about your web pages is simply to get some links to your site. I say simply, but it’s actually an endless and highly involved task. Here are some key points when thinking about link building:

The more links the better (mostly)

If some big piece of news happens, the article which broke that news would be linked to by many places. Google would want people searching for that news to find the original article, and as soon as possible. Therefore when spiders find loads of links to the same webpage, that webpage would get bumped up the priority list of pages to crawl. It will also get a bump up the rankings on SERPs too as it received so many links.

This demonstrates how lots of links can help a webpage get indexed fast. News is a special case however as lots of different types of sites link to news stories. For other types of websites, you’ll need lots of links from related websites.


Links from related websites are the best links

If every lawnmower website links to this one article about the top ten lawnmowers, then we (and Google) can be pretty sure that it’s a good article about lawnmowers. However, if every website about cupcakes links to this one article about the top ten lawnmowers… then no one would know what to think. Is it a good article? Do you care what cupcake websites think about lawnmowers? Hmm.

Due to nonsense like this happening, Google decided to crack down on irrelevant links. So now if they find links to a webpage from an unrelated webpage, they won’t pay it much attention. If they find lots of links from lots of irrelevant web pages, then they will in fact punish the webpage being linked to.

Therefore if your goal is to get indexed quickly and to rank highly in search results, only links from relevant sites can help you.


What Search Engine Optimisation looks like


Links from good sites count more

If a high authority website links to a page, then that webpage will be more highly prioritised by spiders (and SERPs) than if a low authority website links to a webpage. By ‘authority’ I mean websites that have been judged as trustworthy and important by Google. Authority in this way is an ill-defined term used by many different companies, but this is the general idea.

For example, if Wikipedia links to a sewing website, then the page it links to will get indexed by Google pretty quickly. This is because Wikipedia is (mostly) a trustworthy site. However, if a brand new site pops up and links to that same sewing webpage, then Google won’t know if it’s trustworthy or not. The link from this new website will not cause the webpage to get any extra love from Google.


Social Media Links

One more thing – social media links. They don’t count directly, but if you post on social then people are more likely to see your links and use them in their own content. These are called 2nd order links.

This means that you should always fill out your profiles on social media with links to your website. You should also post regularly, and try to get people to repost your links. You should really do this anyway to get traffic from social media, but that’s another story.

Posting on your own social media channels is like shouting out your window. It might not get people to listen to you or trust you (eg send your webpage surging up the SERPs). It will probably get people to notice you, however (eg get Google to index your pages faster). This is especially true if you have a lot of followers.



The internet is like a picnic covered in ants – ants owned by Google. Those ants read webpages, and bring back tasty morsels of content to be indexed by Google, as well as adding any links they find to the master list of links. This master list of links is where Google sends more ants.

While Google is out looking for web pages using these ants, you can help it to find your specific web pages by doing three things:

  • Creating an XML Sitemap (preferably one that automatically updates).
  • Signing up for Google Search Console (and registering your sitemap there).
  • Generating (good) links to your web pages.


Next: Types of SEO