For SEO experts, having a basic understanding of how search engines operate is essential. A review of crawling, rendering, indexing, and ranking is provided below.
What distinguishes crawling, rendering, indexing, and ranking from each other?
Lily Ray recently revealed that when interviewing candidates for the Amsive Digital SEO team, she asks them this question. Danny Sullivan from Google thinks it’s a great one.
Despite how fundamental it might appear, it is easy for some practitioners to mix up the primary search phases and conflate the process.
In this post, we’ll review the operation of search engines and go through each step in detail.
Why It’s Important To Understand The Differences
In a recent trademark infringement case, where I served as an expert witness, the opposing witness misidentified the search stages.
Two tiny businesses asserted their respective rights to use identical brand names.
The opposing party’s ” expert ” incorrectly concluded that my client engaged in unethical or hostile SEO to outrank the plaintiff’s website.
Additionally, he made several serious errors when characterising Google’s procedures in his expert report, where he stated:
- Web crawling was indexing.
- To rank pages in search results, the search bots would give instructions to the search engine.
- Search bots may be “taught” to index pages for particular keywords.
A key tactic in litigation is to try to exclude the conclusions of a testifying expert, which is possible if one can persuade the court that the expert is unqualified to be believed.
I provided their expert’s inaccurate explanations of Google’s procedure as proof that he lacked the necessary credentials since he was unqualified to testify on SEO-related issues in any way.
Although it may sound harsh, this untrained expert presented the court with facts while making numerous simple and apparent blunders. He misrepresented my client’s use of SEO as engaging in unfair trade practices while ignoring the plaintiff’s questionable actions (who was blatantly using black hat SEO, whereas my client was not).
The opposing expert in my legal case is one of many who need help understanding the various search phrases used by the top search engines.
Numerous well-known search marketers have also muddled the steps of search engine procedures, which has led to false diagnoses of poor performance in the SERPs.
Some people had said, “I fear Google had penalized us, therefore we can’t be in search results!” when, in reality, they had forgotten to change a crucial setting on their web servers, rendering the content of their website inaccessible to Google.
Automated sanctions may have been seen as a component of the ranking phase. These websites had crawling and rendering issues, which complicated indexing and ranking.
Prioritise typical problems in each of the four stages that affect how search functions when there are no notifications of a manual action in the Google Search Console.
It Goes Beyond Semantics
Ray and Sullivan’s focus on the significance of comprehending the distinctions between crawling, rendering, indexing, and ranking wasn’t shared by everyone.
I observed some practitioners dismiss these issues as trivial semantics or pointless “gatekeeping” by elitist SEOs.
Some seasoned SEO professionals may have muddled the meanings of these terms. This can occur in all fields when people who are well-versed in the subject use jargon while having a shared understanding of what they are talking about, and nothing about that is fundamentally flawed.
We also frequently anthropomorphize search engines and their workings since understanding is made simpler when concepts are explained in terms of recognizable traits. Additionally, there is nothing wrong with that.
However, this ambiguity while discussing technical procedures can be perplexing and makes it more difficult for individuals attempting to learn about SEO.
Only in a limited sense or as conversational shorthand is it possible to use the terms informally and imprecisely. Despite this, it is always advisable to be aware of and comprehend the exact definitions of the various search engine technology phases.
4 Stages Of Search
The content from the web is brought into your search results through a variety of different procedures. Saying that it can be accomplished in a few clear-cut steps is sometimes a vast oversimplification.
Numerous sub processes can occur within the four stages I discuss in this article.
Even further than that, essential processes can occur asynchronously with these, including:
- Kinds of spam enforcement.
- We are adding components to the Knowledge Graph and updating the data in the knowledge panels.
- Optical character recognition in image processing.
- Processing of audio and video files into text.
- We are evaluating and using PageSpeed data.
- Plus more.
This is called crawling, when a search engine requests web pages from website servers.
Imagine that Google and Microsoft Bing are both seated at a computer, entering text into their browser windows or clicking website links.
As a result, search engine computers browse websites similarly to you. Every time the search engine accesses a web page, it copies it and lists all the links it contains. After gathering that webpage, the search engine will visit the following link on its list of unvisited links.
The term “crawling” or “spidering” is used to describe this, which is appropriate given that the web is a metaphorical enormous, virtual web of interconnected links.
“Spiders,” “bots,” or “crawlers” are the terms used to describe the data collection tools employed by search engines.
While Microsoft Bing has “Bingbot,” Google’s primary crawling tool is called “Googlebot.” Each includes other specialised bots for accessing mobile pages, advertisements (like GoogleAdsBot and AdIdxBot), and more.
Although this step in the search engines’ analysis of web pages appears simple, there is a great deal of intricacy in what happens.
Consider the number of web server systems that could exist, each running a different operating system and version and different content management systems (such as WordPress, Wix, and Squarespace). Then the various customizations each website might have.
Crawlers from search engines can encounter several problems that prevent them from indexing pages, which is why it’s essential to understand the specifics of this step.
Before the search engine can ask for the page and visit it, it must first discover a connection somewhere. (Under specific arrangements, the search engines have been known to believe there might be more secret links, such as via some restricted website internal search forms or one level up in the link hierarchy at a subdirectory level.)
The following techniques can be used to help search engines find linkages between web pages:
- When a website owner submits the link to the search engine directly or makes a sitemap available to them.
- When the page is referenced on other websites.
- If the website already has some pages indexed via linking to the page from within its website.
- Posts on social media.
- Documents contain links.
- URLs that are present in text but are not hyperlinked.
- Through the information on different file types.
- Plus more.
Sometimes, a website will use its robots.txt file, found at the root of the domain and web server, to tell search engines not to crawl one or more web pages.
A page or chunk of a website can be blocked from search engine crawling without preventing those pages from appearing in search results. It can be very detrimental to their potential to rank highly for their keywords to prevent them from being crawled in this way.
In other instances, if a website automatically bans the bots, search engines may have trouble crawling it. When the website’s computer systems have determined that:
- In comparison to a human, the bot is requesting more pages in a shorter amount of time.
- The bot simultaneously queries several pages.
- The server IP address of a bot is geolocated in a region that the website is set up to block.
- The server is overloaded with requests from the bot and other users, which slows down or fails the serving of pages.
However, when they notice that the server is having trouble keeping up with demand, search engine bots are built to alter the delay rates between requests automatically.
“Crawl budget” may have a role in determining whether search bots will finish crawling a more extensive website or a website with, often changing the material on its pages.
In essence, the internet is a space of an endless number of websites updated at different rates. Search engines prioritise the pages they will crawl because they might need more time to visit every page online.
If a website has a lot of pages or needs to be more active to reply, its crawl budget may be exhausted before all of its pages are crawled, especially if it has a lower ranking weight than other websites.
The search engine’s interpretation of the webpage may be impacted, just like with the webpage itself, if the supporting resources that help create the webpage are not available to the search engine.
The crawling stage is divided into a subprocess called rendering by the search engines. Because obtaining a webpage and then parsing the material to determine how it would be built in a browser are two separate operations, I’ve placed it here as a different stage in the process.
Google employs the same “Rendertron” rendering engine as the Chrome web browser based on the free and open-source Chromium browser software.
Google keeps compressed copies of the pages in its repository. Microsoft Bing also follows suit (but I have yet to find documentation confirming this). Some search engines might keep a condensed version of websites that only contains the viewable text and none of the formatting.
Because the search engine could not see as many of the links to the products, I have also observed cases when infinitely scrolling category pages on e-commerce websites could have done better on search engines.
Pages that require cookies will often not be indexed by Googlebot or Bingbot. Cookies may also prevent pages from rendering ultimately or correctly when some essential components are conditionally sent.
The search engines further evaluate a page after it has been crawled and produced to decide whether or not to store it in the index and to establish its topic.
Like an index of terms found after a book, the search engine index serves the same purpose.
The index of a book contains a list of all the key terms and subjects covered in the book, arranged alphabetically by word, along with a page number list.
A list of all the web pages where a keyword was located is included in a search engine’s index, along with numerous keywords and combinations.
The index conceptually resembles a database lookup table, which may have been the initial search engine layout. To look up a keyword and retrieve all its associated URLs, the big search engines probably now make use of something a few generations more advanced.
It would take an abnormally unworkable length to search all web pages for a keyword in real-time each time someone searches for it. Using functionality to look up all pages connected with a term is a time-saving architecture.
For various reasons, only some crawled pages will remain in the search index. For instance, a robot meta tag with a “noindex” directive informs the search engine to exclude the page from its index.
Similarly, a website might have an X-Robots-Tag in its HTTP header that tells search engines not to index the page.
In other cases, the canonical tag on a webpage may inform a search engine that a different page from the one currently displayed is to be regarded as the main version of the page, causing other non-canonical copies of the page to be removed from the index.
Additionally, Google has said that if a webpage is of poor quality, it might not be kept in the index (duplicate content pages, thin content pages, and pages containing all or too much irrelevant content).
A historical history also implies that larger websites with insufficient external links may need to be properly scanned. This means that websites with poor collective PageRank may have indexed only some of their web pages.
A website’s pages may only be fully indexed if the crawl budget is sufficient.
Diagnosing and correcting page indexing issues is a crucial aspect of SEO. As a result, it is wise to carefully research all the different problems that may hinder the indexing of web pages.
The search engine processing phase likely to receive the most excellent attention is the ranking of web pages.
When a search engine has compiled a list of all the websites linked to a specific keyword, it must decide how to rank those pages in response to the keyword.
If you operate in the SEO sector, you already understand several aspects of the ranking procedure. The term “algorithm” is also used to describe the search engine’s ranking system.
The ranking stage of search is so complicated that it deserves its articles and books to explain fully.
A wide range of factors can influence the ranking of a website in search results. According to Google, their algorithm considers more than 200 ranking parameters.
Up to 50 “vectors”—things that can affect how one ranking signal affects rankings—can be found inside several of those variables.
The first iteration of Google’s ranking algorithm, created in 1996, is called PageRank. It was based on the idea that one could calculate links to a page and the relative relevance of the sources of those connections to estimate how strong the page’s ranking is compared to all other pages.
This can be explained metaphorically by saying that links function like votes, and pages with more votes will rank higher than those with fewer links or votes.
In 2022, a significant portion of the DNA from the previous PageRank algorithm is still present in Google’s ranking algorithm. Numerous other search engines also developed similar types of techniques thanks to the influence of that link analysis algorithm.
Before the ranking process was finished, the original Google algorithm approach had to recursively filter through all of the web’s links, sending the PageRank score between pages dozens of times. It can take a month to do this series of repetitive calculations spread across millions of pages.
Today, new page links are added every day. Google uses a drip system to calculate rankings, which enables pages and updates to be taken into account much more quickly without calling for a month-long link calculation procedure.
Links are also evaluated in a sophisticated way, with bought links, exchanged links, spam links, links that are not editorially supported, and more having their ranking power revoked or reduced.
Broad categories of criteria other than links also have an impact on rankings, including:
- E-A-T, or expertise, authoritativeness, and trustworthiness.
- Quality \sLocation/Proximity
- History of personal searches.
- The “HTTPS” URL prefix distinguishes between webpages delivered securely (using SSL) and those not.
- Page velocity
- plus more.
To succeed in the SEO market, one must have a solid understanding of the essential phases of the search.
Some social media celebrities believe it was “going too far” or “gate-keeping” to reject a candidate solely because they were unfamiliar with the distinctions between crawling, rendering, indexing, and ranking.
Knowing the differences between these procedures is a good idea, and I wouldn’t consider having a hazy grasp of such things to be a deal-breaker.
SEO experts have a range of educational backgrounds and degrees of expertise. What matters is that they are teachable enough to pick up knowledge and comprehend it at a basic level.
DMA Marketing would be able to help to assist you if there is any enquiries about the above mentioned so that mastery would be possible to be achieved with us.