How search engines works? From Lexical to Semantic Retrieval.
- Mohamed Diab
- May 26
- 3 min read
The question is simple, right? I am aware that you totally know the search engines' approach and the SEOs' perspective on the optimization.
The SEO has been shifted, again. Very huge shifting.
Do you remember when you did your search keywords? Or you ask your agency to do the keyword research and target the main keywords. Everything was around keywords.
These were in the lexical information retrieval. When we were optimizing for the target keyword. In these times, the Search engines were somewhat simple and clear for many SEOs. simplicity, Crawling, Indexing, and Ranking.
My mate, this simple perspective has been more complicated than ever. Ask me why? Just because of the implementation of machine learning and NLP (Natural Language Processing) in the search engines.
To Summarize the Process:
- Trawler system to discover new documents on the web.
- The store server decides if the URL is forwarded or remains in the sandbox.
- Alexandria: Google's Indexing System, which assigns each URL a unique ID ( DocID)
- The Individual Keyword Phrases integrated into the (Word Index)
- Each (DocID) is assigned to IR Score ( information retrieval )
- QBST (Query Based salient Terms) 1st step for the search query which working with DeepRank, embedded BERT and RankBrain
- The 1st 1000 result ( Mustang): filtering the results to 1000 search result.
- Superroot: re-filtering the 1000 to 10 search results ( Blue Ring or blue links) Using Twiddler and Navboost.
- GWS ( Google Web Server) is responsible for your SERP, including all other features, ads, and images.

Keyword Research in the Lexical Information Retrieval
You may have received an email from your client to optimize for the keyword " CRM for enterprises" recently. You will do your best to optimize for the keyword to make it easy for the search engine to discover your index document and match with the search query (exact match or exact phrase).
Working on your optimization by inserting your main keyword in the 1st paragraph, title tag, and meta description. So, by your best optimization, you enhance your document to meet the search query by finding your target keyword among your content.
" Lexical Information Retrieval can't understand the meaning or the context of the content."
The Lexical IR is represented as a basic information retrieval model used in search engines before the semantic (dense representation), which means that search engines have left the lexical IR a long time ago.
Many of SEO tools still live behind the Semantic IR (Space Vector Model) many years ago.

Vector Space Model
The vector space model (VSM) is a mathematical model representing the text in a multidimensional vector space. In the vector space, each document (page, URL, file) is represented as a vector.
The documents are queries are both represented as vectors in the space model, so how do the search engines define the matching between the document and the query?
The Search engine defines the result of a search query by the relevance of the cosine similarity between the two vectors.

The Measure of the Cosine Similarity is one of the popular similarity measurements implemented by the search engines.
By calculating the Cosine of the angle between two vectors, the search engine can indicate the similarity between the document and the query.
The Vector Space model collects the embeddings that have the semantic meaning of the contextual and similar meanings.


Where:
(x . y) is the dot product of the vectors x and y.
||x|| is the magnitude (length) of vector x.
||y|| is the magnitude (length) of vector y.
Search engines shifted to the new SEO leaving the lexical (sparse) to the semantic (dense) through a long way
The Google architecture has been transformed from the classical web to the Transformer in different steps.
How do SEOs implement the Vector Space Model in optimization?
I can't say more than the keyword research perspective followed by the current SEO managers and CMOs, is dead. I will leave the embeddings for another session, which represent the core of our SEO, besides the chunking embeddings and the structure data.
Using Python in SEO is crucial to analyze and perform like search engines, giving you an insight into how Google treats your content. This will have a lot to talk about soon.
My advice to you as an SEO, to advocate for topic targeting rather than keyword targeting. Just start with this simple tip.
Comments