Entity extraction is a technique highly valued by SEO professionals because it helps to identify relevant keywords and phrases for a website.
By analyzing the entities within content, an SEO can identify semantically relevant keywords and phrases to help a piece of content rank higher in search results for its niche or industry.
These key entities help to optimize content and improve search engine rankings.
Because they make it easier for algorithms to understand the text.
However, the process of analyzing entities is often time-consuming and costly when you lack access to the right tools.
Many of them are paid and to some extent also follow a manual process.
For this reason, we have created a Python-based script to be used in Google Colab as an entity extractor using Artificial Intelligence.
In the following lines, we explain how to put it to work for you and what exactly it consists of.
Explanatory notes before we start::
Entity extraction is the process of identifying and extracting specific information or entities from unstructured text data. This process is made more efficient and accurate through the use of artificial intelligence and natural language processing. Entity extraction is a valuable tool for SEO, as it helps to identify relevant keywords and phrases and improve search engine rankings.
What is an entity extractor?
An entity extractor is a natural language processing (NLP) tool for identifying and classifying entities in a text.
Entities can be people, places, things, organizations, and concepts.
This technology can identify and extract specific entities, such as names, addresses, dates, etc., from various sources, such as text documents, social networks and web pages.
In SEO, entity extraction is important because it helps search engines better understand the content of a web page and relate it to user queries.
What you will need to extract entities from a URL
In fact:
To use our entity extractor, all you will need are three ingredients:
- An API key to use the OpenAI API. You can register here to get an API key.
- The URL on which you want to parse the content to extract the entities
- And of course, our script. Without it we will be lost.
The advantage of using our script is that it will allow you to use the AI models trained by OpenAI, without the need to train your own model with machine learning or ask chatGPT to generate a script to load it.
As in this example:
import openai
openai.api_key = "YOUR_API_KEY"
def extract_entities(text):
response = openai.Completion.create(
engine="davinci",
prompt=f"Extract entities from text: '{text}'",
max_tokens=1024,
n=1,
stop=None,
temperature=0.5,
)
entities = response.choices[0].text.strip()
return entities
In addition, if you know some Python, you can adapt it to make small fixes that will allow you to do bulk analysis of different URLs at the same time.
All this is sure to save you time and money.
How our script identifies entities
To extract entities, our script uses the natural language processing (NLP) capabilities of the OpenAI API. Applying a prompt designed for this purpose by Álvaro Peña de Luna, and adapted by Luis Fernández:
So, when you provide the URL and run the Colab, it starts working and provides you with:
- 10 entities and their typology.
- And the Salience score associated with each one.
That is to say:
Everything you need to improve the semantic prominence of the content in the URL provided.
Script functions uncovered:
In short, the script extracts entities and gives us a Salience score of relevance according to the type of entity from a URL that we have provided.
Moreover, with some modifications, it is also possible to apply it en masse and adapt it to execute the same process to several URLs using a CSV as an import.
But, as we say, this requires a modification of the code.
In the script, we use different libraries such as BeautifulSub, Request and Trilofilatura to scrape the URLs.
To run it, you have to install the dependencies and enter the OpenAI API key. Then, you enter a URL and get the entities with their type and score.
Depending on the load on the OpenAI servers, it may take some time to respond.
So, be patient.
Especially if you run it at the time when the US is working.
The main difference between our script and others, that you can find out there, is that ours extracts the 10 entities with the highest score, without entering the title or the text of the page itself.
It scraps all this information in an automated way from the URL provided.
So, you don’t have to do anything else.
It is very useful when you are doing semantic SEO tasks.
Running the script in Google Colab
To run the entity extractor, you only need to:
- install the necessary dependencies
- Enter your OpenAI API key
- Paste the URL where Colab asks for it.
- And press the enter key on your keyboard
It’s as simple as that.
But if you have any doubt about how to do it, our colleague, Luis, has prepared a short video for you that you can watch right here:
This content is generated from the audio voiceover so it may contain errors.
(00:00) Muy buenas bienvenidas y bienvenidos a un nuevo vídeo del iSocial web yo soy Luis Fernández y hoy continuamos con la serie de vídeos de Inteligencia artificial aplicada al seo en este caso va a ser un vídeo muy rápido no voy a entrar a detalles para eso tenéis vídeos anteriores en los que os explicamos paso a paso cómo programar todo cómo funciona cada línea este caso quiero que os llevéis un Script muy rápido muy sencillo y que podéis aplicar en vuestros proyectos o darle una vuelta y sacar ideas que aplicar de él este
(00:28) Script básicamente lo que hace es extraer las entidades y darnos un saliens scores y el tipo de entidad a partir de una URL como siempre esto la gracia que tienes que lo podéis aplicar en masa y podéis adaptar el código para que en vez de una única URL ejecute el mismo proceso para un montón de urls podéis utilizar un csv como importe Algo similar pero para eso tendréis que Modificar el código voy directamente al ejemplo y os explico un poco lo que vamos a utilizar en este caso openiyey y distintas librerías
(01:02) urls que le pasemos tenéis que ejecutar esta línea de código esta celda para instalar todas las dependencias y a continuación tenéis que meter aquí vuestra clave de vuestra Api de opening le dais al Play y lo que sería aquí es pediros un input una URL y aquí es por ejemplo en este caso una de la vanguardia y aquí extraería las distintas entidades tendríamos el formato entidad organización o Perdón entidad tipo y salís score aquí tengo un ejemplo una demo para que veáis en Live en este caso le podemos pasar esta URL vamos a ejecutar
(01:44) le pasamos en el input la URL esperamos un poco a que llegue la que ejecute la llamada y ya veréis lo que nos devuelve dependiendo de lo cargado que esté en los servidores de openea y esto puede tardar por ejemplo estos últimos días están un poco cargados Incluso se ha caído pero veis que aquí tenemos la respuesta en este caso nos extrae Google colap tipo organización y el Silence score de 0,84 Inteligencia artificial que es una tecnología con 069 etcétera etcétera Incluso en ese extrae tras Pilot que es una empresa 025
(02:21) el pron que hemos utilizado es el siguiente le pedimos que ignore las instrucciones anteriores esto la verdad que cada vez funciona un poco peor hay que no peor sino que hay que utilizar instrucciones más complejas para que nos respete las instrucciones y que evitar errores evitar que nos diga que como modelo del lenguaje no puedo ejecutar esta instrucción pero recomiendo revisarlo y optimizarlo como podáis con este muy básico te sirve para los casos y no sirve como ejemplo y a continuación llegamos al Front directo en este caso
(02:52) le vamos a pedir que extraiga las 10 entidades que tengan un salis de score más alto sabiendo que este es el title y le pasaremos el title Este es el h1 y se lo pasaremos con el scrapeo que hacemos previamente y el texto se lo damos aquí abajo les explicaré Por qué este es el texto y aquí se lo pasamos y luego le pedimos lo que queremos que nos dé le damos el input y ahora el output devuelve además de la entidad en español el tipo de entidad que es y el salis score con el formato entidad tipo Science score es
(03:21) muy importante darle siempre el formato que queremos para evitar que cada resultado nos dé un formato distinto o se salte incluso una de las cosas porque igual no la pilló bien te recomiendo despedirle siempre un formato el texto lo ponemos al final por si el texto es demasiado largo para que nos recorte los tokens Y entonces si lo pusiéramos directamente después del h1 y el title igual nos quedábamos sin la instrucción final de que nos devuelve un resultado si se lo ponemos al final directamente nos va a cortar lo que nos sobre y ya
(03:52) está aquí como veis tenéis el resultado Y eso sería todo la verdad es que es un Script súper Útil para cualquier tarea deseo semántico pues utilizarlo para enlazo interno para detectar nuevos contenidos con los que ampliar un montón de aquí el límite es la imaginación y como consejo final si queréis hacer esto en masa siempre está bien Añadir un parser final una siguiente función de python que lo que haga es estos datos ves que tiene el formato correcto guardarlos en un formato tipo Data frame y no meterlos en csv o un Jason lo que
(04:25) prefiráis y luego repetir la petición para todos los que no han devuelto este formato y con eso ya podéis ejecutarlo en masa e ir ampliando y haciendo toda esta tarea mucho más sencilla Eso es todo cualquier duda ya sabéis estamos por aquí y nos vemos en el siguiente vídeo un saludo
In the YouTube video, Luis explains each step in detail.
Note that if you know Python, it is possible to adapt the code to parse several URLs at the same time.
Something very useful if you have to parse many URLs at the same time.
And you can download the Google Colab from the link above.
How does entity extraction work with artificial intelligence?
Our entity extractor leverages machine learning algorithms developed by OpenAI to identify and extract entities from text.
In a nutshell, the process consists of several steps:
- Preprocessing of the text data to remove noise and irrelevant information.
- Tokenization of the text into individual words or phrases.
- Identification of the part-of-speech (POS) tags of each token.
- Use of machine learning algorithms to classify each token as an entity or not.
- Grouping the entities according to their type and context.
At the end, you get the top 10 entities associated with the loaded URL text with a Salience Score:
As you can see in the image above.
Benefits of the AI Entity Extractor:
AI Entity Extractor offers several benefits for those of us in SEO, including:
- Improved accuracy of structured data: AI Entity Extractor can accurately identify and extract entities from unstructured data, reducing the risk of errors and improving data accuracy.
- Improved efficiency: This tool can extract entities in a very short time, eliminating the effort required for manual data extraction.
- Customization: The AI entity extractor can be customized to extract domain-specific entities, making it ideal for companies dealing with industry-specific terminology.
- Scalability: The script can actually handle large volumes of requests when using OpenAI, making it ideal for SEOs who handle large numbers of URLs.
SEO use cases for entity extractor tool
An AI entity extraction tool is designed to analyze and identify specific entities, such as people, places and things, mentioned in a text.
Our script can be used in a number of ways to improve search engine optimisation (SEO), and these are just a few ideas:
1. Keyword research
2. Content optimization
3. Competitive analysis
4. Semantic search
In conclusion
AI entity extraction is a powerful tool for SEO professionals.
By leveraging the latest AI technologies, an SEO can act quickly and accurately by incorporating important keywords and phrases associated with a piece of content to increase relevance by enriching the semantic context.
This allows, as we have mentioned, optimizing content with related terms, understanding what our competitors are using to position their content in search results or even facilitating the work of algorithms by better contextualizing a piece of content.
In addition, incorporating entity extraction into your SEO strategy can also help you stay ahead of the competition, making it easier to update your content as needed.
Ultimately, an AI entity extraction tool can be a valuable asset for businesses looking to improve their SEO efforts.
Here are some additional resources:
Extract the entities of your website with the help of artificial intelligence!
Co-CEO and Head of SEO at iSocialWeb, an agency specializing in SEO, SEM and CRO that manages more than +350M organic visits per year and with a 100% decentralized infrastructure.
In addition to the company Virality Media, a company with its own projects with more than 150 million active monthly visits spread across different sectors and industries.
Systems Engineer by training and SEO by vocation. Tireless learner, fan of AI and dreamer of prompts.
- Este autor no ha escrito más artículos.