How to extract entities from a URL with OpenAI

Table of contents

Entity extraction is a technique highly valued by SEO professionals because it helps to identify relevant keywords and phrases for a website. 

By analyzing the entities within content, an SEO can identify semantically relevant keywords and phrases to help a piece of content rank higher in search results for its niche or industry.

These key entities help to optimize content and improve search engine rankings.

Because they make it easier for algorithms to understand the text.

However, the process of analyzing entities is often time-consuming and costly when you lack access to the right tools.

Many of them are paid and to some extent also follow a manual process.

For this reason, we have created a Python-based script to be used in Google Colab as an entity extractor using Artificial Intelligence.

In the following lines, we explain how to put it to work for you and what exactly it consists of. 

Explanatory notes before we start::

Entity extraction is the process of identifying and extracting specific information or entities from unstructured text data. This process is made more efficient and accurate through the use of artificial intelligence and natural language processing. Entity extraction is a valuable tool for SEO, as it helps to identify relevant keywords and phrases and improve search engine rankings.

What is an entity extractor?

An entity extractor is a natural language processing (NLP) tool for identifying and classifying entities in a text. 

Entities can be people, places, things, organizations, and concepts.

This technology can identify and extract specific entities, such as names, addresses, dates, etc., from various sources, such as text documents, social networks and web pages.

In SEO, entity extraction is important because it helps search engines better understand the content of a web page and relate it to user queries.

What you will need to extract entities from a URL

In fact:

To use our entity extractor, all you will need are three ingredients:

  1. An API key to use the OpenAI API. You can register here to get an API key.  
  2. The URL on which you want to parse the content to extract the entities
  3. And of course, our script. Without it we will be lost.

The advantage of using our script is that it will allow you to use the AI models trained by OpenAI, without the need to train your own model with machine learning or ask chatGPT to generate a script to load it.

As in this example:

				
					import openai
openai.api_key = "YOUR_API_KEY"

def extract_entities(text):
response = openai.Completion.create(
engine="davinci",
prompt=f"Extract entities from text: '{text}'",
max_tokens=1024,
n=1,
stop=None,
temperature=0.5,
)

entities = response.choices[0].text.strip()
return entities
				
			

In addition, if you know some Python, you can adapt it to make small fixes that will allow you to do bulk analysis of different URLs at the same time.

All this is sure to save you time and money.

How our script identifies entities

To extract entities, our script uses the natural language processing (NLP) capabilities of the OpenAI API. Applying a prompt designed for this purpose by Álvaro Peña de Luna, and adapted by Luis Fernández:

So, when you provide the URL and run the Colab, it starts working and provides you with:

  • 10 entities and their typology. 
  • And the Salience score associated with each one. 

That is to say:

Everything you need to improve the semantic prominence of the content in the URL provided.

Script functions uncovered:

In short, the script extracts entities and gives us a Salience score of relevance according to the type of entity from a URL that we have provided.

Moreover, with some modifications, it is also possible to apply it en masse and adapt it to execute the same process to several URLs using a CSV as an import.

But, as we say, this requires a modification of the code. 

In the script, we use different libraries such as BeautifulSub, Request and Trilofilatura to scrape the URLs. 

To run it, you have to install the dependencies and enter the OpenAI API key. Then, you enter a URL and get the entities with their type and score. 

Depending on the load on the OpenAI servers, it may take some time to respond.

So, be patient. 

Especially if you run it at the time when the US is working.

The main difference between our script and others, that you can find out there, is that ours extracts the 10 entities with the highest score, without entering the title or the text of the page itself

It scraps all this information in an automated way from the URL provided.

So, you don’t have to do anything else.

It is very useful when you are doing semantic SEO tasks.

Running the script in Google Colab

To run the entity extractor, you only need to:

  1. install the necessary dependencies 
  2. Enter your OpenAI API key
  3. Paste the URL where Colab asks for it.
  4. And press the enter key on your keyboard

It’s as simple as that.

But if you have any doubt about how to do it, our colleague, Luis, has prepared a short video for you that you can watch right here:

In the YouTube video, Luis explains each step in detail. 

Note that if you know Python, it is possible to adapt the code to parse several URLs at the same time.

Something very useful if you have to parse many URLs at the same time.

And you can download the Google Colab from the link above.

How does entity extraction work with artificial intelligence?

Our entity extractor leverages machine learning algorithms developed by OpenAI to identify and extract entities from text. 

In a nutshell, the process consists of several steps:

  • Preprocessing of the text data to remove noise and irrelevant information.
  • Tokenization of the text into individual words or phrases.
  • Identification of the part-of-speech (POS) tags of each token.
  • Use of machine learning algorithms to classify each token as an entity or not.
  • Grouping the entities according to their type and context.

At the end, you get the top 10 entities associated with the loaded URL text with a Salience Score:

As you can see in the image above.

Benefits of the AI Entity Extractor:

AI Entity Extractor offers several benefits for those of us in SEO, including:

  1. Improved accuracy of structured data: AI Entity Extractor can accurately identify and extract entities from unstructured data, reducing the risk of errors and improving data accuracy.
  2. Improved efficiency: This tool can extract entities in a very short time, eliminating the effort required for manual data extraction.
  3. Customization: The AI entity extractor can be customized to extract domain-specific entities, making it ideal for companies dealing with industry-specific terminology.
  4. Scalability: The script can actually handle large volumes of requests when using OpenAI, making it ideal for SEOs who handle large numbers of URLs.

SEO use cases for entity extractor tool

An AI entity extraction tool is designed to analyze and identify specific entities, such as people, places and things, mentioned in a text. 

Our script can be used in a number of ways to improve search engine optimisation (SEO), and these are just a few ideas:

1. Keyword research

By analyzing text and extracting entities, an entity extraction tool can help identify keywords relevant to SEO. This can help you better understand search intent, allowing you to optimize your content accordingly.

2. Content optimization

An entity extraction tool helps to identify key entities in the content and ensure that they are properly optimized for search engines. For example, if a writer is creating content about a specific product, the tool can extract key features of the product and ensure that they are included in the content.

3. Competitive analysis

You can certainly use the entity extractor to analyze your competitors’ content and identify which entities they are targeting. Or even analyze the first results of a search. With this, you can gain valuable information about what is performing well for ranking and focus your SEO efforts.
With the increasing importance of semantic search, which focuses on understanding the intent behind a search query, an entity extraction tool can help companies optimize their content for this type of search. By identifying key entities in the content, the tool allows them to create content that is more likely to be relevant to search queries.

In conclusion

AI entity extraction is a powerful tool for SEO professionals. 

By leveraging the latest AI technologies, an SEO can act quickly and accurately by incorporating important keywords and phrases associated with a piece of content to increase relevance by enriching the semantic context.

This allows, as we have mentioned, optimizing content with related terms, understanding what our competitors are using to position their content in search results or even facilitating the work of algorithms by better contextualizing a piece of content.

In addition, incorporating entity extraction into your SEO strategy can also help you stay ahead of the competition, making it easier to update your content as needed.

Ultimately, an AI entity extraction tool can be a valuable asset for businesses looking to improve their SEO efforts.

Extract the entities of your website with the help of artificial intelligence!

Our entity extractor allows you to optimize your content and obtain surprising results. We help you create quality content with the help of AI.
Alvaro Peña de Luna
Head SEO y coCEO en iSocialWeb | + posts

Co-CEO and Head of SEO at iSocialWeb, an agency specializing in SEO, SEM and CRO that manages more than +350M organic visits per year and with a 100% decentralized infrastructure.

In addition to the company Virality Media, a company with its own projects with more than 150 million active monthly visits spread across different sectors and industries.

Systems Engineer by training and SEO by vocation. Tireless learner, fan of AI and dreamer of prompts.

Would you like to improve your project?