Analysis of your competitor’s SEO Script Python content

0


Analyzing your competitor’s content will give you valuable insight into your operations and goals. This basic Python script can provide you with n-gram data in seconds.

This Python script can be a basic version of your competitor’s content analysis. The most plan is to induce a quick summary of the appearance of the writing focus. A lean approach is to retrieve all the computer addresses in the sitemap, dismantle the URL slugs, and run an n-Gram scan on them. If you want to learn more about n-gram analysis, even take a look at our free N-Gram tool. you will apply it not only for computer address, but also for keywords, titles etc.

As a result, you will get a list of the n-grams used in the URL slugs as well as the number of pages that used that n-gram. This scan will only take a few seconds, even on massive sitemaps, and can be completed with less than fifty lines of code.

Additional approaches

If you want to induce more in-depth information, I suggest you continue with these approaches:

  • Retrieve the content of each universal resource locator in the sitemap
  • Create n-grams found in titles
  • Create n-grams found in content
  • Extract keywords with Textrank or Rake
  • Extract familiar entities for your SEO business

But let’s start easy and take a first consideration of the trough with this script. Supported your comments, I could add a lot of refined approaches. Before running the script, you just need to be required to enter the URL of the sitemap you want to crawl. Once the script is executed, you will notice your leads at sitemap_ngrams.csv. Open it in Excel or Google Sheets and have fun analyzing the data.

Here is the Python code:

# Pemavor.com sitemap content analyzer

# Author: Stefan Neefischer

import ad tools like adv

import pandas as pd

def sitemap_ngram_analyzer (site):

sitemap = adv.sitemap_to_df (site)

sitemap = sitemap.dropna (subset =[“loc”]) .reset_index (drop = True)

# Some sitemaps keep URLs with “https://www.bignewsnetwork.com/” at the end, others without “https://www.bignewsnetwork.com/”

# If there is “https://www.bignewsnetwork.com/” at the end, we take the penultimate column as slugs

# Otherwise the last column is the slug column

slugs = site map[‘loc’].dropna ()[sitemap[‘loc’].dropna (). str.endswith (“https://www.bignewsnetwork.com/”)].str.split (“https://www.bignewsnetwork.com/”) .str[-2].str.replace (‘-‘, ”)

slugs2 = sitemap[‘loc’].dropna ()[~sitemap[‘loc’].dropna (). str.endswith (“https://www.bignewsnetwork.com/”)].str.split (“https://www.bignewsnetwork.com/”) .str[-1].str.replace (‘-‘, ”)

# Merge two series

slugs = list (slugs) + list (slugs2)

# adv.word_frequency automatically remove stop words

word_counts_onegram = adv.word_frequency (slugs)

word_counts_twogram = adv.word_frequency (slugs, phrase_len = 2)

csv_output = pd.concat ([word_counts_onegram, word_counts_twogram], ignore_index = True)

.rename ({‘abs_freq’: ‘Count’, ‘word’: ‘Ngram’}, axis = 1)

.sort_values ​​(‘Count’, ascending = False)

#Save input csv with scores

output_csv.to_csv (‘sitemap_ngrams.csv’, index = False)

print (“saved csv file”)

# Provide the site map to analyze

site = “https://searchengineland.com/sitemap_index.xml”

sitemap_ngram_analyzer (site)

# the results will be saved in the sitemap_ngrams.csv file



Source link

Leave A Reply

Your email address will not be published.