Lemmatization will convert a word to its root (e.g. “playing” becomes “play”). Lemmatization is often preferred to stemming. This might all sound a bit complicated, but don’t let it dissuade executive email list you from pursuing this type of research. I’ll be linking out to resources throughout this article which break down exactly how you apply these processes to your corpus. NGram Analysis & Co-Occurrence This first and most simple approach that we executive email list can apply to our SERP content is an analysis of nGram co-occurrence. This means we’re counting the number of times a word or combination of words appears within our corpus.
Why is this useful? Analyzing our SERPs for co-occurring sequences of words can provide a snapshot of what words or phrases Google deems most relevant to the set of executive email list keywords we are analyzing. For example, to create the corpus I’ll be using through this post, I have pulled the top 100 results for 100 keywords around yoga This is just for illustrative purposes; if I was doing this exercise with more quality control, the structure of this corpus might look slightly different. All I’m going to use now is the Python executive email list counter function, which is going to look for the most commonly occurring combinations of two- and three-word phrases in my corpus. The output looks like this: Ngram counts from.
A Yoga SERP You can already start to see some interesting trends appearing around topics that searchers might be interested in. I could also collect MSV for some of these phrases that executive email list I could target as additional campaign keywords. At this point, you might think that it’s obvious all these co-occurring phrases contain the word yoga as that is the main focus of my dataset. This would be an astute observation – it’s known as a executive email list ‘corpus-specific stopword’, and because I’m working with Python it’s simple to create either a filter or a function that can remove those words.