Text analytics for talent acquisition
Source: Great Learning Snippets | Medium
In this scientific article published on Medium, the authors (Great Learning Snippets) give us a detailed presentation of how a model for textual analysis of a CV works.
Broken down into 7 steps, the article describes each stage required to develop a CV analysis algorithm.
To develop an algorithm capable of analysing CVs, the researchers first collected at least 5,000 job descriptions. They then collected a sample of 1,000 CVs from candidates with a background as data scientists and 1,000 CVs from candidates with a variety of professional backgrounds.
The authors began by analysing the data (size, structure, syntax) in order to define what quantities of data they would need for their algorithm.
Once they had obtained the necessary information to facilitate word analysis, the researchers applied the "lemmatization" method. Lemmatization is a terminology treatment that simplifies the classification of one or more words into their canonical (or root) form. This is followed by an in-depth analysis of the data using methods such as Latent Dirichlet Allocation.
In their study, the researchers used the "TF/IDF", "Word2Vec", "GloVe" and "ELMo" methods and models.
TF/IDF = Term frequency-Inverse document frequency. This is a weighting method used to calculate the importance of a word in a text.
TF = number of times the term appears in the doc / total number of words in the doc.
IDF = log_e(Total number of documents / Number of documents with term t in it).
Word2Vec = this is a group of models used to implement a learning method for word embedding. This allows each word in a dictionary to be represented in the form of real numbers. This is used in sentiment analysis and facial recognition.
GloVe = Global Vectors for Word Representation. This is an unsupervised learning algorithm used to obtain vector representations of words.
ELMo = Embeddings from Language Models. Not to be confused with the puppet from the children's programme Sesame Street, ELMo is a computational model for converting words into numbers.