LinkedIn has published a Framework for Natural Language Processing (NLP): Detext does not offer its own language model, but uses procedures as the BERT developed by Google. It focuses on the language of the language of text classification and sequence tagging as well as the ranking of text documents.
Detext is an open source framework that should take the complexity of the use of BERT- (Bidirectional Encoder Representations from Transformers) and other models to allow more flexibility for the processing of language. LinkedIn Compares the tool with a drill that brings a powered engine, but for different tasks special attachments.
Exchange of the models
Similarly, Data Scientists can select the models in Detext suitable for the task. LinkedIn sets a bert model trained with the in-house data, which bears the name Libert, for typical tasks in the company. Applications use it, among other things, the attempt to understand the meaning of text queries and find out what users use a query. As an example, the blog post calls the English request "Sales Consultant at Insights", which the system should understand as the search for a job as a sales consultant in the company Insights.
LinkedIn uses the Framework HouseIntern for Various Search Tasks.
The advantage of detext highlights the blog contribution that with the pre-alert models much easier of existing application traps lies to further tasks. Among other things, the semantic understanding of texts on other areas such as finding other relevant word sequences are transferred to a query or the creation of a ranking for documents.
The right order
The ranking is an important task of Detext. The creation of a ranking with BERT is not trivial, according to LinkedIn, as there is no standard procedure for efficient use. With Detext, however, the company is able to use preambled Bert models in productive use for the ranking.
The Detext ranking framework works on several levels of Word Embeddings up to Learning to Rank.
The framework combines different sources such as search queries and user profiles with the information about the goals such as text documents. In the first step, the detext ranking model calculates the semantic reference between the sources and destinations and connects the semantic features with hand-knitted traditional features to calculate a final score for the source target relevance.
Creating the ranking takes place on several levels, which focus on Word Embeddings and Text Embeddings. An Interaction Layer combines the lower layers, and a MLP layer (Multilayer Perceptron) calculates the score taking into account the non-linear combination of the features. At the top, a Learning-to-Rank Layer (LTR) sits, which processes the different Target scores.
Further details about Detext can be found in a blog post on LinkedIn. The framework is available on GitHub as Open Source project.