Skip to main content

Johnsnowlabs Embedding

Loading the Johnsnowlabs embedding class to generate and query embeddings

Models are loaded with nlp.load and spark session is started with nlp.start() under the hood. For all 24.000+ models, see the John Snow Labs Model Models Hub

! pip install johnsnowlabs

# If you have a enterprise license, you can run this to install enterprise features
# from johnsnowlabs import nlp
# nlp.install()
#### Import the necessary classes
    Found existing installation: langchain 0.0.189
Uninstalling langchain-0.0.189:
Successfully uninstalled langchain-0.0.189
from langchain.embeddings.johnsnowlabs import JohnSnowLabsEmbeddings

Initialize Johnsnowlabs Embeddings and Spark Session

embedder = JohnSnowLabsEmbeddings("en.embed_sentence.biobert.clinical_base_cased")

Define some example texts . These could be any documents that you want to analyze - for example, news articles, social media posts, or product reviews.

texts = ["Cancer is caused by smoking", "Antibiotics aren't painkiller"]

Generate and print embeddings for the texts . The JohnSnowLabsEmbeddings class generates an embedding for each document, which is a numerical representation of the document's content. These embeddings can be used for various natural language processing tasks, such as document similarity comparison or text classification.

embeddings = embedder.embed_documents(texts)
for i, embedding in enumerate(embeddings):
print(f"Embedding for document {i+1}: {embedding}")

Generate and print an embedding for a single piece of text. You can also generate an embedding for a single piece of text, such as a search query. This can be useful for tasks like information retrieval, where you want to find documents that are similar to a given query.

query = "Cancer is caused by smoking"
query_embedding = embedder.embed_query(query)
print(f"Embedding for query: {query_embedding}")