Retrievers | 🦜️🔗 Langchain

📄️ Activeloop DeepLake's DeepMemory + LangChain + ragas or how to get +27% on RAG recall.

Retrieval-Augmented Generators (RAGs) have recently gained significant attention. As advanced RAG techniques and agents emerge, they expand the potential of what RAGs can accomplish. However, several challenges may limit the integration of RAGs into production. The primary factors to consider when implementing RAGs in production settings are accuracy (recall), cost, and latency. For basic use cases, OpenAI's Ada model paired with a naive similarity search can produce satisfactory results. Yet, for higher accuracy or recall during searches, one might need to employ advanced retrieval techniques. These methods might involve varying data chunk sizes, rewriting queries multiple times, and more, potentially increasing latency and costs. Activeloop's Deep Memory a feature available to Activeloop Deep Lake users, addresses these issuea by introducing a tiny neural network layer trained to match user queries with relevant data from a corpus. While this addition incurs minimal latency during search, it can boost retrieval accuracy by up to 27

📄️ Amazon Kendra

Amazon Kendra is an intelligent search service provided by Amazon Web Services (AWS). It utilizes advanced natural language processing (NLP) and machine learning algorithms to enable powerful search capabilities across various data sources within an organization. Kendra is designed to help users find the information they need quickly and accurately, improving productivity and decision-making.

📄️ Arcee Retriever

This notebook demonstrates how to use the ArceeRetriever class to retrieve relevant document(s) for Arcee's Domain Adapted Language Models (DALMs).

📄️ Arxiv

arXiv is an open-access archive for 2 million scholarly articles in the fields of physics, mathematics, computer science, quantitative biology, quantitative finance, statistics, electrical engineering and systems science, and economics.

📄️ Azure Cognitive Search

Azure Cognitive Search (formerly known as Azure Search) is a cloud search service that gives developers infrastructure, APIs, and tools for building a rich search experience over private, heterogeneous content in web, mobile, and enterprise applications.

📄️ BM25

BM25 also known as the Okapi BM25, is a ranking function used in information retrieval systems to estimate the relevance of documents to a given search query.

📄️ Chaindesk

Chaindesk platform brings data from anywhere (Datsources: Text, PDF, Word, PowerPpoint, Excel, Notion, Airtable, Google Sheets, etc..) into Datastores (container of multiple Datasources).

📄️ ChatGPT Plugin

OpenAI plugins connect ChatGPT to third-party applications. These plugins enable ChatGPT to interact with APIs defined by developers, enhancing ChatGPT's capabilities and allowing it to perform a wide range of actions.

📄️ Cohere Reranker

Cohere is a Canadian startup that provides natural language processing models that help companies improve human-machine interactions.

📄️ Cohere RAG retriever

This notebook covers how to get started with Cohere RAG retriever. This allows you to leverage the ability to search documents over various connectors or by supplying your own.

📄️ DocArray

DocArray is a versatile, open-source tool for managing your multi-modal data. It lets you shape your data however you want, and offers the flexibility to store and search it using various document index backends. Plus, it gets even better - you can utilize your DocArray document index to create a DocArrayRetriever, and build awesome Langchain apps!

📄️ ElasticSearch BM25

Elasticsearch is a distributed, RESTful search and analytics engine. It provides a distributed, multitenant-capable full-text search engine with an HTTP web interface and schema-free JSON documents.

📄️ Fleet AI Libraries Context

The Fleet AI team is on a mission to embed the world's most important data. They've started by embedding the top 1200 Python libraries to enable code generation with up-to-date knowledge. They've been kind enough to share their embeddings of the LangChain docs and API reference.

📄️ Google Drive

This notebook covers how to retrieve documents from Google Drive.

📄️ Google Vertex AI Search

Vertex AI Search (formerly known as Enterprise Search on Generative AI App Builder) is a part of the Vertex AI machine learning platform offered by Google Cloud.

📄️ Kay.ai

Data API built for RAG 🕵️ We are curating the world's largest datasets as high-quality embeddings so your AI agents can retrieve context on the fly. Latest models, fast retrieval, and zero infra.

📄️ kNN

In statistics, the k-nearest neighbors algorithm (k-NN) is a non-parametric supervised learning method first developed by Evelyn Fix and Joseph Hodges in 1951, and later expanded by Thomas Cover. It is used for classification and regression.

📄️ LOTR (Merger Retriever)

Lord of the Retrievers, also known as MergerRetriever, takes a list of retrievers as input and merges the results of their getrelevantdocuments() methods into a single list. The merged results will be a list of documents that are relevant to the query and that have been ranked by the different retrievers.

📄️ Metal

Metal is a managed service for ML Embeddings.

📄️ Pinecone Hybrid Search

Pinecone is a vector database with broad functionality.

📄️ PubMed

PubMed® by The National Center for Biotechnology Information, National Library of Medicine comprises more than 35 million citations for biomedical literature from MEDLINE, life science journals, and online books. Citations may include links to full text content from PubMed Central and publisher web sites.

📄️ RePhraseQuery

RePhraseQuery is a simple retriever that applies an LLM between the user input and the query passed by the retriever.

📄️ SEC filing

The SEC filing is a financial statement or other formal document submitted to the U.S. Securities and Exchange Commission (SEC). Public companies, certain insiders, and broker-dealers are required to make regular SEC filings. Investors and financial professionals rely on these filings for information about companies they are evaluating for investment purposes.

🗃️ Self-querying retriever

14 items

📄️ SingleStoreDB

SingleStoreDB is a high-performance distributed SQL database that supports deployment both in the cloud and on-premises. It provides vector storage, and vector functions including dotproduct and euclideandistance, thereby supporting AI applications that require text similarity matching.

📄️ SVM

Support vector machines (SVMs) are a set of supervised learning methods used for classification, regression and outliers detection.

📄️ Tavily Search API

Tavily's Search API is a search engine built specifically for AI agents (LLMs), delivering real-time, accurate, and factual results at speed.

📄️ TF-IDF

TF-IDF means term-frequency times inverse document-frequency.

📄️ Vespa

Vespa is a fully featured search engine and vector database. It supports vector search (ANN), lexical search, and search in structured data, all in the same query.

📄️ Weaviate Hybrid Search

Weaviate is an open-source vector database.

📄️ Wikipedia

Wikipedia is a multilingual free online encyclopedia written and maintained by a community of volunteers, known as Wikipedians, through open collaboration and using a wiki-based editing system called MediaWiki. Wikipedia is the largest and most-read reference work in history.

📄️ you-retriever

Using the You.com Retriever

📄️ Zep

Retriever Example for Zep