Text - GeneralRAGUnknown

wikipedia-2023-11-embed-multilingual-v3

by CohereLabs

Silver56
52.3Kdownloads
245likes

Description

Multilingual Embeddings for Wikipedia in 300+ Languages This dataset contains the wikimedia/wikipedia dataset dump from 2023-11-01 from Wikipedia in all 300+ languages. The individual articles have been chunked and embedded with the state-of-the-art multilingual Cohere Embed V3 embedding model. This enables an easy way to semantically search across all of Wikipedia or to use it as a knowledge source for your RAG application. In total is it close to 250M paragraphs / embeddings. You… See the full description on the dataset page: https://huggingface.co/datasets/CohereLabs/wikipedia-2023-11-embed-multilingual-v3.

What can I do with this?

Tags

size_categories:100M<n<1Bmodality:textregion:us