CodeHuman AnnotatedCommercial OK
toxic_conversations_50k
by mteb
3.0Kdownloads
19likes
Description
ToxicConversationsClassification
An MTEB dataset
Massive Text Embedding Benchmark
Collection of comments from the Civil Comments platform together with annotations if the comment is toxic or not.
Task category
t2c
Domains
Social, Written
Reference
https://www.kaggle.com/competitions/jigsaw-unintended-bias-in-toxicity-classification/overview
How to evaluate on this task
You can evaluate an embedding model on this dataset using the following code:
import… See the full description on the dataset page: https://huggingface.co/datasets/mteb/toxic_conversations_50k.
What can I do with this?
Tags
task_categories:text-classificationtask_ids:sentiment-analysistask_ids:sentiment-scoringtask_ids:sentiment-classificationtask_ids:hate-speech-detectionannotations_creators:human-annotatedmultilinguality:monolinguallanguage:englicense:cc-by-4.0size_categories:100K<n<1Mformat:jsonmodality:textlibrary:datasetslibrary:pandaslibrary:polarslibrary:mlcroissantarxiv:2502.13595arxiv:2210.07316region:usmteb