CodeHuman AnnotatedCommercial OK

toxic_conversations_50k

by mteb

Bronze43
3.0Kdownloads
19likes

Description

ToxicConversationsClassification An MTEB dataset Massive Text Embedding Benchmark Collection of comments from the Civil Comments platform together with annotations if the comment is toxic or not. Task category t2c Domains Social, Written Reference https://www.kaggle.com/competitions/jigsaw-unintended-bias-in-toxicity-classification/overview How to evaluate on this task You can evaluate an embedding model on this dataset using the following code: import… See the full description on the dataset page: https://huggingface.co/datasets/mteb/toxic_conversations_50k.

What can I do with this?

Tags

task_categories:text-classificationtask_ids:sentiment-analysistask_ids:sentiment-scoringtask_ids:sentiment-classificationtask_ids:hate-speech-detectionannotations_creators:human-annotatedmultilinguality:monolinguallanguage:englicense:cc-by-4.0size_categories:100K<n<1Mformat:jsonmodality:textlibrary:datasetslibrary:pandaslibrary:polarslibrary:mlcroissantarxiv:2502.13595arxiv:2210.07316region:usmteb