Twine DMP

Dashboard

My Datasets

A large, high-quality video dataset of URL links to approximately 650000 Youtube video clips that cover 700 human action classes.

24.3MB

MP4

Video

Casual Conversations Dataset

45,000 videos (3,011 participants) and intended to be used for assessing the performance of already trained models in computer vision and audio applications

15GB

MP4

Video

Urban Sound 8K dataset

Contains 8732 urban sounds from 10 classes like an air conditioner, dog bark, drilling, siren, street music, etc.

An audio-visual dataset consisting of short clips of human speech, extracted from interview videos uploaded to YouTube

11,827 videos related to 180 different tasks, which were all collected from YouTube

A large-scale dataset that contains a diverse set of stereo video sequences recorded in street scenes from 50 different cities

51.92GB

JPG

Video

Recommended Marketplace Datasets

Activity Net

A Large-Scale Video Benchmark for Human Activity Understanding

AVA is a project that provides audiovisual annotations of video for improving our understanding of human activity.

The largest collection of poses which focuses on very challenging and realistic tasks of human-centric analysis in various crowd & complex events, including subway getting on/off, collision, fighting, and earthquake escape

n/a

MP4

Video

Yahoo-Flickr Creative Commons 100 Million Dataset

The YFCC100M is the largest publicly and freely usable multimedia collection, containing around 99.2 million photos and 0.8 million videos from Flickr, all of which were shared under one of the various Creative Commons licenses

UMDFaces is a face dataset divided into two parts: Still Images - 367,888 face annotations for 8,277 subjects and Video Frames - Over 3.7 million annotated video frames from over 22,000 videos of 3100 subjects.

A large-scale video dataset, featuring clips from movies with detailed captions.

AVSpeech is a new, large-scale audio-visual dataset comprising speech video clips with no interfering background noises

128MB

MP4

Video

Voices Obscured in Complex Environmental Settings (VOICES) Dataset

A creative commons speech dataset targeting acoustically challenging and reverberant environments with robust labels and truth data for transcription, denoising, and speaker identification.

1.4GB

MP3

Audio

Free Spoken digit dataset

A simple audio or speech data which consists of recordings of spoken English digits

10MB

WAV

Audio

The WebVid-10M Dataset

A large-scale dataset of short videos with textual descriptions sourced from the web

2.5MB

MP4

Video