A large, high-quality video dataset of URL links t...
45,000 videos (3,011 participants) and intended to...
Contains 8732 urban sounds from 10 classes like an...
An audio-visual dataset consisting of short clips ...
11,827 videos related to 180 different tasks, whic...
A large-scale dataset that contains a diverse set ...
An open source, multi-language dataset of voices t...
A Large-Scale Video Benchmark for Human Activity U...
AVA is a project that provides audiovisual annotat...
The largest collection of poses which focuses on v...
The YFCC100M is the largest publicly and freely us...
UMDFaces is a face dataset divided into two parts:...
A large-scale video dataset, featuring clips from ...
AVSpeech is a new, large-scale audio-visual datase...
A creative commons speech dataset targeting acoust...
A simple audio or speech data which consists of re...
A large-scale dataset of short videos with textual...
The first dataset of egocentric videos to study hu...
A dataset of stereo image pairs suited for stereo ...
The VIRAT Video Dataset is designed to be realisti...
A large-scale dataset for recognizing and understa...
A large collection of labeled video clips that sho...
Comprises ten tasks and 100K videos to estimate th...
300+ videos from 20 different TV shows for predict...
a large collection of video clips of different kin...
Fully annotated 4.5 hour dataset of RGB-D video + ...
A database of face videos designed for studying th...
Facial recognition 9,376 still images and 2,802 vi...
The largest video dataset for multi-modal person i...
Videos by 42 subjects, coming from 14 different na...
10 actors portraying 10 different emotional states...
12 hours of audiovisual data by 10 actors; 5 emoti...
3D video eye tracking dataset...
A large multi-purpose human motion and video datas...
This is a corpus of aligned spoken Wikipedia artic...
Audio transcription of TED talks. 1495 TED talks a...
65,000 one-second long utterances of 30 short word...
This dataset contains 23 Persian consonants and 6 ...
This 38.7 GB dataset helps predict which letter-na...
Phonetic and orthographic transcriptions of more t...
recordings of 630 speakers of eight major dialects...
6,000 events of surveillance applications, namely ...
1302 labeled sound recordings. Each recording is l...
A novel audio captioning dataset, consisting of 49...
an open dataset of human-labeled sound events cont...
A collection of crowd-sourced vocal imitations of ...
635 audio event classes and a collection of 2,084,...
120 unscripted 30-minute telephone conversations b...
1,000 hours of 16kHz read English speech...
Parallel English speech samples from 177 countries...
Conversations in Dutch, Japanese, and Irish Englis...
Sample of 24 Alexa wake word recordings in four la...
Public domain speech dataset consisting of 13,100 ...
The largest free speech corpus available for Manda...
500 utterances by a diverse group of actors (over ...
1384 recording by multiple speakers; 3 emotions: a...
Consists of 30000 audio samples of spoken digits (...
1935 recording by 61 speakers (45 male and 16 fema...
65 hours of annotated video from more than 1000 sp...
2199 opinion utterances with annotated sentiment; ...
speech dataset many accents reciting passages from...
7,442 original clips from 91 actors. These clips w...
20 speakers (10 female and 10 male) reading 5 exce...
Training deep discriminative embeddings to solve t...
9365 emotional and 332 neutral samples produced by...
Dinner Party Corpus - The participants were record...
26 text passage read by 10 speakers; 4 main emotio...
800 recording spoken by 10 actors (5 males and 5 f...
1115 audio instances sentences extracted from vari...
2,519 speech samples produced by 100 actors from 5...
Recordings and their associated transcriptions by ...
Recordings for 4 speakers- 2 males and 2 females; ...
6 actors who played 14 sentences; 6 emotions: disg...
A set of human speech with vocal emotion spoken by...
20 sentences by 12 actors; 4 emotions: angry, sad,...
100 hours by over 100 speakers - annotated with em...
includes 14k speech samples with simulated (codecs...
9114 spontaneous utterances and 2656 acted utteran...
3.8 hours of recordings by 46 participants; negati...
7356 files (total size: 24.8 GB). The database con...
4 male actors in 7 different emotions, 480 British...
95 dyadic conversations from 21 subjects. Each sub...
more than 2000 minutes of audio-visual data of 398...
3000 semi-natural utterances, equivalent to 3 hour...
A test bed for voice activity detection algorithms...
2800 recording by 2 actresses; 7 emotions: anger, ...
German language dataset, 22,668 recorded phrases, ...
400 utterances by 38 speakers (27 male and 11 fema...
110 English speakers with various accents; each sp...
non-speech, 1085 audio file by ~12 speakers; non-s...
100K hours of unlabelled speech data for 23 langua...
High quality French speech recordings and associat...
The corpus made available includes two main catego...
This corpus consists of approximately 22 hours of ...
This corpus consists of audio files covering rough...
The CallFriend Spanish corpus of telephone speech ...
This corpus includes 240 hours of Catalan speech f...
The Pansori TEDxKR Corpus is a Korean speech recog...
This dataset consists of 6 actors who recite 14 se...
This speech material contains 2,656 acted utteranc...
This database was created by Nordic Language Techn...
This database contains speech data for Danish, mad...
FT Speech is a new speech corpus created from the ...
LaPS is a dataset used by the Fala Brasil group to...
The M-AILABS Speech Dataset is the first large dat...
26 text passages read by 10 speakers, covering 4 m...
The Acted Emotional Speech Dynamic Database (AESDD...
Microsoft Speech Corpus (Indian languages) release...
The Tunisian_MSA corpus was originally collected t...
400 people from different accent areas in China ar...
The Malayalam Speech Corpus (MSC) is one of the fi...
This data set contains transcribed high-quality au...
Multilingual Librispeech (MLS) a large-scale, open...
BABEL was a joint European project under the COPER...
A "Crowd-Built" continuously growing speech datase...
The Microsoft Speech Language Translation Corpus r...