TY - GEN
T1 - ActivityNet: A Large-Scale Video Benchmark for Human Activity Understanding
AU - Heilbron, Fabian Caba
AU - Castillo, Victor
AU - Ghanem, Bernard
AU - Niebles, Juan Carlos
N1 - KAUST Repository Item: Exported on 2020-04-23
Acknowledgements: IEEE Computer Society, Computer Vision Foundation - CVF
PY - 2015/10/15
Y1 - 2015/10/15
N2 - In spite of many dataset efforts for human action recognition, current computer vision algorithms are still severely limited in terms of the variability and complexity of the actions that they can recognize. This is in part due to the simplicity of current benchmarks, which mostly focus on simple actions and movements occurring on manually trimmed videos. In this paper we introduce ActivityNet, a new largescale video benchmark for human activity understanding. Our benchmark aims at covering a wide range of complex human activities that are of interest to people in their daily living. In its current version, ActivityNet provides samples from 203 activity classes with an average of 137 untrimmed videos per class and 1.41 activity instances per video, for a total of 849 video hours. We illustrate three scenarios in which ActivityNet can be used to compare algorithms for human activity understanding: untrimmed video classification, trimmed activity classification and activity detection.
AB - In spite of many dataset efforts for human action recognition, current computer vision algorithms are still severely limited in terms of the variability and complexity of the actions that they can recognize. This is in part due to the simplicity of current benchmarks, which mostly focus on simple actions and movements occurring on manually trimmed videos. In this paper we introduce ActivityNet, a new largescale video benchmark for human activity understanding. Our benchmark aims at covering a wide range of complex human activities that are of interest to people in their daily living. In its current version, ActivityNet provides samples from 203 activity classes with an average of 137 untrimmed videos per class and 1.41 activity instances per video, for a total of 849 video hours. We illustrate three scenarios in which ActivityNet can be used to compare algorithms for human activity understanding: untrimmed video classification, trimmed activity classification and activity detection.
UR - http://hdl.handle.net/10754/556141
UR - https://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Heilbron_ActivityNet_A_Large-Scale_2015_CVPR_paper.pdf
UR - http://www.scopus.com/inward/record.url?scp=84959216468&partnerID=8YFLogxK
U2 - 10.1109/CVPR.2015.7298698
DO - 10.1109/CVPR.2015.7298698
M3 - Conference contribution
AN - SCOPUS:84959216468
SN - 9781467369640
SP - 961
EP - 970
BT - Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
PB - Institute of Electrical and Electronics Engineers (IEEE)
ER -