Our paper presents a new approach for temporal detection of human actions in long, untrimmed video sequences. We introduce Single-Stream Temporal Action Proposals (SST), a new effective and efficient deep architecture for the generation of temporal action proposals. Our network can run continuously in a single stream over very long input video sequences, without the need to divide input into short overlapping clips or temporal windows for batch processing. We demonstrate empirically that our model outperforms the state-of-the-art on the task of temporal action proposal generation, while achieving some of the fastest processing speeds in the literature. Finally, we demonstrate that using SST proposals in conjunction with existing action classifiers results in improved state-of-the-art temporal action detection performance.
|Original language||English (US)|
|Title of host publication||2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)|
|Publisher||Institute of Electrical and Electronics Engineers (IEEE)|
|Number of pages||10|
|State||Published - Nov 9 2017|