Abstract: Traditional video action detectors typically adopt the two-stage pipeline, where a person detector is first employed to generate actor boxes and then 3D RoIAlign is used to extract ...
Abstract: Directly recognizing action based on compressed video shows significant advantages, such as low storage demands, efficient decoding, and fast inference speeds. Existing methods on compressed ...