Action recognition using spatio-temporal distance classifier correlation filter
- Authors: Anwaar-Ul Haq , Gondal, Iqbal , Murshed, Manzur
- Date: 2011
- Type: Text , Conference proceedings
- Relation: 2011 International Conference on Digital Image Computing Techniques and Applications (DICTA), Noosa, QLD, 6th-8th Dec, 2011
- Full Text: false
- Reviewed:
- Description: The problem of recognizing human actions is characterized by complex dynamics and strong variations in their executions. Despite this inconvenience, space-time correlations provide valuable clues for their discrimination. Therefore, space-time correlators like emph{Maximum Average Correlation Height} (MACH) filters have successfully been used for action recognition with encouraging results. However, their utility is challenged due to number of factors: (i) these filters are trained only for one class at a time and separate filters are required for each class increasing computational overhead, (ii) these filters simply take average of similar action instances and behave no better than average filters and (iii) misaligned action datasets create problems for these filters as they are not shift-invariant. In this paper, we address these issues by posing action recognition as a multi-class discrimination problem and propose a emph{single} 3D frequency domain filter, named Action ST-DCCF for multiple action classes that mitigates inherent discrepancies of correlation filters. It presents a different interpretation of correlation filters as a method of applying spatio-temporal transformation to the data rather than simply minimizing correlation energy across all possible shifts. Experiments on a variety of action datasets are performed to evaluate our approach. Experimental results are comparable to the existing action recognition approaches.
- Description: The problem of recognizing human actions is characterized by complex dynamics and strong variations in their executions. Despite this inconvenience, space-time correlations provide valuable clues for their discrimination. Therefore, space-time correlators like \emph{Maximum Average Correlation Height} (MACH) filters have successfully been used for action recognition with encouraging results. However, their utility is challenged due to number of factors: (i) these filters are trained only for one class at a time and separate filters are required for each class increasing computational overhead, (ii) these filters simply take average of similar action instances and behave no better than average filters and (iii) misaligned action datasets create problems for these filters as they are not shift-invariant. In this paper, we address these issues by posing action recognition as a multi-class discrimination problem and propose a \emph{single} 3D frequency domain filter, named Action ST-DCCF for multiple action classes that mitigates inherent discrepancies of correlation filters. It presents a different interpretation of correlation filters as a method of applying spatio-temporal transformation to the data rather than simply minimizing correlation energy across all possible shifts. Experiments on a variety of action datasets are performed to evaluate our approach. Experimental results are comparable to the existing action recognition approaches.
Contextual action recognition in multi-sensor nighttime video sequences
- Authors: Anwaar-Ul, Haq , Gondal, Iqbal , Murshed, Manzur
- Date: 2011
- Type: Text , Conference paper
- Relation: Proceedings of the 2011 Digital Image Computing: Techniques and Applications (DICTA 2011), Noosa 6th-8th Dec, 2011 p. 256-261
- Full Text: false
- Reviewed:
- Description: Contextual information is important for interpreting human actions especially when actions exhibit interactive relationship with their context. Contextual clues become even more crucial when videos are captured in unfavorable conditions like extreme low light nighttime scenarios. These conditions encourage the use of multi-senor imagery and context enhancement. In this paper, we explore the importance of contextual knowledge for recognizing human actions in multi-sensor nighttime videos. Information fusion is utilized for encapsulating visual information about actions and their context. Space-time action information is contained using 3D fourier transform of fused action silhouette volume. In parallel, SIFT context images are extracted and fused using principal component analysis based feature fusion for each action class. Contextual dissimilarity is penalized by minimizing context SIFT flow energy. The action dataset comprises multi-sensor night vision video data from infra-red and visible spectrum. Experimental results show that fused contextual action information boost action recognition performance as compared to the baseline action recognition approac