南京大学学报(自然科学版) ›› 2012, Vol. 48 ›› Issue (1): 77–83.

• • 上一篇    下一篇

 基于局部时空兴趣点特征包的事件识别*

 杜吉祥**,郭一兰,翟传敏   

  • 出版日期:2015-05-16 发布日期:2015-05-16
  • 作者简介: (华侨大学计算机科学与技术学院,厦门,361021)
  • 基金资助:
     国家自然科学基金(60805021, 61175121),教育部新世纪优秀人才支持计划(NCET-10-0117) ,福建省自然科
    学基金(2011J01349),福建省高等学校杰出青年科研人才培育计划(JA1ooo6 ),福建省教育厅科技计划
    (JA11004),国务院侨办科研基金(11QZR05),华侨大学基木科研业务费专项基金(JB-SJ1oo3)

 Event recognition based on bag of local space-time interest points’features

 Du Ji一Xiang ,Guo Yi一Lan,Zhai Chuan-Ming
  

  • Online:2015-05-16 Published:2015-05-16
  • About author: (College of Computer Science and Technology, Huaqiao University, Xiamen, 361021,China)

摘要:  提出一种基于局部时空兴趣点特征包的电影中复杂事件检索与识别的为一法.该为一法先将一个独立的事件视频序列表示成一个局部时空兴趣点特征包,再将此特征包与支持向量机相结合用于识别
事件.该为一法使用局部时空特征描述子来捕捉视频中的局部事件,可以适应事件的模式的不同的大小和速度.为了验证该为一法的有效性,使用了Hollywood视频数据库,其中的镜头序列收集自32部不同的
Hollywood电影,包含了8个事件类别.和其他相关的为一法相比,实验结果证明本文提出的为一法明显提高了平均正确率和平均查准率.

Abstract:  Event detection is defined as interesting events which attract the attention of users. Now a lot of research about video event detection depends strongly on specific domain knowledge and prior model result in it is
difficult to apply to other domain or even other database.The aim of this paper is to address recognition of natural events in diverse and realistic video settings. We propose a novel method based on bag of local spac}time interest
points’features to recognize and retrieval complex events in real movies. In this method,an individual video sequence is represented as a bag of local spacrtime features then we integrate such bag-of-feature with support
vector machine for recognition events. Local spacitime features have become a popular video representation for event recognition. Local spacrtime features capture characteristic shape and motion in video and provide relatively
independent representation of events with respect to their spatio-temporal shifts and scales as well as background clutter and multiple motions in the scene.The calculation of local representations proceeds in a bottorrrup fashion;
spatio-temporal interest points arc detected first,and local patches arc calculated around these points. Finally, the patches arc combined into a final representation. Spacrtime interest points arc locations in space and time where
changes of movement occur in the video. It is assumed that these locations arc more informative for recognition. So local space-time features arc introduced to capture the local events in video and they can also be adapted to size and
velocity of the pattern of the event. Since the number of frames of each video clip is different and the quantity of the spacrtime interest points of each frame is also diverse, so in order to unify the representation of video sequence, we
adopt the idea of bag of features. For every event class local spatio-temporal features used for training arc first quantized into visual words and a video is then represented as the frequency histogram over the visual words.To
evaluate effectiveness of this method,this paper uses the Hollywood dataset,in this dataset the shot sequences has collected from 32 different Hollywood movies, where the training samples come from 12 movies and the testing
samples arc from another 20 movies and it includes 8 event classes; answering the phone,getting out of the car, hand shaking,hugging, kissing, sitting down, sitting up and standing up.The presented result justify the proposed
method explicitly improve the average accuracy and average precision compared to other relative approaches.

[1]Li Z D, Fei X L. Research on the concept-based information retrieval model. Journal of Nanjing University(Natural Sciences),2002,38(1):
99-109.(李振东,费翔林.基于概念的信息检索模型研究.南京大学学报(自然科学),2002,38(1):99一109).
[2]Wu P, Hsieh J W, Chcng J C,et al. Human smoking event detection using visual interaction clues. International Conference on Pattern Rer ognition, 2010,4344一4347.
[3]Laptcv I,Lindeberg T.Spacrtime interest points, IEEE international Conference on Com- puter Vision, 2003,132一139.
[4]Dollar P,Rahaud V,Cottrell U, et al. Behav- for recognition via sparse spatio-temporal fea- tures. Joint IEEE international Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance,2005,65一72.
[5]Jhuang H,Serre T,Wolf I.,et al. A biologi cally inspired system for action recognition. IEEE international Conference on Computer Vi sion, 2007,1一8.
[6]Oikonomopoulos A,Patras I,Pantic M. Spatio- temporal salient points for visual recognition of human actions, IEEE Transactions on Systems Man and Cybernetics, Part B; Cybernetics 2006,36(3):710一719.
[7]Willems G, T uytelaars T,VanUool L. An effi cient dense and scale-invariant spatio-temporal interest point detector. European Conference on Computer Vision, 2008,650一663.
[8]Wong S F, Cipolla R. Extracting spatio-tempo ral interest points using global Information. IEEE international Conference on Computer Vi sion, 2007,1一8.
[9]Klaser A,Marszalek M, Schmid C. A spatio- temporal descriptor based on 3D-gradients.British Machine Vision Conference,2008, 995一1004.
[10]Laptev I,Lindeberg T.Local descriptors for spatio-temporal recognition. Lecture Notes in Computer Science. Springer-Verlag, 2004,3667.91一103.
[11]Laptev I,Marszalck M,Schmid C,et al. Learning realistic human actions from IEEE Conference on Computer Vision and Pat-tern Recognition,2008,1~8
[12]Scovanner P,Ali S, Shah M. A 3-dimensional SIFT descriptor and its application to action rer ognition. ACM international Conference on Multimedia, 2007,357一360.
[13]Laptev I,Patrick P. Retrieving actions in mov- ies. IEEE international Conference on Computer Vision, 2007,1一8.
[14]Bian Z Q, Zhang X G. Pattern Recognition. The 2nd Edition. Beijing; Tsinghua University Press, 2000, 280-283.(边肇祺,张学工.模式识别.第二版.北京:清华大学出版社,2000, 280~283).
[15]Schuldt C, Laptev l,Caputo B. Recognizing human actions:a local SVM approach, Interna tional Conference on Pattern Recognition, 2004,32~36
[16]Marszalek M, Laptev l,Schmid C. Actions in context, IEEE Conference on Computer Vision and Pattern Recognition, 2009,2929一2936.
[17]Heng W, Muhammad M U, Alexander K,et al. Evaluation of local spatio-temporal features for action recognition. British Machine Vision Conference,2009,127一138.








movies.




No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!