南京大学学报(自然科学版) ›› 2014, Vol. 50 ›› Issue (2): 228–.

• • 上一篇    

基于强模态辅助的弱模态图像概念分类及检测

邹晓川,叶翰嘉,詹德川*   

  • 出版日期:2014-04-07 发布日期:2014-04-07
  • 作者简介:在对图像中的主要描述物体进行检测或者对图像进行分类时需要利用多种不同类型的图像特征,例如HoG、BoW等。从多模态学习的角度来说就是需要对多种不同通道的特征加以利用。虽然将多种特征协同使用可以提升图像中物体检测或者图像分类的性能,但提取多种模态特征需要使耗费大量时间,难以满足实时性要求较高的环境(移动设备,网络搜索等)
  • 基金资助:
    国家自然科学基金青年基金(61105043),江苏省基金面上项目(BK2011566)

Weak model image classification and object detection with affluent strong model information

  • Online:2014-04-07 Published:2014-04-07
  • About author:Zou Xiaochuan, Ye Hanjia, Zhan Dechuan

摘要: 在对图像中的主要描述物体进行检测或者对图像进行分类时需要利用多种不同类型的图像特征,例如HoG、BoW等。从多模态学习的角度来说就是需要对多种不同通道的特征加以利用。虽然将多种特征协同使用可以提升图像中物体检测或者图像分类的性能,但提取多种模态特征需要使耗费大量时间,难以满足实时性要求较高的环境(移动设备,网络搜索等)的要求。本文提出在训练阶段利用强模态的特征来辅助较弱的模态进行学习,通过让弱模态的分类器能够在大量的未标注样本上取得和强模态一致的效果,从而增强弱模态分类器的泛化性能;同时在测试阶段,只需提取弱模态特征,就可以利用使用弱模态特征的分类器进行预测,并达到较好的效果。在INRIA person和caltech101数据上的实验表明,本文的方法在测试时由于只使用了相对抽取开销小的弱模态特征,从而可以应用在实时性要求较高的环境,同时还带来了泛化性能的提升。

Abstract: Object detection and image classification involve different kinds of features like HoG, BoW, etc. From the aspect of multi-modal learning, these tasks can be viewed as learning with different channels of features. However, in concrete applications, different modal always use different features and the extraction process of each modal’s feature costs lots of time. This makes most learning models cannot be applied in particular situations (e.g. on mobile devices, search engine which faces large scale data, etc.). It usually the case that strong modal which has a good accuracy tends to use costly features, and weak modal which works fast, yet could with worse performance. This article introduces a new multi-modal learning method, which incorporates the informative strong modal to help the weak modal learning by minimizing the prediction gap of unlabeled data between the two models. In the training phase, both strong and weak modal are trained, and the weak modal is adjusted to have a similar prediction as the strong modal on a large amount of unlabeled data. In the test phase, only the weak modal’s feature (i.e. feature with low extraction cost) is needed. Our experiments on INRIA person and caltech101 show that the proposed method works efficiently and effectively on common computer vision tasks, and with the plenty of unlabeled data, weak modal can even outperform the strong modal in some cases

[1] Dalal N, Triggs B. Histograms of oriented gradients for human detection. IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Diego, CA, 2005, 886~893.
[2] Sivic J, Zisserman A. Video google: A text retrieval approach to object matching in videos. IEEE International Conference on Computer Vision. Nice, France, 2003, 1470~1477.
[3] Stricker M, Orengo M. Similarity of color images. In Storage and Retrieval of Image and Video Databases III, 1995, 381
[4] Zhang D S, Wong A, Indrawan W, et al. Content-based image retrieval using gabor texture features. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000, 13~15.
[5] LeCun Y, Bottou L, Bengio Y, et al. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 1998, 86(11): 2278~2324.
[6] Harzallah H, Jurie F, Schmid C. Combining efficient object localization and image classification. IEEE International Conference on Computer Vision. Kyoto, Japan, 2009:237~244.
[7] Bosch A, Zisserman A, Munoz X. Image classification using random forests and ferns. IEEE International Conference on Computer Vision. Rio de Janeiro, Brazil, 2007, 1~8.
[8] Zhou Z H, Wu J, Tang W. Ensembling neural networks: many could be better than all. Artificial Intelligence, 2002, 137(1-2): 239~263.
[9] Smeulders A, Worring M, Santini S. Content-based image retrieval at the end of the early years. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000, 22(12): 1349~1380.
[10] Deng J, Dong W, Socher R, et al. Imagenet: A large-scale hierarchical image database. IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Miami, FL, 2009, 248~255.
[11] 俞亚军, 霍 静, 史颖欢等. SSXCS:半监督学习分类系统. 南京大学学报(自然科学), 2013, 49(5): 611~618.
[12] Zhou D, Bousquet O, Lal T N, et al. Learning with local and global consistency. Neural Information Processing Systems, 2004, 16: 321~328.
[13] Bennett K, Demiriz A. Semi-supervised support vector machines. Neural Information Processing Systems, 1998, 368~374
[14] Zhou Z H, Li Y F. Towards making unlabeled data never hurt. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014 (99): 1081~1088.
[15] Wang W, Zhou Z H. A new analysis of co-training. International Conference on Machine Learning. Haifa, Israel, 2010, 1135~1142.
[16] Zhou Z H. Ensemble methods: Foundations and algorithms, Boca Raton, FL: Chapman & Hall/CRC, 2012, 22~30.
[17] Blum A, Mitchell T. Combining labeled and unlabeled data with co-training. Annual conference on Computational learning theory. New York, NY, 1998, 92~100.
[18] Zhou Z H, Zhan D C, Yang Q. Semi-supervised learning with very few labeled training examples. AAAI Conference on Artificial Intelligence. Vancouver, Canada, 2007, 675~680.
[19] Zhang Q, Zhan D C, Yin Y. Learning with weak views based on dependence maximization dimensionality reduction. Intelligent Science and Intelligent Data Engineering, 2012, 557~564.
[20] Zhou Z H, Jiang Y. NeC4.5: Neural ensemble based C4.5. IEEE Transactions on Knowledge and Data Engineering, 2004, 16(6): 770~773.
[21] Elkan C. Deriving TF-IDF as a fisher kernel. International Conference on String Processing and Information Retrieval. Buenos Aires, Argentina, 2005, 295~300.
Lazebnik S, Schmid C, Ponce J. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. IEEE Computer Society Conference on Computer Vision and Pattern Recognition. New York, NY, 2006, 2169~2178.
No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!