计算机集成制造系统 ›› 2014, Vol. 20 ›› Issue (1): 96-.DOI: 10.13196/j.cims.2014.01.mabaizhang.0096.8.20140112

• 论文 • 上一篇    下一篇

基于潜在狄利特雷分布模型的网络评论产品特征抽取方法

马柏樟, 颜志军   

  1. 北京理工大学管理与经济学院
  • 出版日期:2014-01-25 发布日期:2014-01-25
  • 基金资助:
    国家自然科学基金资助项目(71128003,70972006,71102111);新世纪优秀人才支持计划资助项目(NCET-11-0792)

Product features extraction of online reviews based on LDA model

  • Online:2014-01-25 Published:2014-01-25
  • Supported by:
    Project supported by the National Natural Science Foundation,China(No.71128003,70972006,71102111),and the Program for New Century Excellent Talents in University,China(No.NCET-11-0792)

摘要: 针对网络评论挖掘中的产品特征抽取准确度不高、人工参与较多和难以处理口语化表述等问题,提出一种基于潜在狄利特雷分布模型的产品特征抽取方法。该方法首先应用中文分词工具对网络评论信息进行分词和词性标注,得到最初的产品特征名词集合;然后采用潜在狄利特雷分布文本训练模型筛选出候选产品特征词集合,进而通过同义词词林拓展和过滤规则得到最终的产品特征集合。以京东网上的相机和手机评论数据为例,通过实验对比分析验证了所提方法的有效性。

关键词: 网络评论|产品特征抽取|潜在狄利特雷分布|数据挖掘

Abstract: Aiming at the problems that low accuracy of product feature extraction, much human participation and difficult to handle the colloquial expression, a new product feature extraction method was proposed based on Latent Dirichlet Allocation (LDA). The online product reviews were parsed and labeled by using Chinese lexical analysis tool to generate the initial nouns set of product feature. The set of candidate product feature words was selected by LDA text training model, and the final product feature set was obtained through synonym lexicon expansion and feature filtering rules. The evaluate data of camera and mobile phone from JD.com was taken as the example to verify the effectiveness of the proposed method.

Key words: online reviews|product feature extraction|Latent Dirichlet allocation|data mining