Computer Integrated Manufacturing System ›› 2025, Vol. 31 ›› Issue (7): 2438-2445.DOI: 10.13196/j.cims.2024.0393

Previous Articles     Next Articles

Zero-shot defect detection method based on spatial semantic guidance

SONG Yanan1,2,PAN Baisong1,2+,YI Wenchao2,ZHANG Biao3   

  1. 1.Zhejiang Key Laboratory of High-Precision and Efficiency Hybrid Processing Technology and Equipment,Zhejiang University of Technology
    2.Key Laboratory of Special Purpose Equipment and Advanced Processing Technology,Ministry of Education,Zhejiang University of Technology
    3.School of Computer Science,Liaocheng University
  • Online:2025-07-31 Published:2025-08-04
  • Supported by:
    Project supported by the National Natural Science Foundation,China(No.52005447),the Zhejiang Provincial Natural Science Foundation,China(No.LQN25E050014),and the Research Project of Department of Education of Zhejiang Province,China(No.Y202455016).

基于空间语义引导的零样本缺陷检测方法

宋亚楠1,2,潘柏松1,2+,易文超2,张彪3   

  1. 1.浙江工业大学全省高精高效复合加工技术与装备重点实验室
    2.浙江工业大学特种装备制造与先进加工技术教育部重点实验室
    3.聊城大学计算机学院
  • 作者简介:
    宋亚楠(1991-),男,河南驻马店人,讲师,博士,研究方向:机器视觉及应用,E-mail:ynsong@zjut.edu.cn;

    +潘柏松(1968-),男,浙江温岭人,教授,博士,博士生导师,研究方向:智能制造及可靠性工程技术,通讯作者,E-mail:panbsz@zjut.edu.cn;

    易文超(1989-),女,湖北武汉人,讲师,博士,硕士生导师,研究方向:智能优化算法及应用,E-mail:yiwenchao@zjut.edu.cn;

    张彪(1990-),男,山东阳谷人,副教授,博士,硕士生导师,研究方向:机器学习与智能优化,E-mail:zhangbiao@lcu-cs.com。
  • 基金资助:
    国家自然科学基金资助项目(52005447);浙江省自然科学基金资助项目(LQN25E050014);浙江省教育厅科研项目资助(Y202455016)。

Abstract: Existing vision-language models focus too much on object category semantics,and they ignore the fine-grained perception of local spatial defect areas.To solve these problems,a zero-shot defect detection method was proposed based on spatial semantic guidance.The guidance network was designed to extract semantic distribution features of images.These distribution features were added to the visual encoding network of the vision-language model.Highly universal learnable text prompts were designed for both normal and defect states.The corresponding text embedding was extracted by the designed text encoding network.The defect heat map was predicted based on cosine similarity between text embedding and multiple visual features.The proposed method achieved pixel level defect detection accuracy of 88.5%,95.3%,97.0% and 91.6% on the MVTec,VisA,MPDD and BTAD datasets respectively.Experimental results showed that the proposed method had strong zero-shot defect detection performance.

Key words: zero-shot defect detection, vision-language model, learnable prompt, semantic guidance

摘要: 针对现有视觉语言模型过多关注物体类别语义,忽略局部空间缺陷区域的细粒度感知问题,提出基于空间语义引导的零样本缺陷检测方法。设计空间语义引导网络提取图像语义分布特征,并将其添加到视觉语言模型中的视觉编码网络。针对正常和缺陷状态设计通用性较强的可学习文本提示,由设计的文本编码网络提取对应的文本嵌入,并与多阶段视觉特征计算余弦相似度,进而预测缺陷区域热图。所提缺陷检测模型在MVTec、VisA、MPDD、BTAD四个数据集上分别获得了88.5%、95.3%、97.0%、91.6%的像素级缺陷检测准确率。实验结果表明所提方法具有较强的零样本缺陷检测性能。

关键词: 零样本缺陷检测, 视觉语言模型, 可学习提示, 语义引导

CLC Number: