Computer Integrated Manufacturing System ›› 2025, Vol. 31 ›› Issue (6): 2084-2097.DOI: 10.13196/j.cims.2024.0303

Previous Articles     Next Articles

Interpretable visual question answering method for defect recognition

ZUO Daiyue,JIANG Wenbo,ZHENG Hangbin,BAO Jinsong+   

  1. School of Mechanical Engineering,Donghua University
  • Online:2025-06-30 Published:2025-07-08

面向缺陷识别的可解释视觉问答方法

左戴悦,蒋文波,郑杭彬,鲍劲松+   

  1. 东华大学机械工程学院
  • 作者简介:
    左戴悦(2000-),女,上海人,硕士研究生,研究方向:多模态智能、工业质检,E-mail:zuody@mail.dhu.edu.cn;

    蒋文波(1999-),男,广西桂林人,软件研发工程师,研究方向:智能制造,E-mail:wenbo_jiang@163.com;

    郑杭彬(1998-),男,浙江绍兴人,博士研究生,研究方向:多模态智能、数字孪生,E-mail:zhb@mail.dhu.edu.cn;

    +鲍劲松(1972-),男,安徽庐江人,教授,博士,研究方向:工业智能、智能制造系统,通讯作者,E-mail:bao@dhu.edu.cn。

Abstract: Facing the challenge of interpretability of deep learning approaches in the PV panel defect recognition task,an interpretable PV defect visual question and answer framework driven by data and knowledge fusion was proposed.A tandem deep learning model was used to perform the task of PV panel defect recognition.Then,an image-text multimodal model was fine-tuned to learn expert knowledge and evaluate the explanation images obtained by Grad-CAM method.Finally,a specialized prompt template was designed to integrate information from multiple stages into the form of natural language dialogues.Based on the multi-modal large language,the defect recognition model was interpreted,and the application of the model results was extended.The interpretability of the defect detection model was enhanced to achieve accurate and reliable visual question answering,which improved the efficiency and usability of the PV panel defect recognition task.

Key words: photovoltaic panel, defect recognition, multi-modal, interpretability, visual reasoning

摘要: 针对光伏板缺陷识别任务中深度学习方法的可解释性,提出一种数据和知识融合驱动的可解释光伏缺陷视觉问答框架。首先采用串联的深度学习模型执行光伏板缺陷识别任务;然后微调图文多模态模型以学习专家知识对检测模型的热注意力图的评价;最后设计专用提示词模板,将来自多个层级的信息整合到自然语言对话的形式中,基于多模态大语言模型解释光伏板缺陷识别模型,拓展了检测结果的工业场景适用性,增强了缺陷检测模型的可解释性,实现了准确可靠的视觉问答,提升了光伏板缺陷识别任务的效率和可用性。

关键词: 光伏板, 缺陷识别, 多模态, 可解释性, 视觉推理

CLC Number: