Computer Integrated Manufacturing System ›› 2025, Vol. 31 ›› Issue (9): 3228-3244.DOI: 10.13196/j.cims.2024.0311

Previous Articles     Next Articles

Semi-supervised quality prediction for batch processes based on large-scale pseudo label optimization

JIN Huaiping1,2,ZHAO Pengfei3,RAO Feihong3+,YANG Biao1,2,QIAN Bin1,2   

  1. 1.Faculty of Information Engineering and Automation,Kunming University of Science and Technology
    2.Higher Educational Key Laboratory for Industrial Intelligence and Systems of Yunnan Province
    3.Kunming Branch of the 705 Research Institute of China Shipbuilding
  • Online:2025-09-30 Published:2025-10-11
  • Supported by:
    Project supported by the National Natural Science Foundation,China(No.62163019),the Applied Basic Research Programs of Yunnan Province,China(No.202101AT070096),and the Xingdian Talent Support Planning of Yunnan Province,China(No.KKRD202203073).

基于大规模伪标记优化的间歇过程半监督质量预测

金怀平1,2,赵鹏飞3,饶飞鸿3+,杨彪1,2,钱斌1,2   

  1. 1.昆明理工大学信息工程与自动化学院
    2.云南省高校工业智能与系统重点实验室
    3.中国船舶第七○五研究所昆明分部
  • 作者简介:
    金怀平(1987-),男,云南宣威人,教授,博士,博士生导师,研究方向:工业过程建模、软测量技术与应用、机器学习与数据挖掘等,E-mail:jinhuaiping@126.com;

    赵鹏飞(1985-),男,河北易县人,高级工程师,硕士,研究方向:人工智能,E-mail:oyueyueniao@126.com;

    +饶飞鸿(1998-),男,云南楚雄人,工程师,硕士,研究方向:人工智能、复杂工业过程建模、数据挖掘与优化控制等,通讯作者,E-mail:raofeihong7@163.com;


    杨彪(1974-),男,云南曲靖人,教授,博士,博士生导师,研究方向:智能优化调度理论与方法,E-mail:yb_chenggong@163.com;

    钱斌(1976-),男,云南昆明人,教授,博士,博士生导师,研究方向:智能优化调度理论与方法,E-mail:bin.qian@vip.163.com。

    金怀平(1987-),男,云南宣威人,教授,博士,博士生导师,研究方向:工业过程建模、软测量技术与应用、机器学习与数据挖掘等,E-mail:jinhuaiping@126.com;

    赵鹏飞(1985-),男,河北易县人,高级工程师,硕士,研究方向:人工智能,E-mail:oyueyueniao@126.com;

    +饶飞鸿(1998-),男,云南楚雄人,工程师,硕士,研究方向:人工智能、复杂工业过程建模、数据挖掘与优化控制等,通讯作者,E-mail:raofeihong7@163.com;


    杨彪(1974-),男,云南曲靖人,教授,博士,博士生导师,研究方向:智能优化调度理论与方法,E-mail:yb_chenggong@163.com;

    钱斌(1976-),男,云南昆明人,教授,博士,博士生导师,研究方向:智能优化调度理论与方法,E-mail:bin.qian@vip.163.com。
  • 基金资助:
    国家自然科学基金资助项目(62163019);云南省应用基础研究计划资助项目(202101AT070096);云南省“兴滇英才支持计划”资助项目(KKRD202203073)。

Abstract: Batch processes are often characterized by scarce labeled data and abundant unlabeled data,which leads to poor prediction performance of traditional quality prediction models.To tackle this issue,a Two-stage Search based Large-scale Pseudo Label Optimization method (TS-LPLO) was proposed for alleviating the label scarcity issue,based on which a semi-supervised quality prediction model namely semi-supervised TS-LPLO (SSTS-LPLO) was further developed.TS-LPLO transformed the issue of label estimation for unlabeled samples into a large-scale pseudo label distribution optimization problem with an objective function defined by prediction accuracy and probability distribution similarity.Meanwhile,a two-stage approach for solving the label distribution optimization problem was proposed by combining fast global search and refined local search.This strategy could quickly lock the potential area of the optimal solution through global search and enhance the quality of the optimal solution by introducing random evolution mechanism into the local search.Finally,the optimized pseudo labeled data were extended for the labeled training data to establish a semi-supervised quality prediction model and used for online prediction of primary quality characteristics.The effectiveness and superiority of the proposed methods were verified through an industrial chlortetracycline fermentation process.Compared with traditional methods,the pseudo label estimation accuracy of TS-LPLO was increased by 49.79%,and the quality prediction accuracy of SSTS-LPLO was increased by 19%.

Key words: semi-supervised quality prediction, pseudo label, distribution estimation, large-scale optimization, batch process

摘要: 间歇过程往往存在标记数据匮乏、非标记数据丰富的现象,导致传统监督质量预测方法表现不佳。为此,本文提出了一种基于两阶段搜索的大规模伪标记优化方法(TS-LPLO),以缓解标记样本稀缺的问题,并在此基础上构建了半监督质量预测模型(SSTS-LPLO)。TS-LPLO算法将非标记样本的伪标记估计问题转化为大规模伪标记分布优化问题,其目标函数通过预测精度指标和概率分布相似度指标定义。同时,提出了一种融合全局搜索和精细化局部搜索的两阶段优化求解方法,该方法通过全局搜索快速锁定最优解潜在区间,并在精细化搜索中引入随机进化机制以提升优化解的搜索能力。最后,将优化获得的伪标记数据扩充到标记数据集以建立半监督质量预测模型,进而用于关键质量特性的在线预测。所提TS-LPLO、SSTS-LPLO方法的有效性和优越性通过工业金霉素发酵过程得到了验证。相较于传统方法,TS-LPLO伪标记估计精度提升49.79%,SSTS-LPLO质量预测精度提升19%。

关键词: 半监督质量预测, 伪标记, 分布估计, 大规模优化, 间歇过程

CLC Number: