Improving Root Cause Analysis of Production Defect Using AI: A Case Study in an Automotive Manufacturing Plant

Muhammad Najib, Emon Rifa'i

Abstract

In automotive manufacturing, repetitive defects often occur across different time periods, creating a valuable historical dataset containing defect names and their corresponding root causes. Traditionally, identifying the root cause of a production defect relied heavily on human analysis, requiring significant time and on-site inspection. This often led to delayed countermeasures, increased production downtime, and additional issues such as line stops. This study presents an AI-based approach to assist root cause analysis using historical defect data, aiming to reduce the analysis time and improve feedback accuracy. The implementation focused on enabling faster and more accurate identification of root causes by integrating a machine learning model into the factory’s defect recording system (ATPPM, Analisa Tindakan Penanggulangan dan Pencegahan Masalah). The development process involved data preprocessing, model training, and API deployment. The original dataset consisted of 3,128 records, which were cleaned and reduced to 1,449 labeled entries, each annotated with one of 161 unique root cause labels. Eleven machine learning models were evaluated, including Logistic Regression, Random Forest, SVM, and RNN. Initial evaluation using F1-score, precision, and recall showed Logistic Regression achieving the best F1-score of 0.83. Further validation using 5-Fold Cross Validation identified the Support Vector Machine (SVM) as the best-performing model, with an average accuracy of 89.1%. This model was deployed via a Python Flask API and integrated into the existing ATPPM system. The AI-powered system significantly accelerated the root cause analysis process, reducing the average analysis time by 228 minutes. Potential future enhancements involve automating the model’s training process on a regular schedule (weekly or daily), integrating additional data sources including big data and quality management systems, and scaling the current API implementation to multiple production lines for wider impact.

Keywords

Root Cause Analysis; Production Defect; Machine Learning; Defect Prediction

Full Text:

PDF

References

Altman, A., & Krzywinski, M. (2017). The art of the KNN algorithm. Nature Methods, 14, 603–604.

Bird, S., Klein, E., & Loper, E. (2009). Natural Language Processing with Python. O’Reilly Media.

Bousdekis, A., Lepenioti, K., Apostolou, D., & Mentzas, G. (2021). Decision making in predictive maintenance: Literature review and research agenda for Industry 4.0. Journal of Intelligent Manufacturing, 32, 1223–1250.

Breiman, L. (2001). Random Forests. Machine Learning, 45(1), 5–32.

Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD.

Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20, 273–297.

Doggett, A. M. (2005). Root Cause Analysis: A Framework for Tool Selection. The Quality Management Journal, 12(4), 34–45.

Domingos, P., & Pazzani, M. (1997). On the optimality of the simple Bayesian classifier under zero-one loss. Machine Learning, 29, 103–130.

Fisher, R. A. (1936). The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7(2), 179–188.

Ge, Z., Song, Z., & Gao, F. (2013). Review of recent research on data-based process monitoring. Industrial & Engineering Chemistry Research, 52(10), 3543–3562.

Hosmer, D. W., Lemeshow, S., & Sturdivant, R. X. (2013). Applied Logistic Regression (3rd ed.). Wiley.

Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., ... & Liu, T.-Y. (2017). LightGBM: A highly efficient gradient boosting decision tree. Advances in Neural Information Processing Systems, 30.

Kohavi, R. (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection. Proceedings of the 14th International Joint Conference on Artificial Intelligence (IJCAI), 2, 1137–1143.

LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521, 436–444.

Lee, J., Bagheri, B., & Kao, H. A. (2015). A Cyber-Physical Systems architecture for Industry 4.0-based manufacturing systems. Manufacturing Letters, 3, 18–23.

Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1(1), 81–106.

Sokolova, M., & Lapalme, G. (2009). A systematic analysis of performance measures for classification tasks. Information Processing & Management, 45(4), 427–437.

Wuest, T., Weimer, D., Irgens, C., & Thoben, K. D. (2016). Machine learning in manufacturing: advantages, challenges, and applications. Production & Manufacturing Research, 4(1), 23–45.

Zhang, Y., Jin, R., & Zhou, Z.-H. (2010). Understanding bag-of-words model: a statistical framework. International Journal of Machine Learning and Cybernetics, 1(1), 43–52.

Zhang, Y., Ren, S., Liu, Y., & Si, S. (2017). A big data analytics architecture for cleaner manufacturing and maintenance processes of complex products. Journal of Cleaner Production, 142, 626–641.

DOI

https://doi.org/10.21107/ijseit.v9i2.31226

Metrics

Refbacks

  • There are currently no refbacks.


Copyright (c) 2025 Muhammad Najib, Emon Rifa'i

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.