Articles | Open Access | https://doi.org/10.55640/ijvsli-05-01-03

Design-for-Test (DFT) strategies for high-performance computing and graphics chips

Vikas Nagaraj , MTS at Advanced Micro Device(AMD), San Jose, California, USA

Abstract

With the architecture complexity of silicon in high-performance computing (HPC) and graphics processing units (GPUs) growing, reliability, scalability, and first-time-right silicon cannot be achieved without the introduction of advanced Design for Test (DFT) methodologies. This paper addresses the peculiarities of DFT magnetization to cope with the characteristics of HPC and GPU environment issues: massive parallelism, depth pipelining, multi-clock, power domains, and rising thermal and power density. It covers basic techniques, including scan-based testing, built-in self-test (BIST), logic BIST (LBIST), and a modular and hierarchical test planning framework. Additionally, the paper studies the related key infrastructural pieces, such as test access mechanisms (IJTAG, IEEE 1500), remote debug orchestration, and centralized test control units. Additionally, emerging trends like AI/ML-enabled ATPG, in-field telemetry, predictive maintenance, and DFT innovations in the contexts of chipset-based and 3D-integrated architecture alter the test requirements for the overall multi-die system. It provides best practices in early DFT planning, modular IP reuse, scan chain optimization, and power-aware test pattern generation to obtain high test coverage while maintaining silicon performance. This work presents actionable insights for high-yield silicon design and validation in the next-generation compute platform landscape. It is aimed at silicon architects, DFT engineers, and verification professionals.

Keywords

Design-for-Test (DFT), High-Performance Computing (HPC), GPU Validation, Built-In Self-Test (BIST), Semiconductor Test Automation

References

Abotbol, Y., Dror, S., Tshagharyan, G., Harutyunyan, G., & Zorian, Y. (2022). In-field test solution for enhancing safety in automotive applications. Microelectronics Reliability, 137, 114774.

Bhatelia, S. H. (2017). Scan Analysis & Coverage Improvement forLeading ProcessTechnology (Doctoral dissertation, Institute of Technology).

Chavan, A. (2021). Eventual consistency vs. strong consistency: Making the right choice in microservices. International Journal of Software and Applications, 14(3), 45-56. https://ijsra.net/content/eventual-consistency-vs-strong-consistency-making-right-choice-microservices

Chavan, A. (2024). Fault-tolerant event-driven systems: Techniques and best practices. Journal of Engineering and Applied Sciences Technology, 6, E167. http://doi.org/10.47363/JEAST/2024(6)E167

Chen, S., Tong, X., Huo, Y., Liu, S., Yin, Y., Tan, M. L., ... & Ji, W. (2024). Piezoelectric biomaterials inspired by nature for applications in biomedicine and nanotechnology. Advanced Materials, 36(35), 2406192.

Cheng, X., & DeGiorgio, M. (2020). Flexible mixture model approaches that accommodate footprint size variability for robust detection of balancing selection. Molecular biology and evolution, 37(11), 3267-3291.

Cini, N., & Yalcin, G. (2020). A methodology for comparing the reliability of GPU-based and CPU-based HPCs. ACM Computing Surveys (CSUR), 53(1), 1-33.

Dhanagari, M. R. (2024). MongoDB and data consistency: Bridging the gap between performance and reliability. Journal of Computer Science and Technology Studies, 6(2), 183-198. https://doi.org/10.32996/jcsts.2024.6.2.21

Dhanagari, M. R. (2024). Scaling with MongoDB: Solutions for handling big data in real-time. Journal of Computer Science and Technology Studies, 6(5), 246-264. https://doi.org/10.32996/jcsts.2024.6.5.20

Goel, G., & Bhramhabhatt, R. (2024). Dual sourcing strategies. International Journal of Science and Research Archive, 13(2), 2155. https://doi.org/10.30574/ijsra.2024.13.2.2155

Gulve, R., Bade, D. P., Kulkarni, S., Ricchetti, M., & Cron, A. (2022, July). Test methodology automation for multi-die package realization. In 2022 IEEE International Test Conference India (ITC India) (pp. 1-5). IEEE.

Janicki, J., Mrugalski, G., Stelmach, A., & Urban, S. (2020, November). Scan Chain Diagnosis-Driven Test Response Compactor. In 2020 IEEE 29th Asian Test Symposium (ATS) (pp. 1-6). IEEE.

Jiang, D., Lin, W., & Raghavan, N. (2021). Semiconductor manufacturing final test yield optimization and wafer acceptance test parameter inverse design using multi-objective optimization algorithms. Ieee Access, 9, 137655-137666.

Karwa, K. (2023). AI-powered career coaching: Evaluating feedback tools for design students. Indian Journal of Economics & Business. https://www.ashwinanokha.com/ijeb-v22-4-2023.php

Kim, K. (2015, February). 1.1 silicon technologies and solutions for the data-driven world. In 2015 IEEE International Solid-State Circuits Conference-(ISSCC) Digest of Technical Papers (pp. 1-7). IEEE.

Kim, W., & Katipamula, S. (2018). A review of fault detection and diagnostics methods for building systems. Science and Technology for the Built Environment, 24(1), 3-21.

Koenemann, B. (2018). Design-for-test. In EDA for IC System Design, Verification, and Testing (pp. 21-1). CRC Press.

Kong, T. N., Alias, N. E., Hamzah, A., Kamisian, I., Tan, M. P., Sheikh, U. U., & Wahab, Y. A. (2021, August). An efficient march (5n) FSM-based memory built-in self test (MBIST) architecture. In 2021 IEEE Regional Symposium on Micro and Nanoelectronics (RSM) (pp. 76-79). IEEE.

Konneru, N. M. K. (2021). Integrating security into CI/CD pipelines: A DevSecOps approach with SAST, DAST, and SCA tools. International Journal of Science and Research Archive. Retrieved from https://ijsra.net/content/role-notification-scheduling-improving-patient

Kovács, J., Ligetfalvi, B., & Lovas, R. (2024). Automated debugging mechanisms for orchestrated cloud infrastructures with active control and global evaluation. IEEE Access.

Kumar, A. (2019). The convergence of predictive analytics in driving business intelligence and enhancing DevOps efficiency. International Journal of Computational Engineering and Management, 6(6), 118-142. Retrieved from https://ijcem.in/wp-content/uploads/THE-CONVERGENCE-OF-PREDICTIVE-ANALYTICS-IN-DRIVING-BUSINESS-INTELLIGENCE-AND-ENHANCING-DEVOPS-EFFICIENCY.pdf

Laisne, M., Crouch, A., Portolan, M., Keim, M., von Staudt, H. M., Abdalwahab, M., ... & Rearick, J. (2020, November). Modeling novel non-JTAG IEEE 1687-like architectures. In 2020 IEEE International Test Conference (ITC) (pp. 1-10). IEEE.

Marwala, T. (2024). CPUs Versus GPUs. In The Balancing Problem in the Governance of Artificial Intelligence (pp. 137-152). Singapore: Springer Nature Singapore.

Mazumdar, S. (2017). An Efficient NoC-based Framework To Improve Dataflow Thread Management At Runtime.

Nair, R., Nayak, C., Watkins, L., Fairbanks, K. D., Memon, K., Wang, P., & Robinson, W. H. (2017). The resource usage viewpoint of industrial control system security: an inference-based intrusion detection system. In Cybersecurity for Industry 4.0: Analysis for Design and Manufacturing (pp. 195-223). Cham: Springer International Publishing.

Nyati, S. (2018). Revolutionizing LTL carrier operations: A comprehensive analysis of an algorithm-driven pickup and delivery dispatching solution. International Journal of Science and Research (IJSR), 7(2), 1659-1666. Retrieved from https://www.ijsr.net/getabstract.php?paperid=SR24203183637

Oba, F., & Kumagai, Y. (2018). Design and exploration of semiconductors from first principles: A review of recent advances. Applied Physics Express, 11(6), 060101.

Okasaka, S., Weiler, R. J., Keusgen, W., Pudeyev, A., Maltsev, A., Karls, I., & Sakaguchi, K. (2016). Proof-of-concept of a millimeter-wave integrated heterogeneous network for 5G cellular. Sensors, 16(9), 1362.

Raju, R. K. (2017). Dynamic memory inference network for natural language inference. International Journal of Science and Research (IJSR), 6(2). https://www.ijsr.net/archive/v6i2/SR24926091431.pdf

Sardana, J. (2022). Scalable systems for healthcare communication: A design perspective. International Journal of Science and Research Archive. https://doi.org/10.30574/ijsra.2022.7.2.0253

Sardana, J. (2022). The role of notification scheduling in improving patient outcomes. International Journal of Science and Research Archive. Retrieved from https://ijsra.net/content/role-notification-scheduling-improving-patient

Singh, V. (2022). Visual question answering using transformer architectures: Applying transformer models to improve performance in VQA tasks. Journal of Artificial Intelligence and Cognitive Computing, 1(E228). https://doi.org/10.47363/JAICC/2022(1)E228

Singh, V. (2024). Ethical considerations in deploying AI systems in public domains: Addressing the ethical challenges of using AI in areas like surveillance and healthcare. Turkish Journal of Computer and Mathematics Education (TURCOMAT). https://turcomat.org/index.php/turkbilmat/article/view/14959

Sontakke, V., & Dickhoff, J. (2023). Developments in scan shift power reduction: a survey. Bulletin of Electrical Engineering and Informatics, 12(6), 3402-3415.

Sur, S., Zhang, X., Ramanathan, P., & Chandra, R. (2016). {BeamSpy}: Enabling robust 60 {GHz} links under blockage. In 13th USENIX symposium on networked systems design and implementation (NSDI 16) (pp. 193-206).

Tiwari, D., Gupta, S., Rogers, J., Maxwell, D., Rech, P., Vazhkudai, S., ... & Bland, A. (2015, February). Understanding GPU errors on large-scale HPC systems and the implications for system design and operation. In 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA) (pp. 331-342). IEEE.

Vitucci, C., Sundmark, D., Danielsson, J., Jägemar, M., Larsson, A., & Nolte, T. (2023, November). Run time memory error recovery process in networking system. In 2023 7th International Conference on System Reliability and Safety (ICSRS) (pp. 590-597). IEEE.

Wang, R., Chakrabarty, K., & Bhawmik, S. (2015). Built-in self-test and test scheduling for interposer-based 2.5 D IC. ACM Transactions on Design Automation of Electronic Systems (TODAES), 20(4), 1-24.

Article Statistics

Downloads

Download data is not yet available.

Copyright License

Download Citations

How to Cite

Design-for-Test (DFT) strategies for high-performance computing and graphics chips. (2025). International Journal of Signal Processing, Embedded Systems and VLSI Design, 5(01), 10-34. https://doi.org/10.55640/ijvsli-05-01-03