Staff View: Block-based neural network mapping on graphics processor unit

Block-based neural network mapping on graphics processor unit

Block-based neural network (BbNN) was introduced to improve the training speed of artificial neural network. Various works had been carried out by previous researchers to improve training speed of BbNN system. Multithread BbNN training on field-programmable gate array (FPGA) limits training speed du...

Full description

Saved in:

Bibliographic Details
Main Author:	Ong, Chin Tong
Format:	Thesis
Language:	English
Published:	2015
Subjects:	TK Electrical engineering. Electronics Nuclear engineering
Online Access:	http://eprints.utm.my/id/eprint/53959/1/OngChinTongMFKE2015.pdf http://eprints.utm.my/id/eprint/53959/ http://dms.library.utm.my:8080/vital/access/manager/Repository/vital:86538
Tags:	Add Tag No Tags, Be the first to tag this record!

id	my.utm.53959
record_format	eprints
spelling	my.utm.539592020-10-08T04:38:37Z http://eprints.utm.my/id/eprint/53959/ Block-based neural network mapping on graphics processor unit Ong, Chin Tong TK Electrical engineering. Electronics Nuclear engineering Block-based neural network (BbNN) was introduced to improve the training speed of artificial neural network. Various works had been carried out by previous researchers to improve training speed of BbNN system. Multithread BbNN training on field-programmable gate array (FPGA) limits training speed due to low performance of Nios II software used for communication between central processing unit (CPU) and FPGA. This project aims to improve training speed of multithread BbNN block by mapping BbNN model into Compute Unified Device Architecture (CUDA) core. In this project, each BbNN block is mapped into a CUDA core with each core running on a single thread. The functional verification of BbNN core is carried out based on the BbNN output accuracy value. Near 100 percent accuracy value obtained is used to verify the CUDA mapped BbNN. The performance trade-off analysis had been carried out by comparing the accuracy value obtained from BbNN evolution on GPU versus CPU implementations. From the results obtained, it is found out that the performance of CUDA-mapped BbNN can only be as fast as CPU-mapped implementation. Although CUDA-mapped BbNN implementation run multiple BbNN blocks training in parallel, large data transfer between CPU and GPU dominates the performance gain in training multiple BbNN blocks in parallel. Besides that, a significant gain in training speed can only be seen if the order of complexity for GPU execution is at a higher order compared to the order of CPU-GPU data transfer. The result obtained in this project provides recommendation for future research works on how to further improve the training speed of CUDA-base BbNN implementation. 2015-06 Thesis NonPeerReviewed application/pdf en http://eprints.utm.my/id/eprint/53959/1/OngChinTongMFKE2015.pdf Ong, Chin Tong (2015) Block-based neural network mapping on graphics processor unit. Masters thesis, Universiti Teknologi Malaysia, Faculty of Electrical Engineering. http://dms.library.utm.my:8080/vital/access/manager/Repository/vital:86538
institution	Universiti Teknologi Malaysia
building	UTM Library
collection	Institutional Repository
continent	Asia
country	Malaysia
content_provider	Universiti Teknologi Malaysia
content_source	UTM Institutional Repository
url_provider	http://eprints.utm.my/
language	English
topic	TK Electrical engineering. Electronics Nuclear engineering
spellingShingle	TK Electrical engineering. Electronics Nuclear engineering Ong, Chin Tong Block-based neural network mapping on graphics processor unit
description	Block-based neural network (BbNN) was introduced to improve the training speed of artificial neural network. Various works had been carried out by previous researchers to improve training speed of BbNN system. Multithread BbNN training on field-programmable gate array (FPGA) limits training speed due to low performance of Nios II software used for communication between central processing unit (CPU) and FPGA. This project aims to improve training speed of multithread BbNN block by mapping BbNN model into Compute Unified Device Architecture (CUDA) core. In this project, each BbNN block is mapped into a CUDA core with each core running on a single thread. The functional verification of BbNN core is carried out based on the BbNN output accuracy value. Near 100 percent accuracy value obtained is used to verify the CUDA mapped BbNN. The performance trade-off analysis had been carried out by comparing the accuracy value obtained from BbNN evolution on GPU versus CPU implementations. From the results obtained, it is found out that the performance of CUDA-mapped BbNN can only be as fast as CPU-mapped implementation. Although CUDA-mapped BbNN implementation run multiple BbNN blocks training in parallel, large data transfer between CPU and GPU dominates the performance gain in training multiple BbNN blocks in parallel. Besides that, a significant gain in training speed can only be seen if the order of complexity for GPU execution is at a higher order compared to the order of CPU-GPU data transfer. The result obtained in this project provides recommendation for future research works on how to further improve the training speed of CUDA-base BbNN implementation.
format	Thesis
author	Ong, Chin Tong
author_facet	Ong, Chin Tong
author_sort	Ong, Chin Tong
title	Block-based neural network mapping on graphics processor unit
title_short	Block-based neural network mapping on graphics processor unit
title_full	Block-based neural network mapping on graphics processor unit
title_fullStr	Block-based neural network mapping on graphics processor unit
title_full_unstemmed	Block-based neural network mapping on graphics processor unit
title_sort	block-based neural network mapping on graphics processor unit
publishDate	2015
url	http://eprints.utm.my/id/eprint/53959/1/OngChinTongMFKE2015.pdf http://eprints.utm.my/id/eprint/53959/ http://dms.library.utm.my:8080/vital/access/manager/Repository/vital:86538
_version_	1681489452065619968
score	13.211869

Block-based neural network mapping on graphics processor unit

Similar Items