Block-based neural network mapping on graphics processor unit
Block-based neural network (BbNN) was introduced to improve the training speed of artificial neural network. Various works had been carried out by previous researchers to improve training speed of BbNN system. Multithread BbNN training on field-programmable gate array (FPGA) limits training speed du...
Saved in:
Main Author: | |
---|---|
Format: | Thesis |
Language: | English |
Published: |
2015
|
Subjects: | |
Online Access: | http://eprints.utm.my/id/eprint/53959/1/OngChinTongMFKE2015.pdf http://eprints.utm.my/id/eprint/53959/ http://dms.library.utm.my:8080/vital/access/manager/Repository/vital:86538 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
id |
my.utm.53959 |
---|---|
record_format |
eprints |
spelling |
my.utm.539592020-10-08T04:38:37Z http://eprints.utm.my/id/eprint/53959/ Block-based neural network mapping on graphics processor unit Ong, Chin Tong TK Electrical engineering. Electronics Nuclear engineering Block-based neural network (BbNN) was introduced to improve the training speed of artificial neural network. Various works had been carried out by previous researchers to improve training speed of BbNN system. Multithread BbNN training on field-programmable gate array (FPGA) limits training speed due to low performance of Nios II software used for communication between central processing unit (CPU) and FPGA. This project aims to improve training speed of multithread BbNN block by mapping BbNN model into Compute Unified Device Architecture (CUDA) core. In this project, each BbNN block is mapped into a CUDA core with each core running on a single thread. The functional verification of BbNN core is carried out based on the BbNN output accuracy value. Near 100 percent accuracy value obtained is used to verify the CUDA mapped BbNN. The performance trade-off analysis had been carried out by comparing the accuracy value obtained from BbNN evolution on GPU versus CPU implementations. From the results obtained, it is found out that the performance of CUDA-mapped BbNN can only be as fast as CPU-mapped implementation. Although CUDA-mapped BbNN implementation run multiple BbNN blocks training in parallel, large data transfer between CPU and GPU dominates the performance gain in training multiple BbNN blocks in parallel. Besides that, a significant gain in training speed can only be seen if the order of complexity for GPU execution is at a higher order compared to the order of CPU-GPU data transfer. The result obtained in this project provides recommendation for future research works on how to further improve the training speed of CUDA-base BbNN implementation. 2015-06 Thesis NonPeerReviewed application/pdf en http://eprints.utm.my/id/eprint/53959/1/OngChinTongMFKE2015.pdf Ong, Chin Tong (2015) Block-based neural network mapping on graphics processor unit. Masters thesis, Universiti Teknologi Malaysia, Faculty of Electrical Engineering. http://dms.library.utm.my:8080/vital/access/manager/Repository/vital:86538 |
institution |
Universiti Teknologi Malaysia |
building |
UTM Library |
collection |
Institutional Repository |
continent |
Asia |
country |
Malaysia |
content_provider |
Universiti Teknologi Malaysia |
content_source |
UTM Institutional Repository |
url_provider |
http://eprints.utm.my/ |
language |
English |
topic |
TK Electrical engineering. Electronics Nuclear engineering |
spellingShingle |
TK Electrical engineering. Electronics Nuclear engineering Ong, Chin Tong Block-based neural network mapping on graphics processor unit |
description |
Block-based neural network (BbNN) was introduced to improve the training speed of artificial neural network. Various works had been carried out by previous researchers to improve training speed of BbNN system. Multithread BbNN training on field-programmable gate array (FPGA) limits training speed due to low performance of Nios II software used for communication between central processing unit (CPU) and FPGA. This project aims to improve training speed of multithread BbNN block by mapping BbNN model into Compute Unified Device Architecture (CUDA) core. In this project, each BbNN block is mapped into a CUDA core with each core running on a single thread. The functional verification of BbNN core is carried out based on the BbNN output accuracy value. Near 100 percent accuracy value obtained is used to verify the CUDA mapped BbNN. The performance trade-off analysis had been carried out by comparing the accuracy value obtained from BbNN evolution on GPU versus CPU implementations. From the results obtained, it is found out that the performance of CUDA-mapped BbNN can only be as fast as CPU-mapped implementation. Although CUDA-mapped BbNN implementation run multiple BbNN blocks training in parallel, large data transfer between CPU and GPU dominates the performance gain in training multiple BbNN blocks in parallel. Besides that, a significant gain in training speed can only be seen if the order of complexity for GPU execution is at a higher order compared to the order of CPU-GPU data transfer. The result obtained in this project provides recommendation for future research works on how to further improve the training speed of CUDA-base BbNN implementation. |
format |
Thesis |
author |
Ong, Chin Tong |
author_facet |
Ong, Chin Tong |
author_sort |
Ong, Chin Tong |
title |
Block-based neural network mapping on graphics processor unit |
title_short |
Block-based neural network mapping on graphics processor unit |
title_full |
Block-based neural network mapping on graphics processor unit |
title_fullStr |
Block-based neural network mapping on graphics processor unit |
title_full_unstemmed |
Block-based neural network mapping on graphics processor unit |
title_sort |
block-based neural network mapping on graphics processor unit |
publishDate |
2015 |
url |
http://eprints.utm.my/id/eprint/53959/1/OngChinTongMFKE2015.pdf http://eprints.utm.my/id/eprint/53959/ http://dms.library.utm.my:8080/vital/access/manager/Repository/vital:86538 |
_version_ |
1681489452065619968 |
score |
13.211869 |