Big data processing on educational data mining using pyspark with jupyter notebook
The rapid advancement of the information technology brings new challenges and put new demands on our education system. The process of teaching and learning have moved from classroom to Computer Aided Learning (CAL) system. Big data technology and machine learning plays an important role in Computer...
Saved in:
Main Author: | |
---|---|
Format: | Thesis |
Language: | English |
Published: |
2018
|
Online Access: | http://eprints.utm.my/id/eprint/81375/1/VinithaRavichandranMFC2018.pdf http://eprints.utm.my/id/eprint/81375/ http://dms.library.utm.my:8080/vital/access/manager/Repository/vital:119718 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
id |
my.utm.81375 |
---|---|
record_format |
eprints |
spelling |
my.utm.813752019-08-23T04:06:50Z http://eprints.utm.my/id/eprint/81375/ Big data processing on educational data mining using pyspark with jupyter notebook Ravichandran, Vinitha The rapid advancement of the information technology brings new challenges and put new demands on our education system. The process of teaching and learning have moved from classroom to Computer Aided Learning (CAL) system. Big data technology and machine learning plays an important role in Computer Aided Learning (CAL) system due to the massive information or data generated by the system. This leads to the rapid development of data mining in education denote as Educational Data Mining (EDM). The abundance of data collected by the system can be used to analyse, predict and solve many societal issues in the education field such as improve the quality of education, predict as well as monitor educational outcomes. Effective analysing or predicting the future growth of students’ performance can make the Computer Aided Learning (CAL) system a better platform for learning compared to traditional learning. Machine learning techniques were used to get reliable and accurate prediction on students’ performance. Apache Hadoop has been the backbone for big data technology until the emergence of Apache Spark. However, only several researches are done on EDM using Apache Spark. In this dissertation, PySpark was be integrated with Jupyter Notebook to perform EDM on Educational Process Mining (EPM) data set. The Spark MLlib was used to compare four classification algorithms such as Logistic Regression, Naïve Bayes, Decision Tree and Random Forest to deal with EPM data set. Random Forest classifier outperformed other classifiers in Accuracy, Area Under the Precision-Recall(PR) and Area Under the Receiver Operating Characteristic (ROC) although with slightly slower Execution Time in this study. Random Forest classifier are the best classifier when dealing with EDM. 2018 Thesis NonPeerReviewed application/pdf en http://eprints.utm.my/id/eprint/81375/1/VinithaRavichandranMFC2018.pdf Ravichandran, Vinitha (2018) Big data processing on educational data mining using pyspark with jupyter notebook. Masters thesis, Universiti Teknologi Malaysia. http://dms.library.utm.my:8080/vital/access/manager/Repository/vital:119718 |
institution |
Universiti Teknologi Malaysia |
building |
UTM Library |
collection |
Institutional Repository |
continent |
Asia |
country |
Malaysia |
content_provider |
Universiti Teknologi Malaysia |
content_source |
UTM Institutional Repository |
url_provider |
http://eprints.utm.my/ |
language |
English |
description |
The rapid advancement of the information technology brings new challenges and put new demands on our education system. The process of teaching and learning have moved from classroom to Computer Aided Learning (CAL) system. Big data technology and machine learning plays an important role in Computer Aided Learning (CAL) system due to the massive information or data generated by the system. This leads to the rapid development of data mining in education denote as Educational Data Mining (EDM). The abundance of data collected by the system can be used to analyse, predict and solve many societal issues in the education field such as improve the quality of education, predict as well as monitor educational outcomes. Effective analysing or predicting the future growth of students’ performance can make the Computer Aided Learning (CAL) system a better platform for learning compared to traditional learning. Machine learning techniques were used to get reliable and accurate prediction on students’ performance. Apache Hadoop has been the backbone for big data technology until the emergence of Apache Spark. However, only several researches are done on EDM using Apache Spark. In this dissertation, PySpark was be integrated with Jupyter Notebook to perform EDM on Educational Process Mining (EPM) data set. The Spark MLlib was used to compare four classification algorithms such as Logistic Regression, Naïve Bayes, Decision Tree and Random Forest to deal with EPM data set. Random Forest classifier outperformed other classifiers in Accuracy, Area Under the Precision-Recall(PR) and Area Under the Receiver Operating Characteristic (ROC) although with slightly slower Execution Time in this study. Random Forest classifier are the best classifier when dealing with EDM. |
format |
Thesis |
author |
Ravichandran, Vinitha |
spellingShingle |
Ravichandran, Vinitha Big data processing on educational data mining using pyspark with jupyter notebook |
author_facet |
Ravichandran, Vinitha |
author_sort |
Ravichandran, Vinitha |
title |
Big data processing on educational data mining using pyspark with jupyter notebook |
title_short |
Big data processing on educational data mining using pyspark with jupyter notebook |
title_full |
Big data processing on educational data mining using pyspark with jupyter notebook |
title_fullStr |
Big data processing on educational data mining using pyspark with jupyter notebook |
title_full_unstemmed |
Big data processing on educational data mining using pyspark with jupyter notebook |
title_sort |
big data processing on educational data mining using pyspark with jupyter notebook |
publishDate |
2018 |
url |
http://eprints.utm.my/id/eprint/81375/1/VinithaRavichandranMFC2018.pdf http://eprints.utm.my/id/eprint/81375/ http://dms.library.utm.my:8080/vital/access/manager/Repository/vital:119718 |
_version_ |
1643658691575021568 |
score |
13.211869 |