Welcome to NLDML 2022

International Conference on NLP, Data Mining and Machine Learning (NLDML 2022)

March 12~13, 2022, Virtual Conference



Accepted Papers
Performance Analysis Stage of Cancer Patient Level using Machine Learning Approaches

Alex Han1 and Maria Sultana Keya2, 1Independent Researcher, Chadwick International, Incheon, South Korea, 2Computer Science Department, Jahangirnagar University, Dhaka, Bangladesh

ABSTRACT

Cancer is an illness that uncontrollably grows some tissues and organs and spreads to different bodily parts. Cancer can start in any of the billions of neurons that make up the human substance. There are several cancer causes, some of which are avoidable and some of which are not. Therefore, with timely prediction of cancer patients' levels and avoidance of specific harmful habits, people with cancer can benefit from early treatment. In this research paper there are 1000 different levels (high, mid, low) of cancer patients and this data taken from Kaggle. For calculating the likelihood of a cancer, four models of ML are utilized and these are ada boost, logistic regression, random forest, and naive bayes. This research includes correlation matrices and a visual representation of the characteristic, confusion matrices and AUC when determining the optimization method. There are also used Recursive Feature Elimination (RFE) for finding the most important attribute which causes cancer. In this research paper random forests are the best model with 100% AUC.

KEYWORDS

Machine Learning Algorithm, RFE, Correlation matrices, AUC.


A Novel Model for Perspective Generation

Fatima Alkhawaldeh, Tommy Yuan and Dimitar Kazakov, Department of Computer Science, University of York, YO10 5GH, UK

ABSTRACT

Consideration of multiple viewpoints on a contentious issue is critical for avoiding bias and assisting in the formulation of rational decisions. We observe that the current model imposes a constraint on diversity. This is because the conventional attention mechanism is biased toward a single semantic aspect of the claim, whereas the claim may contain multiple semantic aspects. Additionally, disregarding common-sense knowledge may result in generated perspectives that violate known facts about the world. The proposed approach is divided into two stages: the first stage considers multiple semantic aspects, which results in more diverse generated perspectives; the second stage improves the quality of generated perspectives by incorporating common-sense knowledge. We train the model on each stage using reinforcement learning and the automated metric scores. The experimental results demonstrate the effectiveness of our proposed model in generating a broader range of perspectives on a contentious subject.

KEYWORDS

Perspective generation, diversity, quality, common-sense and deep learning.


Identifying Exoplanets with Machine Learning Methods: A Preliminary Study

Yucheng Jin, Lanyi Yang and Chia-En Chiang, EECS Department, University of California-Berkeley, Berkeley, CA, USA

ABSTRACT

The discovery of habitable exoplanets has long been a heated topic in astronomy. Traditional methods for exoplanet identification include the wobble method, direct imaging, gravitational microlensing, etc., which not only require a considerable investment of manpower, time, and money, but also are limited by the performance of astronomical telescopes. In this study, we proposed the idea of using machine learning methods to identify exoplanets. We used the Kepler dataset collected by NASA from the Kepler Space Observatory to conduct supervised learning, which predicts the existence of exoplanet candidates as a three-categorical classification task, using decision tree, random forest, naïve Bayes, and neural network; we used another NASA dataset consisted of the confirmed exoplanets data to conduct unsupervised learning, which divides the confirmed exoplanets into different clusters, using k-means clustering. As a result, our models achieved accuracies of 99.06%, 92.11%, 88.50%, and 99.79%, respectively, in the supervised learning task and successfully obtained reasonable clusters in the unsupervised learning task.

KEYWORDS

Exoplanets Identification, Kepler Dataset, Classification Tree, Random Forest, Naïve Bayes, Multi-layer Perceptron, K-means Clustering.


Multi Language Translator

Hemanth Phani Srinivas, Rachna Raj, Mohammed Wazahath Hussain & Saurav Rameshmoorthy, Computer Science Department, PES University, Bangalore, India

ABSTRACT

The paper is a proposal for a Multi Language Translator which is a system that takes input in mixed language and converts it into a single preferred language. The project restricts its domain to the communication only with respect to banks and its transactions. People tend to prominently use other languages while communicating. So, this project's purpose is to help them communicate and to remove the language barriers. This benefits the people who are migrating to communicate in their workplaces. This project takes the input in audio format and languages included are Hindi, English and Kannada only. The machine learning tools are made use for the translation. The Neural Machine Translation model is being deployed which uses the LSTM -RNN model for training.The main aim of the project is to develop a product that will be able to translate a mixed language into a user-specified language.

KEYWORDS

translator, mixed language, NMT.


Quran Topic Classification Using Mobile Phone Computing and Fuzzy Logic System

Mohamed Refaee, Pioneers for Educational Services & Consultancy Company Newcastle, Bt33 0BN, NI, UK

ABSTRACT

This study was conducted to provide a new methodology for Quran topic classification using mobile phone computing instead of personal computing. Mobile phone applications were used to preprocess chapter al kahf (Cave) and quantify words for the purpose of classification. Fuzzy logic simulator is a mobile application and was used to classify words of the Cave chapter into four topics. The fuzzy system was able to correctly classify related words to the right topics and the performance of the classifier is 78%. Future work for this study will be focusing on improving the performance of the classifier and its application to the whole chapters of The holy Quran.

KEYWORDS

Mobile phone computing, Topics classification, Fuzzy classifier system.


Research on White-box Counter-Attack Method based on Convolution Neural Network Face Recognition

Shuya Tian, Xiangwei Lai, College of Computer and Information Sciencce, Southwest University, Chongqing, China

ABSTRACT

In recent years, deep neural network has been widely used in face recognition, in which the model of a convolution neural network for face recognition is mostly black box model. Because the model structure and related parameters can not be obtained, the attack effect of the counter sample is poor. In order to better realize the attack effect of the black box attack, this paper uses the white box attack to realize the black box attack. Aiming at the convolution neural network face recognition model, this paper proposes an improved FGMS counterattack algorithm, which uses the cosine similarity between the clean sample and the antagonistic sample as the loss function. The threshold is set to 0.8 as the condition for the success of the attack. In order to avoid excessive changes in the image, the threshold super-parameters is set to limit the range and size of the disturbance fluctuation, so that the countermeasure samples are not easy to be detected and improve the visual quality. Countermeasure samples are detected by black box attack on the VGG16 model, and a good attack effect is obtained.

KEYWORDS

Face recognition, adversarial examples, White box attack.


Application of Emotional Voice User Interface in Securities Industry

Ma Xueming and Chen Yao, Orient Securities Co., Ltd, Shanghai, China

ABSTRACT

Through the combination of sentiment algorithm and emotional design, the application and prospect of intelligent service robot in the securities industry are explored. Based on the method of user research, an emotional voice user interface system is constructed, which endows the robot with the ability of sentiment recognition and feedback, and enables the intelligent transformation of the securities business department. Comparative experimental results showthat after optimization, customers' investment inclination and satisfaction with the robot have been significantly improved. This will help the securities business department to engage with more customers while optimizing the service experience.

KEYWORDS

voice user interface, intelligent service robot, sentiment algorithmemotional design, securities industry.


Crime Prediction using Social Media Data: A Data Mining Approach

Fatai Jimoh1, Mohamad Saraee1, Azar Shahgholian2, 1School of Engineering Science and Environment, University of Salford, Manchester, UK, 2Liverpool Business School, Liverpool John Moores University, Liverpool, UK

ABSTRACT

The conventional data mining approach to crime prediction models often depends on historical data. However, considering the current global crime trend where offenders frequently register their criminal intent on social media and also invite others to witness and/or participate in various crimes, there is a need for an alternative and more dynamic approach.This paper, therefore, applied an ensemble of machine learning algorithms to reducethe crime rate in the greater Manchester, UK by combining historical crime data with tweet data. To overcome the problem with data qualities and bias in our prediction, we ensured that all the features used in this work were from the same year and same month. The ensembled method showed the highest performance in predicting different categories of crimes with the highest accuracy when compared with base models and our study also revealed the contribution of sentiment score to the overall performance of the model. Finally, we conclude that social media data if properly mined would contribute to an improved prediction on the likelihood of crimes occurrence as well as their prediction.

KEYWORDS

Data Mining, Crime Prediction, Machine learning, Sentiment Analysis, Stacked Ensemble.