Identification of Acne Vulgaris Type in Facial Acne Images Using GLCM Feature Extraction and Extreme Learning Machine Algorithm

Acne vulgaris or acne is a common inflammatory pilosebaceous condition that affects up to 90% of teenagers, begins during adolescent years, and often persists into adulthood. Acne vulgaris, especially on the face, has a major impact on the emotional, social and psychological health of patients. In treating acne, it is necessary to identify the exact type of acne. The manual method is considered less effective, so it is proposed an automatic method using a computer, which uses image processing techniques. This research was conducted to identify the types of acne on facial acne images. The methods used are K-Means Clustering for segmentation, Gray Level Co-occurrence Matrix (GLCM) for feature extraction, and Extreme Learning Machine (ELM) for classification. The dataset consists of 100 images and consists of 3 classes, namely Nodules, Papules and Pustules, which will be tested in stages to determine the level of accuracy of the ELM method on the classification of acne types. Testing is done in two stages, namely testing 2 classes (Nodules and Papules), followed by testing 3 classes (Nodules, Papules and Pustules). Testing of 2 classes produces the highest accuracy of 95,24% and testing of 3 classes produces the highest accuracy of 80%.


INTRODUCTION
Acne vulgaris or what is commonly called acne is a skin problem that occurs when oil and dead skin cells clog the pores, due to hormonal changes that make the skin more oily (Balbin et al., 2017). Acne is a common inflammatory pilosebaceous condition that affects up to 90% of teenagers, begins during adolescent years, and often persists into adulthood (Latter et al., 2019 ;Lee et al., 2019).The type and severity of this suffering can have severe physical and psychological consequences for sufferers with chronic acne problems (Lucut & Smith, 2016). In adolescents and adults, acne is associated with depression, anxiety, attention deficit hyperactivity disorder, psychosis, or obsessive-compulsive disorder. These psychiatric impacts may be greater in adults, with up to 40% experiencing psychiatric comorbidities (Riahi & Jung, 2020).
Acne vulgaris, especially on the face, has a major impact on the emotional, social and psychological health of patients, as well as the quality of life for adolescents. Significant consequences include feelings of discomfort and self-distrust, emotional and psychosocial distress, consequences on work, and potential psychiatric disorders including depression and suicide (Amini et al., 2018). Acne is a benign evolutionary chronic skin disease characterized by inflammatory processes of hair follicles and sebaceous glands (Maroni et al., 2017). A schematic of the hair follicles and sebaceous glands is shown in Figure 1.
When the sebaceous glands produce a very large amount of sebum, the hair pores under the dermis (skin layer) become blocked. Propronibactrium bacteria will evolve and infect the area. This infection causes acne to develop (Khan et al., 2016). The pleomorphic appearance of acne is defined by primary and secondary skin lesions.
Disease activity will be seen in the primary lesion. The primary skin lesions in acne can be divided into two general categories: non-inflammatory and inflammatory lesions (Becker et al., 2016). Inflammatory lesions consist of comedones (open and closed comedones), papules, pustules and nodules (Lucut & Smith, 2016 ;Becker et al., 2016).

Figure 1. Schematic of Hair Follicles and
Sebaceous Glands Source: (Khan et al., 2016)  Source: (Lucut & Smith, 2016) Identification of the type and number of lesions is necessary for the dermatologist to make an objective diagnosis and to detect changes in the condition while the patient is undergoing treatment (Lucut & Smith, 2016). Without a proper method for determining and documenting the severity of disease, the effects of treatment cannot be fully assessed and the possibility of optimal treatment outcomes is not achieved (Becker et al., 2016).
In the treatment of acne, the first process that is carried out is the counting and classification of acne based on skin lesions. Using the manual method, the dermatologist must count the pimples manually, marking the pimple spots where the pimples are visible (Kittigul & Uyyanonvara, 2016).
Manual observation and counting of acne are less effective traditional methods of diagnosis because they are time-consuming and subjective in nature, where the result of diagnosis depends on the experience and skill of the expert. Therefore, computerized methods are needed, including image processing techniques and machine learning theory (Shen et al., 2018).
Several previous studies have used image processing techniques for the identification of acne diseases, such as distinguishing acne lesions from normal skin lesions (Balbin et al., 2017;Kittigul & Uyyanonvara, 2016), classification of acne according to type (Lucut & Smith, 2016;Amini et al., 2018Shen et al., 2018Junayed et al., 2019;Zaki et al., 2019) as well as the classification of acne according to its severity Zhao et al., 2019). The following are some related studies: In the study (Shen et al., 2018), automatic diagnosis research on facial acne was carried out with the classification divided into two stages, namely binary classification to differentiate skin and non-skin images using Convolutional Neural Network (CNN). Then seven classification was conducted to classify skin image based on acne type using VGG166. The results show the accuracy of each class is more than 81%.
Another study (Amini et al., 2018) identified acne papules and pustules from cellphone images. The methods applied are human face recognition, normalization, Region of Interest (ROI) detection, conversion of RGB images to L*a* b, Gaussian filters, otsu thresholding, and identification. The results showed an accuracy of 92% for detection of ROI and 98% for identifying types of acne papules and pustules.
Another study (Zhao et al., 2019) built a mobile application to detect the severity of the acne from a selfie. ResNet152 pre-trained model was used for feature extraction. The CNN-based regression model performs well at the mild acne level with a recall of 82%, but the ability to differentiate Almost Clear (2) from Mild (3); Moderate (4) from Mild (3), and Severe (5) from Moderate (4), was still unsatisfying. Then (Zaki et al., 2019) perform blackhead detection using image processing techniques. The image used is acquired using a microscope and focuses on the nose and cheeks. Segmentation uses thresholding techniques. Morphological operations are applied using dilation and erosion techniques. Furthermore, so that the results of the blackhead segmentation are clearer and easier to identify, the bounding box is drawn using the 'region props' function. This function is applied to automatically count the number of blackheads, as well as to cut the area to be identified.
This research was conducted to identify the types of acne on facial acne images based on image processing techniques. Before entering the classification process, the image must be segmented to differentiate between acne areas and non-acne areas on the face, with the method used is K-Means Clustering. Furthermore, feature extraction is carried out using the Gray Level Co-Occurrence Matrix (GLCM) method to obtain texture feature values. In the classification process, the author uses the Extreme Learning Machine (ELM) algorithm.
The scope of this research is limited to the segmentation process of facial acne images using the K-Means Clustering method, feature extraction with GLCM texture analysis, and classification of acne types using the ELM algorithm. The data used comes from DermNet New Zealand using the MATLAB program to build applications. The types of acne studied consisted of three classes, namely Nodules, Papules and Pustules.

Dataset
The image used in this study is the image of acne on the face and was obtained from DermNet New Zealand (Dermnet NZ, 2020). The original image is cropped manually to 150150 pixels in JPG format. Image data that has been collected is raw data that does not have a label. For the data to be validated, the cropped image data must be verified by a dermatologist. The doctor will diagnose the type of acne on each image, then give a label according to the type of acne. This is done to make the data ground truth. An example of an image that has been verified by a dermatologist is shown in Figure 2. Figure 2. Image of Facial Acne with Different Types of Acne a) Nodules, b) Papules, c) Pustules Three types of acne will be identified in this study, namely Nodules, Papules and Pustules. The total image data to be processed is 100 images. Furthermore, the dataset is divided into training data and testing data¸ where the data distribution is shown in Table 2. The amount of data in each class is imbalanced, but it is still possible for the classification process to be carried out. This condition has also been carried out in previous studies (Velasco et al., 2019) where the amount of data in each class is not the same, but the classification obtained is quite high, reaching 93.6%. The use of imbalanced datasets is due to the limited data obtained on certain types of acne, for example for the type of Pustules.
In this study, the original RGB images that have been collected will be converted into L*a* b images. After that, the image processing stage is carried out, namely segmentation to distinguish acne areas from non-acne areas in the image. The segmentation stage is carried out using the K-Means Clustering method. After segmenting, facial acne images in RGB format will be converted to grayscale format. The textural features of the grayscale image are then extracted using the GLCM method. GLCM describes statistical features of two orders of image texture and is suitable for many species (Wang et al., 2019), where texture features themselves are important image characteristics that can be used for identification purposes (Kristianto et al., 2021). GLCM has the advantage of providing texture information from an image so that it can represent the texture of the actual object (Arbawa & Dewi, 2020), can get connections of adjacent pixels in various orientations so that the information obtained is more detailed (Wang et al., 2019) (Dewi & Arbawa, 2019), and is not invariant to the gray level transformation (Wang et al., 2019;Zhang et al., 2019). The biggest advantage of GLCM is that it provides more refined and quantitative damage information compared to inspections based solely on conventional visual approaches (Zhu et al., 2019). With the GLCM model, the spatial space extracts the global region of the image (Deotale & Sarode, 2019). The proposed method is illustrated in Figure  3 below.  Figure 3. Research Method The extracted features consist of four features, namely Contrast, Correlation, Energy and Homogeneity. The feature values that have been obtained will be processed at the classification stage. This stage is carried out to identify the type of acne in each testing image. Types of acne vulgaris consist of three classes, namely Nodules, Papules and Pustules. While the algorithm used for classification is ELM. After the entire image processing process has been passed, the results of the identification of the type of acne will be obtained in each test image.

Data Collection Data Collection
In this study, the authors tested the classification in two stages. The first test is carried out using two classes, namely the Nodules class and the Papules class. Furthermore, the second test was carried out using three classes, namely the Nodules class, the Papules class and the Pustules class. The results of these two tests will be compared to determine the performance of the algorithm used in the identification of acne vulgaris types.

Extreme Learning Machine (ELM) Classification
The ELM method, introduced in 2004 by Huang (Novitasari et al., 2020), solves the problem of determining the number of hidden layers in a neural network (Zhang et al., 2020). ELM has three neural network layers with only one hidden layer and an activation function and is always moving forward (Novitasari et al., 2020). This causes ELM to have various advantages over traditional predictive models such as backpropagation neural networks and SVM (Zhang et al., 2020), such as minimized iterations, better results, very fast learning speed (Novitasari et al., 2020), simple network structure (Li et al., 2020), good generalization capabilities, produces the only optimal solution (Zhang et al., 2020), no need to set parameters such as stopping criteria or learning rate, partially overcoming the problem of overfitting and local minimum (Armi et al., 2021), unification of multi-classification, minimal human intervention, ease of implementation and regression (Nagelli et al., 2019).
Extreme Learning Machibe (ELM) is used in various applications such as object recognition, landmark recognition, refractive index identification for ionic liquids, EEG signal classification, protein fold recognition, intrusion detection systems, etc (Nagelli et al., 2019). In implementing the ELM algorithm, several steps that must be taken, including : 1. Determine the number of neurons in the hidden layer The number of neurons in the hidden layer is symbolized by the letter h whose value is determined freely. In this study, the number of neurons used was h = 10, 30 and 50. 2. Entering the training process, preparing the training data matrix The training data is presented in the form of an X matrix measuring N data times d features. In the first test involving two classes of acne, namely Nodules and Papules, the size of the X matrix was 634 (63 training data and 4 GLCM features). Whereas in the second test involving three classes of acne, namely Nodules, Papules and Pustules, the size of the X matrix is 754 (75 training data and 4 GLCM features). 3. Prepare the training data-target label The t matrix contains the target label of each training data which is represented by numbers 1, 2 and 3. Number 1 is the class of Nodules, number 2 is the class of Papules and number 3 is the class of Pustules. The size of the t matrix in the first test is 631 (63 training data) and in the second test is 751 (75 training data). 4. Create an initial weight value matrix (input weight) The initial weight value matrix is symbolized by W with size h, the number of neurons in the hidden layer multiplied by d features. For three different numbers of neurons, the initial weight matrix measures 104, 304 and 504. The determination of the initial weight value is done randomly and is done repeatedly. When the classification results obtained have reached the maximum point, then this initial weight value will be stored so that it can be used as an initial weight value in the system to be built. 5. Calculate the output hidden layer initialization matrix The initialization matrix is symbolized by Hinit where this matrix is the multiplication between the X training data matrix and the initial weight W matrix that has been transposed. Hinit = X × W T ……................ (1) 6. Calculate the hidden layer output matrix with the activation function The activation function is used to determine whether a neuron should be "active" or not based on the weighted sum of the input. The activation functions that will be used in this study are sigmoid, sein and hardlim with the equations that apply to each activation function. The hidden layer output matrix output is symbolized by H. 7. Calculate the weight β (output weight) The calculation of β weight uses the matrix H (output hidden layer) and t (target data training), with the following equation: The method used at this stage is the same as in the fifth step, with the Y matrix used as the testing data matrix. 10. Calculate the hidden layer output matrix with the activation function The method done at this stage is the same as in the sixth step. 11. Calculating the output value of the testing data classification results Artificial neural networks that have been generated in the training process will be tested in the testing process. The test process was carried out to determine the effectiveness of the ELM method in identifying the types of acne vulgaris. The calculation of the output value involves the H value or hidden layer output matrix from the testing data and the β value that has been calculated in the training process. The output value is symbolized by the letter y whose value is a decimal number. For testing class determination, the y value will be rounded. y = H × β ……................ (3) After passing the steps above, the accuracy value of each test will then be calculated.

RESULTS AND DISCUSSION Image Processing Result
The processing results are divided into three parts, namely the results of image pre-processing, the results of segmentation, and the results of feature extraction. The image pre-processing stage is carried out by converting the original RGB image to an L*a*b image. Then the acne image was segmented using the K-Means Clustering method. There are two outputs from this process, namely a black and white binary image showing acne lesions, and a segmented acne image. The feature extraction stage is used to obtain feature values in the segmented image that has previously been converted into a grayscale image. This feature value will be used at the classification stage. Furthermore, the feature extraction process carried out with GLCM produces Contrast, Correlation, Energy and Homogeneity features. The results of the image processing are shown in Table 3. Table 3. Image Processing Results

Image Type Nodules Papules Pustules
Original Image L*a*b Image

Acne Lesions
Segmentation Results

Gray-scale image
After the entire image has gone through the image processing stage, starting from image preprocessing, segmentation, to feature extraction, the next stage is the testing phase to identify the type of acne vulgaris in the image testing. The training image feature value will be used as training. The classification algorithm used in this study is ELM.
In the ELM classification, the testing process is carried out in two stages, namely testing in two classes (Nodules and Papules) and continued testing in three classes (Nodules, Papules and Pustules). The first stage is testing in two classes, namely Nodules and Papules. Total data used were 84 images, with training data as many as 63 images and testing data as many as 21 images. In the second stage, namely testing three classes, all images in the dataset will be used. From 100 images, 75 images become training data and 25 images become testing data.
Tests were carried out with the number of neurons in different hidden layers, namely 10, 30 and 50. The activation functions used were sigmoid, sine and hardlim. The first stage of testing was conducted to determine the performance of the algorithm in the classification of acne vulgaris with 2 classes, namely the Nodules and Papules class. The results of the two classes are shown in Figure 4. Class Testing Process Based on the results of the accuracy of testing in Figure 4 above, it is known that the results of the 2 classes with the highest accuracy were obtained at the hidden neuron value 10, both in the sigmoid activation function and the sine function with an overall accuracy of 95,24%. The confusion matrix is shown in Table 4. Accuracy is the ratio of correct predictions (TP+TN) to the overall data. Accuracy in answering the question "What percentage of acne images are correctly classified from all tested acne images?". Sensitivity or Recall is the ratio of true positive predictions (TP) compared to the total number of true positive data (TP+FN). Sensitivity answers the question "What percentage of acne images are classified correctly according to their type, compared to the overall image of acne on that type?". Precision is the ratio of positive correct predictions (TP) compared to the overall positive predicted results (TP+FP). Precision answers the question "What percentage of acne images are classified correctly according to their type, compared to the overall image predicted as acne of that type?". Apart from calculating the overall accuracy, it is necessary to calculate the accuracy per class. The test results per class in the 2 class tests are shown in Figure 5, which is the best test result on each hidden neuron value.  Figure 5 above, it can be concluded that in the 2 class test, for the Nodules class the highest accuracy was obtained at the hidden neuron value 10 with the sigmoid function and the sine function with 100% accuracy. For the Papules class, the highest accuracy was obtained at the hidden neuron value of 30 and the hardlim function with 100% accuracy. After testing 2 classes, then testing 3 classes, namely the addition of class Pustules. The test results for 3 classes are shown in Figure 6.
Based on the results of the accuracy of testing in Figure 6 above, it is known that the results of the 3 classes with the highest accuracy were obtained at the hidden neuron value 10, both in the sigmoid activation function and the sine function with an overall accuracy of 80%. The results of testing accuracy for 3 classes have decreased compared to testing 2 classes. However, the highest accuracy is obtained in the value of hidden neurons and the activation function which is the same as testing 2 classes. The confusion matrix is shown in Table 5. Table 5 Figure 7, which is the best test result for each hidden neuron value. From the results of the graph shown in Figure 7 above, it can be concluded that in the 3 class test, for the Nodules class the highest accuracy was obtained at the hidden neuron value 10 with the sigmoid function and the sine function with 100% accuracy. For the Papules class, the highest accuracy was obtained at the hidden neuron value of 30 and the hardlim function with 100% accuracy. Whereas for the Pustules class, the highest accuracy was obtained at the hidden neuron value of 50 and the sine function with an accuracy of 50%.
In the previous discussion, two stages of testing were carried out for the classification of types of acne vulgaris, namely by testing 2 classes and testing 3 classes. From the test results, it is known that the accuracy of the test, both overall accuracy and class accuracy. In testing 2 classes, namely for the class Nodules and Papules, the highest overall accuracy is obtained from testing the hidden neuron value 10 with the sigmoid function and the sine function. The overall accuracy obtained has a fairly good value, reaching 95,24%, with a sensitivity value is 1 and precision value is 0,86.
This proves that for case classification with 2 classes, the proposed method can be used to identify the types of acne vulgaris nodules and papules with good results. In addition, with GLCM the features of the acne image can be extracted properly, it is proven after research that these features can produce a very good classification. From the test results, if the value of the class accuracy is calculated, it produces 100% accuracy for the Nodules class and 93.33% for the Papules class.
In the 3 classes test, namely for the class of Nodules, Papules and Pustules, after adding the Pustules class in the test, the overall accuracy decreased compared to the 2 classes test. The highest overall accuracy is obtained from testing the hidden neuron value of 10 with the sigmoid function and the sine function, with an accuracy of 80%. The sensitivity value for 3 class test is 0,64 and precision value is 0,55. If you calculate the accuracy value per class, for the Nodules class and the Papules class the accuracy obtained is the same as the test results for the 2 classes, namely 100% and 93.33%. However, for the Pustules class, the accuracy obtained is 0%, which indicates that the proposed method has not been able to correctly identify the type of acne Pustules.
Unidentified types of acne Pustules can be caused due to poor segmentation results in this type of acne. Morphologically, the type of acne pustules is characterized by the presence of pus at the tip of the pimple, which is a striking feature of acne pustules and other types of acne. However, from the segmentation process carried out with K-Means Clustering, the results of the segmentation types of acne Pustules have not shown the desired results. The segmentation results show that the segmented part is the overall acne along with the inflamed area, where the segmented part should only be the acne area that has pus at the tip. If the inflamed area is also segmented, the morphology of acne pustules will be close to the morphology of acne papules. This is the cause of the test images of acne Pustules all detected as acne papules so that the accuracy is 0%.

System Implementation
Implementation of the system into a GUI-based application is carried out so that the system can be easily understood and used by users. This application was developed with the Matlab R2015a program. Figure 8 below is a display of applications that have been made. format. 3. Axes to display the binary image resulting from K-Means Clustering. 4. Axes to display the image segmentation results. 5. Axes to display the original image with Region of Interest. 6. Push button to select training image. After clicking the button, a dialog box will appear that can be used to select an image file in .JPG format. 7. Edit text to display the previously selected image name.
8. Push button to return the application display to normal. 9. Pop-up menu to select the image to be saved.
The options include: RGB Image, L*a*b Image, Acne Lesion, Segmentation Result, Region of Interest and Grayscale Image. 10. Push button to convert RGB image to L*a*b. 11. Push button to do segmentation with K-Means Clustering. The resulting image output is a black and white binary image. 12. Push button to apply segmentation result to original image. The resulting image output is a segmented image that displays the part with acne and removes the part that is not acne. 13. Push button to display Region of Interest on original image. The resulting image output is the original image by marking the location of the pimple with a green line. 14.

CONCLUSIONS
Based on the research that has been done, it can be concluded that the segmentation results in the Nodules and Papules class are quite good, but in the Pustules class, the K-Means Clustering method has not produced the segmentation output that should be. At the classification stage with the ELM algorithm, both in testing 2 classes and testing 3 classes, the best accuracy was obtained at the hidden neuron value 10 and the sigmoid activation function and the sine function with the overall accuracy value in testing 2 classes of 95.24% and testing 3 classes of 80%. Class accuracy values show high accuracy results in the class of Nodules and Papules both in the 2 classes test or 3 classes test, namely the Nodules class accuracy of 100% and the Papules class accuracy of 93.33%. However, the accuracy of the Pustules class is still very low in testing the 3 classes, which is 0%. The proposed method has been able to identify the types of acne nodules and papules properly but has not been able to identify the types of acne pustules.