Machine learning and deep learning approach for medical image analysis: diagnosis to detection


Computer-aided detection using Deep Learning (DL) and Machine Learning (ML) shows tremendous growth in the medical field. Medical images are considered as the actual origin of appropriate information required for diagnosis of disease. Detection of disease at the initial stage, using various modalities, is one of the most important factors to decrease mortality rate occurring due to cancer and tumors. Modalities help radiologists and doctors to study the internal structure of the detected disease for retrieving the required features. ML has limitations with the present modalities due to large amounts of data, whereas DL works efficiently with any amount of data. Hence, DL is considered as the enhanced technique of ML where ML uses the learning techniques and DL acquires details on how machines should react around people. DL uses a multilayered neural network to get more information about the used datasets. This study aims to present a systematic literature review related to applications of ML and DL for the detection along with classification of multiple diseases. A detailed analysis of 40 primary studies acquired from the well-known journals and conferences between Jan 2014–2022 was done. It provides an overview of different approaches based on ML and DL for the detection along with the classification of multiple diseases, modalities for medical imaging, tools and techniques used for the evaluation, description of datasets. Further, experiments are performed using MRI dataset to provide a comparative analysis of ML classifiers and DL models. This study will assist the healthcare community by enabling medical practitioners and researchers to choose an appropriate diagnosis technique for a given disease with reduced time and high accuracy.


The significance of disease classification and prediction can be observed from the previous years. The important properties and features given in a dataset should be well-known to identify the exact cause along with the symptom of the disease. Artificial Intelligence (AI) has shown promising results by classifying and assisting in decision making. Machine Learning (ML), a subset of AI, has accelerated many research related to the medical field. Whereas, Deep Learning (DL) is a subset of ML that deals with neural network layers, analyzing the exact features required for disease detection [34, 71, 94]. The existing studies from 2014 to present, discusses many applications and algorithms developed for enhancing the medical field by providing accurate results for a patient. Using data, ML has driven advanced technologies in many areas including natural language processing, automatic speech recognition, and computer vision to deliver robust systems such as driverless cars, automated translation, etc. Despite all advances, the application of ML in medical care remained affected with hazards. Many of these issues were raised from medical care stating the goal of making accurate predictions using the collected data and managed by the medical system.

AI examines a given dataset using various techniques to get the required features or highlights from a huge amount of data resulting in difficulty for tracking down an ideal arrangement of significant features and excluding repetitive ones. Considering such features is inconvenient and accuracy metrics becomes erroneous. Hence, choosing a small subset from a wide scope of features will upgrade the efficiency of the model. Subsequently, the exclusion of inconvenient and repetitive features will decline the dimensionality of the information, speed up the learned model similar to boosting [37]. From the existing features, the significant features are extracted using practical approaches such as Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA). Particularly, choosing a feature has two essential clashing objectives, first, boosting the presentation of arrangement and second, limiting the count of features to conquer the issue of dimensionality. Hence, selection of features is considered as an essential task for aforementioned objectives. Later, research related to the features improvement was enhanced by using choice-based multi-target strategies. Thus, in this review, strategies to choose efficient features will be focused.

Cancer disease was identified using multiple techniques of image segmentation, feature selection, and regression using Root Mean Square Error (RMSE), with the parameters such as recognizing patterns, detecting objects, and classifying the image [7]. Brain tumor was detected using six classifiers and Transfer Learning (TL) techniques for image segmentation with Magnetic Resonance Imaging (MRI) of the brain [28]. Also, a TL approach was implemented to identify lung cancer and brain disease in [55]. It analyzed MRI and Computer-Tomography (CT) scan images by using supervised learning Support Vector Machine (SVM) classifiers. The image analysis process has been well understood in the existing studies. However, the techniques using ML and DL are continuously being updated. Therefore, it is a complex task for researchers to identify an accurate method for analyzing images and feature selection techniques varying with every method. The key contributions of this study include:

  1. (i)Classification of diseases after reviewing primary studies,
  2. (ii)Recognition of various image modalities provided by existing articles,
  3. (iii)Description of tools along with reliable ML and DL techniques for disease prediction,
  4. (iv)Dataset description to provide awareness of available sources,
  5. (v)Experimental results using MRI dataset to compare different ML and DL methods,
  6. (vi)Selection of suitable features and classifiers to get better accuracy, and.
  7. (vii)Insights on classification as well as review of the techniques to infer future research.

The significance of this review is to enable physicians or clinicians to use ML or DL techniques for precise and reliable detection, classification and diagnosis of the disease. Also, it will assist clinicians and researchers to avoid misinterpretation of datasets and derive efficient algorithms for disease diagnosis along with information on the multiple modern medical imaging modalities of ML and DL.

The study presented consists of 11 sections. The organization of the section is described as follows: Section 2 discusses the background of study, Section 3 discusses the review techniques, search criteria, source material and the quality assessment. Section 4 summarizes the current techniques and important parameters to acquire good accuracy. Section 5 gives an insight of medical image modalities. Section 6 sums up the tools and techniques being used in ML and DL models. Section 7 discusses the datasets used by the authors previously and gives an insight of data. Section 8 represents the experimental section using ML classifiers and DL models over brain MRI dataset. Section 9 recaps the analytic discussion about the techniques, datasets being used, tools in ML and DL, journals studied for the given article. Discussion, conclusion and future scope is discussed in Sections 10 and 11, respectively.


This section discusses the preliminary terms which are required to comprehend this review. Further, it also presents the statistical analysis of ML and DL techniques used for medical image diagnosis.

Machine learning

ML is a branch of AI where a machine learns from the data by identifying patterns and automates decision-making with minimum human intervention [96, 24, 12]. The most important characteristic of a ML model is to adapt independently, learn from previous calculations and produce reliable results when new datasets are exposed to models repeatedly. The two main aspects include (i) ML techniques help the physicians to interpret medical images using Computer Aided Design (CAD) in a small period of time, and (ii) algorithms used for challenging tasks like segmentation with CT scan [81], breast cancer and mammography, segmenting brain tumors with MRI. Traditional ML models worked on structured datasets where the techniques were predefined for every step, the applied technique fails if any of the steps were missed. The process of evaluating the data quality used by ML and DL algorithms is essential [1622, 61]. Whereas, new algorithms adapt the omission of data based on the requirement for robustness of the algorithm. Figure 1 illustrates the process used by ML algorithms for the prediction and diagnosis of disease.

Conclusions and future work

This study provides an overview of various ML and DL approaches for the disease diagnosis along with classification, imaging modalities, tools, techniques, datasets and challenges in the medical domain. MRI and X-Ray scans are the most commonly used modalities for the disease diagnosis. Further, among all the tools and techniques studied, MATLAB and SVM dominated, respectively. It was observed that MRI dataset is widely used by researchers. Also, a series of experiments using MRI dataset has provided a comparative analysis of ML classifiers and DL models where CNN (97.6%) and RF (96.93%) have outperformed other algorithms. This study indicates that there is a need to include denoising techniques with DL models in the healthcare domain. It also concludes that various classical ML and DL techniques are extensively applied to deal with data uncertainty. Due to the superior performance, DL approaches have recently become quite popular among researchers. This review will assist healthcare community, physicians, clinicians and medical practitioners to choose an appropriate ML and DL technique for the diagnosis of disease with reduced time and high accuracy.

Future work will incorporate DL approaches for the diagnosis of all diseases considering noise removal from any given dataset. The additional aspects and properties of DL models for medical images can be explored. To increase the accuracy, enormous amount of data is required, therefore, the potential of the model should be improved to deal with large datasets. Also, different data augmentation techniques along with required features of the dataset can be explored to attain better accuracy.

Authors: Meghavi Rana & Megha Bhushan

Journal Link: Multimedia Tools and Applications

Download full abstract: