Home Crop Management Impact of Loss Functions and Explainable AI on Plant Disease Detection Using Transfer Learning

Impact of Loss Functions and Explainable AI on Plant Disease Detection Using Transfer Learning

by Anam Fatima
Impact of Loss Functions and Explainable AI on Plant Disease Detection Using Transfer Learning

Agriculture has always been the cornerstone of human survival, providing food, economic stability, and livelihoods. However, plant diseases pose a relentless threat to this vital sector, leading to significant crop losses, increased costs, and environmental harm.

A recent study by researchers Verma, Pantola, and Singh explores how cutting-edge artificial intelligence (AI) techniques, specifically deep learning (a subset of machine learning that uses multi-layered neural networks to analyze complex data) and transfer learning (a method where a pre-trained model is adapted to a new task, saving time and resources), can transform the way we detect and manage plant diseases.

The Role of AI Solutions in Early Crop Disease Prevention

Plant diseases are responsible for destroying up to 40% of global crop yields annually, costing the world economy over $220 billion each year. For instance, diseases like tomato late blight can wipe out 80% of a harvest if not caught early, while wheat rust  has devastated farms in developing nations, pushing communities toward food insecurity.

Traditional methods of disease detection, such as visual inspections by farmers or excessive pesticide use, are not only inefficient but also harmful to ecosystems and human health.  By using advanced algorithms to analyze images of plant leaves, AI can identify diseases with remarkable accuracy, enabling farmers to act before crops are irreversibly damaged.

  • The study highlights that early detection powered by AI could reduce pesticide use by 30–50%, cutting costs and minimizing environmental impact.

To achieve this, the researchers turned to a powerful combination of deep learning models (AI systems inspired by the human brain’s structure, capable of learning patterns from data) and transfer learning, a technique that allows AI systems to apply knowledge from one task to another.

Their work focuses on optimizing these models by testing different loss functions—a critical component of AI training that measures how well the model is performing by quantifying the difference between predicted and actual outcomes.

Leveraging the PlantVillage Dataset for Accurate AI Diagnoses

At the heart of this study lies the PlantVillage dataset, a widely recognized collection of 54,305 high-quality images spanning 38 categories of plants and their diseases.

These images cover 14 major crops, including apples, grapes, tomatoes, and potatoes, with examples of diseases like powdery mildew (a fungal disease creating white patches on leaves), bacterial spots (small, water-soaked lesions caused by pathogens), and blight (rapid plant death due to fungi or bacteria).

To mimic real-world conditions, the researchers applied data augmentation techniques (methods to artificially expand datasets by altering images) such as rotating images by 30 degrees, flipping them horizontally, and adjusting their size. These steps help the AI recognize diseases under varying angles, lighting, and orientations.

The dataset was split into two parts: 70% for training the models (38,013 images) and 30% for testing their accuracy (16,292 images). This rigorous approach ensures that the models can generalize (perform well on new, unseen data) effectively—a crucial factor for real-world deployment.

Evaluating Deep Learning Architectures for Optimal Disease Identification

The study evaluated eight pre-trained deep learning models, each with distinct architectures and strengths. These models were chosen for their proven success in image recognition tasks and their suitability for transfer learning.

  • Starting with VGG16 and VGG19, these models are known for their deep architectures, comprising 16 and 19 layers, respectively.

While they deliver high accuracy, their computational demands are significant, with VGG19 requiring nearly 20 billion FLOPs (floating-point operations, a measure of computational complexity).

In contrast, ResNet50 uses skip connections (shortcuts that allow data to bypass layers, preventing the “vanishing gradient” problem where deep networks struggle to learn) to address challenges in training deep networks, striking a balance between depth and efficiency. DenseNet121 takes a different approach by densely connecting layers, allowing features to be reused effectively across the network.

For practical applications in resource-limited settings, MobileNetV2 and SqueezeNet stand out. MobileNetV2 is designed for mobile and edge devices, offering impressive accuracy with minimal computational requirements just 326 million FLOPs. SqueezeNet, with only 0.75 million parameters, is even lighter, making it ideal for low-power environments.

GoogleNet, another model in the study, uses inception modules (layers that process multiple filter sizes simultaneously) to detect diseases at different scales, enhancing its versatility.

Key Loss Functions Enhancing AI Model Accuracy in Agriculture

loss function acts as a guide for AI models during training, measuring how far their predictions are from the correct answers and steering improvements. The researchers tested five loss functions to determine which ones work best for plant disease detection.

Cross-entropy loss, the most commonly used function for classification tasks, performed well overall, especially with MobileNetV2 and DenseNet121. However, it struggles with class imbalanceDice loss, designed for image segmentation (outlining disease regions), focuses on overlapping areas between predictions and actual disease spots.

While it worked moderately well with SqueezeNet (82.51% accuracy), most models scored poorly with this function, highlighting its limitations in classification tasks.Focal loss addresses class imbalance by prioritizing hard-to-classify examples (cases where the model is less confident).

MobileNetV2 achieved 96.54% accuracy with this function, showcasing its potential for real-world scenarios where certain diseases are underrepresented in the data. Intersection over Union (IoU) loss, another segmentation-focused function, delivered mixed results, with SqueezeNet again being the top performer at 80.67% accuracy.

The standout performer was label smoothing cross-entropy loss, a modified version of cross-entropy that prevents models from becoming overconfident in their predictions by replacing rigid “0” or “1” labels with softer values (e.g., 0.9 for diseased).

ResNet50 achieved a remarkable 97.38% accuracy with this function, along with precision (ability to correctly identify diseased samples), recall, and F1 score (balance of precision and recall) all exceeding 97%. MobileNetV2 followed closely with 96.62% accuracy, proving that high performance doesn’t always require heavy computational resources.

Critical Performance Metrics for Agricultural AI Applications

The study evaluated models using seven key metrics: accuracy, precision, recall, F1 score, rank graduation accuracy (RGA) (a metric assessing how confidently models rank predictions), computational complexity (FLOPs), and training/testing times.

Accuracy measures the percentage of correct predictions, while precision and recall assess how well the model identifies true positives and avoids false alarms. The F1 score balances these two metrics, providing a holistic view of performance.

RGA, a newer metric, evaluates how confidently the model ranks its predictions, with ResNet50 scoring 99.63%—a near-perfect result.

Computational complexity, measured in FLOPs, revealed stark differences between models. VGG19, for example, required 19.6 billion FLOPs, while MobileNetV2 needed just 326 million. Training times also varied widely: VGG19 took over two hours (8,588 seconds), whereas MobileNetV2 completed training in under 20 minutes (1,153 seconds).

Testing times further emphasized the practicality of lightweight models, with MobileNetV2 processing images in 6.8 seconds compared to VGG19’s 39 seconds.These metrics underscore a critical trade-off: while larger models like ResNet50 deliver top-tier accuracy, smaller models like MobileNetV2 and SqueezeNet offer a viable balance of performance and efficiency for real-world use.

Building Trust in AI with Grad-CAM++ Visualizations

One of the biggest hurdles in adopting AI for agriculture is the “black box” problem—models provide answers without explaining how decisions are made. To tackle this, the researchers used Grad-CAM++ (Gradient-weighted Class Activation Mapping++), an explainable AI (XAI) technique that highlights the regions of an image most influential in the model’s prediction.

Building Trust in AI with Grad-CAM++ Visualizations

For example, when analyzing a cherry leaf with powdery mildew, Grad-CAM++ generated a heatmap (a visual overlay showing areas of focus) highlighting the model’s attention on white fungal patches. Similarly, for grape leaves affected by black rot, the heatmap emphasized brown lesions with yellow halos—key visual indicators of the disease.

These visualizations not only validate the model’s decisions but also empower farmers and agronomists to trust and understand AI-driven diagnoses. This transparency is vital for encouraging adoption in communities skeptical of advanced technology.

Overcoming Challenges for Future AI-Driven Farming Solutions

Despite its successes, the study acknowledges limitations.

First, the PlantVillage dataset, while comprehensive, lacks diversity. Most images are from controlled environments in North America, potentially limiting the models’ effectiveness in regions with different climates or disease strains.

Second, real-world deployment poses challenges like variable lighting, dirt on leaves, and occlusions (obstructed views of leaves), which were not accounted for in the lab-based training data.

Future research must address these gaps. Collecting images from diverse regions, including Africa, Asia, and South America, would enhance the models’ global applicability. Integrating multispectral imaging (capturing data across multiple wavelengths) or thermal imaging  could enable earlier disease detection, even before visible symptoms appear.

Additionally, optimizing models for edge devices (decentralized devices like smartphones) would make AI tools accessible to small-scale farmers in remote areas.

Conclusion

The research by Verma, Pantola, and Singh marks a significant leap forward in AI-driven plant disease detection. By combining transfer learning with advanced loss functions, they achieved unprecedented accuracy while maintaining computational efficiency. ResNet50 and MobileNetV2 emerged as top performers, each excelling in different contexts—ResNet50 for maximum accuracy and MobileNetV2 for practicality in resource-limited settings.

The integration of Grad-CAM++ bridges the gap between AI developers and end-users, fostering trust through transparency. As climate change intensifies the frequency and severity of plant diseases, such tools will be indispensable for sustainable agriculture. Farmers gain a reliable ally in protecting their crops, while researchers have a robust framework to build upon.

Power Terms

Deep Learning: A branch of artificial intelligence that uses multi-layered neural networks to learn patterns from data. It is important because it automates complex tasks like image recognition, speech processing, and disease detection. For example, deep learning models like ResNet50 analyze plant leaf images to identify diseases. These networks use layers of mathematical operations (like matrix multiplications) and activation functions (e.g., ReLU) to process data.

Transfer Learning: A method where a pre-trained model is adapted for a new task instead of training from scratch. It saves time and resources, especially when data is limited. Farmers use transfer learning to apply models trained on general images (e.g., ImageNet) to detect plant diseases. In the study, models like VGG16 were fine-tuned using the PlantVillage dataset.

Loss Function: A mathematical formula that measures how well a model’s predictions match actual outcomes. It guides training by quantifying errors. For example, cross-entropy loss penalizes incorrect disease predictions. Common formulas include cross-entropy (for classification) and mean squared error (for regression).

Cross-Entropy Loss: A loss function for classification tasks. It calculates the difference between predicted probabilities and true labels. For binary cases (healthy vs. diseased), the formula is:
*Loss = – (y * log(p) + (1 – y) * log(1 – p))*,
where *y* is the true label (0 or 1) and *p* is the predicted probability. It is crucial for training accurate classifiers.

Dice Loss: A loss function for image segmentation, focusing on overlap between predictions and ground truth. The formula is:
*Dice Loss = 1 – (2 * intersection + ε) / (sum of predictions + sum of truths + ε)*,
where ε prevents division by zero. It helps segment small diseased regions in leaves.

Focal Loss: Adjusts cross-entropy to prioritize hard-to-classify examples. The formula adds a weighting term:
*Loss = – (1 – p)^γ * log(p)*,
where γ reduces the impact of easy examples. It is vital for imbalanced datasets where rare diseases are underrepresented.

IoU Loss (Intersection over Union): Measures overlap between predicted and actual regions. The formula is:
*IoU = (Area of Overlap) / (Area of Union)*.
Used in segmentation, it helps outline disease spots precisely.

Label Smoothing Cross-Entropy Loss: Modifies cross-entropy by replacing rigid labels (0 or 1) with smoothed values (e.g., 0.9 for diseased). This reduces overconfidence and improves generalization. ResNet50 achieved 97.38% accuracy using this in the study.

PlantVillage Dataset: A public collection of 54,305 plant images across 38 disease categories. It is critical for training AI models to recognize diseases like tomato blight or apple scab. Researchers use it to benchmark performance.

Data Augmentation: Techniques to artificially expand datasets by altering images (e.g., rotating, flipping, resizing). It helps models generalize to real-world variations, such as different leaf angles or lighting.

VGG16/VGG19: Deep neural networks with 16 or 19 layers, known for high accuracy but high computational costs. They were used in the study to test performance but require significant resources (e.g., 15.4 billion FLOPs for VGG16).

ResNet50: A 50-layer model using “skip connections” to avoid vanishing gradients in deep networks. It achieved the highest accuracy (97.38%) in the study, making it ideal for precise disease detection.

DenseNet121: A model where each layer connects to all subsequent layers, improving feature reuse. With 6.99 million parameters, it balances efficiency and accuracy (97.07% in the study).

MobileNetV2: A lightweight model designed for mobile devices. It uses 326 million FLOPs and 2.27 million parameters, achieving 97.11% accuracy—ideal for field use.

SqueezeNet: An ultra-compact model with only 0.75 million parameters. While less accurate (82.51% with Dice loss), it suits low-power devices in rural areas.

GoogleNet: Uses “inception modules” to process multiple filter sizes simultaneously. It helps detect diseases at different scales (e.g., small spots vs. large lesions).

FLOPs (Floating-Point Operations): Measures computational complexity. For example, VGG19 requires 19.6 billion FLOPs, while MobileNetV2 uses 326 million. Lower FLOPs mean faster, energy-efficient models.

Parameters: Variables a model adjusts during training. More parameters (e.g., VGG19’s 139.7 million) often mean higher accuracy but greater resource demands.

Accuracy: The percentage of correct predictions. ResNet50 achieved 97.38% accuracy, meaning it correctly identified diseases in 97 out of 100 test images.

Precision: Measures how many predicted diseased cases are correct. High precision (e.g., 97.51% for ResNet50) reduces false alarms.

Recall: Measures how many actual diseased cases are found. High recall (97.38% for ResNet50) ensures fewer missed diagnoses.

F1 Score: Balances precision and recall. ResNet50’s F1 score of 97.39% shows consistent performance across both metrics.

Rank Graduation Accuracy (RGA): Evaluates confidence in predictions. ResNet50 scored 99.63% RGA, indicating near-perfect ranking of results.

Grad-CAM++: An explainable AI tool that highlights image regions influencing predictions. For example, it showed the model focusing on fungal spots in cherry leaves, building trust in AI decisions.

Class Imbalance: When some diseases are rare in the dataset. Focal loss addresses this by prioritizing harder examples, improving detection of underrepresented diseases.

Explainable AI (XAI): Techniques like Grad-CAM++ that make AI decisions transparent. Farmers can see why a leaf is labeled diseased, increasing adoption in agriculture.

Reference:

Verma, P.R., Pantola, D. & Singh, N.P. Plant Disease Detection with Transfer Learning: Evaluating the Impact of Various Loss Functions and Explainable AI. JABES (2025). https://doi.org/10.1007/s13253-025-00691-9

Text ©. The authors. Except where otherwise noted, content and images are subject to copyright. Any reuse without express permission from the copyright owner is prohibited.

Leave a Comment