Plants are the lifeblood of our planet, shaping ecosystems, supporting food chains, and stabilizing the climate. Yet, understanding where they grow and why has long challenged scientists. A groundbreaking study published in New Phytologist in 2022 has transformed this field by combining vast datasets, evolutionary biology, and machine learning.
This research, led by Lirong Cai, Patrick Weigelt, and a global team of ecologists, offers the most detailed maps of plant diversity ever created. By analyzing nearly 300,000 species across 830 regions, the study reveals how climate, geography, and history interact to shape the botanical world.
Why Mapping Plant Diversity Matters for Global Ecosystems
Plants are essential to human survival, but their survival is threatened by deforestation, climate change, and habitat loss. Protecting them requires knowing where they live and what conditions they need to thrive.
Traditional methods, like field surveys and basic statistical models, have provided valuable insights but often miss complex patterns. For example, a simple model might link high rainfall to high plant diversity but fail to explain why some rainy regions have fewer species than others.
This study tackles such gaps by using machine learning, a branch of artificial intelligence that identifies patterns in large datasets through algorithms that improve automatically with experience. By doing so, it provides a clearer picture of global biodiversity and how to protect it.
How Scientists Use AI to Uncover Plant Diversity Hotspots
The research team gathered data from 830 regions worldwide, including islands, mountains, and continents. These regions, part of the Global Inventory of Floras and Traits (GIFT) database—a repository of plant checklists and traits—covered nearly 300,000 vascular plant species (plants with specialized tissues for transporting water and nutrients, such as trees, ferns, and flowering plants).
The Global Inventory of Floras and Traits (GIFT) represented roughly 88% of all known vascular plants.
To understand evolutionary history, the team used a megaphylogeny, a comprehensive evolutionary tree of vascular plants constructed from genetic data and fossils. This allowed them to measure phylogenetic richness, which quantifies the total evolutionary history of species in a region.
For instance, a region with ancient, distantly related species (e.g., cycads or ginkgoes) has higher phylogenetic richness than one dominated by closely related, recently evolved plants. The study analyzed 23 environmental factors grouped into four categories:
- Geography: Region size (larger areas typically host more species) and isolation (distance from other landmasses, influencing species migration).
- Current Climate: Temperature, rainfall, seasonality, and Gross Primary Productivity (GPP)—a measure of the total energy plants produce through photosynthesis, indicating ecosystem productivity.
- Environmental Heterogeneity: Variations in elevation (e.g., mountains vs. plains) and soil types (e.g., clay, sand, loam), which create diverse microhabitats.
- Past Environmental Conditions: Climate stability over millions of years and shifts in biome distribution (e.g., tropical forests expanding or contracting during ice ages).
To compare methods, the team tested traditional statistical models like Generalized Linear Models (GLMs)—which assume linear relationships between variables—against machine learning techniques such as Random Forests (an ensemble of decision trees) and XGBoost (a gradient-boosting algorithm that builds models sequentially to correct errors).
Machine learning excelled because it handles nonlinear relationships (e.g., plant diversity peaking at moderate elevations) and interactions between variables (e.g., high rainfall combined with stable temperatures boosting diversity more than either factor alone).
What Drives Plant Diversity?
The study revealed several critical insights. First, machine learning models outperformed traditional methods, explaining up to 80.9% of global species richness (the number of species in a region) and 83.3% of phylogenetic richness.
XGBoost, a machine learning algorithm, stood out due to its ability to iteratively correct errors and handle missing data. For example, it accurately predicted species counts in data-sparse regions by extrapolating patterns from similar environments.
Second, current climate emerged as the dominant driver of plant diversity. Warm, wet regions with stable temperatures, such as tropical rainforests, supported the most species.
For instance, the Amazon Basin, with its consistent rainfall (over 2,000 mm annually) and average temperatures of 25°C, hosts over 40,000 plant species.
In contrast, areas with high temperature seasonality (large differences between summer and winter temperatures), like temperate zones, had lower diversity. Gross Primary Productivity (GPP) was the strongest single predictor of diversity. High GPP regions, such as tropical forests, act as “engines” of biodiversity by providing abundant energy for growth and reproduction.
Third, environmental heterogeneity—variations in elevation, soil types, and habitats—significantly boosted diversity. Mountainous regions like the Andes and Himalayas, with their wide elevation ranges (0–6,000 meters) and diverse microclimates, hosted thousands of species. A 1,000-meter increase in elevation range correlated with a 15% rise in species richness.
Similarly, regions with 10 or more soil types, such as the Mediterranean, had 20–30% more species than homogeneous areas. This heterogeneity creates “niches” where different plants can specialize, reducing competition and promoting coexistence.
Fourth, past environmental conditions left subtle but detectable imprints. Regions with stable climates since the Last Glacial Maximum (LGM)—a period ~20,000 years ago when ice sheets were at their peak—retained 5–10% more species than areas with volatile histories.
For example, Southeast Asia’s relatively stable climate allowed ancient plant lineages to persist. However, past factors explained less than 6% of diversity patterns in most models, highlighting the dominance of modern climates.
Fifth, geography played a supporting role. Larger regions, such as continents, hosted more species than smaller islands due to species-area relationships—a ecological principle where larger areas support more habitats and larger populations, reducing extinction risks.
A 10-fold increase in area (e.g., from 7,774 km² to 77,740 km²) boosted species richness by 25–40%. Isolation had minimal impact on mainland regions but shaped diversity on islands like Madagascar, where 80% of plants are endemic (found nowhere else).
Global Predictions For Mapping Biodiversity
The study produced maps at four resolutions, the finest being 7,774 km²—smaller than Cyprus. These maps highlight biodiversity hotspots, unexpected high-diversity areas, and regions needing further study.
Tropical rainforests in the Amazon, Congo Basin, and Southeast Asia emerged as the most diverse regions. Southeast Asia’s rainforests alone harbor over 50,000 species, many unique to the area.
For example, Borneo’s rainforests host the Rafflesia arnoldii, the world’s largest flower, which grows nowhere else.
Furthermore, Mountain ranges like the Andes and Himalayas also ranked high due to their environmental complexity. The Andes, with elevations from 0 to 6,000 meters, support over 30,000 plant species, including potatoes and quinoa wild relatives critical for food security.
Unexpectedly, some arid and temperate regions showed high diversity. The Ethiopian Highlands, despite their dry climate, host 2,000+ species per 7,774 km², likely due to microhabitats created by volcanic soils and elevation changes. The Caucasus Mountains, a temperate zone, rival some tropics with 1,800 species per unit area, including ancient wheat varieties and medicinal plants.
In contrast, low-diversity areas included the Sahara Desert, with fewer than 500 species adapted to extreme aridity (e.g., date palms and acacias), and Antarctic tundra, with less than 100 species—mostly mosses and lichens surviving in freezing conditions.
Phylogenetic richness generally mirrored species patterns but with exceptions. The Mediterranean Basin, rich in species (e.g., olive trees, lavender), has lower phylogenetic diversity due to many recent plant lineages.
Conversely, New Caledonia, a Pacific island with only 3,500 species, scores high in phylogenetic richness due to ancient, unique species like the Amborella trichopoda—a “living fossil” that represents one of the earliest flowering plants.
Challenges in Mapping Global Biodiversity with AI
Despite its breakthroughs, the study faced limitations.
1. Data gaps in extreme environments like deserts and the Arctic led to higher prediction errors (15–20% vs. 5–10% in the tropics). For example, remote parts of Siberia and the Sahara lack recent plant surveys, forcing models to extrapolate from limited data. Political barriers, such as conflicts in Central Africa, further hindered data collection.
2. Machine learning models struggled in environments absent from training data, such as hyper-arid deserts where plant survival strategies (e.g., deep roots, water storage) differ markedly from other regions.
Additionally, while models detected complex interactions—like how soil diversity and rainfall jointly boost diversity—interpreting these relationships requires ecological expertise. For instance, high soil diversity might enhance nutrient availability, but only in regions with sufficient rainfall to support plant growth.
3. Evolutionary data gaps posed another challenge. About 32% of plant genera lack genetic data, forcing researchers to estimate their evolutionary relationships. For example, the study used “placeholder” branches in the megaphylogeny for poorly studied groups, which could affect phylogenetic richness estimates.
How This Can Shape Earth’s Green Heritage
The study’s maps and data, freely available online, offer actionable insights for conservation. Tropical rainforests, which store 25% of terrestrial carbon, are critical for climate regulation.
However, the Amazon is losing 10,000 km² of forest annually to deforestation, while Southeast Asia’s rainforests face threats from palm oil plantations. Protecting these regions requires banning illegal logging, expanding protected areas, and promoting sustainable agriculture.
Mountainous and heterogeneous regions, resilient to climate change due to their varied habitats, should be prioritized. For example, the Andes’ microclimates—ranging from humid valleys to arid highlands—could help species adapt to warming temperatures. Similarly, preserving soil diversity in regions like the Mediterranean ensures plants have access to nutrients under changing climates.
The study also highlights the importance of evolutionarily unique areas. New Caledonia’s ancient plant lineages, though few in number, represent irreplaceable genetic diversity. Protecting such regions ensures future species can adapt to environmental changes. For example, the Amborella’s unique genetics could offer insights into early flower evolution, aiding crop breeding programs.
What’s Next for Biodiversity Science and AI?
The research team advocates for integrating citizen science data from platforms like iNaturalist to fill gaps in remote regions. For instance, amateur botanists in understudied areas like Papua New Guinea could upload plant photos, helping scientists identify new species or refine distribution maps.
Satellite-based sensors, such as NASA’s MODIS and ESA’s Sentinel-2, could monitor vegetation changes in real time. These tools track deforestation, wildfire impacts, and plant health, validating predictions and informing conservation strategies. For example, detecting illegal logging in the Congo Basin using satellite imagery could trigger rapid enforcement actions.
Future studies might combine plant and animal diversity data for holistic conservation strategies. Protecting pollinator-rich areas, such as meadows with diverse bee populations, could indirectly safeguard plant species reliant on insects for reproduction. Similarly, preserving seed-dispersing birds in tropical forests ensures plants can colonize new areas as climates shift.
Conclusion
This study marks a turning point in understanding plant diversity. By harnessing machine learning, it reveals how climate, geography, and history interact to shape life on Earth. The detailed maps and open-access data empower scientists, policymakers, and communities to protect biodiversity with precision. As climate change accelerates, such tools are not just academic—they are vital for ensuring ecosystems survive and thrive.
Power Terms
Machine Learning: A type of artificial intelligence where computers learn patterns from data to make predictions without being directly programmed. It is important because it uncovers complex relationships in large datasets, like how climate and geography interact to influence plant diversity. In the study, machine learning algorithms like XGBoost predicted plant species counts more accurately than traditional methods. For example, XGBoost adjusted its predictions iteratively, improving accuracy even in regions with limited data. Unlike basic formulas, machine learning uses adaptive algorithms like decision trees to model nonlinear patterns.
Vascular Plants: Plants with specialized tissues (xylem and phloem) that transport water and nutrients. They include trees, grasses, and ferns. Vascular plants are vital because they form ecosystems, produce oxygen, and provide food. The study focused on them to map global diversity. Examples include oak trees and sunflowers. Their structural complexity supports habitats for other species, making them critical for biodiversity.
Species Richness: The number of different species in a specific area. It measures biodiversity and indicates ecosystem health. High richness, like the Amazon’s 40,000+ plant species, suggests abundant resources and stability. The study used this metric to identify biodiversity hotspots. While there’s no single formula, it is calculated by counting species in a region, such as tallying all plants in a rainforest plot.
Phylogenetic Richness: The total evolutionary history of species in an area, measured by summing branch lengths in a family tree of life. It highlights regions with ancient, unique species, like New Caledonia’s Amborella, a “living fossil.” The study used this to prioritize conservation areas with irreplaceable evolutionary heritage. For example, a region with distantly related species scores higher than one with closely related plants.
Megaphylogeny: A large evolutionary tree combining genetic data and fossils to show relationships among many species. It helps scientists study how biodiversity evolved. The study used a vascular plant megaphylogeny to calculate phylogenetic richness. For instance, the GBOTB_extended tree includes data from ferns and flowering plants. Building megaphylogenies involves comparing DNA sequences to map evolutionary splits over millions of years.
Gross Primary Productivity (GPP): The total energy plants capture via photosynthesis in an area over time. Measured in units like grams of carbon per square meter annually, high GPP (e.g., tropical forests) supports more species by providing energy for growth. The study found GPP was the strongest predictor of plant diversity. Formula: GPP = Total CO2 absorbed by plants – CO2 released by plant respiration.
Environmental Heterogeneity: Variations in physical conditions like elevation, soil types, or microhabitats within a region. It boosts biodiversity by creating niches for different species. The study linked mountainous areas (e.g., the Andes) with high heterogeneity to 15% more species. Examples include a mountain slope with sunny, shaded, and wet zones hosting distinct plants.
Last Glacial Maximum (LGM): A period ~20,000 years ago when ice sheets were largest. Studying past climates explains current species distributions. Regions with stable post-LGM climates (e.g., Southeast Asia) retained more species. The study used LGM temperature data to model historical impacts on diversity. For example, areas with minimal glacial-era temperature swings had higher modern diversity.
XGBoost: A machine learning algorithm that builds decision trees sequentially, correcting errors from prior trees. It handles large datasets efficiently and was the top-performing model in the study. For example, it predicted plant diversity in understudied regions by extrapolating patterns from similar climates. XGBoost uses “gradient boosting,” where each new tree reduces prediction errors.
Random Forests: A machine learning method combining many decision trees to improve accuracy. It reduces overfitting by averaging results from diverse trees. The study used it to rank drivers of diversity, showing climate’s dominance. For example, a Random Forest might split data into subsets to evaluate how rainfall and soil interact.
Generalized Linear Models (GLMs): Statistical models that extend linear regression to non-normal data (e.g., counts like species richness). They assume linear relationships between variables. The study found GLMs less accurate than machine learning. For example, a GLM might model species richness as rainfall + temperature. Formula: g(E(Y)) = β0 + β1X1 + … + βnXn, where g is a function linking predictors to the response.
Temperature Seasonality: The variation in temperature between seasons, measured as the standard deviation of monthly averages. High seasonality (e.g., cold winters and hot summers) stresses plants, reducing diversity. The study found temperate zones with high seasonality had fewer species than stable tropical climates.
Endemic Species: Species found only in one geographic area. They are crucial for conservation due to their uniqueness. Madagascar’s lemurs and baobab trees are examples. The study noted islands like Hawaii have high endemism. Protecting endemics preserves irreplaceable biodiversity.
Species-Area Relationship: The ecological rule that larger areas host more species due to more habitats and lower extinction risks. The study found a 10-fold area increase boosted species by 25–40%. Formula: S = cA^z, where S = species, A = area, and c/z are constants. For example, doubling a forest’s size might increase species by 10%.
Microhabitats: Small-scale environments with unique conditions, like a rotting log or rocky outcrop. They support specialized species, enhancing biodiversity. The Ethiopian Highlands’ volcanic soils create microhabitats for rare plants. The study linked microhabitat diversity to higher species counts in mountainous regions.
Nonlinear Relationships: Interactions where variables affect outcomes in complex, non-straight-line ways. Machine learning captures these better than traditional models. For example, plant diversity might peak at mid-elevations, forming a curve. The study found soil diversity and rainfall combined nonlinearly to boost diversity.
Climate Stability: Consistent climatic conditions over time, allowing species to persist. Regions with stable post-LGM climates (e.g., Southeast Asia) had 5–10% more species. The study measured stability using temperature variability over millennia. Stable climates reduce extinction risks for ancient lineages.
Biome Shifts: Changes in large ecosystems (e.g., forests to grasslands) due to climate or human activity. The study examined shifts since the Pliocene (~5 million years ago). For example, ice age savanna expansions influenced plant distributions. Tracking shifts helps predict future biodiversity changes.
Evolutionary Tree: A diagram showing species’ evolutionary relationships. The study’s megaphylogeny was an evolutionary tree of vascular plants. Branches represent lineages; lengths show divergence time. For example, birds branch from dinosaurs. These trees guide conservation by highlighting evolutionarily unique species.
Citizen Science: Public participation in data collection, like reporting plant sightings via apps. Platforms like iNaturalist help fill data gaps. The study suggested using citizen data for remote areas. Examples include volunteers tracking invasive species, expanding scientific reach cost-effectively.
Satellite-Based Sensors: Instruments on satellites monitoring Earth’s surface. They track deforestation, wildfires, and vegetation health. The study recommended them for real-time biodiversity monitoring. For example, NASA’s MODIS detects forest loss, helping update diversity maps and inform conservation.
Biodiversity Hotspots: Regions with high species richness and endemism but facing threats. The Amazon and Madagascar are hotspots. The study’s maps identified these areas for urgent protection. Conserving hotspots maximizes impact due to their unique species concentration.
Holocene: The current geological epoch (~11,700 years ago to present), marked by human agriculture and settlements. The study referenced Holocene climate stability aiding biodiversity. Understanding this epoch helps contextualize human impacts on modern ecosystems.
Phylogenetic Diversity: A measure of evolutionary distinctness in a region. It prioritizes areas with ancient, unique species for conservation. The study found regions like New Caledonia, with old plant lineages, had high phylogenetic diversity despite lower species counts. This protects genetic resources for future adaptation.
Habitat Loss: The destruction of natural environments, the leading threat to biodiversity. The study’s maps highlight areas needing protection from deforestation. For example, Amazon habitat loss reduces species richness. Combating it requires policies like protected areas and sustainable land use.
Genetic Diversity: The variety of genes within a species, enabling adaptation to changes. The study linked phylogenetic richness to genetic diversity. For example, ancient species like Amborella have unique genes valuable for crop research. Conservation preserves this diversity, ensuring resilience against climate change.
Reference:
Cai, L., Kreft, H., Taylor, A., Denelle, P., Schrader, J., Essl, F., van Kleunen, M., Pergl, J., Pyšek, P., Stein, A., Winter, M., Barcelona, J. F., Fuentes, N., … (2022). Global models and predictions of plant diversity based on advanced machine learning techniques. New Phytologist, 237(1), 107–123. https://doi.org/10.1111/nph.18533