NEURIPS2025 LLM Efficiency COVID-19 vulnerability assessment composite risk score GIS mapping Nigeria public health

Unmasking COVID-19 Vulnerability in Nigeria: Mapping Risks Beyond Urban Hotspots¶

Conference: NEURIPS2025 arXiv: 2509.05398 Code: To be confirmed Area: LLM Efficiency Keywords: COVID-19, vulnerability assessment, composite risk score, GIS mapping, Nigeria, public health

TL;DR¶

This paper constructs a comprehensive COVID-19 vulnerability risk scoring system for Nigerian states, integrating four dimensions — population density, poverty, healthcare accessibility, and age risk — and visualizes hotspot regions via GIS mapping, providing a data-driven decision tool for public health resource allocation.

Background & Motivation¶

Real-world Problem: As Africa's most populous country, Nigeria's COVID-19 pandemic exposed severe imbalances in its public health system — urban centers (e.g., Lagos) accounted for 35.4% of national confirmed cases, while rural areas suffered significant underreporting due to inadequate testing capacity.

Limitations of Prior Work: Previous COVID-19 studies in Nigeria mostly focused on single factors (e.g., population density or poverty rate), lacking a comprehensive framework that integrates multidimensional factors into a unified score. Although Adams & Obaroni modeled density and socioeconomic factors, they did not synthesize these into an actionable unified score.

Inspiration from International Frameworks: COVIRA (a COVID-19 Vulnerability and Risk Assessment framework developed in Nepal) provides a successful example of multidimensional integration, but requires adaptation to Nigeria's specific national conditions — such as higher poverty weighting.

Core Research Questions: What are the key factors affecting COVID-19 vulnerability across Nigerian states? How can a composite risk score quantify these factors to guide decision-making?

Methodological Motivation: Nigeria's demographic characteristics and socioeconomic diversity result in highly uneven risk distributions, necessitating targeted public health strategies rather than one-size-fits-all responses.

Policy Need: An actionable prioritization tool is required to direct limited testing, vaccine, and healthcare resources toward high-risk regions.

Method¶

Overall Architecture¶

This paper constructs a Composite Risk Score system that combines four normalized vulnerability factors via weighted summation, then multiplies by the normalized confirmed case rate per 100,000 population to yield the final risk score. The overall pipeline is: data collection and preprocessing → exploratory data analysis → composite risk score construction → GIS spatial visualization → statistical analysis and validation.

Module 1: Composite Risk Score Construction¶

Function: Integrates population density, poverty, healthcare accessibility, and age risk into a single score
Mechanism: The formula is \(\text{Risk Score} = (\alpha \cdot \text{Density} + \beta \cdot \text{Poverty} + \gamma \cdot \text{Healthcare} + \delta \cdot \text{Age}) \times \text{Cases\_per\_100k\_norm}\), where weights are \(\alpha=0.2\), \(\beta=0.4\), \(\gamma=0.3\), \(\delta=0.1\), and all factors are Min-Max normalized to \([0,1]\)
Design Motivation: Poverty receives the highest weight (0.4) because it directly limits healthcare access and living conditions, particularly in rural areas with severe underreporting. Healthcare accessibility (0.3) reflects the highly uneven distribution of facilities. Population density (0.2) drives urban transmission but has lesser impact in rural areas. Age risk (0.1) is lowest because Nigeria's young demographic structure reduces the proportion of severe cases. Multiplying by the normalized case rate ensures the score simultaneously reflects structural vulnerability and current epidemic conditions.

Module 2: GIS Spatial Visualization and Hotspot Identification¶

Function: Uses Python GeoPandas and Matplotlib to generate five choropleth maps (risk score, population density, poverty, healthcare accessibility, age risk), classifying states into low/medium/high risk tiers
Mechanism: State-level shapefiles are reprojected from WGS84 to UTM coordinates to ensure accurate area calculations; risk scores are divided into three tiers by percentile; each map uses a distinct color scheme to differentiate factors
Design Motivation: Map visualization enables decision-makers to intuitively identify regions requiring priority resource investment, compensating for the limitations of purely numerical analysis in conveying spatial distribution

Module 3: Statistical Analysis and Validation¶

Function: Validates the robustness of the risk score through Spearman rank correlation, OLS regression, and sensitivity analysis
Mechanism: Spearman correlation analyzes pairwise relationships among factors; OLS regression uses normalized case rate as the dependent variable to examine the explanatory power of the four factors (\(R^2=0.305\)); sensitivity analysis varies the poverty weight between 0.3 and 0.5 to test the stability of state rankings
Design Motivation: Rank correlation is appropriate for non-normally distributed data; regression analysis quantifies each factor's contribution to case rates; sensitivity analysis ensures the scoring system does not produce drastically different rankings under minor weight adjustments

Loss & Training¶

This paper involves no machine learning training loss functions. Framework quality is instead evaluated through: consistency validation against NCDC epidemiological reports, VIF multicollinearity testing, ranking stability under weight adjustment, and \(R^2\) explained variance ratio.

Key Experimental Results¶

Table 1: Risk Scores and Key Metrics by State¶

State/Metric	Risk Score	Cases per 100k	Population Density (per km²)	Share of National Cases
Lagos	673.47	Highest	7,777	35.4%
FCT (Abuja)	Second highest	High	High	Significant
National average	28.16	—	—	—
Kogi	Among lowest	Extremely low (5 cases)	Low	Extremely low
Sokoto/Zamfara	Medium-high	Relatively low	Low	Low (poverty-driven)

Table 2: Peak Cases per 100k by Density Group¶

Density Group	Jan 2021 Peak (per 100k)	Jan 2022 Peak (per 100k)	Overall Trend
Low-density states	2.8	—	Lowest
Medium-density states	2.5	—	Moderate
High-density states	—	10.0	Highest

Table 3: Key Statistical Results¶

Analysis	Result
Density ↔ Case Rate Spearman \(r\)	0.37 (\(p<0.05\))
Poverty ↔ Density Spearman \(r\)	−0.77 (\(p<0.01\), strong negative)
Healthcare ↔ Case Rate Spearman \(r\)	−0.31 (\(p<0.05\))
Age Risk ↔ Case Rate Spearman \(r\)	0.26 (\(p<0.05\))
OLS Regression \(R^2\)	0.305
Condition Number	\(9.72 \times 10^3\) (multicollinearity present)
Google Trends ↔ Case Rate Correlation	0.0415 (extremely weak)

Key Findings¶

Pronounced Urban–Rural Divide: Lagos's risk score is 24 times the national average, yet northern high-poverty states (Sokoto, Zamfara) also exhibit elevated risk due to structural vulnerability despite low case counts.
Poverty as Core Driver: The strong negative correlation between poverty and density (\(r=-0.77\)) reveals a urban–rural dual structure — dense urban areas are wealthier but have more cases, while sparse rural areas are poorer with severe underreporting.
Model Robustness: Sensitivity analysis shows that adjusting the poverty weight (0.3–0.5) has no significant effect on state rankings.
Limited Utility of Google Trends: Public search interest is nearly uncorrelated with actual epidemic conditions (\(r=0.04\)), possibly because Nigeria relies more heavily on traditional media such as radio.

Highlights & Insights¶

Multidimensional Integration Outperforms Single-Factor Analysis: Aggregating multiple vulnerability dimensions into an actionable single score constitutes a more practically useful contribution than prior single-factor studies.
Multiplicative Rather Than Additive Incorporation of Case Rate: Using the case rate as a multiplier rather than an addend allows the score to simultaneously reflect structural vulnerability and current epidemic intensity, closely aligning with the priority logic of resource allocation.
Revealing Hidden High-Risk Areas: Northern poverty-stricken states show medium-to-high risk scores despite low case counts, suggesting that rural underreporting may be masking the true extent of the epidemic.
Transferable Framework: The authors note that the framework can be extended to other infectious diseases (e.g., dengue, malaria) and other low-resource countries.

Limitations & Future Work¶

Data Currency: Only 2020 static data are used, failing to capture dynamic changes such as vaccination rates and variant emergence.
Coarse Healthcare Metrics: Healthcare accessibility is measured by facility count rather than quality or capacity, potentially overestimating regions with many but low-quality facilities.
Rural Underreporting: Underestimation of case rates due to insufficient rural testing capacity systematically lowers the risk scores of these states, contradicting the paper's stated goal of revealing rural vulnerability.
Absence of Mobility Data: Inter-state population movement data are not incorporated, potentially missing important transmission pathways.
Subjectivity of Weights: Although sensitivity analysis confirms robustness, the four-factor weights are fundamentally subjective assignments rather than data-driven estimates.
Limited Model Explanatory Power: An OLS \(R^2\) of 0.305 implies that nearly 70% of variance is unexplained, suggesting the absence of important explanatory variables.

COVIRA (Nepal): The framework directly adapted in this paper; a 0–100 risk assessment tool emphasizing risk visualization and communication.
India Vulnerability Index: Based on 5 major categories and 15 factors across 9 large states, successfully identifying high-risk regions.
Italy Risk Model: Focuses on three dimensions — disease hazard, regional exposure, and population vulnerability.
Kenya SVI/EVI/SEVI: A triple-indicator system comprising Social Vulnerability Index, Epidemiological Vulnerability Index, and a composite measure.
U.S. CDC SVI: The most globally influential social vulnerability index, widely adapted for COVID-19 resource allocation.
Insight: The paradigm of multidimensional composite scoring combined with GIS visualization is applicable to any public health problem requiring spatially informed resource allocation.

Rating¶

Novelty: ⭐⭐ (Framework adaptation rather than methodological innovation; core contribution lies in applying an existing framework to the Nigerian context)
Experimental Thoroughness: ⭐⭐⭐ (Covers temporal, spatial, statistical, and sensitivity analyses, but \(R^2\) is low and no comparison with alternative models is provided)
Writing Quality: ⭐⭐⭐ (Clear structure and rich figures, though some discussions are verbose)
Value: ⭐⭐⭐ (Direct practical value for Nigerian public health decision-making; additional credit for framework transferability)