Stop Wrestling with Rasters! mbg Makes Surface Estimation Effortless
What if every data point you've ever collected could tell you not just where something happened, but precisely how it spreads across entire landscapes—with mathematical certainty?
Here's the brutal truth that keeps spatial data scientists awake at night: you've got thousands of scattered point observations. Soil samples. Disease cases. Temperature readings. Pollution measurements. Each one a tiny window into reality. But your stakeholders don't want dots on a map. They need continuous surfaces. They need to know what happens between those points. And they need you to quantify how wrong you might be.
The old way? Cobble together half a dozen R packages. Fight with sp versus sf object conversions. Wrestle INLA's cryptic documentation into submission. Hand-code SPDE meshes until your fingers bleed. Spend weeks on something that should take hours.
There's a better way. Hidden in plain sight on CRAN, the mbg package—short for model-based geostatistics—is transforming how researchers turn point observations into publication-ready raster surfaces. Born from the same intellectual lineage as IHME's legendary Local Burden of Disease team, this tool wraps the formidable power of R-INLA's stochastic partial differential equation (SPDE) approach into something shockingly approachable. No more sacrificing weeks to spatial interpolation. No more guessing at uncertainty. Just clean, rigorous, beautiful surfaces from your scattered observations.
Ready to see what you've been missing?
What is mbg? The Spatial Secret Weapons Top Researchers Don't Talk About
The mbg package is an R package for model-based geostatistics developed by henryspatialanalysis and available on CRAN. It represents a sophisticated yet accessible approach to one of spatial statistics' most fundamental challenges: estimating continuous surfaces from discrete point observations.
But here's what makes it genuinely exciting.
Traditional geostatistical workflows force practitioners to become experts in half a dozen disjointed tools. You might use gstat for kriging, randomForest for machine learning, raster or terra for spatial manipulation, and sp or sf for coordinate handling—then somehow stitch these together while praying your projections match. mbg obliterates this fragmentation by unifying spatial data processing, machine learning prediction, Bayesian geostatistical modeling, and uncertainty-preserving aggregation under one coherent framework.
The package's intellectual DNA traces directly to the Institute for Health Metrics and Evaluation (IHME) and their Local Burden of Disease core code team. These are the same researchers who revolutionized global health mapping, producing iconic visualizations of everything from malaria prevalence to childhood malnutrition. The mbg package distills their hard-won methodological advances into reusable, extensible R code that anyone can leverage.
Why is it trending now? Three converging forces: the explosion of spatially-referenced data from GPS-enabled devices and remote sensing, the growing demand for uncertainty quantification in policy-relevant research, and the maturation of INLA's SPDE approach as a computationally feasible alternative to traditional MCMC methods. Where Markov Chain Monte Carlo might need days to converge, INLA's integrated nested Laplace approximation delivers essentially identical posterior distributions in minutes. mbg puts this speed within reach of practitioners who don't have PhDs in Bayesian computation.
The package currently boasts active development with automated documentation builds, steady CRAN downloads, and a growing community of users in epidemiology, environmental science, and public health.
Key Features That Make mbg Insanely Powerful
Let's dissect what makes this package tick—and why it deserves space in your spatial analysis toolkit.
Unified Spatial-Machine Learning-Geostatistical Pipeline
Most workflows force you to context-switch between incompatible tools. mbg seamlessly integrates sf for vector operations, terra for raster manipulation, and data.table for blazing-fast data processing. The result? You spend your energy on science, not data wrangling.
Dual Modeling Architecture
The package doesn't force you to choose between machine learning flexibility and geostatistical rigor. You can optionally run spatial ML models through caret—random forests, gradient boosting, neural networks, you name it—to generate predictive surfaces from covariates. Then feed these ML predictions (or raw covariates) into the Bayesian geostatistical model powered by R-INLA. This two-stage approach often outperforms either method alone.
SPDE-Based Gaussian Process Approximation
At the heart of mbg lies a computationally brilliant trick. Instead of working with dense covariance matrices that explode in size with data, it uses the Stochastic Partial Differential Equation (SPDE) approach to approximate Gaussian processes through sparse precision matrices on carefully constructed meshes. Translation: you can model spatial correlation across millions of locations without waiting for the heat death of the universe.
Uncertainty-Propagating Aggregation
Here's where mbg truly shines compared to naive alternatives. After generating 250 posterior predictive draws at each pixel, you don't just get a mean prediction. You get full uncertainty distributions. And when aggregating to administrative boundaries—say, estimating district-level disease prevalence from pixel-level predictions—the package preserves uncertainty through the aggregation process. This isn't a nice-to-have; it's essential for honest policy communication.
Production-Ready Outputs
Every step produces standard terra SpatRaster objects or sf features. No proprietary formats. No lock-in. Your results plug directly into existing visualization, reporting, and decision-support workflows.
Real-World Scenarios Where mbg Destroys the Competition
Epidemiological Mapping: Disease Prevalence Without the Guesswork
Imagine you're mapping malaria prevalence across rural Zambia. You've got 2,000 household survey points with RDT results, satellite-derived vegetation indices, rainfall surfaces, and temperature data. Stakeholders need district-level estimates for resource allocation—with confidence intervals that withstand scientific scrutiny. mbg handles the entire pipeline: ML-boosted covariate selection, SPDE-based spatial smoothing, pixel prediction with 250-draw uncertainty, and polygon aggregation that properly propagates uncertainty into district summaries.
Environmental Monitoring: Pollution Surfaces from Sparse Sensors
Air quality monitoring networks are expensive. You might have 50 PM2.5 sensors across a metropolitan area of 5 million people. Regulatory agencies need continuous exposure surfaces for health impact assessment. Traditional interpolation ignores prediction uncertainty. mbg gives you rigorous uncertainty bounds at every location, letting you honestly report where estimates are reliable versus where you're essentially guessing.
Agricultural Prediction: Soil Properties Between Sample Points
Precision agriculture lives and dies by spatial prediction. You've soil-sampled every hectare of a 10,000-hectare farm—wait, no you haven't, because that's prohibitively expensive. You've got 200 points. mbg lets you leverage terrain covariates through ML, capture residual spatial structure through the geostatistical model, and produce management-zone maps with honest uncertainty quantification. No more pretending your kriging variance captures all sources of error.
Climate Downscaling: Local Projections from Coarse Models
Global climate models operate at 100+ kilometer resolutions. You need kilometer-scale projections for impact assessment. mbg's machine learning stage can learn relationships between coarse climate outputs and fine-scale topography, while the geostatistical stage captures spatially-correlated residuals that pure ML would miss.
Step-by-Step Installation & Setup Guide
Getting mbg running requires attention to one critical dependency: R-INLA is not on CRAN. Here's the complete setup.
Step 1: Install the Core Package
The stable version installs directly from CRAN:
# Install mbg from CRAN
install.packages("mbg")
Step 2: Install R-INLA (Critical Prerequisite)
Several core functions depend on INLA. If you don't have it:
# Install INLA from its dedicated repository
# Option 1: Stable version
install.packages("INLA",
repos = c(getOption("repos"),
INLA = "https://inla.r-inla-download.org/R/stable"),
dep = TRUE)
# Option 2: Testing version (newer features, potentially less stable)
install.packages("INLA",
repos = c(getOption("repos"),
INLA = "https://inla.r-inla-download.org/R/testing"),
dep = TRUE)
For detailed platform-specific instructions, consult R-INLA's official download page.
Step 3: Load and Verify
# Load the package
library(mbg)
# Access the introductory vignette
help(mbg)
# Check documentation for the core model runner
help(MbgModelRunner)
Step 4: Optional but Recommended Dependencies
For full functionality, ensure these are current:
# Core spatial stack
install.packages(c("sf", "terra", "data.table"))
# Machine learning support (if using ML stage)
install.packages("caret")
# Visualization
install.packages(c("ggplot2", "tmap"))
Environment Considerations
Memory: SPDE models with fine meshes can consume substantial RAM. For regional analyses with meshes exceeding 10,000 nodes, ensure 16GB+ available memory.
Computation: INLA uses efficient sparse matrix algorithms, but model fitting still benefits from multicore systems. Set INLA:::inla.setOption(num.threads = parallel::detectCores() - 1) for automatic parallelization.
Projection: mbg expects projected coordinates (meters, not degrees). Reproject your data before modeling to avoid distance distortion.
REAL Code Examples: The mbg Workflow in Action
Let's walk through the actual workflow using patterns from the package documentation. These aren't toy examples—they're production-ready patterns you can adapt immediately.
Example 1: Loading and Preparing Spatial Data
library(mbg)
library(terra)
library(sf)
# Load point observations (e.g., disease prevalence at survey locations)
# Expects sf object with geometry and outcome column
points <- st_read("survey_data.shp")
# Load raster covariates (predictor surfaces)
# terra SpatRaster with layers like elevation, rainfall, NDVI
covariates <- rast("environmental_covariates.tif")
# Load population surface for aggregation weighting
population <- rast("population_density.tif")
# Verify alignment: covariates and population must share extent and resolution
# mbg will check this, but catching misalignment early saves debugging time
if (!compareGeom(covariates, population, stopOnError = FALSE)) {
stop("Covariate and population rasters have mismatched geometry!")
}
What's happening here? The workflow begins with three fundamental inputs: your point observations (the ground truth), raster covariates (the environmental predictors that explain spatial variation), and optionally a population surface (for weighted aggregation). The terra package handles these efficiently, and mbg expects standard SpatRaster and sf objects—no proprietary formats.
Example 2: The Optional Machine Learning Stage
library(caret)
# Extract covariate values at point locations for training
# This creates a data frame with outcome and predictor values
extracted_data <- extract(covariates, points, bind = TRUE)
# Configure caret for spatial cross-validation
# Critical: random CV leaks spatial information; use spatial folds
train_control <- trainControl(
method = "cv",
number = 5,
# For true spatial validation, replace with spatialsample or similar
index = createFolds(extracted_data$outcome, k = 5)
)
# Train ensemble of models
# mbg can leverage caret's 200+ methods
rf_model <- train(
outcome ~ .,
data = st_drop_geometry(extracted_data),
method = "rf",
trControl = train_control,
importance = TRUE
)
# Generate predictive surface from ML model
ml_prediction <- predict(covariates, rf_model)
# This ML surface now serves as input to the geostatistical stage
# The geostatistical model will capture spatial structure ML missed
The strategic insight: Machine learning excels at capturing complex, nonlinear relationships between covariates and outcomes. But it typically treats observations as independent, missing spatial correlation in residuals. By feeding ML predictions into the geostatistical model, you get the best of both worlds: ML's predictive power plus rigorous spatial uncertainty quantification.
Example 3: Running the Core Geostatistical Model
# Prepare the SPDE mesh—this is where the magic happens
# The mesh approximates the spatial Gaussian process
# Finer meshes = more accurate approximation but slower computation
mesh <- make_mesh(
loc = st_coordinates(points), # Observation coordinates
max.edge = c(0.5, 2) * 1000, # Max triangle edge lengths (inner, outer)
cutoff = 0.1 * 1000 # Minimum distance between points
)
# Configure model specification
# This defines: outcome ~ covariates + spatial random effect
model_spec <- MbgModelRunner(
data = points,
outcome = "prevalence", # Column name in points
covariates = covariates, # Raw covariates OR ml_prediction
mesh = mesh,
family = "binomial", # For prevalence data; "gaussian" for continuous
link = "logit", # Appropriate link for binomial
n_samples = 250 # Posterior predictive draws per pixel
)
# Fit the model—this calls INLA internally
# Computation time scales with mesh complexity and data size
model_fit <- run_mbg(model_spec)
# The fitted model contains posterior distributions for:
# - Fixed effects (covariate coefficients)
# - Hyperparameters (spatial range, variance)
# - Spatial random field (latent spatial surface)
Deep dive: The SPDE approach replaces the dense covariance matrix of traditional Gaussian processes with a sparse precision matrix derived from a finite element discretization. The make_mesh() call creates this discretization—triangles covering your study area, with finer resolution near data points. MbgModelRunner() configures the full model structure, and run_mbg() executes INLA's efficient approximation to full Bayesian inference.
Example 4: Generating and Summarizing Predictions
# Generate gridded predictions across the full study area
# Returns 250 draws per pixel as a SpatRaster with 250 layers
predictive_draws <- predict(model_fit, newdata = covariates)
# Summarize draws to standard uncertainty surfaces
# mean: posterior mean (best point estimate)
# median: robust central tendency
# lower, upper: 95% credible interval bounds
summary_rasters <- summarize_draws(
predictive_draws,
statistics = c("mean", "median", "lower", "upper"),
probs = c(0.025, 0.975) # For 95% interval
)
# Result: four SpatRasters you can map, export, or analyze
plot(summary_rasters$mean, main = "Predicted Prevalence (Posterior Mean)")
plot(summary_rasters$upper - summary_rasters$lower,
main = "Uncertainty Width (95% CI)")
Critical distinction: Most mapping tools give you a single predicted surface. mbg gives you 250 plausible surfaces, from which you derive honest uncertainty bounds. That uncertainty width map? It's often more scientifically important than the mean prediction, revealing where your data actually informs versus where you're extrapolating blindly.
Example 5: Uncertainty-Preserving Aggregation
# Load administrative boundaries for aggregation
admin_boundaries <- st_read("districts.shp")
# Aggregate pixel predictions to polygons
# CRITICAL: this preserves uncertainty through aggregation
# Not just means of means, but full posterior distributions
district_estimates <- aggregate_to_polygons(
draws = predictive_draws,
polygons = admin_boundaries,
polygon_id = "district_id",
weights = population, # Population-weighted aggregation
statistics = c("mean", "median", "lower", "upper")
)
# Result: sf object with uncertainty for each district
# Perfect for policy reports, resource allocation, epidemiological summaries
print(district_estimates[, c("district_id", "mean", "lower", "upper")])
Why this matters: Naive aggregation takes pixel means and averages them. But uncertainty doesn't average away—some pixels have tight distributions, others wide. mbg's aggregation properly combines these distributions, giving you correct uncertainty bounds at the district level. This is the difference between defensible science and statistical malpractice.
Advanced Usage & Best Practices: Pro Tips from the Trenches
Mesh Design is Everything
Your SPDE mesh quality dominates model accuracy and computation time. Too coarse: miss important spatial structure. Too fine: computational nightmare. Start with max.edge roughly 1/5 of your expected spatial range, and use cutoff to prevent excessive refinement near dense point clusters. Visualize your mesh with plot(mesh) before fitting—if it looks wrong, it is wrong.
Covariate Selection Strategy
Don't dump every raster you have into the model. Use the ML stage for initial variable importance via caret's varImp(), then include top predictors in the geostatistical model. This reduces multicollinearity and improves identifiability of the spatial random effect.
Computational Scaling
For very large domains, consider mesh splitting or stochastic partial likelihood approaches. INLA supports various approximations; consult the R-INLA FAQ for strategies when meshes exceed 50,000 vertices.
Validation That Actually Validates
Standard cross-validation fails for spatial data—nearby points are correlated, so holding out random points doesn't test true predictive performance. Implement spatial cross-validation using spatialsample or manual blocking by geographic regions. mbg predictions should be evaluated on held-out spatial blocks, not random pixels.
Reproducibility Checklist
Always record: INLA version (results can vary slightly), mesh parameters, exact covariate sources and versions, and random seeds for posterior sampling. The mbg workflow is deterministic given these inputs, but only if you document them.
Comparison with Alternatives: Why mbg Wins
| Feature | mbg |
gstat (kriging) |
stan/brms (full Bayes) |
mgcv (GAMs) |
|---|---|---|---|---|
| Computational Speed | Fast (INLA approximation) | Very fast | Slow (MCMC hours-days) | Fast |
| Uncertainty Quantification | Full posterior predictive | Kriging variance only | Gold standard | Approximate |
| ML Integration | Built-in via caret |
None | Manual only | Limited |
| Spatial Correlation Model | SPDE (flexible, scalable) | Matérn covariance | Flexible | Thin-plate splines |
| Polygon Aggregation | Native, uncertainty-preserving | Manual, no uncertainty propagation | Manual | Manual |
| Learning Curve | Moderate | Low | Steep | Moderate |
| Large Datasets (>10k points) | Excellent | Poor (dense matrices) | Poor | Good |
The verdict: gstat wins for quick, simple kriging without uncertainty demands. stan/brms wins when you need maximum flexibility and can afford computation time. But for production geostatistical mapping with full uncertainty quantification at scale, mbg occupies a sweet spot that nothing else matches. The native integration of ML preprocessing, SPDE-based spatial modeling, and uncertainty-preserving aggregation creates a workflow that would require hundreds of lines of custom code to replicate elsewhere.
FAQ: Your Burning Questions Answered
Q: Is mbg suitable for beginners in spatial statistics?
A: If you know basic R, sf, and regression concepts, you can run mbg successfully. However, interpreting spatial random effects and mesh diagnostics requires some geostatistical knowledge. Start with the vignette and simple datasets.
Q: How does mbg handle non-Gaussian outcomes like disease counts?
A: The package supports INLA's exponential family distributions including binomial (prevalence), Poisson (counts), and negative binomial (overdispersed counts). Specify via the family argument in MbgModelRunner().
Q: Can I use my own machine learning models outside caret?
A: Yes—generate predictions externally, stack them as a SpatRaster, and pass to the geostatistical stage as covariates. The ML integration is convenient but not mandatory.
Q: What about temporal or spatiotemporal modeling?
A: The current mbg release focuses on purely spatial models. For spatiotemporal extensions, you'll need to interface with INLA's st models directly or await future package development.
Q: How do I cite mbg in publications?
A: Check citation("mbg") after installation. Also cite the underlying INLA methodology (Rue et al., 2009) and SPDE approach (Lindgren et al., 2011).
Q: Is commercial use permitted?
A: mbg is open-source; check the specific license file in the repository. INLA itself has some commercial-use considerations—consult r-inla.org for current terms.
Q: Where do I get help when stuck?
A: The GitHub Issues page for bugs, Stack Overflow with [r] and [geostatistics] tags for usage questions, and the INLA discussion group for underlying methodology.
Conclusion: Your Spatial Analysis Just Leveled Up
The mbg package isn't just another R spatial tool—it's a production-ready pipeline forged in the fires of global health mapping and refined for broad accessibility. By unifying machine learning preprocessing, SPDE-based Bayesian geostatistics, and uncertainty-preserving aggregation, it solves the complete problem that researchers actually face: turning scattered point observations into rigorous, decision-relevant spatial products.
Here's my honest assessment: If you're currently hand-rolling kriging workflows, struggling to propagate uncertainty through aggregation, or avoiding Bayesian geostatistics because INLA seemed too intimidating, mbg is your escape hatch. The package won't make you an expert spatial statistician overnight—you still need to understand your mesh, validate your models, and interpret your posteriors. But it removes the friction that kills most projects before they start.
The spatial data explosion isn't slowing down. GPS-enabled sensors, citizen science, satellite constellations—point observations are everywhere. The researchers who thrive will be those who can transform these points into surfaces that decision-makers trust. mbg gives you that capability with scientific rigor and computational efficiency that was literally impossible a decade ago.
Stop wrestling with rasters. Start estimating surfaces.
👉 Install mbg from CRAN today — and explore the complete source code, examples, and development roadmap on GitHub.
Your future self—the one presenting publication-quality maps with defensible uncertainty bounds at next week's team meeting—will thank you.
Acknowledgments: Deep gratitude to IHME's Local Burden of Disease team and the Demographic and Health Surveys Geospatial Analysis team for the methodological foundations underlying this package.