Abandoned in German Farmland: Navigating Your Next Move
Author
Anna Ly
Published
December 1, 2024
Abstract
A fun trip to Europe with friends can take an unexpected turn, leaving you stranded in German farmland amidst unfamiliar predators. This study investigates the factors associated with the distance to the nearest settlement, providing a guide for navigating back to safety. Using an ecological dataset from farmland in Diemarden and Eichsfeld, originally collected to study predator counts, we analyzed variables such as distances to water, forests, roads, field edges, and vegetation types. A log-normal linear mixed model (LMM) was fitted using three approaches (lmer, blme, and brms) and they produced similar results. Our analysis identified the distance to the nearest road as the most influential factor. While other variables showed limited association with settlement distance, they may still provide useful signals in similar scenarios.
1. Background on Data Source and the Collector’s Purpose
Prior research indicates that birds that frequently live in farmland have declined (Bro et al. 2000; Amelie Laux, Waltert, and Gottschalk 2022). One reason for their decline is due to predators, such as red foxes Bro et al. (2000). Red foxes have increased in numbers due to rabies vaccine efforts, allowing them to thrive in the wild Chautan, Pontier, and Artois (2000). The other major reason is deforestation, turning natural habitats into mono-cultures, where participants only grow one crop Stanton, Morrissey, and Clark (2018).
Laux et al. were trying to investigate if increasing the number of flower strips, a section of land set aside to grow wildflowers, decreases the amount of predators in an area Amelie Laux, Waltert, and Gottschalk (2022). Their original questions were:
Which are the main predators in farmland contained in Göttingen in Lower Saxony, Germany?
Are there differences in predator activity between vegetation types?
Which environmental parameters explain spatial variation in predator activity best?
How do predators deal with flower strips?
The two areas they analyzed were on hilly cultural landscapes dominated by agriculture:
Diemarden, laying directly south of Göttingen and covered \(35 \text{ km}^{2}\)
Eichsfeld, located east of Göttingen and encompassed \(131 \text{ km}^{2}\).
To capture the data they randomly distributed 120 camera traps (HDPX-5, Browning Trail Cameras) mounted on wooden stakes in the center flower strip to record predators. This took place during the typical breeding season for farmland birds: May to July for both 2019 and 2020.
The researchers analyzed differences between predator capture rates and vegetation types using Kruskal–Wallis rank sum tests. They also incorporated generalized linear mixed models (GLMMs), particularly a negative binomial GLMM to analyze the total number of predators (response) based on the landscape composition (fixed effects), grouping based off of the year and time blocks. The code they used can be found in A. Laux, Waltert, and Gottschalk (2022a), and the corresponding data can be found in A. Laux, Waltert, and Gottschalk (2022b).
2. Our Study
While predation counts were the primary focus for the data collectors, they also recorded additional interesting variables. In particular, they gathered data on distances to the nearest forest, running water, settlements, field edges, and roads.
No sane person plans to be stranded in a foreign landscape… but if we were stranded in German farmland, with various animal predators in the vicinity, which factors would most influence the ability to navigate back to a nearby settlement for assistance?
Rather than analyzing predator capture rates, we focus on the distance to the nearest settlement as the response variable. Distances to other features, such as forests, running water, field edges, and roads, along with the number of predators, will serve as explanatory variables in the analysis.
This report is structured as follows: Section 3 outlines the data processing steps and introduces the variables used in the analysis. Section 4 presents the exploratory plots. Section 5 describes the three models employed. Section 6 presents the results and compares the outcomes across models. Section 7 provides the conclusions. Section 8 discusses challenges encountered throughout the process.
During data preparation, one row in the dataset was found to have unusual values noticeably distinct from the rest and was excluded from the analysis.
Additionally, the different predator types were combined into broader categories. The original dataset included separate counts for badgers, boars, foxes, martens, mouse weasels, raccoons, stoats, dogs, and cats. However, many of these predators were rarely observed, except for foxes. From a practical standpoint, accurate species identification in the wilderness is challenging without specific domain knowledge. All non-domestic species were consolidated under the category “wild predators.”
Cats and dogs, on the other hand, are accustomed to human interaction. Their presence in the field could indicate proximity to human settlements. Since their counts were also sparse, they were grouped into a single “pet predators” category.
Beyond these adjustments, the dataset was preserved in line with the original collectors’ methodology. Below is a table summarizing the variables selected for analysis.
Variable
Description
Type
Settl_Dist
Distance from the camera to the nearest settlement (m).
Quantitative, Continuous
Wood_Dist
Distance from the camera to the nearest forest/woods/hedges (m).
Quantitative, Continuous
Water_Dist
Distance from the camera to the nearest running or standing water (m).
Quantitative, Continuous
Edge_Dist
Distance from the camera to the nearest field edge (m).
Quantitative, Continuous
Road_Dist
Distance from the camera to the nearest road or railway outside of settlements (m).
Quantitative, Continuous
Vegetation
Type of vegetation grown near the farmland. B = flower strip, E = field margin, H = hedge, G = winter cereal, R = rapeseed.
Qualitative, 5 levels
site
Unique identifier for each camera trap station.
Qualitative, 121 levels
wild_predators
Number of wild predators (badgers, boars, foxes, martens, mouse weasels, raccoons, stoats) captured on video, extrapolated to 100 active camera days.
Quantitative, Continuous
pet_predators
Number of carnivorous animal pets captured on video, extrapolated to 100 active camera days.
Quantitative, Continuous
If you inspected the entire original dataset, you may initially suggest using camera instead of site as a grouping variable. However, the authors mentioned that cameras were frequently rotated and occasionally moved locations. This made site a more reliable grouping variable for analysis.
4. Exploratory Data Analysis
A limitation of this dataset is that there are only two observations per site. This makes it nearly impossible to create graphs that account for the grouping variable. Consequently, most exploratory plots simply depict the response variable against selected explanatory variables.
These scatter plots suggest no clear relationship between settlement distance and distances to the nearest field edge or forest. There appears to be a slight linear relationship between distance to the nearest road and distance to the nearest settlement, although the majority of points remain scattered.
Predator sightings were relatively rare, especially for species other than red foxes. Consequently, even after combining predator types, the majority of recorded counts remain at zero.
The last plot shows the distribution of settlement distance by vegetation type: flower strip, field margin, winter cereal, hedges, or rapeseed.
Here, we see that there is not really a difference in settlement distance across vegetation types. However, the effect plots in Section 6 will give a clearer picture once other variables are accounted for.
5. Model Fitting Methods
The response is the distance to the next settlement. The fixed effects include distances to the nearest forest, running water, field edge, and road; the number of wild predators and pet predators; and vegetation type as a categorical variable. Observations are grouped by site.
Since the distance to the next settlement is continuous and strictly positive, we expect the conditional distribution to be log-normal. Fitting the model on log(Settl_Dist) with a Gaussian family and identity link makes this a log-normal linear mixed model (LMM).
However, with only two observations per site, fitting the maximal model is not feasible. It would require estimating an 8×8 random effects covariance matrix (36 parameters) with very little information per cluster. Instead, we fit three models with random intercepts only.
Model 1 uses lmer from the lme4 package Bates et al. (2015), estimating parameters by restricted maximum likelihood (REML).
Model 2 uses blmer from the blme package Chung et al. (2013), which is the Bayesian analogue of lmer. Rather than maximizing the likelihood, it maximizes the posterior (MAP estimation). The default prior for the random effects covariance matrix is the Wishart distribution with 3.5 degrees of freedom.
Model 3 uses brm from the brms package Bürkner (2021), performing full Bayesian estimation via Markov chain Monte Carlo (MCMC). The default prior for the standard deviations of the random effects is a half-Student-t distribution with 3 degrees of freedom, location 0, and scale 2.5. Fixed effects use flat priors, so they are estimated entirely from the data.
Using check_model from the performance package Lüdecke et al. (2021), all three models produce nearly identical diagnostics with the same outliers. We show the results for Model 1 here; the other two are in the appendix.
Model assumption checks for Model 1. Some mild violation in homogeneity of variance and a few outliers, but no major issues.
The model assumptions hold reasonably well, with some mild violation in the homogeneity of variances and a couple of outliers. Splines were explored to improve variance homogeneity but did not improve the fit without introducing other issues (see Section 8). We proceed with these models.
6. Results
We first compare scaled coefficients using the method presented by Gelman Gelman (2008), dividing numeric predictors by 2 standard deviations, and the dotwhisker package Solt and Hu (2024). All three models perform similarly.
There is no substantial difference between vegetation types and settlement distance. The confidence intervals for distances to water, forest, field edge, number of wild predators, and number of pet predators all include 0. Distance to the nearest road stands out as the most clearly associated explanatory variable, consistent with the scatter plots.
The effect plots below were generated using effects::allEffectsFox and Hong (2009) for Models 1 and 2, and conditional_effects from brms Bürkner (2021) for Model 3. It is worth noting that the brms plots include additional uncertainty components not present in the lme4/blme plots. The lme4 and blme effects plots condition on the estimated random effects, whereas brms propagates uncertainty from the full posterior. As shown, all three models exhibit similar patterns.
Effects plots: distance to nearest road vs. log(settlement distance) for lmer, blmer, and brms.
Effects plots: distance to nearest water vs. log(settlement distance).
Effects plots: distance to nearest field edge vs. log(settlement distance).
Effects plots: distance to nearest forest vs. log(settlement distance).
The first four plots suggest the following trends:
As distance to the nearest road decreases, the log distance to the nearest settlement also decreases. This variable has the steepest slope and is consistent with both the EDA scatter plot and the coefficient plot.
Surprisingly, greater distance from the nearest water source is associated with shorter distance to the nearest settlement. This could reflect settlements being built away from standing water, such as small ponds.
Smaller distance to the next field edge is associated with larger distance from the nearest settlement. Field edges tend to be further from human settlements.
Larger distance to the nearest forests and hedges is associated with larger distance to nearby settlements, suggesting German settlements are often located near forested areas.
Effects plots: number of wild predators vs. log(settlement distance).
Effects plots: number of dogs & cats vs. log(settlement distance).
The number of wild predators is not clearly associated with settlement distance. Given that predator sightings were rare overall, the effect is uncertain rather than necessarily small. There is a slight trend where more dog and cat sightings are associated with greater settlement distance, which is counter-intuitive. The paper did not differentiate between wild and domestic animals, which may contribute to this result.
Effects plots: vegetation type vs. log(settlement distance). Level codes are defined in Section 3.
Unlike the raw boxplots, the effect plots show that flower strips (B) are on average farther from settlements, while field margins (E) are closer. One possible explanation is that field margins are cleared or maintained near residences to mitigate pests.
7. Conclusion
If you find yourself stranded in farmland in Germany (specifically Diemarden or Eichsfeld), the best strategy is to locate the nearest roads and railways.
If roads are not visible, other factors to consider include avoiding field edges, running or standing water, and flower strips. Wild predator counts are not clearly associated with settlement proximity and could be misleading. Forests, hedges, and field margins may help guide you toward a settlement.
Additionally, the three packages we used (lme4, blme, and brms) produced consistent results. The brms model took considerably longer to run due to MCMC sampling, while the other two completed in under a couple of seconds. Nonetheless, it is reassuring to see that different modelling methods deliver consistent results.
This analysis is limited by a small sample size (two observations per site) and certain ambiguities in the dataset, such as the lack of information about site codes. Mild violations of homogeneity of variance were also observed.
8. Notes on the Modelling Process
The first challenge was an error when trying to fit a gamma GLMM. While gamma models are standard for continuous positive data, the error “PIRLS step-halvings failed to reduce deviance” was encountered. Based on a GitHub thread lpitombo (2014), Bolker recommends fitting a log-normal LMM instead, which resolved the issue.
The second issue arose when initially considering grouping by vegetation rather than site. Despite having at least 47 observations per level, every model specification returned singular fit warnings. After consulting Bolker, he recommended using site, which limited us to random intercepts only. A natural extension would be a model with a diagonal random effects covariance matrix, which would include random intercepts plus uncorrelated random slopes and allow more flexibility without requiring a full covariance structure.
The third challenge involved incorporating splines. Under Harrell’s rules of thumb Harrell and Harrell (2015), there was room for at most one spline term given the degrees of freedom already used. Any spline addition triggered maximum treedepth warnings from Stan (used by brms), and the spline did not meaningfully improve the homogeneity of variance. Splines were excluded from the final model.
Rather than using ggeffects for effect plots, base R effects plots were combined with ggplot using ggplotify Yu (2023). This was functional, based on a Stack Overflow thread “Combine Base and Ggplot Graphics in r Figure Window” (2013).
Finally, the analysis relied on default prior settings for both blme and brms. Given limited domain knowledge for this ecological application, this was a practical choice. The fact that results align with the frequentist model suggests the default priors were reasonable.
Bates, Douglas, Martin Mächler, Ben Bolker, and Steve Walker. 2015. “Fitting Linear Mixed-Effects Models Using lme4.”Journal of Statistical Software 67 (1): 1–48. https://doi.org/10.18637/jss.v067.i01.
Bro, E, F Reitz, J Clobert, and P Mayot. 2000. “Nesting Success of Grey Partridges (Perdix Perdix) on Agricultural Land in North-Central France: Relation to Nesting Cover and Predator Abundance.”
Bürkner, Paul-Christian. 2021. “Bayesian Item Response Modeling in R with brms and Stan.”Journal of Statistical Software 100 (5): 1–54. https://doi.org/10.18637/jss.v100.i05.
Chautan, M, D Pontier, and Marc Artois. 2000. “Role of Rabies in Recent Demographic Changes in Red Fox (Vulpes Vulpes) Populations in Europe.” https://doi.org/https://doi.org/10.1515/mamm.2000.64.4.391.
Chung, Yeojin, Sophia Rabe-Hesketh, Vincent Dorie, Andrew Gelman, and Jingchen Liu. 2013. “A Nondegenerate Penalized Likelihood Estimator for Variance Parameters in Multilevel Models.”Psychometrika 78 (4): 685–709. https://doi.org/10.1007/s11336-013-9328-2.
Donald, Paul F, RE Green, and MF Heath. 2001. “Agricultural Intensification and the Collapse of Europe’s Farmland Bird Populations.”Proceedings of the Royal Society of London. Series B: Biological Sciences 268 (1462): 25–29. https://doi.org/https://doi.org/10.1098/rspb.2000.1325.
Fox, John, and Jangman Hong. 2009. “Effect Displays in R for Multinomial and Proportional-Odds Logit Models: Extensions to the effects Package.”Journal of Statistical Software 32 (1): 1–24. https://doi.org/10.18637/jss.v032.i01.
Gelman, Andrew. 2008. “Scaling Regression Inputs by Dividing by Two Standard Deviations.”Statistics in Medicine 27 (15): 2865–73.
Harrell, Frank E, Jr, and Frank E Harrell. 2015. “Multivariable Modeling Strategies.”Regression Modeling Strategies: With Applications to Linear Models, Logistic and Ordinal Regression, and Survival Analysis, 63–102.
Laux, Amelie, Matthias Waltert, and Eckhard Gottschalk. 2022. “Camera Trap Data Suggest Uneven Predation Risk Across Vegetation Types in a Mixed Farmland Landscape.”Ecology and Evolution 12 (7): e9027. https://doi.org/https://doi.org/10.1002/ece3.9027.
Laux, A., M. Waltert, and E. Gottschalk. 2022a. “Camera Trap Data Suggest Uneven Predation Risk Across Vegetation Types in a Mixed Farmland Landscape.”https://doi.org/10.5281/zenodo.6594690.
Lüdecke, Daniel, Mattan S. Ben-Shachar, Indrajeet Patil, Philip Waggoner, and Dominique Makowski. 2021. “performance: An R Package for Assessment, Comparison and Testing of Statistical Models.”Journal of Open Source Software 6 (60): 3139. https://doi.org/10.21105/joss.03139.
Newton, Ian. 2004. “The Recent Declines of Farmland Bird Populations in Britain: An Appraisal of Causal Factors and Conservation Actions.”Ibis 146 (4): 579–600. https://doi.org/https://doi.org/10.1111/j.1474-919X.2004.00375.x.
Stanton, RL, CA Morrissey, and RG Clark. 2018. “Analysis of Trends and Agricultural Drivers of Farmland Bird Declines in North America: A Review.”Agriculture, Ecosystems & Environment 254: 244–54. https://doi.org/https://doi.org/10.1016/j.agee.2017.11.028.
Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain François, Garrett Grolemund, et al. 2019. “Welcome to the tidyverse.”Journal of Open Source Software 4 (43): 1686. https://doi.org/10.21105/joss.01686.