29 lines
		
	
	
		
			8.9 KiB
		
	
	
	
		
			HTML
		
	
	
	
	
	
			
		
		
	
	
			29 lines
		
	
	
		
			8.9 KiB
		
	
	
	
		
			HTML
		
	
	
	
	
	
| <!doctype html><html lang=en><head><title>Random forest models for the prediction of biome types and climate variables · Pim Nelissen
 | ||
| </title><meta charset=utf-8><meta name=viewport content="width=device-width,initial-scale=1"><meta name=color-scheme content="light dark"><meta name=author content="Pim Nelissen"><meta name=description content="
 | ||
|   
 | ||
|     Tip
 | ||
|   
 | ||
|   The paper is available here.
 | ||
| 
 | ||
| 
 | ||
| 
 | ||
|   Reflection
 | ||
|   
 | ||
|     
 | ||
|     Link to heading
 | ||
|   
 | ||
| 
 | ||
| This project was a big learning moment when it comes selecting training and testing datasets appropriately in statistical learning. The model can only ever be as good as the data we use. It was one of the first times working with geographical, grid-based data, which was also interesting. Since all of the data worked with was directly fed from a model, it’s also important to know the limits of one’s original model which provided the data. Sometimes, the problem may not be our classifier or regression model, but simply that we did not have enough, or the right, information to properly distinguish data in the first place."><meta name=keywords content><meta name=twitter:card content="summary"><meta name=twitter:title content="Random forest models for the prediction of biome types and climate variables"><meta name=twitter:description content="Tip The paper is available here. Reflection Link to heading This project was a big learning moment when it comes selecting training and testing datasets appropriately in statistical learning. The model can only ever be as good as the data we use. It was one of the first times working with geographical, grid-based data, which was also interesting. Since all of the data worked with was directly fed from a model, it’s also important to know the limits of one’s original model which provided the data. Sometimes, the problem may not be our classifier or regression model, but simply that we did not have enough, or the right, information to properly distinguish data in the first place."><meta property="og:url" content="/en/projects/biome-classification/"><meta property="og:site_name" content="Pim Nelissen"><meta property="og:title" content="Random forest models for the prediction of biome types and climate variables"><meta property="og:description" content="Tip The paper is available here. Reflection Link to heading This project was a big learning moment when it comes selecting training and testing datasets appropriately in statistical learning. The model can only ever be as good as the data we use. It was one of the first times working with geographical, grid-based data, which was also interesting. Since all of the data worked with was directly fed from a model, it’s also important to know the limits of one’s original model which provided the data. Sometimes, the problem may not be our classifier or regression model, but simply that we did not have enough, or the right, information to properly distinguish data in the first place."><meta property="og:locale" content="en"><meta property="og:type" content="article"><meta property="article:section" content="projects"><meta property="article:published_time" content="2024-11-14T00:00:00+00:00"><meta property="article:modified_time" content="2024-11-14T00:00:00+00:00"><link rel=canonical href=/en/projects/biome-classification/><link rel=preload href=/fonts/fa-brands-400.woff2 as=font type=font/woff2 crossorigin><link rel=preload href=/fonts/fa-regular-400.woff2 as=font type=font/woff2 crossorigin><link rel=preload href=/fonts/fa-solid-900.woff2 as=font type=font/woff2 crossorigin><link rel=stylesheet href=/css/coder.min.7763f8bc6341ecf82378e867c285e1549abb063a899be313ccd25dbfcd24fa7d.css integrity="sha256-d2P4vGNB7PgjeOhnwoXhVJq7BjqJm+MTzNJdv80k+n0=" crossorigin=anonymous media=screen><link rel=stylesheet href=/css/coder-dark.min.a00e6364bacbc8266ad1cc81230774a1397198f8cfb7bcba29b7d6fcb54ce57f.css integrity="sha256-oA5jZLrLyCZq0cyBIwd0oTlxmPjPt7y6KbfW/LVM5X8=" crossorigin=anonymous media=screen><link rel=stylesheet href=/css/timeline.css><link rel=icon type=image/svg+xml href=/images/favicon.svg sizes=any><link rel=icon type=image/png href=/images/favicon-32x32.png sizes=32x32><link rel=icon type=image/png href=/images/favicon-16x16.png sizes=16x16><link rel=apple-touch-icon href=/images/apple-touch-icon.png><link rel=apple-touch-icon sizes=180x180 href=/images/apple-touch-icon.png><link rel=manifest href=/site.webmanifest><link rel=mask-icon href=/images/safari-pinned-tab.svg color=#5bbad5></head><body class="preload-transitions colorscheme-auto"><div class=float-container><a id=dark-mode-toggle class=colorscheme-toggle><i class="fa-solid fa-adjust fa-fw" aria-hidden=true></i></a></div><main class=wrapper><nav class=navigation><section class=container><a class=navigation-title href=/en/>Pim Nelissen
 | ||
| </a><input type=checkbox id=menu-toggle>
 | ||
| <label class="menu-button float-right" for=menu-toggle><i class="fa-solid fa-bars fa-fw" aria-hidden=true></i></label><ul class=navigation-list><li class=navigation-item><a class=navigation-link href=/en/cv/>CV</a></li><li class=navigation-item><a class=navigation-link href=/en/skills/>Skills</a></li><li class=navigation-item><a class=navigation-link href=/en/projects/>Projects</a></li><li class=navigation-item><a class=navigation-link href=/en/contact/>Contact</a></li></ul></section></nav><div class=content><section class="container page"><article><header><h1 class=title><a class=title-link href=/en/projects/biome-classification/>Random forest models for the prediction of biome types and climate variables</a></h1></header><div class="notice tip"><div class=notice-title><i class="fa-solid fa-lightbulb" aria-hidden=true></i>Tip</div><div class=notice-content>The paper is available <a href=/files/random-forests-biomes.pdf>here</a>.</div></div><h1 id=reflection>Reflection
 | ||
| <a class=heading-link href=#reflection><i class="fa-solid fa-link" aria-hidden=true title="Link to heading"></i>
 | ||
| <span class=sr-only>Link to heading</span></a></h1><p>This project was a big learning moment when it comes selecting training and testing datasets appropriately in statistical learning. The model can only ever be as good as the data we use. It was one of the first times working with geographical, grid-based data, which was also interesting. Since all of the data worked with was directly fed from a model, it’s also important to know the limits of one’s original model which provided the data. Sometimes, the problem may not be our classifier or regression model, but simply that we did not have enough, or the right, information to properly distinguish data in the first place.</p><p>This project strengthened my skills in building basic random forest pipelines, from data partitioning and preprocessing to hyperparameter tuning, performance evaluation, and model interpretation, all within the context of environmental and climate data.</p><h1 id=summary>Summary
 | ||
| <a class=heading-link href=#summary><i class="fa-solid fa-link" aria-hidden=true title="Link to heading"></i>
 | ||
| <span class=sr-only>Link to heading</span></a></h1><p>In this project, I developed some random forest models to predict biome classes (both binary and multi-class) as well as two continuous climate variables: vegetation carbon pool (VegC) and net primary productivity (NPP). To adress class imbalances, I used SMOTE up-sampling, which notably improved recall for underrepresented classes. By grid-search cross-validation I tried to tune the models better. Performance was evaluated using accuracy, precision, recall, and F₁ score for classifiers, and RMSE for regressors. Finally, I analyzed feature importance to better understand which climate variables, such as seasonal precipitation or extreme temperatures, were driving the model predictions.</p><h1 id=results>Results
 | ||
| <a class=heading-link href=#results><i class="fa-solid fa-link" aria-hidden=true title="Link to heading"></i>
 | ||
| <span class=sr-only>Link to heading</span></a></h1><p>The binary biome classifier achieved up to 85.7% accuracy after SMOTE resampling, effectively distinguishing between temperate deciduous and mixed forests, with winter temperature and autumn precipitation emerging as key predictors. The multi-class classifier reached a weighted F₁ score of around 0.65, although it struggled to separate closely related biomes, reflecting the continuity between the biomes proves challenging for ML to solve. The regression models performed well overall, but revealed spatial biases around coastal and desert areas, suggesting the need to account for additional local processes like soil variability or ocean influences.</p></article></section></div><footer class=footer><section class=container>©
 | ||
| 2024 -
 | ||
| 2025
 | ||
| Pim Nelissen
 | ||
| ·
 | ||
| Powered by <a href=https://gohugo.io/ target=_blank rel=noopener>Hugo</a> & <a href=https://github.com/luizdepra/hugo-coder/ target=_blank rel=noopener>Coder</a>.</section></footer></main><script src=/js/coder.min.6ae284be93d2d19dad1f02b0039508d9aab3180a12a06dcc71b0b0ef7825a317.js integrity="sha256-auKEvpPS0Z2tHwKwA5UI2aqzGAoSoG3McbCw73gloxc="></script></body></html> | 
