website/public/en/projects/biome-classification/index.html

<!DOCTYPE html>
<html lang="en">

<head><script src="/livereload.js?mindelay=10&amp;v=2&amp;port=1313&amp;path=livereload" data-no-instant defer></script>
  <title>
  Random forest models for the prediction of biome types and climate variables · Pim Nelissen
</title>
  <meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="color-scheme" content="light dark">


<meta name="author" content="Pim Nelissen">
<meta name="description" content="

    Tip

  The report is available here.


  Reflection


    Link to heading


This project was a big learning moment when it comes selecting training and testing datasets appropriately in statistical learning. The model can only ever be as good as the data we use. It was one of the first times working with geographical, grid-based data, which was also interesting. Since all of the data worked with was directly fed from a model, it&rsquo;s also important to know the limits of one&rsquo;s original model which provided the data. Sometimes, the problem may not be our classifier or regression model, but simply that we did not have enough, or the right, information to properly distinguish data in the first place.">
<meta name="keywords" content="">


  <meta name="twitter:card" content="summary">
  <meta name="twitter:title" content="Random forest models for the prediction of biome types and climate variables">
  <meta name="twitter:description" content="Tip The report is available here. Reflection Link to heading This project was a big learning moment when it comes selecting training and testing datasets appropriately in statistical learning. The model can only ever be as good as the data we use. It was one of the first times working with geographical, grid-based data, which was also interesting. Since all of the data worked with was directly fed from a model, it’s also important to know the limits of one’s original model which provided the data. Sometimes, the problem may not be our classifier or regression model, but simply that we did not have enough, or the right, information to properly distinguish data in the first place.">

<meta property="og:url" content="//localhost:1313/en/projects/biome-classification/">
  <meta property="og:site_name" content="Pim Nelissen">
  <meta property="og:title" content="Random forest models for the prediction of biome types and climate variables">
  <meta property="og:description" content="Tip The report is available here. Reflection Link to heading This project was a big learning moment when it comes selecting training and testing datasets appropriately in statistical learning. The model can only ever be as good as the data we use. It was one of the first times working with geographical, grid-based data, which was also interesting. Since all of the data worked with was directly fed from a model, it’s also important to know the limits of one’s original model which provided the data. Sometimes, the problem may not be our classifier or regression model, but simply that we did not have enough, or the right, information to properly distinguish data in the first place.">
  <meta property="og:locale" content="en">
  <meta property="og:type" content="article">
    <meta property="article:section" content="projects">
    <meta property="article:published_time" content="2024-11-14T00:00:00+00:00">
    <meta property="article:modified_time" content="2024-11-14T00:00:00+00:00">


<link rel="canonical" href="//localhost:1313/en/projects/biome-classification/">


<link rel="preload" href="/fonts/fa-brands-400.woff2" as="font" type="font/woff2" crossorigin>
<link rel="preload" href="/fonts/fa-regular-400.woff2" as="font" type="font/woff2" crossorigin>
<link rel="preload" href="/fonts/fa-solid-900.woff2" as="font" type="font/woff2" crossorigin>


  <link rel="stylesheet" href="/css/coder.css" media="screen">


    <link rel="stylesheet" href="/css/coder-dark.css" media="screen">


<link rel="stylesheet" href="/css/timeline.css">

<link rel="icon" type="image/svg+xml" href="/images/favicon.svg" sizes="any">
<link rel="icon" type="image/png" href="/images/favicon-32x32.png" sizes="32x32">
<link rel="icon" type="image/png" href="/images/favicon-16x16.png" sizes="16x16">

<link rel="apple-touch-icon" href="/images/apple-touch-icon.png">
<link rel="apple-touch-icon" sizes="180x180" href="/images/apple-touch-icon.png">

<link rel="manifest" href="/site.webmanifest">
<link rel="mask-icon" href="/images/safari-pinned-tab.svg" color="#5bbad5">


</head>


<body class="preload-transitions colorscheme-auto">

<div class="float-container">
    <a id="dark-mode-toggle" class="colorscheme-toggle">
        <i class="fa-solid fa-adjust fa-fw" aria-hidden="true"></i>
    </a>
</div>


  <main class="wrapper">
    <nav class="navigation">
  <section class="container">

    <a class="navigation-title" href="//localhost:1313/en/">
      Pim Nelissen
    </a>


      <input type="checkbox" id="menu-toggle" />
      <label class="menu-button float-right" for="menu-toggle">
        <i class="fa-solid fa-bars fa-fw" aria-hidden="true"></i>
      </label>
      <ul class="navigation-list">


            <li class="navigation-item">
              <a class="navigation-link " href="https://git.pimnel.com/pim/CV/raw/branch/main/cv.pdf">CV</a>
            </li>

            <li class="navigation-item">
              <a class="navigation-link " href="/en/skills/">Skills</a>
            </li>

            <li class="navigation-item">
              <a class="navigation-link " href="/en/projects/">Projects</a>
            </li>

            <li class="navigation-item">
              <a class="navigation-link " href="/en/contact/">Contact</a>
            </li>


      </ul>

  </section>
</nav>


    <div class="content">

  <section class="container page">
  <article>
    <header>
      <h1 class="title">
        <a class="title-link" href="//localhost:1313/en/projects/biome-classification/">
          Random forest models for the prediction of biome types and climate variables
        </a>
      </h1>
    </header>

    <div class="notice tip">
  <div class="notice-title">
    <i class="fa-solid fa-lightbulb" aria-hidden="true"></i>Tip
  </div>
  <div class="notice-content">The report is available <a href="/files/random-forests-biomes.pdf" >here</a>.</div>
</div>

<h1 id="reflection">
  Reflection
  <a class="heading-link" href="#reflection">
    <i class="fa-solid fa-link" aria-hidden="true" title="Link to heading"></i>
    <span class="sr-only">Link to heading</span>
  </a>
</h1>
<p>This project was a big learning moment when it comes selecting training and testing datasets appropriately in statistical learning. The model can only ever be as good as the data we use. It was one of the first times working with geographical, grid-based data, which was also interesting. Since all of the data worked with was directly fed from a model, it&rsquo;s also important to know the limits of one&rsquo;s original model which provided the data. Sometimes, the problem may not be our classifier or regression model, but simply that we did not have enough, or the right, information to properly distinguish data in the first place.</p>
<h1 id="summary">
  Summary
  <a class="heading-link" href="#summary">
    <i class="fa-solid fa-link" aria-hidden="true" title="Link to heading"></i>
    <span class="sr-only">Link to heading</span>
  </a>
</h1>
<p>In this project, I developed some random forest models to predict biome classes (both binary and multi-class) as well as two continuous climate variables: vegetation carbon pool (VegC) and net primary productivity (NPP). To adress class imbalances, I used SMOTE up-sampling, which notably improved recall for underrepresented classes. By grid-search cross-validation I tried to tune the models better. Performance was evaluated using accuracy, precision, recall, and F₁ score for classifiers, and RMSE for regressors. Finally, I analyzed feature importance to better understand which climate variables, such as seasonal precipitation or extreme temperatures, were driving the model predictions.</p>
<h1 id="results">
  Results
  <a class="heading-link" href="#results">
    <i class="fa-solid fa-link" aria-hidden="true" title="Link to heading"></i>
    <span class="sr-only">Link to heading</span>
  </a>
</h1>
<p>The binary biome classifier achieved up to 85.7% accuracy after SMOTE resampling, effectively distinguishing between temperate deciduous and mixed forests, with winter temperature and autumn precipitation emerging as key predictors. The multi-class classifier reached a weighted F₁ score of around 0.65, although it struggled to separate closely related biomes, reflecting the continuity between the biomes proves challenging for ML to solve. The regression models performed well overall, but revealed spatial biases around coastal and desert areas, suggesting the need to account for additional local processes like soil variability or ocean influences.</p>

  </article>
</section>


    </div>

    <footer class="footer">
  <section class="container">
    ©

      2024 -

    2025
     Pim Nelissen
    ·

    Powered by <a href="https://gohugo.io/" target="_blank" rel="noopener">Hugo</a> & <a href="https://github.com/luizdepra/hugo-coder/" target="_blank" rel="noopener">Coder</a>.

  </section>
</footer>

  </main>


  <script src="/js/coder.js"></script>


</body>
</html>