optRF

Optimising Random Forest (optRF)

The Research Project

Random forest is a particularly prominent machine learning method used for predictions and prediction based decision-making processes. Although random forest is known to have many advantages, one aspect that is often overseen is that it is a non-deterministic method that can produce different models using the same input data. This can have severe consequences on decision-making processes. The R package optRF models the non-linear relationship between the number of trees and the prediction stability and uses this model to determine the optimal number of trees for any given data set.

Software

The R package optRF is open source and provides tools to automatically optimise the prediction stability of random forest prediction models. It can be installed in R by:
> install.packages("optRF")
> library("optRF")
> ?opt_prediction
Further material can be found at:

Publications

A detailed description of the method as well as a practical introduction to the problem of non-determinism and the work flow of the R package can be found at:
  • Link to the original publication: optRF: Optimising random forest stability by determining the optimal number of trees. BMC Bioinformatics (2025). DOI: 10.1186/s12859-025-06097-1
  • Link to the blog post: How to Set the Number of Trees in Random Forest - A practical introduction to the optRF package, towardsdatascience.com

Presentations

Presentation slides of selected conference contributions about the research project can be found here:
  • Presentation of optRF generally in all fields of biometry, presented at the 6th Central European Network (CEN) conference "Power of Data – Shaping the Future of Life Sciences" 2026 in Warsaw (Poland)
    Presentation slides
  • Presentation of the optRF method specifically in genomic selection in wheat breeding, presented at the 8th Conference on Cereal Biotechnology and Breeding (CBB) 2025 in Budapest (Hungary)
    Presentation slides