Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> what differences the trees within the random forest have - as I understand it, they are all slightly different, and this gives them greater accuracy?

I can give you an example from my own work. We have a random forest on a 400+ attribute input (ie: 400 variables). All we want at the end is a probability from 0.0 to 1.0.

Our random forest model will build around 500 trees. Each tree randomly selected a small subset of those 400+ input attributes and says "what's the best I can do using only these attributes?". Generally, it does okay. But when you average the 500 trees, the accuracy is pretty darned good.

Edit later: To be clear, each new tree is generated using the random subset of variables. The point is that each tree may glean some insight about that small combination of variables.



Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: