It looks like older projects have higher lines-written to lines-in-production ratios:
terraform-aws-couchbase (2018) - 5:1
Terratest (2016) - 8:1
Terraform (2014) - 9:1
Express.js (2010) - 14:1
jQuery (2006) - 15:1
MySQL (1995) - 16:1
This is a small sample size but it seems easy enough to run in some popular open source perfect and see if there's a statically significant trend. It would also be cool to see the lowest and highest ratios on popular projects.
This is very, very interesting but I think it would be even better if it was somehow normalized; in the first N years (1, 2, 5, 10), what is the lines-written to lines-in-production ratio? Older, still active projects will almost for sure have a higher ratio just as a matter of fixing bugs.
So if it was normalized, we could see whether newer projects are rushed to production faster or it's just a matter of time passing.
+1 for computing some sort of normalization factors... here are some ideas:
N = number of API endpoints, then N/cloc would be something like efficiency (get more stuff done with less code)
M = number of lines of code for all code paths accesses during average day on PROD, then (cloc-M)/cloc would represent the "dead code" ratio---how much of your code base is not used
X(c1,c2) = an arbitrary function computed on the diff between commits c1 and c2
And all of the above can be run using some sort of rolling window from git init to today.
I'm curious about the LOCs. I believe (but haven't bothered to verify) that code swells and then contracts in cycles. Expansion is due to the drunken sailors walk thru the solution space, as everything is tried. Contraction is due to identifying best fit (good enough) paths, code deduplication, dead code (and feature) removal, generalizations hard earned thru experience.
I think 10:1 is a good rule of thumb, but it does feel like time is a function that needs to be accounted for, too.
It would also be interesting to think through how many changes there are _between releases_ of a project. MySQL is 23 years old [1], but what's the effort/change between major releases at this point? That's where the rule sort of falls down for me: a book has a few editions (if it's lucky); software, on the other hand, has lots of releases if it's successful.
terraform-aws-couchbase (2018) - 5:1
Terratest (2016) - 8:1
Terraform (2014) - 9:1
Express.js (2010) - 14:1
jQuery (2006) - 15:1
MySQL (1995) - 16:1
This is a small sample size but it seems easy enough to run in some popular open source perfect and see if there's a statically significant trend. It would also be cool to see the lowest and highest ratios on popular projects.