Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Just a few days ago I found a serious security issue in SQLite: https://sqlite.org/forum/forumpost/07beac8056151b2f

It was also promptly fixed, but it makes me feel like the millions of tests sound better than they are in reality …



Tests usually cannot prove absence of defects. But they can show presence of defects. So yes, a test suite that runs without errors tells you “nothing” in a way. But it's a very good safeguard to ensure that code changes don't inadvertently introduce unexpected changes.

For me it's also the case that I think much more thorough about what inputs could be possible and potentially problematic, so there's often an extra set of test cases around boundaries of input values that would have never been tried when just quickly throwing together a demo application to showcase and experiment with a new feature.

But the fundamental problem that lots of bugs don't appear in testing since those code paths are never tested also means that testing alone isn't sufficient. But I guess we all know that by now, and combining different kinds of tests with other approaches like code reviews (actual proofs are probably beyond the scope for the vast majority of software projects) is being done all the time to not bet everything on a misguided 100 % code coverage unit test approach that's both expensive and fairly useless.


Additionally, bug fixes should pretty much always come with tests. If the software has a lot of tests adding a few additional cases to cover the bug is trivial. But if tests are sparse, it may require a lot more time to figure out how to test that code altogether.

Testing also influences code design. 100% coverage requires planning and forethought and it will inevitably be reflected in code quality. Bugs are inevitable, but not all bugs.


> but it makes me feel like the millions of tests sound better than they are in reality

What it should make you wonder is if software as well-tested as SQLite still has bugs like this, how much worse is the situation in software with fewer tests?


I'm watching engineers re-implement error handling code I wrote line by line over the weeks after they "optimized" my error handling into a oneliner. And as the old bugs crop up when API error responses are weirdly formed, they find themselves adding the edge case code I had added previously.

If I had been granted time to add unit tests, those would just function as a source of truth: "sometimes the API returns this kinda weird error, so we handle it. Sometimes this one, so we handle it." Unit tests are nice for that, all things this given program (the UI in this case) needs to worry about from the various things it talks to (it could talk to a couple different APIs who all have different quirks).

I wasn't granted time because the API quirks are considered bugs that are being fixed... one day... hence why the oneliner "refactor" was allowed, but regardless, it has been my go to object lesson in why I finally find unit tests useful.



The tests don't prevent bugs, but they do make fixes a lot faster to get out, since you can be confident your fix doesn't break any functionality.


Not if a simple test requires rewriting hundreds of tests (aka "fragile tests").

Yes, one shouldn't be writing fragile tests, but usually from what I seen at projects with great test coverage, is that it often slows bugfix releases, and especially any bigger changes, as it's very wearisome to also change hundreds of tests.

So I believe there should be some balance between tests/code ratio, as well as attention paid to tests brittleness.


Definitely, some real world cases research show that this balance stands around 80% for unit tests- you don't earn more quality as you add more test above that.


Last time I heard 80% ratio is optimal for Java, but any dynamic typing language (JS, Python, Ruby) it's around 100% test LOC / code LOC.


Any functionality that is tested :)


"if it's not tested, it's not functionality"


Hyrum would want to have a word...


If only there was a way to run all the tests of all the software that depend on your change...

Fwiw, that's what monorepos are good for.

Sadly it's hard to make a "world monorepo"


If you pin your dependency versions and the dependency maintainers run tests before releasing, you're already almost there (excluding bugs that might arise only in your specific environment). If dependencies don't have tests or test automation, you can always contribute them.

As for the platform-specific bugs, monorepos only help if the way to run tests in all the components is standardised. But this is something you can just as easily implement across repos. Could be as simple as having a testme.sh in the root.


> > "if it's not tested, it's not functionality"

This suggests you may have not 100% test coverage in your tests. But 100% coverage if what? What is the specification you're defining your behaviour against?

The comment above suggests that you could treat your tests as if they were the ones that actually define your contract.

> Hyrum would like to have a word.

This is a reference to "Hyrum's law" which says:

"With a sufficient number of users of an API, it does not matter what you promise in the contract: all observable behaviors of your system will be depended on by somebody."

This comment, in response to the previous about defining tests as the source of truth for your contract, remarks that sadly you can't do that because no matter what contract you wish you define, ultimately the behaviour of your existing software becomes its effective contract.

> [My comment about monorepos]

Here I suggest that if you extend the notion of what is the test corpus to include the test corpus of all of the software that depend on you (not your dependencies! The code whom your code is a dependency) then you could detect if a (yet unmerged) change you're making is actually going to affect any existing code.


Totally misread the "depend on you" as "you depend on" part, sorry.

Interesting idea, but monorepos still don't help on their own. You need a way to detect which modules depend on yours and a way to run and interpret their tests. Whether they're in an adjacent folder or another repo changes very little.

GitHub actually already has a dependency scanning thingy that builds cross-repo dependency graphs [0] and package managers have been doing that for years. The missing parts wouldn't be that much effort to build, but getting everyone to set up their repos to work with it would probably be prohibitively difficult.

[0] https://docs.github.com/en/code-security/supply-chain-securi...


Monorepos are useless unless you have good tooling, including a build system that is aware of module dependencies inside the monorepo.

An example build system that can do that is https://bazel.build


Correct. Another interesting one:

https://github.com/just-buildsystem/justbuild


That's exactly what the Nixpkgs Hydra instance does—caveat being your library and its consumers all need to be in Nixpkgs. I'm hopeful that there will be a way to keep the tracking part of that going forward, as Flakes are adopted and the community decentralises.


There was an interesting comment a few months ago [1]

> Not mentioned is that the full test sqlite test suite is proprietary and you need a super expensive sqlite foundation membership to get access to it.

According to Dr Hipp [2], no one bought the test suite. So there are definitely deficiencies in the test suite which may have been better addressed if the full test suite was open.

[1] https://news.ycombinator.com/item?id=33346661

[2] https://corecursive.com/066-sqlite-with-richard-hipp/#billio...


From the second link:

> We still maintain the first one, the TCL tests. They’re still maintained. They’re still out there in the public. They’re part of the source tree. Anybody can download the source code and run my test and run all those. They don’t provide 100% test coverage but they do test all the features very thoroughly. The 100% MCD tests, that’s called TH3. That’s proprietary. I had the idea that we would sell those tests to avionics manufacturers and make money that way. We’ve sold exactly zero copies of that so that didn’t really work out. It did work out really well for us in that it keeps our product really solid and it enables us to turn around new features and new bug fixes very fast.


It also makes forking SQLite infeasible since any new changes will be woefully under tested.


SQLite's free test suite is far in excess of most other projects'. What do you think the test coverage is in the Linux kernel, GCC, or Emacs?


Right, but it makes the fork strictly worse from a reliability perspective than SQLite given it will be less tested.

If there wasn't a competitive advantage given it has no sales, wouldn't they have open sourced it by now?


> wouldn't they have open sourced it by now?

If there's minimal value in it, why put in the work to open source an extremely complex test environment?


Just for completeness sake, they do offer a SQLite Consortium Membership for $120k, which I guess includes all their test suites as a selling point: https://www.sqlite.org/prosupport.html


(Embedded) database has stricter requirements than a text editor or a compiler.


Obviously not, if nobody was willing to buy the test suite for their internal forks.


That's a non-sequitur argument.


It follows fine, but you want to debate instead of think.

If the 100% MC/DC coverage was critical to forks, the companies that fork (there's lots of them!) would have bought it.

Nobody bought it, so it's not that important to maintaining a fork compared to the regular test suite even for such environments. A test suite which, to go back to my first comment, is still leagues ahead of the dozen other pieces of lynchpin software most companies have no problem depending on.

Meanwhile, for the 99.9% of us out here not building aircraft and merely shipping a billion browsers or phones...


> If the 100% MC/DC coverage was critical to forks, the companies that fork (there's lots of them!) would have bought it.

Again, that's a non-sequitur (or perhaps strawman, you can choose), because I wasn't addressing the proprietary test set, merely the comparison between a text editor and a database, which is completely absurd since the tolerance for failures is drastically different.


> I wasn't addressing the proprietary test set

Then perhaps you're in the wrong thread to be saying anything at all.


What was the incentive to buy these tests if they've already been used to improve the product?


You can then audit the tests to ensure they test some condition that you are concerned about.

In reality nobody audits source code like that (see heartbleed for an unrelated example of critical code that didn't get proper audits from people who should have cared)


But the testsuite being proprietary is part of their business model.


At least the tests help avoid regressions, you can bet these bugs won't come back again.


I found this interesting (in another blog post of the same author):

> Keep your integration testing for smoke tests — to make sure your database actually starts and that you haven’t missed anything basic. Only when there is no way to exercise the code except when an actual full instance of the software is running should an end-to-end test be used.

This is the complete opposite of my experience. But I guess this is because he is developing a library-like software, and I'm mostly working on application code. I found unit tests mostly useless and a waste of time. But I'm sure that for a database library they are absolutely key...


When I've heard people making similar claims, what I've usually found is they're testing "glue" code: controllers, routers, etc. Personally I find this a near total waste of time: it's hard to write the tests, almost never actually catches a bug, and failures in this code are totally obvious during a smoke test - automated or not.

I write a lot of "application" code (cli, service and back-end) and a lot of tests. Parsing, calculations, file generation, regex .. that catches lots of bugs.

The value comes from keeping the complex code separate from the glue, and of course testing it. And you can easily test dozens of cases, which is usually not true of integration tests due to complexity and run time.


> failures in this code are totally obvious during a smoke test - automated or not.

Yes, but if your codebase is large enough then a non-automated smoke test can be a very slow process, especially if things are configurable. It would have taken 3-4 days to smoke test all functionality manually at my last workplace.

Automated tests could make that 5-10 minutes.


Any time I make claims like this, people look at me like I'm insane. But here I am, year after year, meeting release targets with robust software while the teams that chase test coverage and other optics don't. I would assume I'm missing something if this industry hasn't give me a million reasons not to believe in the best practices typically put forward.


It really depends on the type of software. An ETL pipeline, for example, is obviously way easier to develop and maintain through tests (record real system inputs, compare with desired outputs). But that logic doesn't extend to all other types of software.


rqlite creator here.

I'm not developing libraries, I'm developing an entire RDBMS. In my experience -- and this is broader than rqlite -- integration and end-to-end tests seem like they are great - at the start. But as you rely more and more on them they become costly to maintain and really hard to debug. A single test failure (often due to a flaky, timing-sensitive issue) means wading through layers and layers of code to identify the issue.

Overly relying on integration and end-to-end testing (note I said over-reliance, there is absolutely a need for them) becomes exponentially more costly over time (measured in development velocity and time to debug) as the test suite grows. If you find you're having difficulty identifying a place for them it may that you're not decomposing your software properly in the first place. All this is probably manageable if you're a solo developer, but when a team is building the software it can become really painful.

For more details see the talk I gave to CMU[1] on my testing strategy.

[1]https://youtu.be/JLlIAWjvHxM?t=2067


Thats exactly the point of a regression test suite and shows that millions of tests work at giving confidence in a major SQL provider to release a fix to prod so quickly....


I feel frustration that the first response was to tell you it why it wasn't a bug and why it didn't need fixing.


Tests demonstrate the current behavior of the software. If you don't have tests then all you'd need to do is manually run through millions of test cases manually, hardly better. What you've done is be the first person to go through a test case. Once there's a test that behavior will be documented, which I would much prefer to having to wait yet again for someone to find this test case.


That reminds me of caddy templates. Caddy templates cannot have untested code. I think go text templates can. When in doubt, treat templates as code that has access to private stuff. With caddy that includes files and environment variables.

I didn't get bitten by this because I read the docs, but I noticed how easy it would be to misconfigure.


What really bugs me is the denialism as a recurring pattern. If you hadn't persisted, maintainer would just pretend there are no security issues.


If that's how it makes you feel then why not go on security bugs hunting spree. If you found one, surely you can find at least 10 more others.


Does that fall within the scope of the core library? It looks like a bug in the CLI program (shell.c) rather than the sqlite lib.


SQLite is probably very well tested, but this "millions of tests" argument is tiring. Most of these tests are algorithmically produced.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: