Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Test Anything Protocol (testanything.org)
157 points by brianzelip on Oct 5, 2023 | hide | past | favorite | 35 comments


TAP uses YAML as a format for supplying metadata from tests, which ... seems like a weird choice. It's not nice to read for humans, a producer still needs to quote strings and what not. The rules are complex enough that it's not easy for producers to produce or consumers to consume, and it famously has interoperability challenges and some awkward security implications for consumers.

Honestly I would prefer if it used a custom line delimited format, why shouldn't the producer be able to produce 'message: assertion failure: foo > 0'? You could have some simple rules to escape new lines and you'd have a format which doesn't require producers to embed a complete YAML library to produce output, and it would be nicer for humans to read.

This is if you need structured metadata at all. It seems alright to me to just have free form metadata text associated with a test.

Otherwise, I kinda like the idea of a standard test output format... but in the end, 'exit code 0 == success, not 0 == failure' is enough for the vast majority of cases such as CI integration. The human can figure out the implications from the test failure based on mostly any form of textual test output format.


JSON seems more like a more appropriate format for this since it's an interchange format.


Except it's not simply an interchange format, TAP is human-readable enough that a lot of people are going to just look at the TAP output and not bother installing a TAP consumer.


JSON is also supposed to be human readable in a pinch.


Related:

Test Anything Protocol - https://news.ycombinator.com/item?id=23473370 - June 2020 (29 comments)

Test Anything Protocol specification (2006) - https://news.ycombinator.com/item?id=11915487 - June 2016 (10 comments)

Test Anything Protocol - https://news.ycombinator.com/item?id=10030889 - Aug 2015 (1 comment)

libtap - Testing library for C, implementing the Test Anything Protocol. - https://news.ycombinator.com/item?id=2936877 - Aug 2011 (6 comments)


TAP is tolerable for minimal testing, but fails to provide a lot of features you really want for anything nontrivial.

A while back I wrote out a list of what we really want out of a test framework (more than any existing framework supports, though many get much closer than TAP):

https://gist.github.com/o11c/ef8f0886d5967dfebc3d


All of those are supported by TAP, they're just not top-level keywords.

Just start your test's description with your "CUSTOM-KEYWORD", then aggregate on that, instead of "ok/not ok".

You can even easily plug into prove(1) to emit your own custom summary.

You can aggregate on anything, e.g. you could consider all tests taking longer than 500ms failures, regardless of top-level "ok" status.

Your use-case is rather niche, so it's not a failure of the protocol that it provides a boolean "ok/not ok" at the top-level, and not "failed known flaky" or whatever (although that's usually marked as "TODO" test).

It's intentionally minimal exactly to make it easy to support the sort of use cases you're describing here.


If all the standard-following tooling doesn't know about or support it automatically, it's not reasonable to claim it's "supported by TAP".

If the only controllable level of granularity is "run a whole executable file", you don't have control.

Actually running all tests is about the least interesting thing you can do when testing.


Tap can be emitted by anything, when you start modules etc. The reason you have 1...3 at the start is to be able to check when tests fail to run, you can have multiple of these.


Nice list, but I think it would be simpler to ask that frameworks support custom outcomes, with user-defined behavior. Most (all?) of your list of outcomes can be boiled down to a few behaviors:

1. Should this result be treated as a pass or a fail? 2. Should a warning of some sort be emitted? (Ex: for FAST and SLOW) 3. Should the result be counted as part of the main test corpus, or as part of some auxiliary grouping?


The reason enumerations are better than free text is to so tools can handle them in a uniform way. This is a major problem that is keeping TAP from being very useful.

For example, if you get MISSING outputs from tests under CI, you know something is wrong with your CI configuration. If you get either kind of WIP output, you shouldn't publish (FSVO publish - some projects might require this for every commit, some only for commits on the main branch, some only on tags).

It also makes it easier to diff test results meaningfully.


TAP is a protocol, though, not a framework?


Barely tolerable. At one point, the Linux selftests were contemplating using TAP. I read the main example:

    1..4
    ok 1 - Input file opened
    not ok 2 - First line of the input valid
    ok 3 - Read the rest of the file
    not ok 4 - Summarized correctly # TODO Not written yet
and I had two immediate objections.

a) Shouldn't tests have names?

b) (more serious) To generate valid TAP output, I need to know up-front how many tests there are. What if I can run the tests in sequence but I don't know how to count them? This can happen if there is a tree of tests, where each node knows what children it has but not what children they have. There's no credible way to generate TAP output streamily, even if there is no parallelization of tests.

No thanks.


> a) Shouldn't tests have names?

That's what the text following the test number is for.

> b) (more serious) To generate valid TAP output, I need to know up-front how many tests there are. What if I can run the tests in sequence but I don't know how to count them? This can happen if there is a tree of tests, where each node knows what children it has but not what children they have.

Typically you'd use subtests for this, e.g.

    1..1
    # Subtest: root
        1..2
        Subtest: branch 1
            1..2
            ok 1 - leaf 1
            ok 2 - leaf 2
        ok 1 - branch 1
        ok 2 - leaf 3
    ok 1 - root
But if you really didn't know the number of tests up front for some reason, the test plan is also allowed to appear at the end of the file.


I wrote a simple set of test-cases for my lisp implementation, and use the TAP format for the output. In my case I pipe to "tapview", by ESR, to visualize the results. The raw output is otherwise pretty basic:

      TAP version 14
      ok tst:zero:1
      ok tst:zero:2
      ..
      ok union:1
      ok union:2
      ok year:len
      1..215
Tapview is pretty portable, and works well as a simple output visualizer turning my view into:

      ./yal examples/lisp-tests.lisp | _misc/tapview
      ...........................................................
      215 tests, 0 failures.


a) They can have names (internally). At my previous job, our test suite required every test be a single file, and yet it still output TAP (you specified which tests you wanted via filenames).

b) No, you don't have to know prior how many tests will run. If you read the spec, you'll see there's an option to print the final count at the end.


I was recently trying out Node's newish test runner [1] but wanted better error reporting and was grateful that Node provided an API that let me use any TAP-compatible error reporter. I did find unfortunately that most error reporters I tried seemed to have problems, the most common one was not reporting whether or not a test was skipped, which was a deal-breaker. In the end I settled for tap-mocha-reporter [2] since it worked well enough. I did have to end up patching it to not print extraneous new lines, I suspect an artifact of it using YAML for its internal representation. I do wish there was an error reporter that mimicked exactly the format of Jest, as it's what I'm most used to.

[1] https://nodejs.org/api/test.html

[2] https://github.com/tapjs/tap-mocha-reporter


We actually ship a tap reporter in Node and it used to be the default until recently :)


Also not to be confused with Google's Test Automation Platform (continuous integration system) [1].

[1] https://static.googleusercontent.com/media/research.google.c...


I always liked the TAP protocol for test reporting. I think part of it is because the protocol is so succinct and makes no assumptions about how the testing framework works under the hood.


I dislike the format itself (such is the nature of protocols/standards, that's fine) but I'm glad people managed to standardize somewhat on a format. I guess the XML format from JUnit is also a de facto standard.

I wish we had a unified format for for reporting benchmark results too. Now I think about it, I guess it should be possible to extend TAP formst to add arbitrary extra measures alongside pass/fail.

I also wish testing tools would do more to force test authors to differentiate between different types of failures (the test thinks it detected a bug / the test setup failed / the test code itself divided by zero). Once your project has a corpus of tests that just "fail" it's quickly impossible to retroactively fix that problem.


> the test setup failed

Most testing frameworks allow you to skip with a message if setup failed, which can help

> the test code itself divided by zero

I think this is a bug you do want to know and should be rare once code is merged into master (there's a bug in your code, just so happens it is test code!). However if for some reason you rely on a non-deterministic value that can make your test code fail in some way I have used Python decorators in the past to mark a test as such and raise a specific message.



I used this over a decade ago when writing some tests involving postgres stored procedures, it's nice to be able to output a standardized format and have other tools be able to read it. Especially since at the time most of our company used perl.

junit's xml format is also similarly valuable as an interchange format and was how I set up unit test reporting on some of my first golang CI pipelines in jenkins/hudson.


Not to be confused with the Telocator Alphanumeric Protocol (TAP) like TUN/TAP virtual network interfaces!


Or https://tapster.io/ (which is actually in the testing space - delta-geometry robots for interacting with touchscreen apps on actual hardware :-)


I use Bats which is TAP-compliant (https://github.com/bats-core/bats-core) at work to test CIS Benchmark at servers, it's amazing.


I attempted to make a small script library to get tap functionality in Adventure Game Studio: https://github.com/ericoporto/tap


Looking at the examples-- we have Perl and Php, but not Python

Not to criticise those languages, but they are hardly used in test automation anymore (if they ever were). This looks like a really out of date hobby project


TAP is awesome when you've got big projects with tests in multiple languages. Got tests written in bash, perl and node? No problem, just combine the results with a single test runner.


Is there a good shell test runner? I often use perl Test::More to do shell expect testing.

I’ve been looking for something to run test assertions that’s available on base distro installs.

Test::More is always there.


Let's say I already have mature testing tools in my language of choice, like Pytest in Python. What does TAP offer? It seems mostly like a standardized reporting format.


Basically nothing, it’s a format for things like small shell scripts to emit for consumption by some other reporting tool.


I don't get this in Ruby's case? What's the benefit with this over RSpec / Minitest?


The subreddit linked on the page has 3 posts in the last year




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: