Posts Tagged “unit”
Feb 15, 2012
This is the second part of my unit testing advice. See the first part on this blog.
If you need any introduction you should really read the first part. I’ll just present the other three ideas I wanted to cover.
Focusing on common cases
This consists of testing only/mostly common cases. These tests rarely fail and give a false sense of security. Thus, tests are better when they also include less common cases, as they’re much more likely to break inadvertently. Common cases not only break far less often, but will probably be caught reasonably fast once someone tries to use the buggy code, so testing them has comparatively less value than testing less common cases.
The best example I found was in the
wrap_stringtests. The relevant example was adding the string “A test of string wrapping…”, which wraps not to two lines, but three (the wrapping is done only on spaces, so “wrapping…” is taken as a single unit; in this sense, my test case could have been clearer and use a very long word, instead of a word followed by ellipsis). Most of the cases we’ll deal with will simply wrap a given word in two lines, but wrapping in three must work, too, and it’s much more likely to break if we decide to refactor or rewrite the code in that function, with the intention to keep the functionality intact.
See other examples of this in aa20bce (no tests with more than one consecutive newline, no tests with lines of only non-printable characters), b248b3f (no tests with just dots, no valid cases with more than one consecutive slash, no invalid cases with content other than slashes), 5e771ab (no directories or hidden files), f8ecac5 (invalid hex characters don’t fail, but produce strange behaviour instead; this test actually discovered a bug), 7856643 (broken escaped content) and 87e9f89 (trailing garbage).
Not trying to make the tests fail
This is related to the previous one, but the emphasis is on trying to choose tests that we think will fail (either now or in the future). My impression is that people often fail to do this because they are trying to prove that the code works, which misses the point of testing. The point is trying to prove the code doesn’t work. And hope that you fail at it, if you will.
The only example I could find was in the
strcasecmpendtests. Note how there’s a test that checks that the last three characters of string “abcDEf” (ie. “DEf”) is less than “deg” when compared case-insensitively. That’s almost pointless, because if we made that same comparison case-sensitively (in other words, if the “case” part of the function breaks) the test still passes! Thus it’s much better to compare the strings ”abcdef” and “Deg”.
Addendum: trying to cover all cases in the tests
There’s another problem I wanted to mention. I have seen several times before, although not in the Tor tests. The problem is making complicated tests that try to cover many/all cases. This seems to stem from the idea that having more test cases is good by itself, when actually more tests are only useful when they increase the chances to catch bugs. For example, if you write tests for a “sum” function and you’re already testing
[5, 6, 3, 7], it’s probably pointless to add a test for
[1, 4, 6, 5]. A test that would increase the chances of catching bugs would probably look more like
[-4, 0, 4, 5.6]or
So what’s wrong with having more tests than necessary? The problem is they make the test suite slower, harder to understand at a glance and harder to review. If they don’t contribute anything to the chance of catching bugs anyway, why pay that price? But the biggest problem is when we try to cover so many test cases than the code produces the test data. In this cases, we have all the above problems, plus that the test suite becomes almost as complex as production code. Such tests become much easier to introduce bugs in, harder to follow the flow of, etc. The tests are our safety net, so we should be fairly sure that they work as expected.
And that’s the end of the tips. I hope they were useful :-)
Feb 14, 2012
When reviewing tests written by other people I see patterns in the improvements I would make. As I realise that these “mistakes” are also made by experienced hackers, I thought it would be useful to write about them. The extra push to write about this now was having concrete examples from my recent involvement in Tor, that will hopefully illustrate these ideas.
These ideas are presented in no particular order. Each of them has a brief explanation, a concrete example from the Tor tests, and, if applicable, pointers to other commits that illustrate the same idea. Before you read on, let me explicitly acknowledge that (1) I know that many people know these principles, but writing about them is a nice reminder; and (2) I’m fully aware that sometimes I need that reminder, too.
Edit: see the second part of this blog.
Tests as spec
Tests are more useful if they can show how the code is supposed to behave, including safeguarding against future misunderstandings. Thus, it doesn’t matter if you know the current implementation will pass those tests or that those test cases won’t add more or different “edge” cases. If those test cases show better how the code behaves (and/or could catch errors if you rewrite the code from scratch with a different design), they’re good to have around.
I think the clearest example were the tests for the
eat_whitespace*functions. Two of those functions end in
_no_nl, and they only eat initial whitespace (except newlines). The other two functions eat initial whitespace, including newlines… but also eat comments. The tests from line 2280 on are clearly targeted at the second group, as they don’t really represent an interesting use case for the first. However, without those tests, a future maintainer could have thought that the
_no_nlfunctions were supposed to eat whitespace too, and break the code. That produces confusing errors and bugs, which in turn make people fear touching the code.
See other examples in commits b7b3b99 (escaped ‘%’, negative numbers, %i format string), 618836b (should an empty string be found at the beginning, or not found at all? does “\n” count as beginning of a line? can “\n” be found by itself? what about a string that expands more than one line? what about a line including the “\n”, with and without the haystack having the “\n” at the end?), 63b018ee (how are errors handled? what happens when a %s gets part of a number?), 2210f18 (is a newline only \r\n or \n, or any combination or \r and \n?) and 46bbf6c (check that all non-printable characters are escaped in octal, even if they were originally in hex; check that characters in octal/hex, when they’re printable, appear directly and not in octal).
Boundaries of different kinds are a typical source of bugs, and thus are among the best points of testing we have. It’s also good to test both sides of the boundaries, both as an example and because bugs can appear on both sides (and not necessarily at once!).
The best example are the tor_strtok_r_impl tests (a function that is supposed to be compatible with
strtok_r, that is, it chops a given string into “tokens”, separated by one of the given separator characters). In fact, these extra tests discovered an actual bug in the implementation (ie. an incompatibility with
strtok_r). Those extra tests asked a couple of interesting questions, including “when a string ends in the token separator, is there an empty token in the end?” in the “howdy!” example. This test can also be considered valuable as in “tests as spec”, if you consider that the answer to be above question is not obvious and both answers could be considered correct.
See other examples in commits 5740e0f (checking if
tor_snprintfcorrectly counts the number of bytes, as opposed the characters, when calculating if something can fit in a string; also note my embarrassing mistake of testing
snprintf, and not
tor_snprintf, later in the same commit), 46bbf6c (check that character 21 doesn’t make a difference, but 20 does) and 725d6ef (testing 129 is very good, but even better with 128—or, in this case, 7 and 8).
Testing implementation details
Testing implementation details tends to be a bad idea. You can usually argue you’re testing implementation details if you’re not getting the test information from the APIs provided by whatever you’re testing. For example, if you test some API that inserts data in a database by checking the database directly, or if you test the result of a method call was correct by checking the object’s internals or calling protected/private methods. There are two reasons why this is a bad idea: first, the more implementation details you tests depend on, the less implementation details you can change without breaking your tests; second, your tests are typically less readable because they’re cluttered with details, instead of meaningful code.
The only example I encountered of this in Tor were the compression tests. In this case it wasn’t a big deal, really, but I have seen this before in much worse situations and I feel this illustrates the point well enough. The problem with that deleted line is that it’s not clear what’s it’s purpose (it needs a comment), plus it uses a magic number, meaning if someone ever changes that number by mistake, it’s not obvious if the problem is the code or the test. Besides, we are already checking that the magic number is correct, by calling the
detect_compression_method. Thus, the deleted
memcmpdoesn’t add any value, and makes our tests harder to read. Verdict: delete!
I hope you liked the examples so far. My next post will contain the second half of the tips.
Dec 6, 2009
The other day I was in a conversation with some developer that was complaining about some feature. He claimed that it was too complex and that it had led to tons of bugs. In the middle of the conversation, the developer said that the feature had been so buggy that he ended up writing a lot of unit tests for it. To be honest I don’t think there were a lot of bugs after those tests were written, so that made me think:
Maybe the testers in his team are doing too good of a job?
Because, you know, if testers are finding enough of “those bugs” (the ones that should be caught and controlled by unit tests and not by testers weeks after the code was originally written), maybe some developers are just not “feeling the pressure” and can’t really get that they should be writing tests for their code. If testers are very good, things just work out fine in the end… sort of. And of course, the problem here is the trailing “sort of”.
I know I’m biased, but in my view there is a ton of bugs that should never be seen by someone that is not the developer itself. Testers should deal with more complex, interesting, user-centred bugs. Non-trivial cases. Suboptimal UIs. Implementation disagreements between developers and stakeholders. That kind of thing. It’s simply a waste of time and resources that testers have to deal with silly, easy-to-avoid bugs on a daily basis. Not to mention that teams shouldn’t have to wait for days or weeks until they find basic bugs via exploratory testing. Or that a lot of those are much, much quicker to test with unit tests than having to create the whole fixture/environment for them to be found with exploratory testing.
My current conclusion is that pushing on the UI/usability side is not only good for the user, but it’s likely to produce better code as it will be, on average, more complex and will have to be better controlled by QA (code review, less ad-hoc design, …) and automated tests. Maybe developers will start hating me for that, but hopefully users will thank me.