HCoder.org
Posts Tagged “automated tests”
-
Unit testing advice for seasoned hackers (2/2)
Feb 15, 2012 onThis is the second part of my unit testing advice. See the first part on this blog.
If you need any introduction you should really read the first part. I’ll just present the other three ideas I wanted to cover.
Focusing on common cases
This consists of testing only/mostly common cases. These tests rarely fail and give a false sense of security. Thus, tests are better when they also include less common cases, as they’re much more likely to break inadvertently. Common cases not only break far less often, but will probably be caught reasonably fast once someone tries to use the buggy code, so testing them has comparatively less value than testing less common cases.
The best example I found was in the
wrap_string
tests. The relevant example was adding the string “A test of string wrapping…”, which wraps not to two lines, but three (the wrapping is done only on spaces, so “wrapping…” is taken as a single unit; in this sense, my test case could have been clearer and use a very long word, instead of a word followed by ellipsis). Most of the cases we’ll deal with will simply wrap a given word in two lines, but wrapping in three must work, too, and it’s much more likely to break if we decide to refactor or rewrite the code in that function, with the intention to keep the functionality intact.See other examples of this in aa20bce (no tests with more than one consecutive newline, no tests with lines of only non-printable characters), b248b3f (no tests with just dots, no valid cases with more than one consecutive slash, no invalid cases with content other than slashes), 5e771ab (no directories or hidden files), f8ecac5 (invalid hex characters don’t fail, but produce strange behaviour instead; this test actually discovered a bug), 7856643 (broken escaped content) and 87e9f89 (trailing garbage).
Not trying to make the tests fail
This is related to the previous one, but the emphasis is on trying to choose tests that we think will fail (either now or in the future). My impression is that people often fail to do this because they are trying to prove that the code works, which misses the point of testing. The point is trying to prove the code doesn’t work. And hope that you fail at it, if you will.
The only example I could find was in the
strcasecmpend
tests. Note how there’s a test that checks that the last three characters of string “abcDEf” (ie. “DEf”) is less than “deg” when compared case-insensitively. That’s almost pointless, because if we made that same comparison case-sensitively (in other words, if the “case” part of the function breaks) the test still passes! Thus it’s much better to compare the strings ”abcdef” and “Deg”.Addendum: trying to cover all cases in the tests
There’s another problem I wanted to mention. I have seen several times before, although not in the Tor tests. The problem is making complicated tests that try to cover many/all cases. This seems to stem from the idea that having more test cases is good by itself, when actually more tests are only useful when they increase the chances to catch bugs. For example, if you write tests for a “sum” function and you’re already testing
[5, 6, 3, 7]
, it’s probably pointless to add a test for[1, 4, 6, 5]
. A test that would increase the chances of catching bugs would probably look more like[-4, 0, 4, 5.6]
or[]
.So what’s wrong with having more tests than necessary? The problem is they make the test suite slower, harder to understand at a glance and harder to review. If they don’t contribute anything to the chance of catching bugs anyway, why pay that price? But the biggest problem is when we try to cover so many test cases than the code produces the test data. In this cases, we have all the above problems, plus that the test suite becomes almost as complex as production code. Such tests become much easier to introduce bugs in, harder to follow the flow of, etc. The tests are our safety net, so we should be fairly sure that they work as expected.
And that’s the end of the tips. I hope they were useful :-)
-
Unit testing advice for seasoned hackers (1/2)
Feb 14, 2012 onWhen reviewing tests written by other people I see patterns in the improvements I would make. As I realise that these “mistakes” are also made by experienced hackers, I thought it would be useful to write about them. The extra push to write about this now was having concrete examples from my recent involvement in Tor, that will hopefully illustrate these ideas.
These ideas are presented in no particular order. Each of them has a brief explanation, a concrete example from the Tor tests, and, if applicable, pointers to other commits that illustrate the same idea. Before you read on, let me explicitly acknowledge that (1) I know that many people know these principles, but writing about them is a nice reminder; and (2) I’m fully aware that sometimes I need that reminder, too.
Edit: see the second part of this blog.
Tests as spec
Tests are more useful if they can show how the code is supposed to behave, including safeguarding against future misunderstandings. Thus, it doesn’t matter if you know the current implementation will pass those tests or that those test cases won’t add more or different “edge” cases. If those test cases show better how the code behaves (and/or could catch errors if you rewrite the code from scratch with a different design), they’re good to have around.
I think the clearest example were the tests for the
eat_whitespace*
functions. Two of those functions end in_no_nl
, and they only eat initial whitespace (except newlines). The other two functions eat initial whitespace, including newlines… but also eat comments. The tests from line 2280 on are clearly targeted at the second group, as they don’t really represent an interesting use case for the first. However, without those tests, a future maintainer could have thought that the_no_nl
functions were supposed to eat whitespace too, and break the code. That produces confusing errors and bugs, which in turn make people fear touching the code.See other examples in commits b7b3b99 (escaped ‘%’, negative numbers, %i format string), 618836b (should an empty string be found at the beginning, or not found at all? does “\n” count as beginning of a line? can “\n” be found by itself? what about a string that expands more than one line? what about a line including the “\n”, with and without the haystack having the “\n” at the end?), 63b018ee (how are errors handled? what happens when a %s gets part of a number?), 2210f18 (is a newline only \r\n or \n, or any combination or \r and \n?) and 46bbf6c (check that all non-printable characters are escaped in octal, even if they were originally in hex; check that characters in octal/hex, when they’re printable, appear directly and not in octal).
Testing boundaries
Boundaries of different kinds are a typical source of bugs, and thus are among the best points of testing we have. It’s also good to test both sides of the boundaries, both as an example and because bugs can appear on both sides (and not necessarily at once!).
The best example are the tor_strtok_r_impl tests (a function that is supposed to be compatible with
strtok_r
, that is, it chops a given string into “tokens”, separated by one of the given separator characters). In fact, these extra tests discovered an actual bug in the implementation (ie. an incompatibility withstrtok_r
). Those extra tests asked a couple of interesting questions, including “when a string ends in the token separator, is there an empty token in the end?” in the “howdy!” example. This test can also be considered valuable as in “tests as spec”, if you consider that the answer to be above question is not obvious and both answers could be considered correct.See other examples in commits 5740e0f (checking if
tor_snprintf
correctly counts the number of bytes, as opposed the characters, when calculating if something can fit in a string; also note my embarrassing mistake of testingsnprintf
, and nottor_snprintf
, later in the same commit), 46bbf6c (check that character 21 doesn’t make a difference, but 20 does) and 725d6ef (testing 129 is very good, but even better with 128—or, in this case, 7 and 8).Testing implementation details
Testing implementation details tends to be a bad idea. You can usually argue you’re testing implementation details if you’re not getting the test information from the APIs provided by whatever you’re testing. For example, if you test some API that inserts data in a database by checking the database directly, or if you test the result of a method call was correct by checking the object’s internals or calling protected/private methods. There are two reasons why this is a bad idea: first, the more implementation details you tests depend on, the less implementation details you can change without breaking your tests; second, your tests are typically less readable because they’re cluttered with details, instead of meaningful code.
The only example I encountered of this in Tor were the compression tests. In this case it wasn’t a big deal, really, but I have seen this before in much worse situations and I feel this illustrates the point well enough. The problem with that deleted line is that it’s not clear what’s it’s purpose (it needs a comment), plus it uses a magic number, meaning if someone ever changes that number by mistake, it’s not obvious if the problem is the code or the test. Besides, we are already checking that the magic number is correct, by calling the
detect_compression_method
. Thus, the deletedmemcmp
doesn’t add any value, and makes our tests harder to read. Verdict: delete!I hope you liked the examples so far. My next post will contain the second half of the tips.
-
LeakFeed and
Sep 15, 2011 onA couple of week ago I discovered LeakFeed, an API to fetch cables from Wikileaks. I immediately thought it would be cool to play a bit with it and create some kind of application. After a couple of failed ideas that didn’t really take off, I decided to exploit my current enthusiasm for Javascript and build something without a server. Other advantages were that I knew Angular, an HTML “power up” written in Javascript (what else?), which I knew it would ease the whole process a lot, and I even got the chance to learn how to use Twitter’s excellent Bootstrap HTML and CSS toolkit.
What I decided to build is a very simple interface to search for leaked cables. I called it LeakyLeaks (see the code on GitHub). Unfortunately the LeakFeed API is quite limited, so I had to limit my original idea. However, I think the result is kind of neat, especially considering the little effort. To build it, I started writing support classes and functions using Test-Driven Development with Jasmine. Once I had that basic functionality up and running I started building the interface with Bootstrap and, at that point, integrating the data from LeakFeed with Angular was so easy it’s almost ridiculous. And as LeakFeed can return the data using JSONP, I didn’t even need a server: all my application is simply a static HTML file with some Javascript code.
All this get-data-from-somewhere-and-display-it is astonishingly simple in Angular. There’s this functionality (“resources”) to declare sources of external data: you define the URLs and methods to get the data from those resources, and then simply call some methods to fetch the data. E.g.: you can get the list of all tags in LeakFeed from http://api.leakfeed.com/v1/cables/tags.json (adding a GET parameter
callback
with a function name if you want JSONP). Similarly, you can get the list of all offices in LeakFeed from http://api.leakfeed.com/v1/cables/offices.json. In Angular, you can declare a resource to get all this information like this:this.LeakFeedResourceProxy = $resource( 'http://api.leakfeed.com/v1/cables/:id.json', { callback: 'JSON_CALLBACK' }, { getTags: {method: 'JSON', isArray: true, params: {id: 'tags'}}, getOffices: {method: 'JSON', isArray: true, params: {id: 'offices'}}} );
Once you have declared it, using it is as simple as calling the appropriate method on the object. That is, you can get the tags by calling
this.LeakFeedResourceProxy.getTags()
, and the offices by callingthis.LeakFeedResourceProxy.getOffices()
. And when I say “get the tags”, I mean get a list of Javascript objects: no JSON, text or processing involved. If you assign the result of those functions to any property (say,this.availableOffices
), you’ll be able to show that information like so (the|officename
is a custom filter to show the office names with a special format):<select id="office" name="office"> \ <option value=""> ()</option> \ </select>
The cool thing is, thanks to Angular’s data binding, anytime the value of that variable changes, the representation in the HTML changes, too. That is, if you assign another value to
this.availableOffices
the select box will be automatically updated to have the new set of options! But the data binding is two-way, so any changes you make in the UI also get reflected back in the Javascript variables. This further simplifies many tasks and makes programming with Angular a breeze. There are other nice things about Angular (and many nice things about Bootstrap and Jasmine of course!), but I think that’s enough for today :-) -
From pseudo-code to code
Aug 10, 2010 onThis post is probably not about what you’re thinking. It’s actually about automated testing.
Different stuff I’ve been reading or otherwise been exposed to in the last weeks has made me reach a sort of funny comparison: code is (or can be) like science. You come up with some “theory” (your code) that explains something (solves a problem)… and you make sure you can measure it and test it for people to believe your theory and build on top of it.
I mean, something claiming to be science that can’t be easily measured, compared or peer-reviewed would be ridiculous. Scientists wouldn’t believe in it and would certainly not build anything on top of it because the foundation is not reliable.
I claim that software should be the same way, and thus it’s ridiculous to trust software that doesn’t have a good test suite, or even worse, that may not even be particularly testable. Trusting software without a test suite is not that different from taking the word of the developer that it “works on my machine”. Scientists would call untested science pseudo-science, so I am tempted to call code without tests pseudo-code.
Don’t get me wrong: sure you can test by hand, and hand-made tests are useful and necessary, but that only proves that the exact code you tested, without any changes, works as expected. But you know what? Software changes all the time, so that’s not a great help. If you don’t have a way to quickly and reliably measure how your code behaves, every time you make a change you are taking a leap of faith. And the more leaps of faith you take, the less credible your code is.