Solr-Lucene Zone is brought to you in partnership with:

Michael loves building software; he's been building search engines for more than a decade, and has been working on Lucene as a committer, PMC member and Apache member, for the past few years. He's co-author of the recently published Lucene in Action, 2nd edition. In his spare time Michael enjoys building his own computers, writing software to control his house (mostly in Python), encoding videos and tinkering with all sorts of other things. Michael is a DZone MVB and is not an employee of DZone and has posted 48 posts at DZone. You can read more from them at their website. View Full User Profile

Your test cases should sometimes fail!

08.18.2011
| 9537 views |
  • submit to reddit
I'm an avid subscriber of the delightful weekly (sometimes) Python-URL! email, highlighting the past week's interesting discussions across the numerous Python lists. Each summary starts with the best quote from the week; here's last week's quote:

"So far as I know, that actually just means that the test suite is insufficient." - Peter Seebach, when an application passes all its tests.

I wholeheartedly agree: if your build always passes its tests, that means your tests are not tough enough! Ideally the tests should stay ahead of the software, constantly pulling you forwards to improve its quality. If the tests keep passing, write new ones that fail! Or make existing ones evil-er.

You'll be glad to know that Lucene/Solr's tests do sometimes fail, as you can see in the Hudson Jenkins automated trunk builds.


Randomized testing

Our test infrastructure has gotten much better, just over the past 6 months or so, through heavy use of randomization.

When a test needs a Directory instance, but doesn't care which, it uses the newDirectory method. This method picks one of Lucene's Directory implementations (RAMDirectory, NIOFSDirectory, MMapDirectory, etc.) and then wraps it with MockDirectoryWrapper, a nice little class that does all sorts of fun things like: occasionally calling Thread.yield; preventing still-open files from being overwritten or deleted (acts-like-Windows); refusing to write to the same file twice (verifying Lucene is in fact write-once); breaking up a single writeBytes into multiple calls; optionally throwing IOException on disk full, or simply throwing exceptions at random times; simulating an OS/hardware crash by randomly corrupting un-sync'd files in devilish ways; etc. We pick a timezone and locale.

To randomize indexing, we create a IndexWriterConfig, tweaking all sorts of settings, and use RandomIndexWriter (like IndexWriter, except it sometimes optimizes, commits, yields, etc.). The newField method enables or disables stored fields and term vectors. We create random codecs, per field, by combining a terms dictionary with a random terms index and postings implementations. MockAnalyzer injects payloads into its tokens.

Sometimes we use the PreFlex codec, to writes all indices in the 3.x format (so that we test index backwards compatibility), and sometimes the nifty SimpleText codec. We have exotic methods for creating random yet somewhat realistic full Unicode strings. When creating an IndexSearcher, we might use threads (pass an ExecutorService), or not. We catch tests that leave threads running, or that cause insanity in the FieldCache (for example by loading both parent and sub readers).


Reproducibility

To ensure a failure is reproducible, we save the random seeds and on a failure print out a nice line like this:

NOTE: reproduce with: ant test -Dtestcase=TestFieldCacheTermsFilter -Dtestmethod=testMissingTerms -Dtests.seed=-1046382732738729184:5855929314778232889

This fixes the seed so that the test runs deterministically. Sometimes, horribly, we have bugs in this seed logic, thus causing tests to not run deterministically and we scramble to fix those bugs first!

If you happen to hit a test failure, please send that precious line to the dev list! This is like the Search for Extraterrestrial Intelligence (SETI): there are some number of random seeds out there (hopefully, not too many!), that will lead to a failure, and if your computer is lucky enough to discover one of these golden seeds, please share the discovery!

The merging of Lucene and Solr's development was also a big step forward for test coverage, since every change in Lucene is now tested against all of Solr's test cases as well.

Tests accept a multiplier to crank things up, causing them to use more test documents or iterations, run for longer time, etc. We now have perpetual jobs on Jenkins, for both 3.x and trunk, launching every 15 minutes with multiplier 5. We know quickly when someone breaks the build!

This added test coverage has already caught a number of sneaky bugs (including a rare index corruption case on disk-full and a chunking bug in MMapDirectory) that we otherwise would not have discovered for some time.

The test infrastructure itself is so useful that it's now been factored out as a standalone JAR so apps using Lucene can tap into it to create their own fun randomized tests.
References
Published at DZone with permission of Michael Mccandless, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Comments

David Grant replied on Thu, 2011/08/18 - 7:17am

I'll admit that when I read the title of this post, I thought it was a daft proposition, but I'm glad I read further! Good post.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.