"So far as I know, that actually just means that the test suite is insufficient." - Peter Seebach, when an application passes all its tests.
I wholeheartedly agree: if your build always passes its tests, that means your tests are not tough enough! Ideally the tests should stay ahead of the software, constantly pulling you forwards to improve its quality. If the tests keep passing, write new ones that fail! Or make existing ones evil-er.
You'll be glad to know that Lucene/Solr's tests do sometimes fail, as you can see in the Hudson Jenkins automated trunk builds.
Our test infrastructure has gotten much better, just over the past 6 months or so, through heavy use of randomization.
When a test needs a Directory instance, but doesn't care which, it uses the newDirectory method. This method picks one of Lucene's Directory implementations (RAMDirectory, NIOFSDirectory, MMapDirectory, etc.) and then wraps it with MockDirectoryWrapper, a nice little class that does all sorts of fun things like: occasionally calling Thread.yield; preventing still-open files from being overwritten or deleted (acts-like-Windows); refusing to write to the same file twice (verifying Lucene is in fact write-once); breaking up a single writeBytes into multiple calls; optionally throwing IOException on disk full, or simply throwing exceptions at random times; simulating an OS/hardware crash by randomly corrupting un-sync'd files in devilish ways; etc. We pick a timezone and locale.
To randomize indexing, we create a IndexWriterConfig, tweaking all sorts of settings, and use RandomIndexWriter (like IndexWriter, except it sometimes optimizes, commits, yields, etc.). The newField method enables or disables stored fields and term vectors. We create random codecs, per field, by combining a terms dictionary with a random terms index and postings implementations. MockAnalyzer injects payloads into its tokens.
Sometimes we use the PreFlex codec, to writes all indices in the 3.x format (so that we test index backwards compatibility), and sometimes the nifty SimpleText codec. We have exotic methods for creating random yet somewhat realistic full Unicode strings. When creating an IndexSearcher, we might use threads (pass an ExecutorService), or not. We catch tests that leave threads running, or that cause insanity in the FieldCache (for example by loading both parent and sub readers).
To ensure a failure is reproducible, we save the random seeds and on a failure print out a nice line like this:
NOTE: reproduce with: ant test -Dtestcase=TestFieldCacheTermsFilter -Dtestmethod=testMissingTerms -Dtests.seed=-1046382732738729184:5855929314778232889
This fixes the seed so that the test runs deterministically. Sometimes, horribly, we have bugs in this seed logic, thus causing tests to not run deterministically and we scramble to fix those bugs first!
If you happen to hit a test failure, please send that precious line to the dev list! This is like the Search for Extraterrestrial Intelligence (SETI): there are some number of random seeds out there (hopefully, not too many!), that will lead to a failure, and if your computer is lucky enough to discover one of these golden seeds, please share the discovery!
The merging of Lucene and Solr's development was also a big step forward for test coverage, since every change in Lucene is now tested against all of Solr's test cases as well.
Tests accept a multiplier to crank things up, causing them to use more test documents or iterations, run for longer time, etc. We now have perpetual jobs on Jenkins, for both 3.x and trunk, launching every 15 minutes with multiplier 5. We know quickly when someone breaks the build!
This added test coverage has already caught a number of sneaky bugs (including a rare index corruption case on disk-full and a chunking bug in MMapDirectory) that we otherwise would not have discovered for some time.
The test infrastructure itself is so useful that it's now been factored out as a standalone JAR so apps using Lucene can tap into it to create their own fun randomized tests.