Of communities, companies, and bugs (Or, “Dr Dobbs Journal is a slut!”)
Andrew Binstock (Editor-in-Chief at DDJ) has taken a shot at Oracle’s Java7 release, and I found myself feeling a need to respond.
In his article, Andrew notes that
… what really turned up the heat was Oracle's decision to ship the compiler aware that the known defects would cause one of two types of errors: hang the program or silently generate incorrect results. Given that Java 7 took five years to see light, it seems to me and many others that Oracle could have waited a bit longer to fix the bug before releasing the software. To a large extent, there is a feeling in the Java community that Oracle does not understand Java (despite the company's earlier acquisition of BEA). That may or may not be, but I would have expected it to understand enterprise software enough not to ship a compiler with defects that hang a valid program.
There’s
so many things in this paragraph alone I want to respond to, I feel it
necessary to deconstruct it and respond individually:
- “Oracle’s decision to ship the compiler aware that the known defects…” According to the post that went out to the Apache Solr mailing list
(seen quoted in a blog post), “These problems were detected only 5 days
before the official Java 7 release, so Oracle had no time to fix those
bugs… .” I’m sorry, folks, but five days before the release is not a
“known defect”. It’s a late-breaking bug. This is yellow journalism, if
you ask me.
- “Given that Java 7 took five years to see
light…” Much of that time being the open-sourcing of the JDK itself (1.5
years) and the Oracle acquisition (1.5 years), plus the community’s
wrangling over closures that Sun couldn’t find a way to bring consensus
around. Remember when they stood on the stage at Devoxx one year and
promised “no closures” only to turn around the year following at the
same conference and said, “Yes closures”? Sun' had a history of
flip-flopping on commitments worse than a room full of politicians.
Slapping Oracle with the implicit “you had all this time and you wasted
it” argument is just unfair.
- “… it seems to me and many
others that Oracle could have waited a bit longer to fix the bug before
releasing the software.” First of all, what “many others”? Remember when
Sun proposed the “Java7 now with less features vs Java7 later with more
features” question? Overwhelmingly, everybody voted for now, citing
“It’s been so long already, just ship *something*” as a reason. If
Oracle slipped the date, the howls would still be echoing across the
hills and valleys, and Andrew would be writing, “If Oracle commits to a
date, they really should stick with this date…” But secondly, remember,
the bug was noticed five days before the release. Those of you who’ve
never seen a bug show up during a production deployment roll out, please
cover your eyes. The rest of you know good and well that sometimes
trying to abort a rollout like that mid-stream causes far more damage
than just leaving the bug in place. Particularly if there’s a
workaround. (Which there is, by the way.)
- “To a large
extent, there is a feeling in the Java community that Oracle does not
understand Java.” Hmm. Not surprising, really, when pundits continually
hammer away how Oracle doesn’t get Java and doesn’t understand that
everything should be given away for free and when people bitch and
complain you should immediately buy them all ponies and promise that
they’ll never do anything wrong again…. Seriously? Oracle doesn’t
understand Java? Or is it that Oracle refuses to play the same bullshit
game that Sun played? Let’s see, what is Sun’s stock price these days?
Oh, right.
- “I would have expected it to understand
enterprise software enough…” And frankly, I would have expected an
editor to understand journalism enough to at least attempt a fair and
unbiased story. It’s disappointing, really. Andrew has struck me as a
pretty nice and intelligent guy (we’ve chatted over email), but this
piece clearly falls way short on a number of levels.
- “… not to ship a compiler with defects that hang a valid program.” Let’s get to the next paragraph to get into this one.
Andrew’s next paragraph reveals some disturbing analysis:
The problem, from what is known so far, derives from a command-line optimization switch on the Java compiler. This switch incorrectly optimized loops, resulting in the various reported errors. In Java 7, this switch is on by default, while it was off by default in previous releases. Regardless of the state of the switch, the resulting optimizations were not tested sufficiently.
This is a curious problem, because compilers are one of the most demonstrably easy products to test. Text file, easily parsed binary file out. Or earlier in the compilation process: text file in, AST out. The easy generation of input and the simple validation of output make it possible to create literally tens of thousands of regression tests that can explore every detail of the generated code in an automated fashion. These tests are known to be especially important in the case of optimizations because defects in optimized code are far more difficult for developers to locate and identify. The implicit contract by the compiler is that going from debug code during development to optimized code for release does not change functionality. Consequently, optimizations must be tested extra carefully.
Actually, no, the problem, according once again to the Solr mailing list entry,
is with the hotspot compiler, not with the compiler itself. Andrew
demonstrates a shocking lack of comprehension with this explanation: JIT
compilation is nothing like traditional compilation (unless you
hyperfocus on the optimization phases of the traditional compiler
toolchain), and often has nothing to do with ASTs and so forth. In
short, Andrew saw “compiler” and basically leapt to conclusions. It’s a
sin of which I’m guilty of as well, but damn, somebody should have
caught this somewhere along the way, including Andrew himself—like maybe
contacting Oracle and asking them to explain the problem and offer an
explanation?
Nah, it’s much better (and gets DDJ a lot more hits) if we leave it the way it’s written. Sensationalism sells. Hence my title.
And, it turns out, if they’re optimizations in the JITter, they can be disabled:
At least disable loop optimizations using the -XX:-UseLoopPredicate JVM option to not risk index corruptions.
Please note: Also Java 6 users are affected, if they use one of those JVM options, which are not enabled by default: -XX:+OptimizeStringConcat or -XX:+AggressiveOpts
Oh, did we mention? It turns
out these optimizations have been there in Java 6 as well, so apparently
not only is Oracle an idiot for not finding these bugs before now, but
so is the entire Java ecosystem. (It seems these bugs only appear now
because the optimizations are turned on by default now, instead of
turned off.)
Andrew continues:
But even if Oracle's in-house testing was not complete, I have to wonder why they were not testing the code on some of the large open-source codebases currently available. One program that reported the fatal bug was Apache Solr, which most developers would agree is a high profile, open source project. Projects such as Solr provide almost ideal test beds: a large code base that is widely used. Certainly, Oracle might not cotton to writing UATs and other tests to validate what the compiler did with the Solr code. But, in fact, it didn’t have to write a test at all. It simply needed to run the package and the SIGSEGV segmentation fault would occur.
Oh, right. With the acquisition of
Sun, Oracle also inherited a responsibility to test their software
against every open-source software package known to man. Those people
working on those projects have no responsibility to test it themselves,
it’s all Oracle’s fault if it all doesn’t work right out of the box.
Particularly with fast-moving source bases like those seen in
open-source projects. Hmm.
I have to hope that this event will be a sharp lesson to Oracle to begin using the large codebases at its disposal as a fruitful proving ground for its tools. While the sloppiness I've discussed is disturbing, it's made worse by the fact that the same defects can be found in Java 6. The reason they suddenly show up now is that the optimization switch is off by default on Java 6, while on in Java 7. This suggests that Sun's testing was no better than Oracle's. (And given that much of the JDK team at Oracle is the same team that was at Sun, this is no surprise.) The crucial difference is that Oracle knew about the bugs prior to release and went ahead with the release anyway, while there is no evidence Sun was aware of the problems.
I have to hope that this even
won’t be a sharp lesson to Oracle that the community is basically made
up of a bunch of whiny bitches who complain when a workaroundable bug
shows up in their products. Frankly, I would.
Did we mention that all of this was done on an open-source project? At any point anyone can grab the source, build it, and test it for themselves. So, Andrew, are you volunteering to run every build against every open-source project out there? After all, if this is a “community”, then you should be willing to donate all of your time for the community’s benefit, right? Where are the hordes of developers willing to volunteer and donate their time to working on the JDK itself? You’re all quite ready to throw rocks at Oracle (and before that, Sun), but how many of you are willing to put down the rock, pick up a hammer, and start working to build it better?
Yeah, I kind of thought so.
Oracle's decision was political, not technical. And here Oracle needs to really reassess its commitment to its users. Is Java a sufficiently important enterprise technology that shipping showstopper bugs will no longer be permitted? The long-term future of Java, the language, hangs in the balance.
Unless you were in the room when they
made the decision, Andrew, you’re basically blowing hot air out your
ass, and it smells about as good as when anyone else does. This is a
blatantly stupid thing to say, and quite frankly, if Oracle refuses to
talk to you ever again, I‘d say they were back to making good decisions.
You can’t responsibly declare what the rationale for a decision was
unless you were in the room when it was made, and sometimes not even
then.
Worse than that, the Solr mailing list entry even points out that Oracle acknowledged the fix, and discussed with the community (the Solr maintainers, in this case, it seems) when and how the fix could come out:
In response to our questions, they proposed to include the fixes into service release u2 (eventually into service release u1, see [6]).
Wow. Oracle
actually responded to the bug and discussed when the fix would come out.
Clearly they are unengaged with the community and don’t “get” Java.
Maybe I should rename this blog’s title to “Sloppy Work at Dr Dobb’s Journal”.
Nah. Sensationalism sells better. Even when it turns out to be completely unfounded.
(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)





Comments
Guido Amabili replied on Sat, 2011/08/06 - 12:52pm
I didn't read fully the article but aren't five days enough to take the decision, and stand to it, to delay or not a software delivery ?
I am rather tired of Oracle cause we have found a bug in the system we are working due to a bug in their stored procedure parser and compiler(Oracle 11g) which is not able to compile and execute properly nested loops......
Mike James replied on Sat, 2011/08/06 - 1:19pm
in response to:
Guido Amabili
I agree and would add that the entire article is bad tempered and nit picking.
It borders on libel.
David Whatever replied on Sat, 2011/08/06 - 2:49pm
Five days is not enough time to fix the issue. It is more than enough time to make a decision about whether a release candidate should become final. At very least, it is time to document as known issues with workarounds in the release notes.
Oracle has also given lower priority to fixing the issues from what I understand; they may not be fixed in the first update either.
These valid concerns about code breaking under default optimizations, as well as the lack of any ETA, make this an extremely anticlimactic and lackluster release. I expect this to lead people away from evaluating Java 7 support in their products, as now people will not be sure when the "released" version of Java will be truly ready.
And stop giving Oracle a pass because developers weren't testing apps with internal (and marked as unsupported) optimization flags or pre-release-candidate versions of OpenJDK. There was no expectation established that was needed of developers. The projects which reported this issue were testing once the JRE was in release-candidate status, which is the intended purpose of release candidates.
Reza Rahman replied on Sat, 2011/08/06 - 9:31pm
Ted,
Thanks so much for having the courage to inject some sanity into the needless histrionics! I agree with you a 100%...
Cheers,
Reza
Fab Mars replied on Sun, 2011/08/07 - 2:35am
Totally my point.
All of this is blahblahblahblah.
Igor Laera replied on Sun, 2011/08/07 - 9:05am
But it isn't. Oracle owns it, Oracle controls it, Oracle sues people with it. Then it's their job. Microsoft does that. They have whole datacenters filled with software to test before they release a larger Service Pack. That's the reason that company makes so much dough since decades: they see it as their part of the JOB.
If Oracle thinks its too much pain to do that in front, they should offer the servers to the teams to setup own testing environments. They have something like 100 datacenters around the world. The have an 'unbreakable' Ripoff Linux distribution. Its not like they don't know how. They simply don't want to.
Its funny that they are people who think that Oracle can have all the good stuff around Java - and simply relay/outsource/drop the "challenging" stuff on the people who use it.
Whats next? Asking the community to write the API docs, because its too much work?
Reza Rahman replied on Sun, 2011/08/07 - 10:30am
in response to:
Igor Laera
Anthony Ve replied on Sun, 2011/08/07 - 12:10pm
in response to:
Reza Rahman
I admire the way you keep trying to talk some sense into the followers of the "Oracle is evil" cult :-)
Igor Laera replied on Sun, 2011/08/07 - 2:59pm
in response to:
Anthony Ve
Reza Rahman replied on Sun, 2011/08/07 - 10:08pm
in response to:
Anthony Ve
Thanks for the very kind words.
It's nothing specific to Oracle (I probably know better than many others the areas that Oracle really does need to do better).
I try to speak my mind openly on what I think is right (provided I know enough about the topic at hand).
I believe a lot of the anti-Oracle sentiment is rooted in unfamiliarity, ignorance, over-generalized stereotyping and an unfounded distrust/fear (perhaps added with some agenda pushing and good old fashioned jealousy). The difference I guess is that I do regularly interact with good people working inside Oracle that do genuinely mean well.
In case of LibreOffice/OpenOffice and Hudson/Jenkins - my take on it has been that both of those situations are very murky where separating the "good guys" from the "bad guys" isn't really that cut and dry -- as is the Java/Android situation...
Cay Horstmann replied on Sun, 2011/08/07 - 11:49pm
I am not sure that the "everyone is beating on poor Oracle" meme captures the nuances of this issue.
When you look at the release notes at http://www.oracle.com/technetwork/java/javase/jdk7-relnotes-418459.html#knownissues, one issue does stand out over the usual "grief with Java plugin" or "grief with CJK input" or "grief with weird X11 issues", namely that an optimization bug can sometimes silently deliver the wrong result.
People don't like to get the wrong result, even if the probability is less than them getting struck by lightning. We all know that because it's happened before--remember the Pentium bug?
So, if Oracle had said "Whoa, we just found out this issue, and we'll fix it immediately, but it'll take ten days to re-run all the acceptance tests", nobody would have batted an eye. Or if they had said "Look, we've got this problem, but you've got to ship sometime, and here is how you work around this vexing issue", people would have been ok. Their problem was to say nothing at all. If you say nothing at all, you open the floodgates to "Don't use Java 7 if you use loops", "Java 7 unsafe at any speed", "Sloppy work a Oracle", and all the other hyperbole. That's a lesson well learned for anyone who needs to make a similar decision.
Andrew McVeigh replied on Mon, 2011/08/08 - 11:36am
in response to:
Cay Horstmann
I think you've summed it up very neatly and concisely. Putting the name-calling aside, getting the wrong result silently needs to be handled differently from other bugs which can be clearly/easily detected. I felt the same way about getting my Pentium replaced at the time, regardless of the probability of occurrence (I was doing numerical work at the time).
I also think that it is sad that the tone of the wider debate seems to have degenerated into 2 camps: (a) oracle is bad or (b) oracle is good. Surely this is just a process improvement that needs to happen.
Reza Rahman replied on Mon, 2011/08/08 - 12:58pm
in response to:
Andrew McVeigh
Andrew Binstock replied on Mon, 2011/08/08 - 7:19pm
Andrew McVeigh replied on Tue, 2011/08/09 - 4:38am
in response to:
Andrew Binstock
Agree that usage of the word slut is completely ridiculous (and offensive) in this context.
Steve Mcjones replied on Tue, 2011/08/09 - 10:32am
What a douche. The problem is not that there are bugs, the problem is that Oracle knew about them and decide to ignore them.
The could have:
They have done nothing, because marketing and politics was just more important.
And now they get the deserved beating. Nothing wrong with it!
Reza Rahman replied on Tue, 2011/08/09 - 12:36pm
in response to:
Steve Mcjones
Take a look here: http://weblogs.java.net/blog/fabriziogiudici/archive/2011/08/02/worried-about-java-7-go-hudson-or-jeskins. One of the bugs was filed as low-priority and hence presumably worth the risk since it happens very infrequently. The other two were discovered just a few days before the release. All are assigned high priority now and are getting fixed. The only real question here is whether the most sound judgement call was made under the circumstances.
I do hope the OpenJDK team realizes things have gone too far for them not to present their side of the story themselves soon...