rc3.org

Strong opinions, weakly held

Tag: scalability

The argument for the Java ecosystem

There’s a lot to hate about Cade Metz’s article on server-side Java from last month’s Wired magazine. For one thing, Java has been more successful on the server than on the client for at least 15 years. The fact that you can use it to build large-scale Web applications is not news, nor is the fact that the JVM is a great host for languages other than Java.

That said, the tradeoffs that go into choosing a software development platform are interesting. The Wired article is mostly concerned with scaling at levels that are unrealistic for nearly anyone to plan for. And indeed, platform is only a small part of the scaling discussion. I’m sure that most of the software written for Healthcare.gov was written in Java, and it didn’t solve any of the problems that are endemic to that sort of project (about which more some other time).

In the early stages, any Web company should focus almost solely on developer productivity. The hard part is building software people want to use, and iterating rapidly on your ideas. (I think everybody already knows this.) You’ll probably have rewritten everything by the time you even start to approach scaling issues of the kind face by companies like Pinterest or Twitter, to say nothing of the really big companies like Facebook and Google. Just choose whatever helps you build software as efficiently as possible.

If it were up to me, though, I’d build software using a JVM-based language, for reasons that the Wired article doesn’t really get into. The fact that Java and the JVM are heavily used at big companies like Google and a lot of others that are significantly more boring means that the platform will continue to grow and evolve. In that sense, it’s an incredibly safe choice.

Java also has an incredibly robust open source ecosystem, and mature libraries for almost everything. Non-Java languages that run on the JVM can make use of these libraries as well. One of the first things Java developers notice when they move to non-Java platforms is that the key libraries that you take for granted in the Java ecosystem tend to be immature or completely missing.

Another key advantage is that the world of Big Data is built to a huge degree on Java. Hadoop is written in Java, as are most Hadoop jobs. Storm, Hadoop’s realtime processing cousin, is also written in Java. One key advantage here is that if you’re already writing your software in Java or some other JVM-based language, you can transfer those skills to start writing Hadoop jobs. More importantly, you can share library code between your standard software and your Big Data jobs. This can be a huge advantage. Again, in the early stages of a company, your “Big Data” can easily be processed using R or Python and you won’t need Hadoop at all, but it’s likely you will eventually.

Finally, all businesses eventually have to integrate with “enterprise” software to some degree. Maybe you’re experimenting with the Community Edition of Vertica for data warehousing, or your HR department has purchased some special software for managing benefits. The bottom line here is that every enterprise software package integrates Java (and Microsoft) first. In many cases, those are the only options for integration. If you’re not on one of those platforms, you’re stuck with awful options like the Unix ODBC implementations. This isn’t reason enough to choose Java on its own, but it’s something you get for free.

Every company of significant size is eventually going to find its way to supporting some JVM-based software. Maybe they’re running Hadoop, or they use Solr for search, or Cassandra as a data store. Why resist?

There are a lot of other advantages to choosing the JVM. There are lots of developers out there who know Java. The Java development tools are very mature. Write once/run anywhere was always overstated, but it’s still very easy to write your code on OS X or Windows, and then deploy and run it on Unix servers without any tweaks or even making a new build. Because Java is so heavily used by big companies, and API stability over time is valued, so you don’t often have to do much work to upgrade to newer versions of the JVM when they arrive. Indeed, I’ve worked on projects where my development JVM was different from the production JVM and never ran into problems.

There are great arguments to be made for all of the other popular platforms. I’ve built software using PHP, Ruby on Rails, Python and Django, and plenty of other languages and libraries, and they all have real strengths. However, I think that the advantages of betting on the JVM outweigh the alternatives in most cases.

How Twitter uses Hadoop and Pig

I confess that I have never fully immersed myself into the world of Hadoop and map/reduce, but I am very interested in them. Tony Bain does a good job of explaining how Twitter is using Hadoop (and Pig, a scripting language for Hadoop) for analytics. It’s a great explanation of a practical business application for map/reduce for those of us who are still trying to get it.

Links from March 30th

Links for April 9

Links for April 8

© 2024 rc3.org

Theme by Anders NorenUp ↑