If you develop Web applications, you should definitely check out this presentation from Randy Shoup and Dan Pritchett on how eBay scales, not because you should do things the eBay way, but rather because it is a great illustration of how there’s an exception to every rule. When you deal with the traffic and transaction volume that eBay does, the rules that we all play by change radically.
For most applications, performance problems arise when your Web pages execute too many SQL statements, and the solution is to sufficiently index your database and use joins wherever possible rather than running additional queries. For example, one common misuse of object-relational mapping libraries results in the “n + 1 selects” problem. When many instances of an object are displayed on one page, the library runs separate queries to retrieve dependent objects, which leads to performance problems. Any decent ORM library has a way to include dependencies in queries for that reason. While most developers spend lots of times trying to retrieve more data using fewer queries, eBay eschews joins. I’d assume they compensate with a lot of caching, but I’d love to hear more of the details.
eBay runs almost everything with auto-commit turned on. Many Web developers may not find this surprising, but anybody who builds transaction processing systems will probably be thrown by that. Perhaps even more astounding is the fact that eBay doesn’t even compensate by rolling their own transactions in the application layer. They just work around it. I’d love to see a whole presentation on how they work that out as well.
One takeaway from the presentation is that object-relational mapping is becoming a requirement for any serious Web application. eBay’s ORM layer is what enables them to do many of the crazy database tricks they use to scale up, by isolating the SQL inside the ORM layer and allowing the rest of their application to just deal with objects. Once data has left the ORM layer, it doesn’t matter if it was fetched using joins or the object graph was built from data cached at the application layer.
Even if you don’t use ORM, looking at the Data Access Object pattern is a good idea. Keeping SQL in its own layer is a rule few people ought to be breaking these days.
There’s lots more provocative stuff in the presentation, check it out.