Hash-Bang URLs and overuse of AJAX

Scott Gilbertson looks into the use of JavaScript in Gawker’s redesign:

The problem with Gawker’s redesign is that it uses JavaScript to load everything. That means that, not only is there no chance for the site to degrade gracefully in browsers that don’t have JavaScript enabled, the smallest JavaScript typo can crash the entire website.

I really don’t get this new trend (as Gilbertson notes, Gawker is following Twitter’s lead) toward sites that require JavaScript to load all the content on the pages. If a developer suggested such a scheme to me, I would be more inclined to fire them than to take their suggestion.

People were, at one time, hesitant to use AJAX to load all the content on their pages, because it made it difficult for search engines to index the content on those pages. Google offered a solution to that by way of hash bang URLs. And now, because Google allows it, it seems like developers are rushing headlong to adopt what sure looks to me like an anti-pattern.

I love AJAX and I think there’s a place for content that is loaded through AJAX but that should still be indexed by search engines. But generally speaking, if content should be indexed, then it should live at a static URL and be loaded through normal HTML rather than being loaded onto the page via AJAX.

Gilbertson’s source for his post is this lengthy explanation of why things went wrong for Gawker by Mike Davies.

What I wasn’t able to find is an argument in favor of building Web sites in this fashion — that is to say, loading everything via AJAX. Anyone have a pointer or want to make the case?

12 thoughts on “Hash-Bang URLs and overuse of AJAX

  1. My only arguments for building sites by loading everything with AJAX revolve around content protection: It requires a sophisticated enough browsing environment that it’s no longer simple to spider the site, to use wget or “lynx –dump” or what-have-you to retrieve the data at that URL.

    I first ran into this when I had an RSS reader crash that blew away the OPML that described my subscriptions, but left me with subdirectory names of the feeds. My first reaction was to automate a script to find the first Google result for each of those names, but, of course, Google’s pages are now spider-resistant enough that I ended up having to find all those URLs again manually.

    However this also lets the server offer up a version of the page that’s different from what the user sees, which seems to me like an invitation to search engine gaming.

    So I guess I got nothin’ too.

    And, yeah, I hate the everything in AJAX bits, my browser’s never sure when the page has fully loaded, I’m constantly getting partial pages or just framed pages on Twitter and Facebook (especially on the iPad), it just seems completely broken.

  2. FWIW, I led the design and implementation of an “extreme Ajax” architecture and framework to support an IBM product family (Rational Jazz). In hindsight, I wish I had chosen more of a hybrid approach.

    Why did we go the Extreme Ajax route? First of all, we started in 2006 so didn’t have all of these blog entries to warn us. :-) But basically because we could, liked the user experience, and didn’t understand some of the consequences until much later. That being said, it’s still viewed as a successful architecture and there are some good things to say about it.

    I am planning to write a long blog entry on this at some point.

  3. I can see getting to a Javascript loading environment if you intend the site to be an AJAX-ey experience, then need to cut costs and sacrifice the HTML generating server side code. You’d generally have the client side code to expand your content into the DOM already written for the AJAX experience.

    Inject politics and there are more ways to get there, for instance if “server code” needs to go through hoops at the “department of indeterminate approval”. You’d want to just build a simple API and do the rest in the clients.

  4. There are two reasons I’ve seen:

    1. Extreme caching: if you have a complex page with a mix of staleness thresholds, AJAX-loading small fragments which are personalized or frequently updated can be a fair win. I’m not sure this makes that much sense any more unless you can completely ignore spiders and anything else which doesn’t support JavaScript since it’s not hard to get CDN capacity these days and something as simple as cache-level SSI (i.e. Varnish XSI) can avoid expensive backend hits while still giving each client custom HTML.

    2. Maintaining state in old browsers: I’d bet this is why Twitter did this: it allows non-WebKit browsers to use AJAX to load everything and update the URL so the browser’s location bar has something which can be bookmarked, shared, etc. Once HTML5 pushState becomes deployed (Firefox 4, maybe IE9 but there’s no sign of it yet) this will be completely gratuitous – I’d have preferred that they either simply fail down for older browsers or relied on in-page permalinks rather than breaking all of their old links.

  5. I believe that twitter did it to help avoid the whale of fail, by decoupling a page into parts that were then loaded via Ajax. Maybe they figured they needed to do this while they addressed the underlying architecual problems that caused so many whales in the first place.

  6. “I really don’t get this new trend (as Gilbertson notes, Gawker is following Twitter’s lead) toward sites that require JavaScript to load all the content on the pages. If a developer suggested such a scheme to me, I would be more inclined to fire them than to take their suggestion.”

    For a consumer facing content app I agree. However, we had an internal web app that was almost all Javascript and JSON. In fact I would call it more “an app that happened to be written in Javascript” rather than a “web app”. Almost no HTML was generated on the server side. Part of the reasoning behind this is both web app and multiple other automation processes (possibly in other departments) needed to interact with the data from the server. By only exposing it via JSON, we only had to provide one interface API on the server side and everyone could use it.

  7. I wrote something similar as well for the same reason not long ago. It was an internal app, and the HTML was the user interface for an application, not a medium for publishig content. I think that’s the big disconnect. If you’re talking about UI widgets, using AJAX is fine. If you’re talking about Web publishing, it’s probably really bad architecture.

  8. I’m guessing in this day and age it is easier to find people who can cobble together Javascript than work on backend technologies. So long PHP!

    Joking aside, using even the simplest piece of AJAX makes it exceptionally difficult to perform automated testing. As of yet I still haven’t found a suite I am happy with.

  9. For the same reason Gmail and Google Maps and Flickr did it in the first place: because it allows for a responsive user experience when only small parts of the UI need to change.

    I’m no fan of the big heavy all-AJAX sites that are actually slower to re-render than page reloads would be, and doing everything in AJAX has big problems with memory leaks and sloppy programming (kind of the way long-running GUI apps have more problems than command-line programs that run and exit). But there are certainly legitimate uses.

    I favor a hybrid approach, which is to say that you do page reloads on “change of major resource”, that is, when the most important thing on the page changes – the thing the URL really refers to – but not for peripheral stuff where you still want history. When you don’t want history for a client state change you don’t change the URL at all.

  10. Oh, the other thing you can do is make your links real links to real URLs – a href=”/foo” – but then put an event listener in that intercepts that call and replaces it with a state change to #!/foo and a client-side partial page update. But crawlers – and non-JS browsers – will find the static link in the href and use that. Doesn’t help when people cut and paste URLs after using client-side navigation though.

  11. I don’t really see the objection to ajax, json and making content “dynamic”. I remember (a long time back) people were sceptical about why you would need a database behind a site. People were sceptical about sharing their data to other websites, and then rss came along and changed that for many. The word “dynamic” eventually gained ground, people eventually started using credit cards, believed that despite some security exceptions its largely safe “enough” and we are using the web and its better for it.

    We come from an internet where an animated 48×48 GIFa was “cool” because it moved, to a place where twitter live real time data can be searched in real time – now thats cool.

    the term “dynamic” though has been misused. It should be database driven or generated content.

    dynamic means “reactive” and “progressive”, what is progressive about the server reading a db and spitting out some data?

    I think the web must become more dynamic, real-time, reactive and progressive. If a search engine cannot keep up, replace the search engine.

    How many “apps” on phones, browser plugins and data is there in the world that search engines cant read and do not have in their database? I would say more data is not in a search engine that is. (I cant prove that, but I think its a logical assumption)

    How much electricity and bandwidth of the worlds networks is being loaded by crawlers and activity. How many pings are there “Is your data different?”,”Is your data different?” no I will come back, and so on. The internet is inefficient in so many ways.

    What about repeated data – the header, logos, menus on a site repeated again and again (even if there is cache), repeated data, performance issues, bandwidth etc.

    The future is load some content when it changes, dont reload the header and footer if you dont need to.

    it makes sense, crawlers, browsers must keep up.

    Why save the search engine? I mean in the sense of, A search engine spits a search results page with the meta description or some content from the site. 3 lines of text, representing the entire page. some keywords and a few links in and out determine the life and times of a website? I would prefer a website without the hundreds of extra links to nothing – just for SEO, fat footers are really a pattern to duplicate and push more seo. sites are full of links and rubbish just to satisfy our need to be on page 1 rather than page 2 – I think that aspect is more ridiculous and has caused more harm to the web. (think spyware, clickbots, link farm sites, Ad sites, scrupulous backlinks, cloaking, irrelevant wording on websites written in senglish – (seo english))

    The hashbang needs to be an intermediate step to a real solution, that understands live real time data. let it happen.

  12. I agree with you here. When twitter changed over to Hash-Bang URL’s, I thought it made it look more arcane and unprofessional. There’s something clean and elegant about twitter.com/username, having it be twitter.com/#!/username just seems unnecessary and weird.

    I also do not like twitter’s behavior since switching to the more heavily AJAX-reliant setup. It seems to be just as buggy as before, I don’t notice any improvement in speed. And I find the unintuitive behavior of right-click and copy-and-pasting URL’s to be extremely inconvenient. I make heavy use of right-click and copying URL’s for links, and with AJAX, especially if the developers didn’t actively think of this sort of functionality, it often doesn’t work at all.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>