rc3.org

Strong opinions, weakly held

Category: Commentary (page 19 of 982)

The design of the index for Facebook’s Graph Search

Under the Hood: Building out the infrastructure for Graph Search

Really interesting post on the design of the search index used by Facebook’s Graph Search feature. How challenging was it to build? Facebook started the project in 2009 and migrated all their other search systems to it before building Graph Search. All three of the engineers listed as working on the project were at Google prior to working at Facebook.

How hard is it to build a lyrics site?

Song lyrics sites are universally terrible. They are a usability nightmare, with bad markup and tons of ads. Here are some examples:

The markup is bad, there are ads everywhere, and the usability generally sucks. Why? I understand that running such a site exposes you to legal risk, and I’ve always assumed that the outlaw foundation of such sites explains their awfulness. Even so, I’m wondering how difficult it would be to make a better attempt.

Any high quality site in this vein has to have some form of revenue, because it will attract a lot of traffic. The secret is to spend as little as possible on infrastructure. The other day, the NPR News Apps Blog had a post about building a high capacity, low cost site. That seems like a good starting point.

The other requirement is a big catalog of songs, organized by artist and album, and then the lyrics for all of them. I think building such a database with absolutely minimal human intervention would be fun.

Right now I’m just kicking this idea around. If I start working on it, I’ll post about my progress.

How developers use API documentation

Chris Parnin writes about problems with API documentation, as evidenced by developer migration to Stack Overflow. The whole thing is incredibly interesting, and points to a need for a major reconsideration of what makes for good documentation.

Why I don’t talk about learning curves

I don’t want to pick on this person, so I won’t use their name, but I saw this in a blog post today:

The most common complaint people have when learning Haskell is the steep learning curve.

It’s a very typical example of a mistake I see all the time, which is that when people say something has a steep learning curve, they mean that it’s difficult to learn. It’s understandable why people would think that way — steep things are difficult to climb.

However, the X axis on the plot of a learning curve is the resources invested, and the Y axis represents the level of mastery attained. You can look it up. So a steep curve means that initial progress in learning is very rapid. The fuller definition of a steep learning curve is that initial progress is rapid but that the curve plateaus and progress becomes difficult.

Unfortunately, the rampant misuse of “steep learning curve” means that if I use it correctly, nobody will actually get what I’m talking about. If I use it incorrectly, then I’m part of the problem. The end result has been to discourage discussion of learning curves using that terminology at all. Nobody seems to mind.

Don’t get stuck

At Etsy, our engineering is well known for practicing continuous deployment. For all of the talk in the industry about continuous deployment, I don’t think that its impact on personal productivity is fully understood.

If you don’t work in a shop that does continuous deployment, you may assume that the core of it is that releases are not really planned. Code is pushed as soon as it’s finished, not according to some schedule, and that’s true, but there’s a much deeper truth at work. The real secret to the success of continuous deployment is that code is pushed before it’s done.

When people are practicing continuous deployment the Etsy way, they start out by adding a config flag to the code base behind which they can hide all of the code for their feature. As soon as the flag has been added, they add some conditional code to the application that creates a space that they can fill with the code for their new feature. At that point, they should be pushing code as frequently as is practical.

This is the core principle of continuous deployment. It doesn’t matter if the feature doesn’t work at all or there’s nothing really to show, you should be pushing code in small, digestible chunks. Ideally, you’ve written tests that are then part of the continuous integration suite, and you’re having people review that code before it goes out. So even though you don’t have a working feature, you’re confident the code you’re producing is robust because other people have looked at it, and it’s being tested every time anyone deploys or runs the suite of automated tests. You’re also reducing the chances of having to spend hours working through a painful merge scenario.

Many engineers are not prepared to work this way. There’s a strong urge to hold onto your code until you’ve made significant progress. On many teams, working on a feature for a week or two to build something real before you push it is completely normal. At Etsy, we see that as a risky thing to do. At the end of those two weeks you’re pushing a sizable chunk of code that has never been tested and has never run on the production servers out into the world. That chunk of code very well may be too big for another engineer to review carefully in a reasonable amount of time. It should have been broken up.

Pushing code frequently is the main factor that mitigates the risk of abandoning the traditional software release cycle. If you deploy continuously but the developers all treat the project like they’re developing in a more traditional fashion, it won’t work.

That’s the systems-based argument for pushing code at a rate that tends to make people uncomfortable, but what I want to talk about is how taking this approach improves personal productivity. I’m convinced that one thing that separates great developers from good developers is that great developers don’t allow themselves to get stuck. And if they do get stuck, they get stuck on design issues, and not on problem solving or debugging.

Thinking in terms of what code you can write that you can push immediately is one way to help keep from getting stuck. In fact, a mental exercise I use frequently when I’m blocked on solving a problem is to try to come up with the smallest thing I can do that represents progress. Focusing on deploying code frequently helps me stay in that mindset.

Banks really are evil

Here’s a selection of stories involving banks that appeared just this weekend in the New York Times:

Major Banks Aid in Payday Loans Banned by States

For the banks, it can be a lucrative partnership. At first blush, processing automatic withdrawals hardly seems like a source of profit. But many customers are already on shaky financial footing. The withdrawals often set off a cascade of fees from problems like overdrafts. Roughly 27 percent of payday loan borrowers say that the loans caused them to overdraw their accounts, according to a report released this month by the Pew Charitable Trusts. That fee income is coveted, given that financial regulations limiting fees on debit and credit cards have cost banks billions of dollars.

Ahead of Election in Cyprus, Gloom and Voter Apathy Tied to Financial Woes

What many Cypriots find most frustrating is that their crisis, like those in Ireland and Iceland before them, was concentrated in the banks. There is no sovereign debt crisis and, before the banking collapse, their economy was relatively healthy. Why, they wonder, should they suffer for the misdeeds of a few bankers? Why cover losses that should be borne, at least in part, by private investors?

Patron of Siena Stumbles

There is little question, though, that JPMorgan helped enable an acquisition that was regarded as foolish by many people and that severely weakened Monte dei Paschi. Later, Deutsche Bank of Germany and Nomura of Japan undertook transactions with previous management at Monte dei Paschi that helped it conceal losses of 730 million euros, raising further questions about the conduct of investment banks.

It’s like this pretty much every week.

What to do with data scientists

I’ve been thinking a lot lately about where data scientists should reside on an engineering team. You can often find them on the analytics team or on a dedicated data science team, but I think that the best place for them to be is working as closely with product teams as possible, especially if they have software engineering skills.

Data scientists are, to me, essentially engineers with additional tools in the toolbox. When engineers work on a problem, they come up with engineering solutions that non-engineers may not see. They think of ways to make work more efficient with software. Data scientists do the same thing as engineers, but with data and mathematics. For example, they may see an opportunity to use a classifier where a regular software engineer may not. Or they may see a way to apply graph theory to efficiently solve a problem.

This is what the Javier Tordable presentation on Mathematics at Google that I’ve linked to before is about. The problem with having a data science team is lack of exposure to inspiring problems. The best way to enable people to use their specialized skills to solve problems is to allow them to suffer the pain of solving the problem. As they say, necessity is the mother of invention.

The risk, of course, is that if a data scientist is on one team, they may not have any exposure at all to problems that they could solve that are faced by other teams. In theory, putting data scientists on their own team and enabling them to consult where they’re most needed enables them to engage with problems where they are most needed, but in practice I think it often keeps them too far from the front lines to be maximally useful.

It makes sense to have data scientists meet up regularly so that they can talk about what they’re doing and share ideas, but I think that most of the time, they’re better off collaborating with members of a product team.

Interviewing is just a model of employment

Etsy has gotten a lot of attention for its efforts to increase the gender diversity of its engineering team in the past year. The short version of the story is, Etsy made a concerted effort to recruit more female engineers, and made some changes to its hiring model that led to positive changes there. For more details, see First Round Capital’s coverage of a talk on how we did it by Kellan Elliot-McCrea, our CTO.

Unsurprisingly, this effort has drawn criticism. The most visible example is a blog post from Meghan Casserly from Forbes, which accuses Etsy of instituting a double standard, undermining female engineers even as it attempts to add more to its staff. Here’s the crux of her post:

In other words, hiring women engineers is hard. Especially if you hire them like men. “Don’t lower standards,” Elliott-McCrea says, but isn’t exempting women from the same brutal challenge-based interviews their male colleagues undergo doing just that? While I applaud Etsy for its single-minded dedication to increasing gender diversity in its ranks, instead of feeling uplifted by Elliott-McCrea’s presentation I find myself stuck on the question: Is hiring women as women just PC pandering?

There’s a lot that’s wrong with this blog post, starting with the assertion that Etsy has exempted women from anything. For example, “brutal challenge-based interviews” were never a standard part of the interviewing process at Etsy.

The post serves to illustrate a larger point that I want to make. The reason companies interview engineers at all is that they need to assess what kind of team member they’ll be. Can they write code? Can they deliver results in a timely fashion? Will they drive everyone nuts? Are they capable of learning? Interviewing is one way to get the answers to those kinds of questions.

The hiring process is about creating a model of a potential employee that the employer hopes accurately represents what kind of employee they will be. These models are not terribly accurate, and there is a huge amount of space within which a company can experiment in order to refine that model.

Nobody has convinced me that stressful “challenge” style interviews accurately model the work of software developers. People who do well at them are not necessarily more qualified than people who do poorly at them. Answering interview questions is itself a skill, and being good at it doesn’t mean you’ll necessarily be good at the job. Etsy is iterating on how it builds a model of software engineers through the hiring process. Every company should be.

Fetishizing interviewing, or a specific style of interviewing, betrays the same sort of lazy thinking that I wrote about the other day. I’ve seen great engineers turned down for jobs due solely to the fact that the entire panel of interviewers all took the same approach. The model was broken, but the interviewers thought that the problem was that the candidate wasn’t up to the challenge. The Forbes blog post falls prey to the same problem.

Lauren Bacon makes a related point in her response to the same Forbes blog post.

The wrong way to put up a maintenance page

Everybody with a Web site or service occasionally has to put up a maintenance page. Maybe you’re doing database maintenance that requires downtime, maybe you had a hardware failure, maybe Amazon Web Services is having some kind of outage.

You can either set up your application so that the page is displayed instead of what the customers expected to see, you can have a status page that shows whether the service is running, or you can redirect the user to the permanent URL where the maintenance page always lives. The latter approach is the wrong one. Here’s why.

Let’s say your service has a scheduled maintenance window that starts at midnight and ends at four in the morning. Your customers are most likely going to have to put up a maintenance page of their own and turn off the code that connects to your service while it’s down. Somebody is going to have to set their alarm, make sure the page is back up, and then reenable the service, update their own maintenance page, and so forth.

When said person gets up early in the morning, there’s a good chance they’ll go to the browser tab that’s displaying the state of your service and hit reload to see whether it’s back. If you have redirected them to a maintenance page that’s always around, they’re going to see that your site is down for maintenance even if it’s back up. At worst, they’ll assume your service is still down and not turn things back on at their end. At best, they’ll still have to finish waking up and actually investigate further.

Sure, putting a static page up somewhere and redirecting to it is easy, but it’s lazy and wrong. Don’t do it.

This message is brought to you by someone who had to set their alarm on a Sunday morning so that they could verify that a third party vendor had successfully completed scheduled maintenance and got confused after reloading the maintenance page in their browser.

Tim Bray wishes XML a happy 15th birthday:

When XML was in­vented, it was the world’s only use­ful cross-plat­form cross-lan­guage cross-char­ac­ter-set cross-data­base data for­mat. Where by “use­ful” I mean, “came with a pretty good suite of free open-source tools to do the basic things you needed.”

That’s why it ended up being used for all sorts of wildly-in­ap­pro­pri­ate things.

Older posts Newer posts

© 2025 rc3.org

Theme by Anders NorenUp ↑