rc3.org

Strong opinions, weakly held

Page 15 of 989

Monoculture and consensus

In response to my previous post on engineering culture, Bill Higgins asked in the comments:

Recently I worked on a project with multiple sites, and one of our toughest problems was the vast cultural differences between the sites. As a trivial example, one of the sites was militant about test automation and another site barely paid lip service to it.

So it seems like there is some happy medium between “multiple, incompatible cultures” and “monoculture”. I would be interested to hear your thoughts on where cultural homogeneity is helpful and where it is harmful.

It seems to me that the problem here is not diversity but rather failure to reach consensus. One big challenge when it comes to building engineering teams is figuring out which things everyone has to agree on and which things everyone can do their own way. For example, if you’re a Python shop, everyone can use the text editor of their choosing, but everyone has to use spaces or tabs, mixing the two is not an option.

Similarly, test driven development only works if everyone on the team practices it. If you’re not writing tests, it’s really easy to write code that is nearly impossible to test. Similarly, if there’s no continuous integration infrastructure, the people who aren’t writing tests will regularly check in code that breaks the existing tests. Teams have to reach some kind of consensus about these issues with team-wide implications.

To me this is a separate issue from avoiding monoculture. Teams have to reach consensus in order to function well. I read someone who said that in interviews, they look for people who are willing to “disagree and commit” when they can’t reach consensus. Oddly, I find that these situations arise as often on teams with no demographic diversity (read, composed entirely of white guys) as they do on more diverse teams.

Seven signs of dysfunctional engineering teams

I’ve been listening to the audiobook of Heart of Darkness this week, read by Kenneth Branagh. It’s fantastic. It also reminds me of some jobs I’ve had in the past.

There’s a great passage in which Marlow requires rivets to repair a ship, but finds that none are available. This, in spite of the fact that the camp he left further upriver is drowning in them. That felt familiar. There’s also a famous passage involving a French warship that’s blindly firing its cannons into the jungles of Africa in hopes of hitting a native camp situated within. I’ve had that job as well. Hopefully I can help you avoid getting yourself into those situations.

There are several really good lists of common traits seen in well-functioning engineering organizations. Most recently, there’s Pamela Fox’s list of What to look for in a software engineering culture. More famous, but somewhat dated at this point, is Joel Spolsky’s Joel Test. I want to talk about signs of teams that you should avoid.

This list is partially inspired by Ralph Peters’ Spotting the Losers: Seven Signs of Non-Competitive States. Of course, such a list is useless if you can’t apply it at the crucial point, when you’re interviewing. I’ve tried to include questions to ask and clues to look for that reveal dysfunction that is deeply baked into an engineering culture.

Preference for process over tools. As engineering teams grow, there are many approaches to coordinating people’s work. Most of them are some combination of process and tools. Git is a tool that enables multiple people to work on the same code base efficiently (most of the time). A team may also design a process around Git — avoiding the use of remote branches, only pushing code that’s ready to deploy to the master branch, or requiring people to use local branches for all of their development. Healthy teams generally try to address their scaling problems with tools, not additional process. Processes are hard to turn into habits, hard to teach to new team members, and often evolve too slowly to keep pace with changing circumstances. Ask your interviewers what their release cycle is like. Ask them how many standing meetings they attend. Look at the company’s job listings, are they hiring a scrum master?

Excessive deference to the leader or worse, founder. Does the group rely on one person to make all of the decisions? Are people afraid to change code the founder wrote? Has the company seen a lot of turnover among the engineering leader’s direct reports? Ask your interviewers how often the company’s coding conventions change. Ask them how much code in the code base has never been rewritten. Ask them what the process is for proposing a change to the technology stack. I have a friend who worked at a growing company where nobody was allowed to introduce coding conventions or libraries that the founding VP of Engineering didn’t understand, even though he hardly wrote any code any more.

Unwillingness to confront technical debt. Do you want to walk into a situation where the team struggles to make progress because they’re coding around all of the hacks they haven’t had time to address? Worse, does the team see you as the person who’s going to clean up all of the messes they’ve been leaving behind? You need to find out whether the team cares about building a sustainable code base. Ask the team how they manage their backlog of bugs. Ask them to tell you about something they’d love to automate if they had time. Is it something that any sensible person would have automated years ago? That’s a bad sign.

Not invented this week syndrome. We talk a lot about “not invented here” syndrome and how it affects the competitiveness of companies. I also worry about companies that lurch from one new technology to the next. Teams should make deliberate decisions about their stack, with an eye on the long term. More importantly, any such decisions should be made in a collaborative fashion, with both developer productivity and operability in mind. Finding out about this is easy. Everybody loves to talk about the latest thing they’re working with.

Disinterest in sustaining a Just Culture. What’s Just Culture? This post by my colleague John Allspaw on blameless post mortems describes it pretty well. Maybe you want to work at a company where people get fired on the spot for screwing up, or yelled at when things go wrong, but I don’t. How do you find out whether a company is like that? Ask about recent outages and gauge whether the person you ask is willing to talk about them openly. Do the people you talk to seem ashamed of their mistakes?

Monoculture. Diversity counts. Gender diversity is really important, but it’s not the only kind of diversity that matters. There’s ethnic diversity, there’s age diversity, and there’s simply the matter of people acting differently, or dressing differently. How homogenous is the group you’ve met? Do they all remind you of you? That’s almost certainly a serious danger sign. You may think it sounds like fun to work with a group of people who you’d happily have as roommates, but monocultures do a great job of masking other types of dysfunction.

Lack of a service-oriented mindset. The biggest professional mistakes I ever made were the result of failing to see that my job was ultimately to serve other people. I was obsessed with building what I thought was great software, and failed to see that what I should have been doing was paying attention to what other people needed from me in order to succeed in their jobs. You can almost never fail when you look for opportunities to be of service and avail yourself of them. Be on the lookout for companies where people get ahead by looking out for themselves. Don’t take those jobs.

There are a lot of ways that a team’s culture can be screwed up, but those are my top seven.

The risks of a dead man’s switch

Bruce Schneier considers Edward Snowden’s dead man’s switch, which will trigger the wide release of his trove of documents if he’s killed:

I would be more worried that someone would kill me in order to get the documents released than I would be that someone would kill me to prevent the documents from being released.

That’s the security mindset.

Defining data engineering

Last year I started working in the world of Big Data, and at the time, I didn’t know that “data science” and “data engineering” were separate things. At some point, I looked at what my team is working on and realized that the distinction between the two is important, and that the team is firmly entrenched in the data engineering camp.

Data scientists get all the glory and attention, but without data engineering, there’s no way for data scientists to practice real science. I’ll talk more about this in another post.

In this first post, I want to talk about the four basic layers of the data engineering stack. These apply whether you’re working to enable people to collect analytic data for a Web-based business, or building the infrastructure for scientists to analyze rainfall patterns. The layers are:

  1. Instrumentation
  2. Data crunching
  3. Data warehousing
  4. End-user tools

Let’s look at an example from Web analytics, because that’s what I understand the best. A tool like Google Analytics spans all four layers but end users only have a measure of control over two of them. When you add the Google Analytics JavaScript to your Web site, you’re setting up the instrumentation. Google crunches the data they collect, and they warehouse it for you. You can then view reports using the Web interface. Google Analytics is a great general purpose tool, but the lack of control and visibility is what limits its potential.

At Etsy, we have our own custom instrumentation, our own Hadoop jobs to crunch the logs the instruments write to, our own data warehouse, and, for the most part, end-user tools for exploring that data that we wrote ourselves.

All of the data engineering team’s projects involve at least one layer of the stack. For example, we worked with our mobile team to add instrumentation to our native iOS and Android apps, and then we made changes to our Hadoop jobs to make sure that the new incoming data was handled correctly. The new mobile data also has implications for our end-user tools.

Along with building up the data infrastructure, managing data quality is the other priority of data engineering. It’s possible to lose data at every layer of the stack. If your instrumentation is built using JavaScript, you lose data from browsers that don’t have it enabled. Your instruments usually log through calls to some kind endpoint and if that endpoint is down or the connection is unreliable, you lose data. If people close the browser window before the instruments load, you lose data. If your data crunching layer can’t properly process some of the data from the instruments (often due to corruption that’s beyond your control), it’s lost. Data can be lost between the data crunching layer and the data warehouse layer, and of course bugs in your end-user tools can give the appearance of data loss as well.

In terms of skills and daily work, data engineering is not much different than other areas of software development. There are cases where having a background in math or quantitative analysis can be hugely helpful, but many of the problems are straightforward programming or operations problems. The big problems tend to be scaling each of the layers of the stack to accommodate the volume of data being collected, and doing the hardcore debugging and analysis required to manage data loss effectively.

That’s a quick description of what life in data engineering is like. I am planning on writing a lot more about this topic. If you have questions, please comment.

The criminalization of binary downloads

Why code sharing sites are eliminating binary downloads

Simon Phipps reports on a trend I wasn’t aware of — sites for code sharing like GitHub and Google Code eliminating binary downloads. The risk of malware distribution and hassles from the copyright industry are making it more trouble than it’s worth.

Worse places to live than where you live

Today’s New York Times had an amazing slate of stories that made it clear that there are worse places to live:

The International section of the paper is always somewhat grim, but this selection stood out.

Data science by example

What does data science look like in practice? You won’t find a better example than Coding in the Rain by my old boss, Jason Davis. The post is a clinic on extracting meaning from a noisy data set.

The Snowden Effect

Journalism prof/media critic Jay Rosen made the most important point yet about Edward Snowden in his post, The Snowden Effect. In it, he describes how Snowden’s disclosures have motivated reporters to dig deeply into secret government programs that violate citizens’ privacy, and not just in America. All over the world people are asking what their governments are up to. Snowden’s specific disclosures aside, we’ve learned a ton about how governments collect data about us thanks to the Snowden Effect that Rosen describes. Had Glenn Greenwald not reported on Snowden’s leak, none of this would be happening. This effect alone (which is generalizable to many whistle blowers) makes whistle blowing hugely important.

The best example yet is this New York Times article from the Saturday New York Times that explains how the FISA court is used to establish secret precedents that provide a legal fig leaf for the NSA’s activities, thus illustrating the dangers of secrecy itself. It’s fascinating to see the Judicial Branch step up and provide the same function that the Office of Legal Counsel did in the Bush era.

Farewell to jeffreyp and Douglas Engelbart

This week we lost two people that are much on my mind. The first is Douglas Engelbart, one of the great visionaries in the history of the computer industry. You can read the New York Times obituary, but my favorite piece on Engelbart’s impact was by Bret Victor.

Engelbart was, of course, one of the fathers of the Silicon Valley as we know it today. As the obituary mentions, with regard to “The Mother of All Demos,” given by Engelbart in 1968:

The conference attendees were awe-struck. In one presentation, Dr. Engelbart demonstrated the power and the potential of the computer in the information age. The technology would eventually be refined at Xerox’s Palo Alto Research Center and at the Stanford Artificial Intelligence Laboratory. Apple and Microsoft would transform it for commercial use in the 1980s and change the course of modern life.

This week we also lost Jeffrey McManus, who in many ways epitomized the Silicon Valley to me, in a good way.

He passed away in his sleep at age 46. He joins the list of people I knew online for what seems like forever and never got the chance to meet in person, much to my regret. I always knew him as jeffreyp — that was his login on The Well. He was always the life of the online party, and I get the sense that he was the life of the party in the real world as well.

I was an interested spectator as he married and created a family with his wife Carole, progressed through his career, and leveled up his half-elf ranger Cocteaustin in Everquest.

Jeffrey was a huge beneficiary of the world Engelbart helped to create, and he did his part to build on Engelbart’s work. He wrote computer books, worked on developer platforms for companies like Ebay and Yahoo, and wound up founding CodeLessons, an online education company dedicated to teaching people to program. He’s gone way, way too soon.

I feel especially reminded this week to focus on the things that really matter.

Confusing power and morality

Ta-Nehisi Coates has an amazing post today, musing on the dynamics of power and ways that powerlessness is exhibited. He distills the post to a single sentence in the comment section:

My point is we often mistake the display of power for a display of morality.

Read the whole thing.

« Older posts Newer posts »

© 2024 rc3.org

Theme by Anders NorenUp ↑