Retailers fight to control customer data

Retailers fight to control customer data

John Gruber has a piece up about retailers disabling NFC at checkout to prevent customers from checking out using Apple Pay. Retailers are intentionally degrading the customer experience in order to retain the ability to collect data about their customers’ habits. This tradeoff is near and dear to me, as analytics is currently a huge part of my job.

What I’d like to know is, what’s the return these companies are getting from tracking the behavior of specific users? For one thing, the work to build systems to exploit this data is resource intensive, and often results in failure. Companies are risking hurting their business by inconveniencing customers in exchange for the opportunity to make more money by exploiting the purchase history of their customers. I’d be really, really surprised if the economics actually work.

Customers, not extortionists

Customers, not extortionists

These days, if you want attentive customer service from most companies, the most direct route is to complain about the company on Twitter in such a way that your tweet shows up in the company’s mentions. It doesn’t matter whether your cable is working poorly, you had problems rebooking a flight, or your iPhone app didn’t work as well as you expected, Twitter is the place to seek relief.

This is a problem, mostly for the companies people are complaining about. They’re teaching their customers that the only way to get responsive customer service is to embarrass them publicly. What these companies fear most is that a complaint on Twitter will inspire an avalanche of “me too” retweets and responses that ultimately has a measurable negative impact on their business. That gives every customer who happens to be on Twitter the opportunity to be an amateur extortionist.

Here’s the thing, though. I don’t want to have to threaten a company to get decent customer service. If that’s what it takes, I don’t want to do business with the company at all. This is on my mind, of course, because there’s a company out there that I am having a bad customer service experience with, and I’m frustrated by the fact that griping about it on Twitter would almost certainly make it better.

What I’ve done instead is look at the company’s replies on Twitter to see what they suggest to other people who go to Twitter with their complaints, and follow those instructions. We’ll see how it works out.

We should be allowed to encrypt our data

We should be allowed to encrypt our data

There’s a debate raging over the news that mobile devices will soon be encrypted using keys that are fully under the owner’s control. The government, of course, hates this idea. Law enforcement feels like they should be able to decrypt anything if they want to. This is not a new debate — there was a huge debate over key escrow in the 90’s.

Here are a few pointers to pieces that explain why it’s important that users should be able to use encryption in the way that they see fit. Tim Bray concisely answers the question Is Encrypting Phones OK? Bruce Schneier explains that any back door put in place for law enforcement will inevitably be exploited by others (and links to a number of other good pieces on this topic). Cryptographer Matthew Green speculates on how Apple’s new security measures work.

Kathy Sierra opens up about online harassment

Kathy Sierra opens up about online harassment

In Trouble at the Koolaid Point Kathy Sierra talks about her experience with online harassment, and more recently, the degree to which people are willing to forgive and forget the past misdeeds of her harassers. She also talks about the wide variety of pernicious lies that have been told about her that have reached such wide circulation that she can’t really shake them, and how those lies have been used to justify her harassment. This stuff happens all the time, to all kinds of people, especially women and other members of underrepresented groups online.

Management is not about sorting apples

Management is not about sorting apples

Blameless post-mortems are one of the most notable (and perhaps most misunderstood) features of Etsy’s engineering culture. John Allspaw wrote about them on Code as Craft back in May, 2012. In it, he talks about human errors and the “Bad Apple theory,” which is that the best way to eliminate error is to eliminate the “bad apples” who introduce error.

Most of the time, when we talk about blameless post-mortems, it’s in the context of outages. What I think though is that once you accept the reasoning behind building a culture of learning around outages (as opposed to a culture of blame), it also changes, or at least should change, how you think about management in general.

Etsy’s practices around post-mortems are drawn largely from the field of accident investigation. One of the key concepts taken from that field is that of local rationality. You can read about it in this rather dry paper, Perspectives on Human Error: Hindsight Biases and Local Rationality, by David Woods and Richard Cook. To oversimplify, in the moment, people take actions that seem sensible to them in that context. Even when people take what seem to be negligent shortcuts, they do so confident that what they’re doing is going to work —they just happen to be wrong.

The challenge is in building resilient systems that enable the humans interacting with them to exercise local rationality safely. Disasters occur when the expected outcomes of actions differ from the actual outcomes. Maybe I push a code change that is supposed to make error messages more readable, but instead prevents the application from connecting to the database. The systems thinker asks what gave me the confidence to make that change, given the actual results. Did differences between the development and production environments make it impossible to test? Did a long string of successful changes give me the confidence to push the change without testing? Did I successfully test the change, only to find out that the results differed in production? A poor investigation would conclude that I am a bad apple who didn’t test his code properly and stop before asking any of those questions. That’s unlikely to lead to building a safer system in the long run. Only in an organization where I feel safe from reprisal will I answer questions like the ones above honestly enough to create the opportunity to learn.

I mention all of this to provide the background for the real point I want to make, which is that once you start looking at accidents this way, it necessarily changes the way you think of managing other people in general. When it comes to the bad apple theory in accident investigation, the case is closed, it’s a failure. Internalizing this insight has led me to also reject the bad apple theory when it comes to managing people in general.

Poor individual performance is almost always the result of a systems failure that is causing local rationality to break down. All too often the employee who is ostensibly performing poorly doesn’t even know that they’re not meeting the expectations of their manager. In the meantime, they may be working on projects that don’t have clear goals, or that they don’t see as important. They may be confronted with obstacles that are difficult to surmount, often as a result of conflicting incentives.

There are a million things that can lead to poor outcomes, only a few of which are due to the personal failings of any given person working on the project. If you accept that local rationality exists, then you accept that people are doing what they believe is expected of them. If they knew better, they would do better.

All this is not to say that there are never cases where an employment relationship should end. Sometimes people are on the wrong team, or at the wrong company. What I would say though is that the humane manager works to construct a system in which people can thrive, rather than getting rid of people who aren’t succeeding within a system that could quite possibly be unfit for humans. Even in the case where a person simply lacks the skills to succeed at the task at hand, someone else almost certainly assigned them the task or agreed to let them work on it. Their being in the position to fail reflects as poorly on the system as it does on the individual.

These principles are easier to apply within the limited context of investigating an incident than the general context of managing an organization, or the highly personal relationship been a manager and the person who reports to them. Focusing on the system and how to optimize it for the people who are part of it is the bedrock of building a just culture. As managers, it’s up to us to create a safe place for employees to explain the choices they make, and then use what we learn from those explanations to shore up the system overall. Simply tossing out the bad apples is a commitment to building a team that is unable to look back honestly and improve.

The strengths of low variance political configurations

The strengths of low variance political configurations

Today I got around to listening to the A Not So Simple Majority episode of This American Life. The story is about the school board in East Ramapo in New York, where a group of people who all send their children to private religious schools took over the public school board, and have since been gutting the public school system and funneling the money to their private schools. The religion of the group in question isn’t really important. The story is infuriating on many levels, and at the end it left me thinking about how to prevent this sort of thing from happening. I think that the lesson is in the dangers of small-scale democracy. The school board in East Ramapo has a lot of power, not just to manage schools but also to set local property tax rates, and was subject to capture by a relatively small group of people.

At the other end of the spectrum we have the US Presidency. It’s a nationwide election, and separation of powers insures that the President can’t do that much anyway. The election cycle is long and painful. This all leads to low variance outcomes — President Bush and President Obama may not personally have that much in common, but America has not been a radically different place under one of them than the other. The entire system is built to reduce the variance between Presidents. Generally speaking, the smaller the electorate, the higher the variance. That’s why the House of Representatives features a much broader ideological spectrum than the Senate, for example.

Getting back to East Ramapo, I was reminded of the article about the small municipalities around St. Louis, Missouri that I recently linked to. They’re really too small to be well-governed or even governable. Similarly, there was the recent case of Bell, California, where the elected officials in a town of 38,000 made themselves the highest-paid municipal officials in the country. I wonder whether the problems in East Ramapo School District would never have occurred if the entire county had a single school district, rather than the nine it currently has.

People seem to reflexively romanticize small-scale democracy, but it’s exploitable and breakable in many ways. We should be warier of it.

The system that created Ferguson, Missouri

The system that created Ferguson, Missouri

When I opened a tab with Radley Balko’s lengthy Washington Post article, How municipalities in St. Louis County, Mo., profit from poverty, I had some preconceived notions. It’s become increasingly difficult to raise taxes to pay for government programs, so governments are increasingly relying on alternative means to raise funds. In many cases, the burden winds up falling largely on the poor. The classic example here is state lotteries.

The article does cover that territory — it explains how small municipalities around St. Louis fund their governments through fines, court fees, traffic tickets, and so forth. What I didn’t expect was how intimately tied to racism this is. In the present, this manifests itself in white elected officials presiding over white cops squeezing black populations for as much money as they can get, leading to harassment and ultimately alienation. The current circumstances are the result of prior racism. When African Americans started moving to the St. Louis suburbs, whites responded by trying to zone them out:

Instead, developers would create new subdivisions outside a city. White people would move in. As black families moved north and west of the city, these subdivisions would try to keep them out by zoning themselves as single-family housing only. That barred the construction of public and low-income housing.

Because of the way Missouri laws work, the subdivisions incorporated and created tiny towns, towns that were too small to have a self-sustaining tax base. Instead, they use their independent police forces to wring the money out of residents who don’t have the political power to prevent this from happening.

We all observed what’s happen in Ferguson, Missouri with horror. Balko explains the system in which Ferguson exists. It’s a must-read.

Surprisingly, Perl outperforms sed and Awk

Surprisingly, Perl outperforms sed and Awk

Normally the performance of utilities and scripting languages really isn’t an issue – they’re all fast enough for the task at hand, but sometimes, that isn’t the case. For example, my team has built a database replication system that copies many millions of records from a set of sharded databases to a data warehouse every day. When it exports data from the originating databases, it needs to add a database identifier to every record, represented by a line in a TSV file.

The easiest way to do this is to pipe the output of the mysql command to a script that simply appends a value to each line. I started by using sed, for reasons of simplicity. This command appends a tab and the number 1 to every line of input:

sed 's/$/\t1/'

Unfortunately, as the amount of data being replicated increased, we found that the CPU on the box running the replication script was pegged at 100%, and sed was using most of it, so I started experimenting with alternatives.

To test, I used a 50 million line text file. The table the sample was taken from has over 3 billion rows, so you can see why the performance of this simple piece of code becomes important.

My approach to testing is simple, cat the file through the transformation and then just redirect the output to /dev/null. Here’s the baseline (the test was run on my MacBook Pro):

$ time cat test.tsv > /dev/null

real    0m0.615s
user    0m0.006s
sys 0m0.608s

Here’s how sed performs:

$ time cat test.tsv | sed 's/$/\t1/' > /dev/null

real    0m57.405s
user    0m56.845s
sys 0m1.970s

I read on one of the Stack Exchange sites that Awk might be faster, so I tried that:

$ time cat test.tsv | awk '{print $0 "\t2"}' > /dev/null

real    3m51.618s
user    3m50.367s
sys 0m3.676s

As you can see, Awk is a lot slower than sed, and it doesn’t even use a regular expression. I also read that using Bash with no external commands might be faster, so I tried this out:

$ time cat test_5m.tsv | while read line; do echo "$line\t2"; done > /dev/null

real    7m24.761s
user    3m16.709s
sys 5m54.428s

Those results are from a test file with 5 million lines, 1/10 the size of the other tests. The Bash solution is roughly 10 times slower than the Awk solution. At this point, I felt a little stuck. Nothing I tried outperformed the sed approach that we were already using. For some reason, I thought it might be worth it to try a Perl one-liner. Perl is known for having good performance for a scripting language, but at the same time, I couldn’t imagine that Perl could outperform sed, which is much simpler. First, I tried a direct translation of the sed solution:

$ time cat test.tsv | perl -pe 's/$/\t2/'  > /dev/null

real    0m42.030s
user    0m41.296s
sys 0m2.805s

I was surprised by this result, Perl beat sed handily. I’ve run these tests a number of times, and I’ve found that Perl reliably outperforms the sed equivalent in an apples to apples comparison. Of course, I didn’t have to use a regular expression here, I was just matching the end of the line. What happens when I leave the regex out?

$ time cat test.tsv | perl -ne 'chomp; print "$_\t2\n"' > /dev/null

real    0m12.938s
user    0m12.344s
sys 0m2.280s  

This time I just strip the line ending, print out the line with the text I want to append, and then add the line ending back in. The original sed command is more than four times slower than this Perl one-liner, that’s a massive improvement.

There are a couple of lessons here. The first is that when you’re doing simple text processing, you may as well just use Perl one-liners. The idea that sed and awk are superior because they are smaller and simpler is not borne out by real-world results. (They may be faster for some things, but it’s clearly no sure thing.) Perl is mature and is obviously highly optimized.

The second is that while premature optimization may be the root of all evil, when you’re performing the same operation billions of times, even very small gains in efficiency can have huge impact. When the CPU on the server was pegged at 100% for a few days and the load average spiked at over 200, every gain in efficiency became hugely important.

If you want to dig into Perl one liners, the perlrun man page is one place to start.

Update: For the tests above, I used the default OS X versions of these tools. The versions were Perl 5.16.2, Awk 20070501, and some version of BSD sed from 2005.

Here are some other numbers, using GNU sed (4.2.2) and Awk (4.1.1), installed via Homebrew (rather than the old, default versions that are installed on OS X.) Perl still wins against Awk, but it’s a lot closer:

$ time cat test.tsv | gawk '{print $0 "\t2"}' > /dev/null

real    0m23.503s
user    0m23.234s
sys 0m1.596s

$ time cat test.tsv | gsed 's/$/\t1/' > /dev/null

real    2m32.154s
user    2m31.332s
sys 0m2.014s

On the other hand, the latest GNU sed takes it on the chin. It’s slower than Perl, Awk, and the old OS X default version of sed.