I’ve been thinking a lot lately about where data scientists should reside on an engineering team. You can often find them on the analytics team or on a dedicated data science team, but I think that the best place for them to be is working as closely with product teams as possible, especially if they have software engineering skills.
Data scientists are, to me, essentially engineers with additional tools in the toolbox. When engineers work on a problem, they come up with engineering solutions that non-engineers may not see. They think of ways to make work more efficient with software. Data scientists do the same thing as engineers, but with data and mathematics. For example, they may see an opportunity to use a classifier where a regular software engineer may not. Or they may see a way to apply graph theory to efficiently solve a problem.
This is what the Javier Tordable presentation on Mathematics at Google that I’ve linked to before is about. The problem with having a data science team is lack of exposure to inspiring problems. The best way to enable people to use their specialized skills to solve problems is to allow them to suffer the pain of solving the problem. As they say, necessity is the mother of invention.
The risk, of course, is that if a data scientist is on one team, they may not have any exposure at all to problems that they could solve that are faced by other teams. In theory, putting data scientists on their own team and enabling them to consult where they’re most needed enables them to engage with problems where they are most needed, but in practice I think it often keeps them too far from the front lines to be maximally useful.
It makes sense to have data scientists meet up regularly so that they can talk about what they’re doing and share ideas, but I think that most of the time, they’re better off collaborating with members of a product team.
Data science by example
What does data science look like in practice? You won’t find a better example than Coding in the Rain by my old boss, Jason Davis. The post is a clinic on extracting meaning from a noisy data set.