Manchester R User Group Meetup – May 2017

At the latest Manchester R User Group meeting (organized by Mango Solutions) Leanne Fitzpatrick from HelloSoda gave a talk on Deploying Models in a Machine Learning Environment.

Leanne spoke about how the use of Docker had speeded up the deployment of machine learning models into the production environment, and had also enabled easier monitoring and updating of the models.

One of the additional benefits, and Leanne alluded that this may even have been the original motivation, was that of reducing the barriers between the data scientists and software engineers in the company. Data Science is an extremely broad church, encompassing a wide range of skill-sets and disciplines. Inevitably, there can be culture-clashes between those who consider themselves to be from the ‘science’ side of Data Science, and those from the engineering side of Data Science. Scientists are people who like to explore data, develop proof-of-concept projects, but who are often not the most disciplined in code writing and organization, and for whom operational deployment of a model is the last stage in their thinking. Scientists break things. Scientists like to break things. Scientists learn by breaking things.

xkcd_the_difference
Scientists are different (taken from xkcd.com)

Data Scientists who break things can be seen as an annoyance to those responsible for maintaining the operational infrastructure.

Obviously, in a commercial environment the data scientists and software engineers/developers need to work as efficiently together as possible. The conclusion that Leanne presented in her talk suggested that HelloSoda have taken some steps towards solving this problem through their use of containerization of the models.  I say, ‘some steps’, as I can’t believe that any organization can completely remove all such barriers. Having worked in inter-disciplinary teams in both the commercial world and in academic research I’ve seen some teams work well together and others not. What tools and protocols an organization can use to generally reduce the barriers between investigative Data Science and operational Data Science is something that intrigues me – something for a longer post maybe.