R – Hoyle Analytics

I’ve used the pracma package in R for a while now. My main use, I’ll confess, is because it provides a convenient method for sampling orthonormal matrices. The first time I was faced with this task (16 years ago) I had to code up my own version of the Stewart algorithm in Java. The algorithm works by iteratively applying a series of Householder transformations – see for example the blog post by Rick Wicklin for a description and implementation of the algorithm. However, now that I predominantly use R and python the implementation of the Stewart algorithm in pracma is extremely handy and follows a similar form to the other random variate sampling functions in base R – see below,

library( pracma )

nDim <- 100 # set dimension of matrix
orthoMatrix <- rortho( nDim )

More recently I've been exploring the pracma package further, and I've been continually amazed how many useful little utility methods are available in the package – all the little methods for doing numerical scientific calculations that I was taught as an undergraduate – methods that I have ended up coding from scratch multiple times. Ok, the package describes itself as 'practical numerical math routines' so I guess I shouldn't be surprised.

At the latest Manchester R User Group meeting (organized by Mango Solutions) Leanne Fitzpatrick from HelloSoda gave a talk on Deploying Models in a Machine Learning Environment.

Leanne spoke about how the use of Docker had speeded up the deployment of machine learning models into the production environment, and had also enabled easier monitoring and updating of the models.

One of the additional benefits, and Leanne alluded that this may even have been the original motivation, was that of reducing the barriers between the data scientists and software engineers in the company. Data Science is an extremely broad church, encompassing a wide range of skill-sets and disciplines. Inevitably, there can be culture-clashes between those who consider themselves to be from the ‘science’ side of Data Science, and those from the engineering side of Data Science. Scientists are people who like to explore data, develop proof-of-concept projects, but who are often not the most disciplined in code writing and organization, and for whom operational deployment of a model is the last stage in their thinking. Scientists break things. Scientists like to break things. Scientists learn by breaking things.

xkcd_the_difference — Scientists are different (taken from xkcd.com)

Data Scientists who break things can be seen as an annoyance to those responsible for maintaining the operational infrastructure.

Obviously, in a commercial environment the data scientists and software engineers/developers need to work as efficiently together as possible. The conclusion that Leanne presented in her talk suggested that HelloSoda have taken some steps towards solving this problem through their use of containerization of the models. I say, ‘some steps’, as I can’t believe that any organization can completely remove all such barriers. Having worked in inter-disciplinary teams in both the commercial world and in academic research I’ve seen some teams work well together and others not. What tools and protocols an organization can use to generally reduce the barriers between investigative Data Science and operational Data Science is something that intrigues me – something for a longer post maybe.

Hoyle Analytics

Category: R

pracma R package

Manchester R User Group Meetup – May 2017

Share this:

Share this: