Hoyle Analytics

I have recently needed to do some work evaluating high-order derivatives of composite functions. Namely, given a function $f(t)$ , evaluate the $n^{th}$ derivative of the composite function $\underbrace{\left (f\circ f\circ f \circ \ldots\circ f \right )}_{l\text{ terms}}(t)$ . That is, we define $f_{l}(t)$ to be the function obtained by iterating the base function $f(t)=f_{1}(t)$ $l-1$ times. One approach is to make recursive use of the Faa di Bruno formula,

$\displaystyle \frac{d^{n}}{dx^{n}}f(g(x))\;=\;\sum_{k=1}^{n}f^{(k)}(g(x))B_{n,k}\left (g'(x), g''(x), \ldots, g^{(n-k+1)}(x) \right )$ Eq.(1)

The fact that the exponential partial Bell polynomials $B_{n,k}\left (x_{1}, x_{2},\ldots, x_{n-k+1} \right )$ are available within the ${\tt sympy}$ symbolic algebra Python package, makes this initially an attractive route to evaluating the required derivatives. In particular, I am interested in evaluating the derivatives at $t=0$ and I am focusing on odd functions of $t$ , for which $t=0$ is then obviously a fixed-point. This means I only have to supply numerical values for the derivatives of my base function $f(t)$ evaluated at $t=0$ , rather than supplying a function that evaluates derivatives of $f(t)$ at any point $t$ .

Given the Taylor expansion of $f(t)$ about $t=0$ we can easily write code to implement the Faa di Bruno formula using sympy. A simple bit of pseudo-code to represent an implementation might look like,

Generate symbols.
Generate and store partial Bell polynomials up to known required order using the symbols from step 1.
Initialize coefficients of Taylor expansion of the base function.
Substitute numerical values of derivatives from previous iteration into symbolic representation of polynomial.
Sum required terms to get numerical values of all derivatives of current iteration.
Repeat steps 4 & 5.

I show python code snippets below implementing the idea. First we generate and cache the Bell polynomials,

 # generate and cache Bell polynomials
 bellPolynomials = {}
 for n in range(1, nMax+1):
      for k in range(1, n+1):
         bp_tmp = sympy.bell(n, k, symbols_tmp)
         bellPolynomials[str(n) + '_' + str(k)] = bp_tmp

Then we iterate over the levels of function composition, substituting the numerical values of the derivatives of the base function into the Bell polynomials,

for iteration in range(nIterations):
    if( verbose ):
        print( "Evaluating derivatives for function iteration " + str(iteration+1) )

    for n in range(1, nMax+1):
        sum_tmp = 0.0
        for k in range(1, n+1):
            # retrieve kth derivative of base function at previous iteration
            f_k_tmp = derivatives_atFixedPoint_tmp[0, k-1]

            #evaluate arguments of Bell polynomials
            bellPolynomials_key = str( n ) + '_' + str( k )
            bp_tmp = bellPolynomials[bellPolynomials_key]
            replacements = [( symbols_tmp[i],
            derivatives_atFixedPoint_tmp[iteration, i] ) for i in range(n-k+1) ]
            sum_tmp = sum_tmp + ( f_k_tmp * bp_tmp.subs(replacements) )

        derivatives_atFixedPoint_tmp[iteration+1, n-1] = sum_tmp

Okay – this isn’t really making use true recursion, merely looping, but the principle is the same. The problem one encounters is that the manipulation of the symbolic representation of the polynomials is slow, and run-times slow significantly for $n > 15$ .

However, the $n^{th}$ derivative can alternatively be expressed as a sum over partitions of n as,

$\displaystyle \frac{d^{n}}{dx^{n}}f(g(x))\;=\;\sum \frac{n!}{m_{1}!m_{2}!\ldots m_{n}!} f^{(m_{1}+m_{2}+\ldots+m_{n})}\left ( g(x)\right )\prod_{j=1}^{n}\left ( \frac{g^{(j)}(x)}{j!}\right )^{m_{j}}$ Eq.(2)

where the sum is taken over all non-negative integers tuples $m_{1}, m_{2},\ldots, m_{n}$ that satisfy $1\cdot m_{1}+ 2\cdot m_{2}+\ldots+ n\cdot m_{n}\;=\; n$ . That is, the sum is taken over all partitions of n. Fairly obviously the Faa di Bruno formula is just a re-arrangement of the above equation, made by collecting terms involving $f^{(k)}(g(x))$ together, and as such that rearrangement gives the fundamental definition of the partial Bell polynomial.

I’d shied away from the more fundamental form of Eq.(2) in favour of Eq.(1) as I believed the fact that a number of terms had already been collected together in the form of the Bell polynomial would make any implementation that used them quicker. However, the advantage of the form in Eq.(2) is that the summation can be entirely numeric, provided an efficient generator of partitions of n is available to us. Fortunately, sympy also contains a method for iterating over partitions. Below are code snippets that implement the evaluation of $f_{l}^{(n)}(0)$ using Eq.(2). First we generate and store the partitions,

# store partitions
pStore = {}
for k in range( n ):
    # get partition iterator
    pIterator = partitions(k+1)
    pStore[k] = [p.copy() for p in pIterator]

After initializing arrays to hold the derivatives of the current function iteration we then loop over each iteration, retrieving each partition and evaluating the product in the summand of Eq.(2). Obviously, it is relatively easy working on the log scale, as shown in the code snippet below,

# loop over function iterations
for iteration in range( nIterations ):

    if( verbose==True ):
        print( "Evaluating derivatives for function iteration " + str(iteration+1)  )

    for k in range( n ):
        faaSumLog = float( '-Inf' )
        faaSumSign = 1

        # get partitions
        partitionsK = pStore[k]
        for pidx in range( len(partitionsK) ):
            p = partitionsK[pidx]
            sumTmp = 0.0
            sumMultiplicty = 0
            parityTmp = 1
            for i in p.keys():
                value = float(i)
                multiplicity = float( p[i] )
                sumMultiplicty += p[i]
                sumTmp += multiplicity * currentDerivativesLog[i-1]
                sumTmp -= gammaln( multiplicity + 1.0 )
                sumTmp -= multiplicity * gammaln( value + 1.0 )
                parityTmp *= np.power( currentDerivativesSign[i-1], multiplicity )	

            sumTmp += baseDerivativesLog[sumMultiplicty-1]
            parityTmp *= baseDerivativesSign[sumMultiplicty-1]

            # now update faaSum on log scale
            if( sumTmp > float( '-Inf' ) ):
                if( faaSumLog > float( '-Inf' ) ):
                    diffLog = sumTmp - faaSumLog
                    if( np.abs(diffLog) = 0.0 ):
                            faaSumLog = sumTmp
                            faaSumLog += np.log( 1.0 + (float(parityTmp*faaSumSign) * np.exp( -diffLog )) )
                            faaSumSign = parityTmp
                        else:
                            faaSumLog += np.log( 1.0 + (float(parityTmp*faaSumSign) * np.exp( diffLog )) )
                    else:
                        if( diffLog > thresholdForExp ):
                            faaSumLog = sumTmp
                            faaSumSign = parityTmp
                else:
                    faaSumLog = sumTmp
                    faaSumSign = parityTmp

        nextDerivativesLog[k] = faaSumLog + gammaln( float(k+2) )
        nextDerivativesSign[k] = faaSumSign

Now let’s run both implementations, evaluating up to the 15th derivative for 4 function iterations. Here my base function is $f(t) =1 -\frac{2}{\pi}\arccos t$ . A plot of the base function is shown below in Figure 1.

FaaDiBrunoExample_BaseFunction — Figure 1: Plot of the base function $f(t)\;=\;1\;-\;\frac{2}{\pi}\arccos(t)$

The base function has a relatively straight forward Taylor expansion about $t=0$ ,

$\displaystyle f(t)\;=\;\frac{2}{\pi}\sum_{k=0}^{\infty}\frac{\binom{2k}{k}t^{2k+1}}{4^{k}\left ( 2k+1 \right )}\;\;\;,\;\;\;|t| \leq 1 \;\;,$ Eq.(3)

and so supplying the derivatives, $f^{(k)}(0)$ , of the base function is easy. The screenshot below shows a comparison of $f_{l}^{(15)}(0)$ for $l\in \{2, 3, 4, 5\}$ . As you can see we obtain identical output whether we use sympy’s Bell polynomials or sympy’s partition iterator.

The comparison of the implementations is not really a fair one. One implementation is generating a lot of symbolic representations that aren’t really needed, whilst the other is keeping to entirely numeric operations. However, it did highlight several points to me,

Directly working with partitions, even up to moderate values of $n$ , e.g. $n=50$ , can be tractable using the sympy package in python.
Sometimes the implementation of the more concisely expressed representation (in this case in terms of Bell polynomials) can lead to an implementation with significantly longer run-times, even if the more concise representation can be implemented concisely (less lines of code).
The history of the Faa di Bruno formula, and the various associated polynomials and equivalent formalisms (such as the Jabotinksy matrix formalism) is a fascinating one.

I’ve put the code for both methods of evaluating the derivatives of an iterated function as a gist on github.

At the moment the functions take an array of Taylor expansion coefficients, i.e. they assume the point at which derivatives are requested is a fixed point of the base function. At some point I will add methods that take a user-supplied function for evaluating the $k^{th}$ derivative, $f^{(k)}(t)$ , of the base function at any point t and will return the derivatives, $f_{l}^{(k)}(t)$ of the iterated function.

I haven’t yet explored whether, for reasonable values of n (say $n \leq 50$ ), I need to work on the log scale, or whether direct evaluation of the summand will be sufficiently accurate and not result in overflow error.

At the latest Manchester R User Group meeting (organized by Mango Solutions) Leanne Fitzpatrick from HelloSoda gave a talk on Deploying Models in a Machine Learning Environment.

Leanne spoke about how the use of Docker had speeded up the deployment of machine learning models into the production environment, and had also enabled easier monitoring and updating of the models.

One of the additional benefits, and Leanne alluded that this may even have been the original motivation, was that of reducing the barriers between the data scientists and software engineers in the company. Data Science is an extremely broad church, encompassing a wide range of skill-sets and disciplines. Inevitably, there can be culture-clashes between those who consider themselves to be from the ‘science’ side of Data Science, and those from the engineering side of Data Science. Scientists are people who like to explore data, develop proof-of-concept projects, but who are often not the most disciplined in code writing and organization, and for whom operational deployment of a model is the last stage in their thinking. Scientists break things. Scientists like to break things. Scientists learn by breaking things.

xkcd_the_difference — Scientists are different (taken from xkcd.com)

Data Scientists who break things can be seen as an annoyance to those responsible for maintaining the operational infrastructure.

Obviously, in a commercial environment the data scientists and software engineers/developers need to work as efficiently together as possible. The conclusion that Leanne presented in her talk suggested that HelloSoda have taken some steps towards solving this problem through their use of containerization of the models. I say, ‘some steps’, as I can’t believe that any organization can completely remove all such barriers. Having worked in inter-disciplinary teams in both the commercial world and in academic research I’ve seen some teams work well together and others not. What tools and protocols an organization can use to generally reduce the barriers between investigative Data Science and operational Data Science is something that intrigues me – something for a longer post maybe.