# China invests big in AI

This week saw interesting news and career guide articles in Nature highlighting Chinese government plans for its AI industry. The goal of the Chinese government is to become a world leader in AI by 2030. China forecasts that the value of its core AI industries will be US\$157.7Billion in 2030 (based on exchange rate at 2018/01/19). How realistic that goal is will obviously depend upon what momentum there already is within China’s AI sector, but even so I was still struck and impressed by the ambition of the goal – 2030 is only 12 years away, which is not long in research and innovation terms. The Nature articles are worth a read (and are not behind a paywall).

What will be the effect of China’s investment in AI? Attempting to make technology based predictions about the future can be ill-advised, but I will speculate anyway, as the articles, for me, prompted three immediate questions:

• How likely is China to be successful in achieving its goal?
• What sectors will it achieve most influence in?
• What are competitor countries doing?

How successful will China be?

Whatever your opinions on the current hype surrounding AI, Machine Learning, and Data Science, there tends to a consensus that Machine Learning will emerge from its current hype-cycle with some genuine gains and progress. This time it is different. The fact that serious investment in AI is being made not just by corporations but by governments (including the UK) could be taken as an indicator that we are looking beyond the hype. Data volumes, compute power, and credible business models are all present simultaneously in this current AI/Machine Learning hype-cycle, in ways that they weren’t in the 1980s neural network boom-and-bust and other AI Winters. Machine Learning and Data Science is becoming genuinely commoditized. Consequently,  the goal China has set itself is about building capacity, i.e. about the transfer of knowledge from a smaller innovation ecosystem (such as the academic community and a handful of large corporate labs) to produce a larger but highly-skilled bulk of practitioners. A capacity building exercise such as this should be a known quantity and so investments will scale – i.e. you will see proportional returns on those investments. The Nature news article does comment that China may face some challenges in strengthening the initial research base in AI, but this may be helped by the presence of large corporate players such as Microsoft and Google, who have established AI research labs within the country.

What sectors will be influenced most?

One prominent area for applications of AI and Machine Learning is commerce, and China provides a large potential market place. However, access to that market can be difficult for Western companies and so Chinese data science solution providers may face limited external competition on their home soil. Equally, Chinese firms wishing to compete in Western markets, using expertise of the AI-commerce interface gained from their home market, may face tough challenges from the mature and experienced incumbents present in those Western markets. Secondly, it may depend precisely on which organizations in China develop the beneficial experience in the sector. The large US corporates (Microsoft, Google) that have a presence in China are already main players in AI and commerce in the West, and so may not see extra dividends beyond the obvious ones of access to the Chinese market and access to emerging Chinese talent. Overall, it feels that whilst China’s investment in this sector will undoubtedly be a success, and Chinese commerce firms will be a success, China’s AI investment may not significantly change the direction the global commerce sector would have taken anyway with regard to its use and adoption of AI.

Perhaps more intriguing will be newer, younger sectors in which China has already made significant investment. Obvious examples, such as genomics, spring to mind, given the scale of activity by organizations such as BGI (including the AI-based genomic initiative of the BGI founder Jun Wang). Similarly, robotics is another field highlighted within the Nature articles.

What are China’s competitors investing in this area?

I will restrict my comments to the UK, which, being my home country, I am more familiar with. Like China, the UK has picked out AI, Robotics, and a Data Driven Economy as areas that will help enable a productive economy. Specifically, the UK Industrial Strategy announced last year identifies AI for one of its first ‘Sector Deals’ and also as one of four Grand Challenges. The benefits of AI is even called out in other Sector Deals, for example in the Sector Deal for the Life Sciences.  This is on top of existing UK investment in Data Science, such as the Alan Turing Institute (ATI) and last year’s announcement by the ATI that it is adding four additional universities as partners. In addition we have capacity-building calls from research councils, such as the EPSRC call for proposals for Centres for Doctoral Training (CDTs). From my quick reading, 4 of the 30 priority areas that the EPSRC has highlighted for CDTs make explicit reference to AI, Data Science, or Autonomous Systems. The number of priority areas that will have some implicit dependence on AI or Data Science will be greater. Overall the scale of the UK investment is, naturally, unlikely to match that of China – the original Nature report on the Chinese plans says that no mention of level of funding is made.  However, the likely scale of the Chinese governmental investment in AI will ultimately give that country an edge, or at least a higher probability of success. Does that mean the UK needs to re-think and up its investment?

Notes:

# Faa di Bruno and derivatives of an iterated function

I have recently needed to do some work evaluating high-order derivatives of composite functions. Namely, given a function $f(t)$, evaluate the $n^{th}$ derivative of the composite function $\underbrace{\left (f\circ f\circ f \circ \ldots\circ f \right )}_{l\text{ terms}}(t)$. That is, we define $f_{l}(t)$ to be the function obtained by iterating the base function $f(t)=f_{1}(t)$ $l-1$ times. One approach is to make recursive use of the Faa di Bruno formula,

$\displaystyle \frac{d^{n}}{dx^{n}}f(g(x))\;=\;\sum_{k=1}^{n}f^{(k)}(g(x))B_{n,k}\left (g'(x), g''(x), \ldots, g^{(n-k+1)}(x) \right )$      Eq.(1)

The fact that the exponential partial Bell polynomials $B_{n,k}\left (x_{1}, x_{2},\ldots, x_{n-k+1} \right )$ are available within the ${\tt sympy}$ symbolic algebra Python package, makes this initially an attractive route to evaluating the required derivatives. In particular, I am interested in evaluating the derivatives at $t=0$ and I am focusing on odd functions of $t$, for which $t=0$ is then obviously a fixed-point. This means I only have to supply numerical values for the derivatives of my base function $f(t)$ evaluated at $t=0$, rather than supplying a function that evaluates derivatives of $f(t)$ at any point $t$.

Given the Taylor expansion of $f(t)$ about $t=0$ we can easily write code to implement the Faa di Bruno formula using sympy. A simple bit of pseudo-code to represent an implementation might look like,

1. Generate symbols.
2. Generate and store partial Bell polynomials up to known required order using the symbols from step 1.
3. Initialize coefficients of Taylor expansion of the base function.
4. Substitute numerical values of derivatives from previous iteration into symbolic representation of polynomial.
5. Sum required terms to get numerical values of all derivatives of current iteration.
6. Repeat steps 4 & 5.

I show python code snippets below implementing the idea. First we generate and cache the Bell polynomials,

 # generate and cache Bell polynomials
bellPolynomials = {}
for n in range(1, nMax+1):
for k in range(1, n+1):
bp_tmp = sympy.bell(n, k, symbols_tmp)
bellPolynomials[str(n) + '_' + str(k)] = bp_tmp


Then we iterate over the levels of function composition, substituting the numerical values of the derivatives of the base function into the Bell polynomials,

for iteration in range(nIterations):
if( verbose ):
print( "Evaluating derivatives for function iteration " + str(iteration+1) )

for n in range(1, nMax+1):
sum_tmp = 0.0
for k in range(1, n+1):
# retrieve kth derivative of base function at previous iteration
f_k_tmp = derivatives_atFixedPoint_tmp[0, k-1]

#evaluate arguments of Bell polynomials
bellPolynomials_key = str( n ) + '_' + str( k )
bp_tmp = bellPolynomials[bellPolynomials_key]
replacements = [( symbols_tmp[i],
derivatives_atFixedPoint_tmp[iteration, i] ) for i in range(n-k+1) ]
sum_tmp = sum_tmp + ( f_k_tmp * bp_tmp.subs(replacements) )

derivatives_atFixedPoint_tmp[iteration+1, n-1] = sum_tmp


Okay – this isn’t really making use true recursion, merely looping, but the principle is the same. The problem one encounters is that the manipulation of the symbolic representation of the polynomials is slow, and run-times slow significantly for $n > 15$.

However, the $n^{th}$ derivative can alternatively be expressed as a sum over partitions of n as,

$\displaystyle \frac{d^{n}}{dx^{n}}f(g(x))\;=\;\sum \frac{n!}{m_{1}!m_{2}!\ldots m_{n}!} f^{(m_{1}+m_{2}+\ldots+m_{n})}\left ( g(x)\right )\prod_{j=1}^{n}\left ( \frac{g^{(j)}(x)}{j!}\right )^{m_{j}}$   Eq.(2)

where the sum is taken over all non-negative integers tuples $m_{1}, m_{2},\ldots, m_{n}$ that satisfy $1\cdot m_{1}+ 2\cdot m_{2}+\ldots+ n\cdot m_{n}\;=\; n$. That is, the sum is taken over all partitions of  n. Fairly obviously the Faa di Bruno formula is just a re-arrangement of the above equation, made by collecting terms involving $f^{(k)}(g(x))$ together, and as such that rearrangement gives the fundamental definition of the partial Bell polynomial.

I’d shied away from the more fundamental form of Eq.(2) in favour of Eq.(1) as I believed the fact that a number of terms had already been collected together in the form of the Bell polynomial would make any implementation that used them quicker. However, the advantage of the form in Eq.(2) is that the summation can be entirely numeric, provided an efficient generator of partitions of n is available to us. Fortunately, sympy also contains a method for iterating over partitions. Below are code snippets that implement the evaluation of $f_{l}^{(n)}(0)$ using Eq.(2). First we generate and store the partitions,

# store partitions
pStore = {}
for k in range( n ):
# get partition iterator
pIterator = partitions(k+1)
pStore[k] = [p.copy() for p in pIterator]


After initializing arrays to hold the derivatives of the current function iteration we then loop over each iteration, retrieving each partition and evaluating the product in the summand of Eq.(2). Obviously, it is relatively easy working on the log scale, as shown in the code snippet below,

# loop over function iterations
for iteration in range( nIterations ):

if( verbose==True ):
print( "Evaluating derivatives for function iteration " + str(iteration+1)  )

for k in range( n ):
faaSumLog = float( '-Inf' )
faaSumSign = 1

# get partitions
partitionsK = pStore[k]
for pidx in range( len(partitionsK) ):
p = partitionsK[pidx]
sumTmp = 0.0
sumMultiplicty = 0
parityTmp = 1
for i in p.keys():
value = float(i)
multiplicity = float( p[i] )
sumMultiplicty += p[i]
sumTmp += multiplicity * currentDerivativesLog[i-1]
sumTmp -= gammaln( multiplicity + 1.0 )
sumTmp -= multiplicity * gammaln( value + 1.0 )
parityTmp *= np.power( currentDerivativesSign[i-1], multiplicity )

sumTmp += baseDerivativesLog[sumMultiplicty-1]
parityTmp *= baseDerivativesSign[sumMultiplicty-1]

# now update faaSum on log scale
if( sumTmp > float( '-Inf' ) ):
if( faaSumLog > float( '-Inf' ) ):
diffLog = sumTmp - faaSumLog
if( np.abs(diffLog) = 0.0 ):
faaSumLog = sumTmp
faaSumLog += np.log( 1.0 + (float(parityTmp*faaSumSign) * np.exp( -diffLog )) )
faaSumSign = parityTmp
else:
faaSumLog += np.log( 1.0 + (float(parityTmp*faaSumSign) * np.exp( diffLog )) )
else:
if( diffLog > thresholdForExp ):
faaSumLog = sumTmp
faaSumSign = parityTmp
else:
faaSumLog = sumTmp
faaSumSign = parityTmp

nextDerivativesLog[k] = faaSumLog + gammaln( float(k+2) )
nextDerivativesSign[k] = faaSumSign


Now let’s run both implementations, evaluating up to the 15th derivative for 4 function iterations. Here my base function is $f(t) =1 -\frac{2}{\pi}\arccos t$. A plot of the base function is shown below in Figure 1.

The base function has a relatively straight forward Taylor expansion about $t=0$,

$\displaystyle f(t)\;=\;\frac{2}{\pi}\sum_{k=0}^{\infty}\frac{\binom{2k}{k}t^{2k+1}}{4^{k}\left ( 2k+1 \right )}\;\;\;,\;\;\;|t| \leq 1 \;\;,$    Eq.(3)

and so supplying the derivatives, $f^{(k)}(0)$, of the base function is easy. The screenshot below shows a comparison of $f_{l}^{(15)}(0)$ for $l\in \{2, 3, 4, 5\}$. As you can see we obtain identical output whether we use sympy’s Bell polynomials or sympy’s partition iterator.

The comparison of the implementations is not really a fair one. One implementation is generating a lot of symbolic representations that aren’t really needed, whilst the other is keeping to entirely numeric operations. However, it did highlight several points to me,

• Directly working with partitions, even up to moderate values of $n$, e.g. $n=50$, can be tractable using the sympy package in python.
• Sometimes the implementation of the more concisely expressed representation (in this case in terms of Bell polynomials) can lead to an implementation with significantly longer run-times, even if the more concise representation can be implemented concisely (less lines of code).
• The history of the Faa di Bruno formula, and the various associated polynomials and equivalent formalisms (such as the Jabotinksy matrix formalism) is a fascinating one.

I’ve put the code for both methods of evaluating the derivatives of an iterated function as a gist on github.

At the moment the functions take an array of Taylor expansion coefficients, i.e. they assume the point at which derivatives are requested is a fixed point of the base function. At some point I will add methods that take a user-supplied function for evaluating the $k^{th}$ derivative, $f^{(k)}(t)$, of the base function at any point t and will return the derivatives, $f_{l}^{(k)}(t)$ of the iterated function.

I haven’t yet explored whether, for reasonable values of n (say $n \leq 50$), I need to work on the log scale, or whether direct evaluation of the summand will be sufficiently accurate and not result in overflow error.

# Manchester R User Group Meetup – May 2017

At the latest Manchester R User Group meeting (organized by Mango Solutions) Leanne Fitzpatrick from HelloSoda gave a talk on Deploying Models in a Machine Learning Environment.

Leanne spoke about how the use of Docker had speeded up the deployment of machine learning models into the production environment, and had also enabled easier monitoring and updating of the models.

One of the additional benefits, and Leanne alluded that this may even have been the original motivation, was that of reducing the barriers between the data scientists and software engineers in the company. Data Science is an extremely broad church, encompassing a wide range of skill-sets and disciplines. Inevitably, there can be culture-clashes between those who consider themselves to be from the ‘science’ side of Data Science, and those from the engineering side of Data Science. Scientists are people who like to explore data, develop proof-of-concept projects, but who are often not the most disciplined in code writing and organization, and for whom operational deployment of a model is the last stage in their thinking. Scientists break things. Scientists like to break things. Scientists learn by breaking things.

Data Scientists who break things can be seen as an annoyance to those responsible for maintaining the operational infrastructure.

Obviously, in a commercial environment the data scientists and software engineers/developers need to work as efficiently together as possible. The conclusion that Leanne presented in her talk suggested that HelloSoda have taken some steps towards solving this problem through their use of containerization of the models.  I say, ‘some steps’, as I can’t believe that any organization can completely remove all such barriers. Having worked in inter-disciplinary teams in both the commercial world and in academic research I’ve seen some teams work well together and others not. What tools and protocols an organization can use to generally reduce the barriers between investigative Data Science and operational Data Science is something that intrigues me – something for a longer post maybe.

# SciPy incomplete gamma function

I got tripped up by this recently when doing eigenvalue calculations in python. I wanted to evaluate the incomplete gamma function $\gamma (a, x)\;=\;\int_{0}^{x}t^{a-1}e^{-t}dt$. After using the SciPy ${\tt gammainc}$ function I was scratching my head as to why I was seeing a discrepancy between my numerical calculations for the eigenvalues and my theoretical calculation. Then I came across this post by John D Cook that helped clarify things. The SciPy function ${\tt gammainc}$ actually calculates $\gamma(a,x)/\Gamma(a)$.