1 Introduction

Stan, a Bayesian modeling language, was released in 2012 to considerable fanfare. A shiny new inference algorithm, HMC with NUTS (1), promised and delivered fits that could not be fit before. And then deep learning happened. The release of TensorFlow (2) opened the doors of deep learning to all and by it 2016 began getting significant traction. One result of this was people often assumed that deep learning’s successes usurped Bayesian modeling’s domain. This is not in our collective imagination–Bayesians like to ‘believe things’ after all–NSF reviews came back dismissing Stan funding because all the interesting work was assumed to be happening with deep learning. Recently in the UK, open skepticism was expressed about impact claims for Bayesian software in response to a research grant1. Mind you, deep learning and Bayesian modeling are conceptual cousins but in the end are very different from each other. They are better thought of as complementary than as antagonistic. Yet Bayesians found themselves in deep learning’s shadow somehow.

It is 2021 and this document does some very simple analysis around use of Bayesian and deep learning packages as evidenced in the research literature to get a perspective on what actually happened and is happening. The comparison aims to approach the following goals with very simple research citation metrics:

  1. How does Bayesian modeling software stack up against deep learning without appeal to feature comparisons, performance arguments on suspect data sets or achieved closeness to Platonic ideals? My response is–just go out and count how many citations the respective approaches have in the research literature. Counting and categorization, that’s it.
  2. Asses the impact of Bayesian modeling software using deep learning metrics as a yard stick–how big a fraction of a huge thing are we?
  3. Contextualize the roles each approach has by looking at subject distributions. The technologies have very different use cases so one would expect variation.

Citation counting is a crude metric but it has the advantage of simplicity. In compiling these metrics I came away with very different opinion than I started with so I thought it worth sharing. My prior assumptions were that Bayesian modeling was very niche and scurrying around doing very useful and important science but niche none the less. This analysis led me to revise that opinion considerably.

1.1 One of these things is not like the other: Characterizing the difference between Bayesian modeling and deep learning

What, if anything, are the hard differences between Bayesian modeling and deep learning? Deep learning can be implemented in Bayesian models, and lots of Bayesian concepts get used in the deep learning world as one reviewer noted:

“There is a huge rise in the use of Bayesian Deep learning. Please do not fall into the trap of making DL look like some ‘other’ thing. It’s just a non/hyper-parametric model and as such we can do all the full Bayesian stuff - and many do! There are well-documented works on semi-structured HMC, Riemann MC and good old fashioned MCMC for use in big deep nets, let alone all the neat work on approximate Bayes, mainly using variational learning. Look at work on things like loss-calibrated Bayesian deep nets, Bayes DL, Bayesian autoencoding, ‘HamilTorch’ MCMC package for TensorFlow (https://adamcobb.github.io/journal/hamiltorch.html) etc.”

Perhaps Stan and PyMC3 are really part of the same discipline as deep learning and this article catalogs usage differences of siblings from the same parents–sort of like the difference between R and Python?

But no. Another reviewer comment contains the key insight: they said it was no surprise that Bayesian software had high usage outside of computer science since other subjects are “dominated by Stats & Applied Math.” I took this to mean that deep learning is not a natural candidate for use in those fields, and software like Stan and PyMC3 are. Unpacking that a bit I’ll observe:

  • Unless the phenomenon under study is a physical neural net, deep learning only offers prediction services. While outstanding progress has been made and further progress may well involve Bayesian concepts, it is prediction in the end.
  • The basis of prediction in deep learning is opaque to human comprehension even if Bayesian techniques are used. This applies to generative neural nets as well.
  • Opacity blocks use in fields “dominated by Stats & Applied Math” where the goal involves developing and fitting mechanistic models for the most part. The science is in the model description, the quality of fit validates the model. A high quality fit in the absence of an understandable model does not help.

Deep learning clearly is used in mechanistic models but typically as a sensor or classifier, e.g., classify x-ray images for evidence of COVID pneumonia in an epidemiological study. That study itself will likely be a state-based model where transition rates are explicitly estimated statistically and are human interpretable.

Another case includes hybrid models where a deep learning component replaces some or all of the likelihood of a Bayesian model. While the overall model may be interpretable, the individual deep learning components remain black boxes. Interpretability for the overall model comes via having well understood properties for the deep learning components, e.g., how they were trained etc. or other methods of characterizing their role. For example Maggie Lieu’s presentation at StanCon used a deep learning component to substitute for a numerically unstable halo mass function (3).

Also Bayesian models can be utterly opaque if not authored with an eye to understandability, but the option for understandability exists and is generally the expectation. Deep learning systems cannot match this because a big portion of ‘authoring’ (hyper parameter setting) in those frameworks are beyond human understanding.

In the end I believe model interpretability cleanly differentiates the deep learners and Bayesian modelers2, at least in practice.

1.2 Defining what counts as a citation

For the purposes of this document we take Bayesian software to be Stan (4) and PyMC3 (5) with ecosystem components included that are likely to be cited. That includes the simplified syntax interfaces to Stan, brms (6), RstanArm (7), and interface packages to Stan, RStan (8) and PyStan (9)3. The analysis applies only to packages that are in current development so the venerated, and very high impact, packages like BUGS (10) and JAGS (11) are not considered although including them would double our counts. There is also the best named Bayesian package of all time, “emcee: The MCMC Hammer,” (12) which enjoys tremendous use in the astrophysics community but is too specialized to be considered a general Bayesian package so it is not included. Also, the ecosystems are actually much larger but they are unlikely to be cited in the research literature and we have to stop some place. For deep learning we take TensorFlow, its interface Keras (13) and PyTorch (14) as roughly equivalent entities for the comparison. Theano (15) is no longer in development. It should be noted that both PyTorch and TensorFlow have implemented HMC with NUTS for Bayesian inference but those are recent developments that have not made much of an impact yet.

The key resource behind this document is Elsevier’s https://scopus.com research search engine that provides a tightly curated4 search index that includes sources outside of the academic behemoth’s own journals. It also provides a solid API (Application Programmer Interface) and a classification of journals into subject areas. The actual form of Scopus queries is discussed below which should allow the questioning reader to verify and update the counts for the various packages discussed.

2 High level prevalence of Deep Learning vs Bayesian Modeling

In the process of grant writing I create metrics to help justify projects, lately in partnership with PyMC and ArviZ through the scientific fiscal sponsor NumFOCUS of which Stan is a member as well. Since Bayesian modeling seems always in the shadow of deep learning I started tracking deep learning software packages as well for comparison. Below we see the relative citations of the top deep learning packages TensorFlow, PyTorch and the support package Keras to the Bayesian packages PyMC3, Stan with supporting/derivative packages RStan, RStanArm, PyStan and brms.