Monday, June 12, 2017

A brainstorm with Bayes, Laplace, Pearson and Fisher

In our last posting session we discussed about statistical modelling.  Briefly, it was said that the main idea of statistical models relies in the quali-quantitative description of reality. If one assumes that the data sampling and modelling process is surrounded by several sources of uncertainty, it turns out that summarizing these uncertainties into probabilities is the most practical form to achieve their quantification.
Indeed, within Statistics (which may be described as the mathematics of uncertainty) the probability theory is a keystone, because through it one can quantitatively accomplish all inference about the parameters of a tested model.  Nevertheless, adopting a certain concept of probability has not always been such straightforward – Yes, there are different concepts of how to define probability and this has led to an endless debate about whether one is better than another.
Currently, the branch of statistical inference can be divided in two main strands: the frequentist (or classical) and the Bayesian inference. Here, it is not our intention to deepen philosophical discussion on both approaches. Though, we will shortly outline the main differences among them, and also give some reasons why we are inclined in using Bayesian inference in most of our studies. So, for the sake of simplicity, let’s introduce a quick way to contrast both inferences by exposing the impressive archery skills of Merida (from the Disney animation).





Figure 1: Comics contrasting frequentist and Bayesian inference. Adapted from http://faculty.washington.edu/kenrice/BayesIntroClassEpi515.pdf 


The frequentist way of thinking
What can we learn by reading these short comics? Well, firstly within the frequentist inference, the probability is defined as long-run relative frequencies of events, and for that one has to assume a purely random and well-defined experiment. For instance, in the comics the experiment would be composed by Merida shooting randomly arches at the target point (bullseye), and by you, who sits behind the target point and tries to calculate the exact position of the bullseye.
In every shoot, you draw a 10 cm circle around the arch and assume that this circle represents 95% of your confidence that it includes the bullseye. If Merida continues shooting randomly some arches (e.g. 100 arches) and you would continue drawing 10 cm circle for each shot arch, then at the end you would perceive that 95 of your circles have some degree of overlap and the remaining 5 felt completely out of the target point. What would then be your conclusions? Well, you would assume that the bullseye is contained somewhere within the 95 overlapped circles, being therefore 95% confident about your estimation since 95 of 100 shoots felt very close to each other. 
Note, however, that to estimate such outcome, Merida and you would have to repeat this experiment over and over, each of them being a new and independent experiment in relation to the previous one. In addition, although you would have 95% confidence about your estimation, it does not tell you the exact position of the bullseye, i.e., is it truly in the 95% region or could it also be within the remaining 5% region? So, what about Bayesian inference? 

The Bayesian way of thinking 
Well, under the Bayesian prism the probability is defined on an individual’s degree of belief of a particular event. Specifically, the probability quantifies the plausibility attributed to a certain proposition and whose truthfulness is uncertain in the light of available knowledge (Kinas & Andrade, 2010).
Let’s go back to the comics example: although you are sitting behind the target point, you may have some knowledge about the bullseye location due to previous experience. This experience can be attributed, for example, to the fact that you have already done the same experiment with other archers and thus may have an overall idea where the most probable target point could be, or also because you know Merida’s extraordinary abilities so well that that her first shot will probably lay very close to the bullseye (if not already over the target!).
The fact is that within Bayesian inference, you can use your previous experience and update your estimation as more and more arches are shot. Thus, a new experiment (i.e., a new shoot) is conditioned to the results of the previous experiment, and thus you will be able to continuously update your estimates until you reach the most probable outcome. As Kruschke (2014) already said, ‘’Bayesian inference is reallocation of credibility across possibilities.’’  

       So…why Bayesianism instead of Frequentism?
Most of the criticism over Bayesian inference is upon its subjective definition of probability. Such critiques have mainly increased after the 20’s, when Karl Pearson and Ronald Fisher firstly developed statistics as an information science. Nevertheless, this subjectivity is the cornerstone of Bayesian inference and represents a consequence of available information, and not merely an arbitrary quantification as usually thought. Moreover, the scientific judgment about the best choice of these probabilities is as necessary as the decision made upon choosing the most appropriate model according to tested data (Kinas & Andrade, 2010).
In conceptual terms, Bayesian inference is way much simpler than frequentist inference if we consider that all questions can be answered through the analysis of the posterior distribution, and which is obtained by means of Bayes’ Theorem (also known as inverse probability theorem)[1].  In the light of this, it can be noticed that the posterior distribution denotes the most complete way to express the state of knowledge about an investigated phenomenon.




Figure 2: Thomas Bayes (upper left panel), Pierre S. Laplace (upper right panel), Karl Pearson (lower left panel), and Ronald Fisher (lower right panel).


The essence of Bayesian inference lies precisely in the interactive dynamism between the previous experiences (also known as priors in Bayesian statistical terminology) and current experiment (known under likelihood), which jointly reallocates the credibility denoted in the posterior distribution (from which all necessary inferences are drawn). This also shows that the today’s posterior distribution can become the prior distribution of tomorrow (remember the comics), a fact that can never be assumed in the context of classical inference since all experiments are independent from each other.
Furthermore, according to Jaynes (2003), it is also at this stage where one of the major differences among both inferential branches arises, because probabilities change as we change our state of knowledge; frequencies, by contrast, do not change. This also explains why particular questions cannot be answered according to the frequency definition of probability. In environmental sciences, for example, questions such as ‘’ what is the probability that the current water use will remain sustainable’’ or ‘’what is the probability that area A presents greater potential for conservation than area B’’ can only be answered under the Bayesian prism.
Thus, knowing that uncertainty is inherent to all scientific realms, its inclusion into the decision-making process is not only desirable, but essential. Indeed, Bayesian inference has been successfully applied in a wide range of situations, such as cracking German enigmas during World War II, searching for the black box of the Air France aircraft that felt in the middle of the Atlantic Ocean in 2009, artificial intelligence, in courtrooms, and medicine[2]. Bayesian approaches has also shown to be a powerful tool in environmental sciences, as it is possible to incorporate all types of uncertainties in conservation and management decisions, and hence preventing possible catastrophes that might be irreversible.
As mentioned at the beginning of this post, the intention was not to discuss exhaustively all topics concerning the debate around frequentist and Bayesian inference. In fact, a wide range of interesting books and articles exposes an in deep discussion of this issue, highlighting pros and cons. If you are interested in further details on the incorporation of Bayesian inference into the ecological context, we recommend the articles by Dennis (1996), Ellison (2004), Clark (2005), and Cressie et al. (2009). If you are interested in more theoretical issues, you may refer to Jaynes (2003), McCarthy (2007), Gelman et al. (2013), and Kruschke (2014).



[1] Though this Theorem is named after the legacy left by the English Reverend Thomas Bayes, its practical development and application into scientific issues was mainly done by the French scientist Pierre S. Laplace – that’s also why some claims that the Theorem should be rather named as the Bayes-Laplace Theorem!

[2] For an easy-go reading about the historical application of Bayesian inference, the book The theory that would not die: how Baye’s rule cracked the enigma code, hunted down Russian submarines & emerged triumphant from two centuries of controversy, written by Sharon B. McGrayne, is highly recommended. 


References
Clark, J.S. 2005. Why environmental scientists are becoming Bayesians. Ecol. Let., 8: 2-14.
Cressie, N.; Calder, C.A., Clark, J.S., Ver Hoef, J.M. & Wilke, C.K., 2009. Accounting for uncertainty in ecological analysis: the strengths and limitations of hierarchical statistical modeling. Ecol. Appl., 19, 553-570.
Dennis, B. 1996. Should ecologists become Bayesians? Ecol. Appl., 6: 1095-1103.
Ellison, A.M. 2004. Bayesian inference in ecology. Ecol. Let., 7: 509-520
Gelman, A.; Carlin, J.B.; Stern, H.S.; Dunson, D. B.; Vehtari, A. & Rubin, D. B. 2013. Bayesian data analysis, Chapman & Hall/CRC Press, 675p.
Jaynes, E.T. 2003. Probability Theory – The Logic of Science. Cambridge University Press, 727p.
Kinas, P.G. & Andrade, H.A. 2010. Introdução à Análise Bayesiana (com R). maisQnada, 240p.
Kruschke, J.K. 2014. Doing Bayesian data analysis: a tutorial with R, JAGS and Stand. Elsevier, 759p.
McCarthy, M.A. 2007. Bayesian methods for Ecology. Cambridge University Press, 306p.



By Marie-Christine Rufener

Tuesday, April 11, 2017

After all, what is statistical modelling?

In order to inaugurate our posting session, nothing more appropriate than starting to talk about statistical modelling. Every day we hear more and more about researchers and their development of mathematical model, and their further application into several issues such as the prediction of climate changes, the effect on stock exchange after a company’s breakdown, an organism’s reaction to a new remedy, and so on.  Although few have some degree of familiarity about its concept and applicability, most people struggle about what exactly is a statistical model and where it came from.
Since ancient time humanity has been trying to understand the surrounding environment in which they live in. Without realizing it properly, humans have actually been using statistical models in their daily tasks since much longer time than we even can imagine. For instance, how did the ancient Egyptians know the right time to plant their wheat fields? Or, how did the ancient navigators know of the most appropriate wood to be used in their ships, in order to offer both greater resistance against damage and lighter weight to achieve higher speed? A very simple answer to these questions could be: adaptive knowledge. But, in which sense? 
For thousands of years humans have collected information from their daily tasks and, by means of their acquired knowledge, these information have been passed to successive generations which, in turn, improved increasingly these tasks. So, in essence what they have been doing was nothing more than summarizing their knowledge (‘’data’’) and speculate about a particular system. And that is exactly what statistical modelling is about!
In a very broad meaning, models may be understood as being a simplified way to summarize and describe quali-quantitatively a complex reality. It is usually used to classify particular events, untangle multiple influences, and quantify and identify patterns in the data. This latter aspect is of particular interest, once patterns might be used to forecast an outcome of interest, such as predicting the weather, economy and presidential elections.  
Usually models can be decomposed in two different parts: a deterministic component that summarizes a mathematical model; and a stochastic (random) component which devotes the statistical part of the model (Kéry, 2010). Hilborn & Mangel (1997), inspired by Schnute's (1987) expression, stated that in environmental sciences such as Ecology, the modelling process involves the work of a true ecological detective since its goal is to seek after a model that most approximates to reality.
Nonetheless, sadly but truly, an ecological detective often faces the frustrating fact that environmental data can hardly be integrally incorporated into a single model, because they are not solely extreme complex but are also very dynamic in spatio-temporal terms. In this sense, any modeler requires to oversimplify the whole process that is being evaluated, which can mostly lead to the misrepresentation of the data and, consequently, to biased interpretation of the result. That is also why an old environmental paradox is commonly paraphrased, which says that: ‘’everyone believes in data, except the collector, and no one believes in models, except the modeler .’’


Figure 1:  Illustrative scheme highlighting the modelling steps (adapted from Lemos (2010) and inspired from Hilborn & Mangel (1997)).

Despite of this, there is no reason to panic because there is still one way we may overcome this issue. Indeed, if one accepts that a certain degree of uncertainty has to be accounted throughout the modelling process, then we might solve this paradox and in fact reach an acceptable approximation of the investigated phenomenon. The following figure illustrates the entire modelling procedure and which may be roughly represented by four consecutives steps: data, knowledge (such as summaries provided by empirical models), understanding (i.e., the conceptual inference about the underlying process), and the model processing (translation of environmental processes into mathematical equations).
From the unifying framework established between each of the modelling steps, the information exchange supplied by the data and models will provide a general overview of the entire process that is being evaluated. Once the cycle is closed, discrepancies that emerged between expectations and observations may be traced to their sources. What’s more: this will also enable to make probabilistic statements about the data, parameters and complexity of the models in order to minimize the uncertainties and, consequently, approach the so-wanted realism. 
 Though we will probably never reach the perfect model, our job as environmental researcher is to set up and gather any type of information in the same way as a detective mounts his clues through a coherent image. As George Box (1979) said in one of his famous aphorism: "the most that can be expected from any model is that it can supply a useful approximation to reality; all models are wrong, but some are useful". Thus, our duty as modeler is to create and choose the most parsimonious model in light of our own knowledge. As our knowledge is adaptive, so are also our models. Models might be all what we have, and framing them whether as "right" or "wrong" is just a matter of perspective. However, if we assume that models are approximations to the truth, then we have to admit that they are all right (at least to some degree).


1. This paradox makes reference to the one firstly posed by Beverdige (1957), which said: "No one believes an hypothesis except its originator, but everyone believes an experiment except the experimenter. Most people are ready to believe something based on experiment but the experimenter knows the many little things that could have gone wrong in the experiment. For this reason the discoverer of a new fact seldom feels quite so confident of it as others do. On the other hand other people are usually critical of an hypothesis, whereas the originator identifies himself with it and is liable to become devoted to it."


by Marie-Christine Rufener

References:

Box, G. E. P. 1979. Robustness in the strategy of scientific model building. Robustness in statistic: 201-236. 
Hilborn, R. & Mangel, M. 1997. The ecological detective: confronting models with data. New Jersey, Princeton University Press, 315p.
Kéry, M. 2010. Intoduction to WinBUGS for ecologists: a bayesian approach to regression, ANOVA, mixed models and related analyses. Burlington, Academic Press, 302p.
Lemos, R. S. T. 2010. Hierarchical Bayesian methods for marine sciences: analysis of climate variability and fish abundance. Doctoral Thesis, Universidade de Lisboa, Portugal, 182p.
Schnute, J. T. 1987. Data, uncertainty, model ambiguity, and model identification. Natural  Resource Modeling 2: 159-212.