See the discussion of Bayesian Calibration in the Dakota User's Manual
\cite UsersMan.

<!--
The following information is outdated; user manual has more up to data info

We have three preliminary implementations of Bayesian calibration methods 
in Dakota, where a ``prior distribution'' on a parameter is 
updated through a Bayesian framework involving experimental data and 
a likelihood function. 
The theory behind Bayesian methods is best described in other sources 
\cite Kenn01 and only a brief summary is given here. 
In Bayesian methods, uncertain parameters are characterized by probability 
density functions. These probability densities functions define the 
permissible parameter values - the support, as well as the relative
 plausibility of each permissible parameter value. In the context of 
calibration or any inference step, the probability density function 
that describes knowledge before the incorporation of data is called 
the prior, \f$f_\Theta\left( \theta  \right)\f$.
 
When data is available, the likelihood function describes how well 
each parameter value is supported by the data. Bayes Theorem \cite Jaynes, 
shown in Equation \ref eqnBayesThm, is used for inference:  to 
derive the plausible parameter values, based on the prior probability 
density and the data \f$d\f$. The result is the posterior parameter density 
of the parameters \f$f_{\Theta |D}\left( {\theta |d} \right)\f$. It is 
interpreted the same way as the prior, but includes the information 
derived from the data.
 
\anchor eqnBayesThm
\f[
  {f_{\Theta |D}}\left( {\theta |d} \right) = \frac{{{f_\Theta }\left( \theta  \right)\mathcal{L}\left( {\theta ;d} \right)}}{{{f_D}\left( d \right)}}
\f]

The likelihood function is used to describe how well a model's 
predictions are supported by the data. 
The likelihood function can be written generally as:
\f[
  \mathcal{L}\left( {\theta ;d} \right) = f\left( {\mathcal{M}\left( \theta  \right) - d} \right)
\f]
where \f$\theta\f$ are the parameters of model \f$\mathcal{M}\f$. 
The function \f$f\f$ can greatly influence the results. 
The specific likelihood functions used in this example were based on 
Gaussian probability density functions. This means that we assume 
the difference between the model (e.g. computer simulation)
and the experimental observations are Gaussian: 

\f[
d_i = \mathcal{M}(\theta) + \epsilon_i,
\f]
where \f$\epsilon_i\f$ is a random variable that can encompass both
measurement errors on \f$d_i\f$ and modeling errors associated with the
simulation \f$\mathcal{M}(\theta)\f$. We further 
assume that all experiments and observations are independent. 
If we have \f$n\f$ observations,  
the probabilistic model defined by Eq. (TODO BAD REF ref eq:model) results in a
likelihood function for \f$\theta\f$ that is the product of \f$n\f$ normal
probability density functions
as shown in Equation \ref eqnLikelihood.

\anchor eqnLikelihood
\f[
\mathcal{L}({\theta};d) = \prod_{i=1}^n
\frac{1}{\sigma \sqrt{2\pi}} \exp
\left[ - \frac{\left(d_i-\mathcal{M}({\theta})\right)^2}{2\sigma^2} \right]
\mathcal{L}\left( {{\theta};d} \right) = \prod\limits_{i = 1}^n {\frac{1}{{\sigma \sqrt {2\pi } }}  \exp \left[  - \frac{\left(d_i - \mathcal{M}({\theta})\right)^2}{2\sigma^2} \right] }
\f]

Markov Chain Monte Carlo (MCMC) is the standard method used to compute 
posterior parameter densities, given the observational data 
and the priors. There are many references that 
describe the basic algorithm \cite Gilks, and in addition, the algorithms 
are an active research area.  One variation used in Dakota is DRAM: 
Delayed Rejection and Adaptive Metropolis \cite Haario. Note that 
MCMC algorithms take tens or hundreds of thousands of steps to converge. 
Since each iteration involves an evaluation of the model 
\f$\mathcal{M}(\theta)\f$, often surrogate models of the simulation 
model are employed.
 
As mentioned above, we have three implementations of a Bayesian 
calibration:  one called QUESO, one called DREAM, and one called GPMSA. 
They are specified with the \c bayes_calibration \c queso
or \c bayes_calibration \c dream or 
\c bayes_calibration \c gpmsa, respectively.
The QUESO method uses components from the QUESO library
(Quantification of Uncertainty for Estimation, Simulation, and
Optimization) developed at The University of Texas at Austin.
DREAM uses the DREAM code developed by John Burkardt. 
It is based on the DiffeRential Evolution Adaptive
Metropolis approach which runs multiple different chains simultaneously
for global exploration, and automatically tunes the proposal covariance
during the process by a self-adaptive randomized subspace sampling
\cite Vrugt.
The GPMSA calibration capability uses the GPMSA code developed at 
Los Alamos National Laboratory.

In the QUESO method, the user can run the MCMC sampling with 
the simulation model \f$\mathcal{M}(\theta)\f$ directly. However, 
if the model is expensive, we recommend that the user employs 
a surrogate model (an emulator) because the Monte Carlo Markov Chain 
will be much faster:  the MCMC can generate thousands of samples 
on the emulator more quickly. One can specify a Gaussian process,
a polynomial chaos expansion or a stochastic collocation 
as the emulator for the \c queso method. The specification 
details for these are listed in the Reference Manual. 
One can also specify various settings for the MCMC DRAM sampling: 
the sampling can use a standard Metropolis-Hastings algorithm 
or the adaptive Metropolis in which the covariance of the proposal 
density is updated adaptively. There is also a setting to control 
the delayed rejection. Finally, there are two scale factors 
which control the scaling of the problem. The 
\c likelihood_scale is a number 
which scales the likelihood by dividing
the log of the likelihood (e.g. dividing the sum of squared differences
between the experimental data and simulation data or SSE).  This
is useful for situations with very small likelihoods (e.g. the model is either
very far away from the data or there is a lot of data so the likelihood function
involves multiplying many likelihoods together, where the SSE term is large
and the likelihood becomes very small).
In some respects, the \c likelihood_scale can be seen as a normalizing factor
for the SSE.  If the SSE is large, the likelihood scale should be large.
The second factor is a \c proposal_covariance_scale 
which is a vector that controls the scaling of the proposal covariance
in the different input directions. This may be useful when the 
input variables being calibrated are of different magnitudes:  
one may want to take a larger step in a direction
with a larger magnitude, for example.
    
For the DREAM method, one can define the number of chains used with
\c chains.  The total number of generations per chain in DREAM is
the number of samples divided by the number of chains.
The minimum number of chains is three.
The number of chains randomly selected to be used in the crossover
each time a crossover occurs is \c crossover_chain_pairs.
There is an extra adaptation during burn-in, in which DREAM estimates a
distribution of crossover probabilities that favors large jumps over
smaller ones in each of the chains.
Normalization is required to ensure that all of the input dimensions contribute
equally.  In this process, a discrete number of candidate points for
each crossover value is generated.  This parameter is \c num_cr.
The \c gr_threshold is the convergence tolerance for the Gelman-Rubin
statistic which will govern the convergence of the multiple chain
process.  The integer \c jump_step forces a long jump every 
\c jump_step generations.
For more details about these parameters, see \cite Vrugt. 

GPMSA is another code that provides the capability for Bayesian 
calibration.
A key part of GPMSA is the construction of an emulator from simulation runs 
collected at various settings of input parameters. The emulator is a 
statistical model of the system response, and it is used to incorporate 
the observational data to improve system predictions and constrain or 
calibrate the unknown parameters. The GPMSA code draws heavily 
on the theory developed in the seminal Bayesian calibration paper 
by Kennedy and O'Hagan \cite Kenn01. The particular approach developed 
by the Los Alamos group is provided in \cite Hig08. GPMSA uses 
Gaussian process models in the emulation, but the emulator is 
actually a set of basis functions (e.g. from a singular value 
decomposition) which have GPs as the coefficients. One major 
difference between GPMSA and the QUESO implementation in Dakota 
is that the QUESO implementation does not have an explicit 
``discrepancy'' function \f$\delta\f$ which models the difference between 
the simulation and the observational data results in addition 
to the error term \f$\epsilon\f$, but GPMSA has a sophisticated 
model for the discrepancy term. 
At this point, the GPMSA implementation in Dakota is an early 
prototype. 
 
%At this point, the GPMSA library is a standalone C++ library which has 
%its own methods to create Gaussian process models, perform MCMC updating, 
%etc. The GPMSA C++ library is an alpha-version and undergoing development, 
%so a user is cautioned to obtain the latest version (e.g. Version-of-the-Day)
%to have the latest updates. We expect that future Dakota-GPMSA integration 
%will involve more interoperability and sharing of optimization 
%algorithms and surrogate models between Dakota and GPMSA, for example. 

We briefly describe the process of running QUESO from Dakota.
The user will create a Dakota input file such as the one shown 
in \ref figUQ18. 
Note that the method is \c bayes_calibration \c queso, 
specifying the QUESO algorithm. The number of 
samples indicates the number of samples that the MCMC algorithm 
will take, in this case 5000 (this usually will need to be larger). 
For this example, we are using the \c text_book analytic 
example, so we do not need to specify an emulator, but the lines 
commented out give an idea of the options if the user wanted to 
specify an emulator. This example is using the full DRAM (delayed
rejection adaptive metropolis). The likelihood is scaled, but the 
proposal covariance is not unless the user uncomments that line. 
The calibration terms in the responses section refers to the 
number of outputs that will be used in the calibration process:  
in this case, it is just one. The calibration data file 
has the observational data:  in this case, it is a freeform file 
(e.g. no header or annotation) with ten experiments. For each 
experiment, there is one standard deviation value indicating the 
error associated with that experiment. 

\anchor figUQ18
\verbatim
strategy,
        single_method
        tabular_data

method,
        bayes_calibration queso
#       emulator
#       gp
#        emulator_samples = 50
#       points_file = 'ros.txt' freeform
#        pce 
#        sparse_grid_level = 3 
          samples = 5000 #seed = 348                                    
          rejection delayed
          metropolis adaptive
          likelihood_scale = 100.0 
          output verbose
          #proposal_covariance_scale = 0.1 0.3
 
variables,
       continuous_design = 2
         lower_bounds = 0. 0.
         upper_bounds = 3. 3.
         initial_point = 1. 1.

interface,
         direct 
           analysis_driver = 'text_book'

responses,
        calibration_terms = 1
        calibration_data_file = 'dakota_queso.test10.txt'
          freeform
          num_experiments = 1
          num_replicates = 10 
          num_std_deviations = 1
        no_gradients
        no_hessians
\endverbatim
-->
<!--
\caption{Dakota input file for UQ example using Bayesian Calibration}
-->
<!--
When the input file shown in \ref figUQ18 is run, 
Dakota will run the MCMC algorithm and generate a posterior sample of 
\f$\theta\f$ in accordance with Bayes Theorem \ref eqnBayesThm and 
the likelihood function \ref eqnLikelihood. The MCMC sample output 
is put into a directory called \c outputData in the directory 
from which Dakota is run. In addition, the MCMC sample chain is 
written to a file in the run directory called QuesoOutput.txt. 
The first columns of this file are the sample inputs, the next columns 
are the responses, and the final column is the  log of the 
likelihood.

We expect to continue development of Bayesian calibration methods, 
so check for updates to this capability. The QUESO and GPMSA capabilities
in Dakota currently rely on the QUESO library developed by The University of 
Texas at Austin. This integrated capability is still in prototype form and 
available to close collaborators of the Dakota team.  DREAM is distributed 
with Dakota.
If you are interested in these capabilities, contact the Dakota developers at  
dakota-developers@development.sandia.gov.
-->