`bamlss.Rd`

This is the main model fitting function of the package. Function `bamlss()`

is a wrapper function that parses the `data`

and the model `formula`

, or
extended `bamlss.formula`

, as well as the `bamlss.family`

into a `bamlss.frame`

. The `bamlss.frame`

then holds all model
matrices and information that is needed for setting up estimation engines.
The model matrices are based on `mgcv`

infrastructures, i.e.,
smooth terms are constructed using `smooth.construct`

and
`smoothCon`

. Therefore, all `mgcv`

model term constructors like
`s`

, `te`

, `t2`

and `ti`

can be used. Identifiability conditions are imposed using function `gam.side`

.

After the `bamlss.frame`

is set up function `bamlss()`

applies optimizer
and/or sampling functions. These functions can also be provided by the user. See the details
below on how to create new engines to be used with function `bamlss()`

.

Finally, the estimated parameters and/or samples are used to create model output results like summary statistics or effect plots. The computation of results may also be controlled by the user.

```
bamlss(formula, family = "gaussian", data = NULL,
start = NULL, knots = NULL, weights = NULL,
subset = NULL, offset = NULL, na.action = na.omit,
contrasts = NULL, reference = NULL, transform = NULL,
optimizer = NULL, sampler = NULL, samplestats = NULL,
results = NULL, cores = NULL, sleep = NULL,
combine = TRUE, model = TRUE, x = TRUE,
light = FALSE, ...)
```

- formula
A formula or extended formula, i.e., the

`formula`

can be a`list`

of formulas where each list entry specifies the details of one parameter of the modeled response distribution, see`bamlss.formula`

. For incorporating smooth terms, all model term constructors implemented in`mgcv`

such as`s`

,`te`

and`ti`

can be used, amongst others.- family
A

`bamlss.family`

object, specifying the details of the modeled distribution such as the parameter names, the density function, link functions, etc. Can be a character without the`"_bamlss"`

extension of the`bamlss.family`

name.- data
A

`data.frame`

or`list`

containing the model response variable(s) and covariates specified in the`formula`

. By default the variables are taken from`environment(formula)`

: typically the environment from which`bamlss`

is called.- start
A named numeric vector containing starting values to be send to the

`optimizer`

and/or`sampler`

function. For a possible naming convention for the parameters see function`parameters`

, but this is not restrictive and engine specific.- knots
An optional list containing user specified knots, see the documentation of function

`gam`

.- weights
Prior weights on the data.

- subset
An optional vector specifying a subset of observations to be used in the fitting process.

- offset
Can be used to supply model offsets for use in fitting.

- na.action
A function which indicates what should happen when the data contain

`NA`

's. The default is set by the`na.action`

setting of`options`

, and is`na.omit`

if set to`NULL`

.- contrasts
An optional list. See the

`contrasts.arg`

of`model.matrix.default`

.- reference
A

`character`

specifying a reference category, e.g., when fitting a multinomial model.- transform
A transformer function that is applied on the

`bamlss.frame`

. See, e.g., function`randomize`

and`bamlss.engine.setup`

.- optimizer
An optimizer function that returns, e.g., posterior mode estimates of the parameters as a named numeric vector. The default optimizer function is

`opt_bfit`

. If set to`FALSE`

, no optimizer function will be used.- sampler
A sampler function that returns a matrix of samples, the columns represent the parameters, the rows the iterations. The returned matrix must be coerced to an object of class

`"mcmc"`

, see`as.mcmc`

. The default sampler function is`sam_GMCMC`

. If set to`FALSE`

, no sampler function will be used.- samplestats
A function computing statistics from samples, per default function

`samplestats`

is used. If set to`FALSE`

, no`samplestats`

function will be used. Note that this option is crucial for very large datasets, as computing statistics from samples this way may be very time consuming!- results
A function computing results from the parameters and/or samples, e.g., for creating effect plots, see function

`link{results.bamlss.default}`

. If set`FALSE`

no`results`

function will be used.- cores
An integer specifying the number of cores that should be used for the sampler function. This is based on function

`mclapply`

of the parallel package.- sleep
Time the system should sleep before the next core is started.

- combine
If samples are computed on multiple cores, should the samples be combined into one

`mcmc`

matrix?- model
If set to

`FALSE`

the model frame used for modeling is not part of the return value.- x
If set to

`FALSE`

the model matrices are not part of the return value.- light
Should the returned object be lighter, i.e., if

`light = TRUE`

the returned object will not contain the model.frame and design and penalty matrices are deleted.- ...
Arguments passed to the

`transformer`

,`optimizer`

,`sampler`

,`results`

and`samplestats`

function.

The main idea of this function is to provide infrastructures that make it relatively easy to create estimation engines for new problems, or write interfaces to existing software packages.

The steps that are performed within the function are:

First, the function parses the

`data`

, the`formula`

or the extended`bamlss.formula`

as well as the`bamlss.family`

into a model frame like object, the`bamlss.frame`

. This object holds all necessary model matrices and information that is needed for subsequent model fitting engines. Per default, all package`mgcv`

smooth term constructor functions like`s`

,`te`

,`t2`

and`ti`

can be used (see also function`smooth.construct`

), however, even special user defined constructors can be included, see the manual of`bamlss.frame`

.In a second step, the

`bamlss.frame`

can be transformed, e.g., if a mixed model representation of smooth terms is needed, see function`randomize`

.Then an optimizer function is started, e.g., a function that finds posterior mode estimates of the parameters. A convention for model fitting engines is that such functions should have the following arguments:

`optimizer(x, y, family, start, weights, offset, ...)`

Internally, function

`bamlss()`

will send the`x`

object that holds all model matrices, the response`y`

object, the`family`

object,`start`

ing values for the parameters, possible`weights`

and`offset`

s of the created`bamlss.frame`

to the optimizer function (see the manual of`bamlss.frame`

for more details on the`x`

,`y`

and other objects). The job of the optimizer is to return a named numeric vector of optimum parameters. The names of the parameters should be such that they can be uniquely mapped to the corresponding model matrices in`x`

. See function`parameters`

for more details on parameter names. The default optimizer function is`bfit`

. The optimizer can return more information than only the optimum parameters. It is possible to return a list, the convention here is that an element named`"parameters"`

then holds the named vector of estimated parameters. Possible other return values could be fitted values, the Hessian matrix, information criteria or information about convergence of the algorithm, etc. Note that the parameters are simply added to the`bamlss.frame`

in an (list) entry named`parameters`

.After the optimization step, a

`sampler`

function is started. The arguments of such sampler functions are the same as for the`optimizer`

functions`sampler(x, y, family, start, weights, offset, ...)`

Sampler functions must return a matrix of samples, each row represents one iteration and the matrix can be coerced to

`mcmc`

objects. The function may return a list of samples, e.g., if multiple chains are returned each list entry then holds one sample matrix of one chain. The column names of the sample matrix should be the same as the names of estimated parameters. For a possible naming convention see function`parameters`

, which ensures unique mapping of samples with the model matrices in the`x`

object of the`bamlss.frame`

. The samples are added to the`bamlss.frame`

in an (list) entry named`samples`

.Next, the

`samplestats`

function is applied. This function can compute any quantity from the samples and the`x`

object, the arguments of such functions are`samplestats(samples, x, y, family, ...)`

where argument

`samples`

are the samples returned from the`sampler`

function, and`x`

,`y`

and`family`

are the same objects as passed to the optimizer and or sampler functions. For example, the default function in`bamlss()`

for this task is also called`samplestats`

and returns the mean of the log-likelihood and the log-posterior computed of all samples, as well as the DIC.The last step is to compute more complex information about the model using the

`results`

function. The arguments of such`results`

functions are`results(bamlss.frame, ...)`

here, the full

`bamlss.frame`

including possible`parameters`

and`samples`

is passed to the function within`bamlss()`

. The default function for this task is`results.bamlss.default`

which returns an object of class`"bamlss.results"`

for which generic plotting functions are and a`summary`

function is provided. Hence, the user can control the output of the model, the plotting and summary statistics, too.

Note that function `transform()`

, `optimizer()`

, `sampler()`

, `samplestats()`

and `results()`

can be provided from the `bamlss.family`

object, e.g.,
if a `bamlss.family`

object has an element named `"optimizer"`

, which
represents a valid optimizer function such as `bfit`

, exactly this optimizer
function will be used as a default when using the family.

An object of class `"bamlss"`

. The object is in principle only a slight extension
of a `bamlss.frame`

, i.e., if an `optimizer`

is applied it will hold the
estimated parameters in an additional element named `"parameters"`

. If a sampler function
is applied it will additionally hold the samples in an element named `"samples"`

.
The same mechanism is used for `results`

function.

If the `optimizer`

function computes additional output next to the parameters, this will
be saved in an element named `"model.stats"`

. If a `samplestats`

function is applied,
the output will also be saved in the `"model.stats"`

element.

Additionally, all functions that are called are saved as attribute `"functions"`

in the
returned object.

Umlauf N, Klein N, Zeileis A (2019). BAMLSS: Bayesian Additive Models for Location,
Scale and Shape (and Beyond). *Journal of Computational and Graphical Statistics*, **27**(3), 612--627.
doi: 10.1080/10618600.2017.1407325

Umlauf N, Klein N, Simon T, Zeileis A (2021).
bamlss: A Lego Toolbox for Flexible Bayesian Regression (and Beyond).
*Journal of Statistical Software*,
**100**(4), 1--53. doi: 10.18637/jss.v100.i04

```
if (FALSE) ## Simulated data example.
d <- GAMart()
f <- num ~ s(x1) + s(x2) + s(x3) + te(lon, lat)
b <- bamlss(f, data = d)
#> Error in bamlss(f, data = d): object 'd' not found
summary(b)
#> Error in summary(b): object 'b' not found
plot(b)
#> Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'plot': object 'b' not found
plot(b, which = 3:4)
#> Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'plot': object 'b' not found
plot(b, which = "samples")
#> Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'plot': object 'b' not found
## Use of optimizer and sampler functions:
## * first run optimizer,
b1 <- bamlss(f, data = d, optimizer = opt_bfit, sampler = FALSE)
#> Error in bamlss(f, data = d, optimizer = opt_bfit, sampler = FALSE): object 'd' not found
print(b1)
#> Error in print(b1): object 'b1' not found
summary(b1)
#> Error in summary(b1): object 'b1' not found
## * afterwards, start sampler with staring values,
b2 <- bamlss(f, data = d, start = coef(b1), optimizer = FALSE, sampler = sam_GMCMC)
#> Error in bamlss(f, data = d, start = coef(b1), optimizer = FALSE, sampler = sam_GMCMC): object 'd' not found
print(b2)
#> Error in print(b2): object 'b2' not found
summary(b2)
#> Error in summary(b2): object 'b2' not found
## Continue sampling.
b3 <- continue(b2, n.iter = 12000, burnin = 0, thin = 10)
#> Error in continue(b2, n.iter = 12000, burnin = 0, thin = 10): object 'b2' not found
plot(b3, which = "samples")
#> Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'plot': object 'b3' not found
plot(b3, which = "max-acf")
#> Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'plot': object 'b3' not found
plot(b3, which = "max-acf", burnin = 500, thin = 4)
#> Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'plot': object 'b3' not found
```