`simJM.Rd`

Simulates longitudinal data with normal error and (Cox-type) survival times
using the inversion method. The function `simJM()`

is a wrapper specifying
all predictors and the resulting data sets. The wrapper calls `rJM()`

to sample
the survival times, a modified version of `rSurvtime()`

from the R package
CoxFlexBoost.

```
simJM(nsub = 300, times = seq(0, 120, 1), probmiss = 0.75,
long_setting = "functional",
alpha_setting = if(nonlinear) "linear" else "nonlinear",
dalpha_setting = "zero", sigma = 0.3, long_df = 6, tmax = NULL,
seed = NULL, full = FALSE, file = NULL, nonlinear = FALSE,
fac = FALSE)
rJM(hazard, censoring, x, r,
subdivisions = 1000, tmin = 0, tmax,
file = NULL, ...)
```

- nsub
number of individuals for which longitudinal data and survival times should be simulated.

- times
vector of time points at which longitudinal measurements are "sampled".

- probmiss
proportion of longitudinal measurements to be set to missing. Used to induce sparsity in the longitudinal measurements.

- long_setting
Specification of the longitudinal trajectories of the sampled subjects. Preset specifications are

`"linear"`

,`"nonlinear"`

and`"functional"`

. See Details.- alpha_setting
specification of the association between survival and longitudinal. Preset specifications are

`"simple"`

,`"linear"`

,`"nonlinear"`

and`"nonlinear2"`

. See Details.- dalpha_setting
specification of the association between survival and the derivative of the longitudinal. Work in progress.

- sigma
standard deviation of the normal error around the true longitudinal measurements.

- long_df
number of basis functions from which functional random intercepts are sampled.

- tmax
For function

`simJM()`

, longest possible survival time, observations are censored after that timepoint. Defaults to`max(times)`

and should not be specified longer than`max(times)`

for longitudinal setting "functional". For function`rJM()`

, latest time point to sample a survival time.- seed
numeric scalar setting the random seed.

- full
logical indicating if only the longitudinal data set should be returned (

`FALSE`

) or additionally also the data for the survival part evaluated on a regular time grid and the longitudinal data set without longitudinal missings (`TRUE`

).- file
name of the data file the generated data set should be stored into (e.g., "simdata.RData") or NULL if the dataset should directly be returned in R.

- nonlinear
If set to

`TRUE`

, a nonlinear interaction between`alpha`

and`mu`

is simulated.- fac
If set to

`TRUE`

, a smooth interaction that varies by a factor is simulated.- hazard
complete hazard function to specify the joint model. Time must be the first argument.

- censoring
function to compute (random) censoring.

- x
matrix of sampled covariate values.

- r
matrix of sampled random coefficients.

- subdivisions
the maximum number of subintervals for the integration.

- tmin
earliest time point to sample a survival time.

- ...
further arguments to be passed to

`hazard`

or`censoring`

.

The function simulates longitudinal data basing on the given specification at given `times`

.
The full hazard is built from all joint model predictors \(\eta_{\mu}\), \(\eta_{\sigma}\),
\(\eta_{\lambda}\), \(\eta_{\gamma}\), \(\eta_{\alpha}\) as presented in
Koehler, Umlauf, and Greven (2016), see also `jm_bamlss`

. Survival times are sampled using the inversion
method (cf. Bender, Augustin, & Blettner, 2005). Additional censoring and missingness is
introduced. The longitudinal information is censored according to the survival information. The
user can also specify own predictors and use only `rJM`

to simulate survival times
accordingly.

Pre-specified functions for \(\eta_{\mu}\) in `long_setting`

are for `linear`

$$\eta_{\mu i}(t) = 1.25 + r_{1i} + 0.6 \sin(x_{2i}) + (-0.01) t + 0.02 r_{2i} t$$,
for `nonlinear`

$$\eta_{\mu i}(t) = 0.5 + r_{1i} + 0.6 \sin(x_{2i}) + 0.1 (t+1) \exp(-0.075 t)$$
and for `functional`

$$\eta_{\mu i}(t) = 0.5 + r_{1i} + 0.6 \sin(x_{2i}) + 0.1 (t+1) \exp(-0.075 t) + \sum_k \beta_{ki} B(t)$$,
where \(B(.)\) denotes a B-spline basis function and \(\beta_{ki}\) are the sampled penalized
coefficients from `gen_b`

per person.

Prespecified functions for \(\eta_{\alpha}\) in `alpha_setting`

are for `constant`

$$\eta_{\alpha}(t) = 1$$, for `linear`

$$\eta_{\alpha}(t) = 1 - 0.015 t$$, for
`nonlinear`

$$\eta_{\alpha}(t) = \cos((time-20)/20)$$, and for `nonlinear`

$$\eta_{\alpha}(t) = \cos((time-33)/33)$$.

Additionally the fixed functions for \(\eta_{\lambda} = 0.1(t+2)\exp(-0.075t)\) and \(\eta_{\lambda} = 0.1(t+2)\exp(-0.075t)\) are employed.

For `full = TRUE`

a list of the three `data.frame`

s is returned:

- data
Simulated dataset in long format including all longitudinal and survival covariates.

- data_grid
Dataset of the time-varying survival predictors which are not subject specific, evaluated at a grid of fixed time points.

- data_full
Simulated data set prior to generating longitudinal missings. Useful to assess the longitudinal fit.

For `full = FALSE`

only the first dataset is returned.

Covariates within these datasets include a subject identifier `id`

, the sampled survival
times `survtime`

, the event indicator `event`

, the time points of longitudinally
"observed" measurements `obstime`

, the longitudinal response `y`

, the cumulative
hazard at the survival time `cumhaz`

, as well as covariates `x1, x2`

, random effects

`r1, r2, b1, ...`

, and the true predictors `alpha, lambda, gamma, mu, sigma`

.

Hofner, B (2016). CoxFlexBoost: Boosting Flexible Cox Models (with Time-Varying Effects). R package version 0.7-0.

Bender, R., Augustin, T., and Blettner, M. (2005).
Generating Survival Times to Simulate Cox Proportional Hazards Models.
*Statistics in Medicine*, **24**, 1713-1723.

Koehler N, Umlauf N, Beyerlein, A., Winkler, C., Ziegler, A., and Greven S (2016). Flexible Bayesian Additive Joint Models with an
Application to Type 1 Diabetes Research. *(submitted)*

```
if (FALSE) ## Simulate survival data
## with functional random intercepts and a nonlinear effect
## of time, time-varying association alpha.
d <- simJM(nsub = 300)
head(d)
#> Error in head(d): object 'd' not found
## Simulate survival data
## with random intercepts/slopes and a linear effect of time,
## constant association alpha.
d <- simJM(nsub = 200, long_setting = "linear",
alpha_setting = "constant")
head(d)
#> id survtime event x1 x2 x3 r1 r2 b1
#> 1 1 67.63766 0 0.5462693 -2.379859 1 0.04729362 0.318103 -1.298252
#> 1.8 1 67.63766 0 0.5462693 -2.379859 1 0.04729362 0.318103 -1.298252
#> 1.12 1 67.63766 0 0.5462693 -2.379859 1 0.04729362 0.318103 -1.298252
#> 1.13 1 67.63766 0 0.5462693 -2.379859 1 0.04729362 0.318103 -1.298252
#> 1.24 1 67.63766 0 0.5462693 -2.379859 1 0.04729362 0.318103 -1.298252
#> 1.34 1 67.63766 0 0.5462693 -2.379859 1 0.04729362 0.318103 -1.298252
#> b2 b3 b4 b5 b6 cumhaz obstime dalpha
#> 1 -0.768504 -0.396014 -0.2152782 0.2827463 0.439171 1.125407 0 0
#> 1.8 -0.768504 -0.396014 -0.2152782 0.2827463 0.439171 1.125407 8 0
#> 1.12 -0.768504 -0.396014 -0.2152782 0.2827463 0.439171 1.125407 12 0
#> 1.13 -0.768504 -0.396014 -0.2152782 0.2827463 0.439171 1.125407 13 0
#> 1.24 -0.768504 -0.396014 -0.2152782 0.2827463 0.439171 1.125407 24 0
#> 1.34 -0.768504 -0.396014 -0.2152782 0.2827463 0.439171 1.125407 34 0
#> mu lambda alpha gamma dmu sigma y
#> 1 0.8831874 0.3802428 1 -4.938723 -0.003637939 -1.203973 0.5016701
#> 1.8 0.8540839 0.3802428 1 -4.938723 -0.003637939 -1.203973 1.1900108
#> 1.12 0.8395321 0.3802428 1 -4.938723 -0.003637939 -1.203973 1.3382066
#> 1.13 0.8358942 0.3802428 1 -4.938723 -0.003637939 -1.203973 0.6504030
#> 1.24 0.7958768 0.3802428 1 -4.938723 -0.003637939 -1.203973 0.6490512
#> 1.34 0.7594974 0.3802428 1 -4.938723 -0.003637939 -1.203973 0.8872736
```