Understanding the R Equivalent of JAGS’ “is Distributed As” Syntax

=====================================================

In this article, we’ll explore how to achieve a similar concept in R to what’s used in JAGS/BUGS for specifying distributions and estimating model parameters. We’ll delve into the details of the dmvnorm() function from the mvtnorm package, which allows us to specify multivariate normal distributions.

Background: Multivariate Normal Distribution

In probability theory, a multivariate normal distribution is a generalization of the one-dimensional normal distribution to higher dimensions. It’s commonly used in statistical modeling to represent complex relationships between multiple variables. The multivariate normal distribution has several useful properties, such as being conjugate prior for Gaussian processes and having closed-form solutions for maximum likelihood estimation.

R Equivalent: `dmvnorm()`

In R, the mvtnorm package provides an implementation of the multivariate normal distribution. The dmvnorm() function allows us to specify a multivariate normal distribution with a given mean vector and covariance matrix. The syntax is similar to what’s used in JAGS/BUGS for specifying distributions.

## Specify a multivariate normal distribution
p <- dmvnorm(0, sigma = diag(c(1, 2)))

In this example, sigma is a diagonal matrix where the variance of each variable is specified on the main diagonal. The 0 vector represents the mean of each variable.

Likelihood Calculation

When working with Bayesian methods or maximum likelihood estimation, it’s essential to calculate the likelihood function. For a multivariate normal distribution, the likelihood function can be calculated using the following formula:

Like = function(p){
  prod(dmvnorm(y, p, sigma))
}

This code calculates the likelihood of observing the data y given the parameters p. The dmvnorm() function computes the probability density function (PDF) of the multivariate normal distribution.

MLE Estimation

To estimate the maximum likelihood estimator (MLE), we can use numerical optimization techniques or analytical methods. For simplicity, let’s use a numerical approach in R.

p <- seq(0, 1, by = 0.01)
l <- Like(p)

plot(p, l, type = "l")

This code generates an array of possible values for the parameters p and calculates the corresponding likelihood function values. The resulting plot shows the maximum likelihood estimate (MLE) as a peak in the likelihood curve.

Bayesian Estimation using Metropolis-Hastings

For Bayesian methods, we can use the Metropolis-Hastings algorithm to sample from the posterior distribution. Here’s an example implementation:

MH <- function(N = 1000, p0 = runif(1)){
  log.like = function(p){
    sum(dmvnorm(y, p, sigma))
  }

  ll0 <- log.like(p0)
  r <- c(p0, rep(0, N))

  for(i in 1:N){
    p1 <- runif(1)
    ll1 <- log.like(p1)

    if(ll1 > ll0 || log(runif(1)) < ll1 - ll0){
      p0 <- p1
      ll0 <- ll1
    }

    r[i + 1] <- p0
  }

  return(r)
}

set.seed(123)
p <- MH(10000)

plot(density(p))
abline(v = c(mean(p), mean(p) + c(-1,1)*qnorm(0.975)*sd(p)))

This code implements the Metropolis-Hastings algorithm to sample from the posterior distribution. The resulting plot shows the kernel density estimate (KDE) of the sampled values along with credible intervals.

By following these steps, we’ve explored how to achieve a similar concept in R to what’s used in JAGS/BUGS for specifying distributions and estimating model parameters.

Last modified on 2024-02-04