May 2024
This project explores the effects of misspecified likelihood functions in Bayesian statistical analysis. Through simulation studies, I compare how different specifications of the likelihood function affect posterior distributions and survival curve estimates.
The simulation includes implementations of several posterior sampling techniques, including Gibbs sampling and the Metropolis-Hastings algorithm, to demonstrate their performance under various conditions.
For this simulation study, I used the following approach:
The primary metrics for comparison included posterior mean estimates, credible intervals, and mean squared error between the estimated and true survival curves.
Below is a sample of the R code used to implement the Metropolis-Hastings algorithm for one of the model specifications:
# Metropolis-Hastings implementation for exponential likelihood
metropolis_hastings <- function(data, n_iter, proposal_sd, prior_mean, prior_sd) {
# Initialize
theta_current <- mean(data) # Starting value
theta_samples <- numeric(n_iter)
# Metropolis-Hastings algorithm
for (i in 1:n_iter) {
# Propose new value
theta_proposal <- rnorm(1, mean = theta_current, sd = proposal_sd)
# Skip negative proposals for rate parameter
if (theta_proposal <= 0) {
theta_samples[i] <- theta_current
next
}
# Calculate log posterior ratios
log_post_current <- sum(dexp(data, rate = theta_current, log = TRUE)) +
dnorm(theta_current, mean = prior_mean, sd = prior_sd, log = TRUE)
log_post_proposal <- sum(dexp(data, rate = theta_proposal, log = TRUE)) +
dnorm(theta_proposal, mean = prior_mean, sd = prior_sd, log = TRUE)
# Calculate acceptance probability
log_accept_prob <- log_post_proposal - log_post_current
# Accept or reject
if (log(runif(1)) < log_accept_prob) {
theta_current <- theta_proposal # Accept
}
# Store current value
theta_samples[i] <- theta_current
}
return(theta_samples)
}
The code for survival curve estimation:
# Calculate survival curves based on posterior samples
calculate_survival_curves <- function(theta_samples, time_points) {
n_samples <- length(theta_samples)
n_times <- length(time_points)
survival_curves <- matrix(0, nrow = n_samples, ncol = n_times)
# Calculate survival probability for each time point and each posterior sample
for (i in 1:n_samples) {
theta <- theta_samples[i]
survival_curves[i, ] <- exp(-theta * time_points)
}
# Calculate mean and credible intervals
mean_curve <- colMeans(survival_curves)
lower_ci <- apply(survival_curves, 2, quantile, probs = 0.025)
upper_ci <- apply(survival_curves, 2, quantile, probs = 0.975)
return(list(
time = time_points,
mean = mean_curve,
lower = lower_ci,
upper = upper_ci
))
}
The simulation study revealed several key findings:
Below is a visualization of posterior distributions under different likelihood specifications:
[Posterior distribution comparison plot would appear here]
Comparison of estimated survival curves:
[Survival curves comparison plot would appear here]
This simulation study demonstrates the importance of correctly specifying likelihood functions in Bayesian analysis. Key conclusions include:
Future work could explore more robust approaches to handling model misspecification in Bayesian analysis, including the use of non-parametric priors or mixture models to accommodate a wider range of data-generating processes.