The ATE.ERROR.XY function estimates the Average Treatment Effect (ATE) in the presence of both measurement error in covariates and misclassification in the binary outcome variable Y. It provides estimates of the ATE by employing simulations, bootstrap sampling, and extrapolation methods to account for these errors.
First, we generate our simulated data, which includes the observed outcome variable Y_star, which may be misclassified, and the covariate X_star, which is subject to measurement error.
library(ATE.ERROR)
set.seed(1)
data(Simulated_data)
Y_star <- Simulated_data$Y_star
Y <- Simulated_data$Y
A <- Simulated_data$T
Z <- Simulated_data$Z
X_star <- Simulated_data$X_star
X <- Simulated_data$X
p11 <- 0.8
p10 <- 0.2
sigma_epsilon <- 0.1
B <- 100
Lambda <- seq(0, 2, by = 0.5)
bootstrap_number <- 10In this section, we load the necessary libraries and data. The
probabilities p11 and p10 are set to 0.8 and
0.2, respectively. We define the measurement error variance
sigma_epsilon and set up the parameters for the number of
simulation steps (B), the sequence of lambda values
(Lambda), and the number of bootstrap samples
(bootstrap_number).
We apply the ATE.ERROR.XY function using different types
of extrapolation: linear, quadratic, and nonlinear. The results from
each extrapolation are stored in separate variables.
ATE.ERROR.XY_results_linear <- ATE.ERROR.XY(Y_star, A, Z, X_star, p11, p10, sigma_epsilon, 
                                            B, Lambda, extrapolation = "linear", 
                                            bootstrap_number)The True ATE is added to the result summary, and the columns are reordered to report the true ATE and the naive estimate for ATE:
combined_summary <- data.frame(True_ATE = round(True_ATE, 3), combined_summary)
print(combined_summary)
#>   True_ATE Naive_ATE_XY Sigma_epsilon p10 p11 Extrapolation   ATE    SE
#> 1    0.162        0.092           0.1 0.2 0.8        linear 0.149 0.004
#> 2    0.162        0.092           0.1 0.2 0.8     quadratic 0.153 0.012
#> 3    0.162        0.092           0.1 0.2 0.8     nonlinear 0.149 0.002
#>               CI
#> 1 (0.144, 0.157)
#> 2 (0.135, 0.168)
#> 3 (0.147, 0.153)This table summarizes the results from the ATE.ERROR.XY
function with different extrapolation methods. It includes the True ATE,
Naive ATE, measurement error variance sigma_epsilon,
probabilities p10 and p11, type of
extrapolation, ATE, a standard error (SE), and a 95% confidence interval
(CI).
We create a boxplot for the N estimates of ATE obtained from the
ATE.ERROR.XY function with different extrapolation
methods.
combined_data <- rbind(
  ATE.ERROR.XY_results_linear$boxplot$data,
  ATE.ERROR.XY_results_quadratic$boxplot$data,
  ATE.ERROR.XY_results_nonlinear$boxplot$data
)
combined_plot <- ggplot(combined_data, aes(x = Method, y = ATE, fill = Method)) +
  geom_boxplot() +
  geom_hline(aes(yintercept = Naive_ATE_XY, color = "naive estimate"), 
             linetype = "dashed") +
  geom_hline(aes(yintercept = True_ATE, color = "true estimate"), 
             linetype = "dashed") +
  scale_color_manual(name = NULL, values = c("naive estimate" = "red", 
                                             "true estimate" = "blue")) +
  labs(title = "ATE Estimates from the ATE.ERROR.XY Method with different 
       Approximations of the Extrapolation Function", 
       y = "ATE Estimate") +
  theme_minimal() +
  theme(legend.position = "right") +
  guides(fill = guide_legend(title = NULL, order = 1),
         color = guide_legend(title = NULL, override.aes = list(linetype = "dashed"),
                              order = 2))
print(combined_plot)Naive_Estimation
function calculates the naive estimate of the ATE without correcting for
measurement error or misclassification.ATE.ERROR.XY function corrects for measurement error and
misclassification, providing more accurate ATE estimates.This vignette provides a comprehensive overview of the
ATE.ERROR.XY function, demonstrating how to apply it,
interpret the results, and visualize the ATE estimates. The method
effectively addresses measurement error and misclassification in the
data.