---
title: "aggregate-estimands"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{aggregate-estimands}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment  = "#>",
  fig.width = 8,
  fig.height = 5,
  message = FALSE,
  warning = FALSE
)
library(ggplot2); library(dplyr); library(tidyr)
data.table::setDTthreads(1L)
options(dplyr.summarise.inform = FALSE, scipen = 999, digits = 5)
theme_set(theme_bw(base_size = 12))
```

> **Notation.** For symbol definitions, see the [notation vignette](notation.html).

## Overview

`multiple_treatment_group_analysis()` estimates child-penalty effects separately for each treatment group $d$ (age at first birth). Because groups differ in their baseline earnings, age composition, and selection, a single group's estimate is not directly comparable to another's. `aggregate_estimands()` collapses the group-specific estimates into a single number per event time, answering the question: *what is the average effect across the distribution of first-birth ages?*

Three aggregate estimands are available:

- **avg_of_ratios** ($\theta_{\text{Agg},1}$): weighted average of each group's normalized effect $\theta(g,d,d+e)$. Formally, $\theta_{\text{Agg},1}(g,e) = \sum_d w_d \cdot \theta(g,d,d+e)$, where $w_d$ are population weights normalised to sum to one. Preferred because normalization happens within each group before aggregation, so baseline differences do not inflate the average.
- **ratio_of_avgs** ($\theta_{\text{Agg},2}$): ratio of the weighted-average ATE to the weighted-average APO. Formally, $\theta_{\text{Agg},2}(g,e) = \sum_d w_d \cdot \text{ATE}(g,d,d+e) \;/\; \sum_d w_d \cdot \text{APO}(g,d,d+e)$. This implicitly up-weights high-earning groups (their larger APO contributes proportionally more to the pooled denominator), so it differs from $\theta_{\text{Agg},1}$ when earnings vary across groups.
- **gender_ineq** ($\Delta\rho_{\text{Agg}}$): weighted average of `NTD_New` $\Delta\rho$ estimates — the aggregate gender-inequality penalty. Formally, $\Delta\rho_{\text{Agg}}(e) = \sum_d w_d \cdot \Delta\rho(d,d+e)$.

## Simulate and estimate

We simulate 2 000 individuals spanning treatment groups 24–30 (the extra groups serve as clean controls for the youngest treated cohorts), then analyse groups 24–27 with two post-treatment periods.

```{r simulate_and_estimate}
library(childpen)

data <- simulate_data(n_individuals = 2000, treatment_groups = 24:30)

res <- multiple_treatment_group_analysis(
  data             = data,
  treatment_groups = 24:27,
  periods_post     = 2,
  periods_pre      = NULL,
  verbose          = FALSE
)
```

## Basic aggregation

Calling `aggregate_estimands()` with default arguments applies sample-proportion weights (`weights = "sample"`), where each treatment group is weighted by its share of the sample, as in Leventer (2025). Standard errors account for the estimation of these weights via the influence-function formula in the paper's Appendix G. Pass `weights = NULL` for uniform weights.

```{r basic_agg}
agg <- aggregate_estimands(res)
head(agg)
```

The output has one row per `event_time × estimand × method × agg_type` combination. The `n_groups` column records how many treatment groups contributed — useful for spotting cells where edge-of-sample groups drop out due to `max_age`/`min_age` bounds.

## Plot aggregate results

Plot the `avg_of_ratios` aggregate for the `NTD_Conv` method across event times, with 95% confidence ribbons.

```{r plot_avg_of_ratios}
agg |>
  filter(agg_type == "avg_of_ratios", method == "NTD_Conv", estimand == "theta") |>
  ggplot(aes(x = event_time, y = est, ymin = ci_l, ymax = ci_h)) +
  geom_ribbon(fill = "steelblue", alpha = 0.25, color = NA) +
  geom_line(color = "steelblue") +
  geom_point(color = "steelblue") +
  geom_hline(yintercept = 0, linetype = "dashed", color = "grey40") +
  labs(
    x     = "Event time",
    y     = expression(theta[Agg * "," * 1] ~ "(avg of ratios)"),
    title = "NTD_Conv aggregate: avg_of_ratios"
  )
```

## Compare aggregation types

`avg_of_ratios` and `ratio_of_avgs` answer the same underlying question but weight groups differently. When high first-birth-age groups have higher counterfactual earnings than low first-birth-age groups, `ratio_of_avgs` assigns them more weight (the APO enters the denominator for each group before the ratio is formed across groups). The two estimands therefore diverge whenever earnings are heterogeneous across groups.

```{r compare_agg_types}
agg |>
  filter(
    agg_type %in% c("avg_of_ratios", "ratio_of_avgs"),
    method   == "DID_Female",
    estimand == "theta"
  ) |>
  ggplot(aes(
    x     = event_time,
    y     = est,
    ymin  = ci_l,
    ymax  = ci_h,
    color = agg_type,
    fill  = agg_type
  )) +
  geom_ribbon(alpha = 0.2, color = NA) +
  geom_line() +
  geom_point() +
  geom_hline(yintercept = 0, linetype = "dashed", color = "grey40") +
  labs(
    x     = "Event time",
    y     = expression(theta ~ "estimate"),
    title = "DID_Female: avg_of_ratios vs ratio_of_avgs",
    color = "Aggregation type",
    fill  = "Aggregation type"
  ) +
  theme(legend.position = "bottom")
```

If the two lines are similar, earnings are approximately homogeneous across groups. Divergence is a sign that the treatment-group distribution interacts with baseline earnings levels, and the researcher must decide which weighting regime is more policy-relevant.

## Custom weights

By default treatment groups are weighted by their sample proportions. Researchers may instead want to weight groups by their empirical share in a target population. Below we contrast two stylized distributions:

- **Right-skewed** (mimicking the US): most first births concentrate at younger ages.
- **Left-skewed** (mimicking Southern Europe): most first births concentrate at older ages.

```{r custom_weights}
groups <- as.character(24:27)

# Right-skewed: more weight on younger groups
w_us <- setNames(c(0.40, 0.30, 0.20, 0.10), groups)

# Left-skewed: more weight on older groups
w_se <- setNames(c(0.10, 0.20, 0.30, 0.40), groups)

agg_us <- aggregate_estimands(res, weights = w_us)
agg_se <- aggregate_estimands(res, weights = w_se)

bind_rows(
  mutate(agg_us, population = "US (right-skewed)"),
  mutate(agg_se, population = "Southern Europe (left-skewed)")
) |>
  filter(
    agg_type == "avg_of_ratios",
    method   == "NTD_Conv",
    estimand == "theta"
  ) |>
  ggplot(aes(
    x     = event_time,
    y     = est,
    ymin  = ci_l,
    ymax  = ci_h,
    color = population,
    fill  = population
  )) +
  geom_ribbon(alpha = 0.2, color = NA) +
  geom_line() +
  geom_point() +
  geom_hline(yintercept = 0, linetype = "dashed", color = "grey40") +
  labs(
    x     = "Event time",
    y     = expression(theta[Agg * "," * 1]),
    title = "Aggregate NTD_Conv estimate under different first-birth distributions",
    color = "Population weights",
    fill  = "Population weights"
  ) +
  theme(legend.position = "bottom")
```

When younger groups bear larger penalties than older groups (or vice versa), the choice of weight vector shifts the aggregate estimate. Reporting both provides a sensitivity check on how much the aggregate depends on the first-birth distribution.

## Gender inequality aggregate

The `gender_ineq` aggregate ($\Delta\rho_{\text{Agg}}$) is the weighted average of `NTD_New` $\Delta\rho$ estimates across treatment groups. It measures how parenthood affects the female-to-male earnings ratio: negative values indicate that mothers bear a larger proportional penalty — the female-to-male earnings ratio is lower post-birth than the counterfactual ratio.

```{r gender_ineq}
agg |>
  filter(agg_type == "gender_ineq", estimand == "Delta_rho", method == "NTD_New") |>
  ggplot(aes(x = event_time, y = est, ymin = ci_l, ymax = ci_h)) +
  geom_ribbon(fill = "tomato", alpha = 0.25, color = NA) +
  geom_line(color = "tomato") +
  geom_point(color = "tomato") +
  geom_hline(yintercept = 0, linetype = "dashed", color = "grey40") +
  labs(
    x     = "Event time",
    y     = expression(Delta * rho[Agg]),
    title = "Gender inequality aggregate (NTD_New, avg of ratios)"
  )
```

A flat, negative profile indicates a persistent parenthood-driven gender gap; values closer to zero at later event times suggest the gap narrows as parents age away from the birth event.