[^updated]: Last updated: r format(Sys.time(), '%d %B, %Y')

\captionsetup[table]{labelformat=empty}

| Page| Variable | Label | |----:|:-------------|:---------------------------------------------| | \hyperlink{page.2}{2} | \hyperlink{page.2}{HISTID} |Historical unique identifier | | \hyperlink{page.3}{3} | \hyperlink{page.3}{byear} |Year of birth | | \hyperlink{page.4}{4} | \hyperlink{page.4}{bmonth} |Month of birth | | \hyperlink{page.5}{5} | \hyperlink{page.5}{dyear} |Year of death | | \hyperlink{page.6}{6} | \hyperlink{page.6}{dmonth} |Month of death | | \hyperlink{page.7}{7} | \hyperlink{page.7}{death_age} |Age at death (years) | | \hyperlink{page.8}{8} |\hyperlink{page.8}{sex} |Sex | | \hyperlink{page.9}{9} |\hyperlink{page.9}{race_first} |Race on First Application | | \hyperlink{page.10}{10} |\hyperlink{page.10}{race_first_cyear} |First Race: Application Year | | \hyperlink{page.11}{11} |\hyperlink{page.11}{race_first_cmonth}|First Race: Application Month | | \hyperlink{page.12}{12} |\hyperlink{page.12}{race_last} |Race on Last Application | | \hyperlink{page.13}{13} |\hyperlink{page.13}{race_last_cyear} |Last Race: Application Year | | \hyperlink{page.14}{14} |\hyperlink{page.14}{race_last_cmonth} |Last Race: Application Month | | \hyperlink{page.15}{15} |\hyperlink{page.15}{bpl} |Place of Birth | | \hyperlink{page.16}{16} |\hyperlink{page.16}{zip_residence} |ZIP Code of Residence at Time of Death | | \hyperlink{page.17}{17} |\hyperlink{page.17}{socstate} |State where Social Security Number Issued | | \hyperlink{page.18}{18} |\hyperlink{page.18}{age_first_application} |Age at First Social Security Application | | \hyperlink{page.19}{19} | \hyperlink{page.19}{link_abe_exact_conservative} |Flag for conservative ABE match | | \hyperlink{page.20}{20} | \hyperlink{page.20}{weight} |CenSoc weight | | \hyperlink{page.21}{21} | \hyperlink{page.21}{weight_conservative} |CenSoc weight (Conservative Sample) |

\vspace{150pt}

Summary: The CenSoc-Numident file (N = 9,398,581) links the full-count 1940 Census to the National Archives’ public release of the Social Security Numident file. Records were linked using the standard and conservative variants of the ABE method developed by Abramitzky, Boustan, and Eriksson (2012, 2014, 2017). To merge 1940 Census variables with this dataset, researchers can obtain a copy of the 1940 Census from IPUMS-USA and link on the individual-level, unique identifier HISTID variable.

\newpage

\huge HISTID \normalsize \vspace{12pt}

Label: Historical Unique Identifier

Description: HISTID is a unique individual-level identifier. It can be used to merge the CenSoc-Numident file with the 1940 Full-Count Census from IPUMS.

\newpage

\huge byear \normalsize \vspace{12pt}

Label: Birth Year

Description: byear reports a person's year of birth, as recorded in the Numident death records.

## Library Packages
library(tidyverse)
library(data.table)
library(kableExtra)

## read in censoc_numident_v2.1 data file
censoc_numident_v2.1 <- read_csv("/data/censoc/censoc_data_releases/censoc_numident/censoc_numident_v2.1/censoc_numident_v2.1.csv")
byear_plot <- censoc_numident_v2.1 %>%
    group_by(byear) %>%
    summarise(n = n()) %>%
    ggplot(aes(x = byear, y = n)) + 
    geom_line() + 
    geom_point() +
    theme_minimal(base_size = 15) +
  ggtitle("Year of Birth") + 
  theme(legend.position="bottom") +
  xlab("title") + 
  labs(x = "Year", 
       y = "Count") + 
  scale_y_continuous(labels = scales::comma, limits = c(0, 500000)) +
  scale_x_continuous(breaks = scales::pretty_breaks(n=5)) 

\vspace{75pt}

byear_plot

\newpage \huge bmonth \normalsize \vspace{12pt}

Label: Birth Month

Description: bmonth reports a person's month of birth, as recorded in the Numident death records.

## run in the console and copy and paste into documentation
bmonth_tabulated <- censoc_numident_v2.1 %>% 
    group_by(bmonth) %>%
    tally() %>%
    mutate(freq = signif(n*100 / sum(n), 2)) %>%
    mutate(label = c("January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December")) %>%
    select(bmonth, label, n, `freq %` = freq) %>% 
  knitr::kable(format = "pipe")

\vspace{30pt}

bmonth_tabulated

\newpage

\huge dyear \normalsize \vspace{12pt}

Label: Death Year

Description: dyear reports a person's year of death, as recorded in the Numident death records.

dyear_plot <- censoc_numident_v2.1 %>%
    group_by(dyear) %>%
    summarise(n = n()) %>%
    ggplot(aes(x = dyear, y = n)) + 
    geom_line() + 
    geom_point() +
    theme_minimal(base_size = 15) +  
  ggtitle("Year of Death") + 
  theme(legend.position="bottom") +
  xlab("title") + 
  labs(x = "Year", 
       y = "Count") + 
  scale_y_continuous(labels = scales::comma) + 
  scale_y_continuous(labels = scales::comma, limits = c(0, 800000)) + 
  scale_x_continuous(breaks = scales::pretty_breaks(n=5)) 

\vspace{75pt}

dyear_plot

\newpage

\huge dmonth \normalsize \vspace{12pt}

Label: Death Month

Description: dmonth reports a person's month of death, as recorded in the Numident death records.

dmonth_tabulated <- censoc_numident_v2.1 %>%
    group_by(dmonth) %>%
    tally() %>%
    mutate(freq = signif(n*100 / sum(n), 2)) %>%
    mutate(label = c("January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December")) %>%
    select(dmonth, label, n, `freq %` = freq) %>% 
  knitr::kable(format = "pipe")

\vspace{30pt}

dmonth_tabulated

\newpage

\huge death_age \normalsize \vspace{12pt}

Label: Age at Death (Years)

Description: death_age reports a person's age at death in years, calculated using the birth and death information recorded in the Numident death records.

death_age_plot <- censoc_numident_v2.1 %>%
    group_by(death_age) %>%
    summarise(n = n()) %>%
    ggplot(aes(x = death_age, y = n)) + 
    geom_line() + 
    geom_point() +
    theme_minimal(base_size = 15) + 
  ggtitle("Age at Death") + 
  theme(legend.position="bottom") +
  xlab("title") + 
  labs(x = "Age at Death", 
       y = "Count") + 
  scale_y_continuous(labels = scales::comma) + 
  scale_x_continuous(breaks = scales::pretty_breaks(n=5)) 

\vspace{75pt}

death_age_plot

\newpage

\huge sex \normalsize \vspace{12pt}

Label: Sex

Description: sex reports a person's sex, as recorded in the Numident death, application, or claim records.

sex_tabulated <- censoc_numident_v2.1 %>%
    group_by(sex) %>%
     tally() %>%
     mutate(freq = signif(n*100 / sum(n), 3)) %>%
     mutate(label = c("Men", "Women")) %>%
     select(sex, label, n, `freq %` = freq) %>% 
  knitr::kable(format = "pipe")

\vspace{30pt}

sex_tabulated

\newpage

\newpage

\huge race_first \normalsize \vspace{12pt}

Label: race first

Description: race_first reports a person's race, as recorded on their first application entry.

Note: Before 1980, the race schema in the Social Security application form contained three categories: White, Black, and Other. In 1980, the SSA added three categories: (1) Asian, Asian American, or Pacific Islander, (2) Hispanic, and (3) North American Indian or Alaskan Native. The Other category was also removed.

race_first_tabulated <- censoc_numident_v2.1 %>%
    group_by(race_first) %>%
    tally() %>%
    mutate(freq = signif(n*100 / sum(n), 3)) %>%
    mutate(label = c("White", "Black", "Other", "Asian", "Hispanic", "North American Native", "Missing")) %>%
    select(race_first, label, n, `freq %` = freq) %>% 
  knitr::kable(format = "pipe")

\vspace{30pt}

race_first_tabulated

\newpage

\huge race_first_cyear \normalsize \vspace{12pt}

Label: First Race: Application Year

Description: race_first_cyear is a numeric variable reporting the year of the application on which a person reported their first race.

\vspace{12pt}

\newpage

\huge race_first_cmonth \normalsize \vspace{12pt}

Label: First Race: Application Month

Description: race_first_cmonth is a numeric variable reporting the month of the application on which a person reported their first race.

\newpage

\huge race_last \normalsize \vspace{12pt}

Label: race last

Description: race_last reports a person's race, as recorded on their most recent application entry.

Note: Before 1980, the race schema in the Social Security application form contained three categories: White, Black, and Other. In 1980, the SSA added three categories: (1) Asian, Asian American, or Pacific Islander, (2) Hispanic, and (3) North American Indian or Alaskan Native. They also removed the Other category.

race_last_tabulated <- knitr::kable(censoc_numident_v2.1 %>%
    group_by(race_last) %>%
    tally() %>%
    mutate(freq = signif(n*100 / sum(n), 3)) %>%
    mutate(label = c("White", "Black", "Other", "Asian", "Hispanic", "North American Native", "Missing")) %>%
    select(race_last, label, n, `freq %` = freq))

\vspace{30pt}

race_last_tabulated

\newpage

\huge race_last_cyear \normalsize \vspace{12pt}

Label: First Race: Application Year

Description: race_last_cyear reports the year of the application on which a person reported their last race.

\newpage

\huge race_last_cmonth \normalsize \vspace{12pt}

Label: Last Race: Application Month

Description: race_last_cmonth is a numeric variable reporting the month of the application on which a person reported their last race.

\newpage

\huge bpl \normalsize \vspace{12pt}

Label: Birthplace

Description: bpl is a numeric variable reporting a person's place of birth, as recorded in the Numident application or claims records. The accompanying bpl_string variable reports the person's place of birth as a character string. The coding schema matches the detailed IPUMS-USA Birthplace coding schema.

For a complete list of IPUMS Birthplace codes, please see: https://usa.ipums.org/usa-action/variables/BPL

bpl_tabulation <- censoc_numident_v2.1 %>%
    filter(bpl < 10000 | is.na(bpl)) %>%
    group_by(bpl, bpl_string) %>%
    tally() %>%
    ungroup() %>%
    mutate(freq = round(n*100 / sum(n), 2)) %>%
    select(bpl, bpl_string, n, `freq %` = freq)

rows <- seq_len(nrow(bpl_tabulation) %/% 2)

knitr::kable(list(bpl_tabulation[rows,1:4],
           matrix(numeric(), nrow=0, ncol=1),
           bpl_tabulation[-rows, 1:4]),
      caption = "BPL Tabulation (Native born only)",
      label = "tables", format = "latex", booktabs = TRUE)  %>%
  kableExtra::kable_styling(latex_options = "HOLD_position")

\newpage

\huge zip_residence

\normalsize

\vspace{12pt}

Label: ZIP Code of Residence at Time of Death

Description: zip_residence is a string variable (9-characters) reporting a person's ZIP Code of residence at time of death, as recorded in the Numident death records.

\newpage \huge socstate

\normalsize

\vspace{12pt}

Label: State where Social Security Number Issued

Description: The state in which a person's social security card was issued. Determined by first three (3) digits of Social Security number, as recorded in Numident death records. The accompanying socstate_string variable reports the state in which a person's social security card was issued as a character string. The coding schema matches the detailed IPUMS-USA Birthplace coding schema.

\vspace{30pt}

socstate_tabulation <- censoc_numident_v2.1 %>%
    filter(socstate < 10000 | is.na(socstate)) %>%
    group_by(socstate, socstate_string) %>%
    tally() %>%
    ungroup() %>%
    mutate(freq = round(n*100 / sum(n), 2)) %>%
    select(socstate, socstate_string, n, `freq %` = freq)

rows <- seq_len(nrow(socstate_tabulation) %/% 2)

knitr::kable(list(socstate_tabulation[rows,1:4],
           matrix(numeric(), nrow=0, ncol=1),
           socstate_tabulation[-rows, 1:4]),
      caption = "Tabulation of socstate (Native born only)",
      label = "tables", format = "latex", booktabs = TRUE)  %>%
  kableExtra::kable_styling(latex_options = "HOLD_position")

\newpage

\huge age_first_app

\normalsize

\vspace{12pt}

Label: Age at First Social Security Application

Description: age_first_application reports the age at which a person submitted their first Social Security Application.

age_first_app_plot <- censoc_numident_v2.1 %>%
  group_by(age_first_application) %>%
  filter(age_first_application %in% c(0:110)) %>% 
  summarise(n = n()) %>%
  ggplot(aes(x = age_first_application, y = n)) + 
  geom_point() +
  geom_line() + 
  theme_minimal(15) + 
  theme(legend.position="bottom") +
  labs(title = "Age of First Application",
       x = "Age of First Application", 
       y = "Count") + 
  scale_y_continuous(labels = scales::comma) + 
  scale_x_continuous(breaks = scales::pretty_breaks(n=5)) 

\vspace{75pt}

age_first_app_plot

\newpage

\huge link_abe_exact_conservative

\normalsize

\vspace{12pt}

Label: Flag for conservative ABE match

Description: A flag variable reporting whether a match was established with the ABE conservative match with exact names.

Note: All matches were established with the standard ABE match. A subset of these records were also matched with the conservative ABE match.

link_abe_exact_conservative_tabulated <- censoc_numident_v2.1 %>%
  group_by(link_abe_exact_conservative) %>%
  tally() %>%
  mutate(freq = signif(n*100 / sum(n), 3)) %>%
  arrange(desc(link_abe_exact_conservative)) %>% 
  mutate(label = c("Match Established with Conservative ABE Algorithm", "Not Established")) %>%
  select(link_abe_exact_conservative, label, n, `freq %` = freq) %>% 
  knitr::kable(format = "pipe")

\vspace{50pt}

link_abe_exact_conservative_tabulated

\newpage

\huge weight \normalsize

Label: Sample Weight

\vspace{12pt}

Description: A post-stratification person-weight to Human Mortality Database (HMD) totals for persons (1) born between 1895-1939 (2) dying between 1988-2005 (3) dying between ages 65-100. Please see the CenSoc Methods Protocol for more details on weighting procedure.

weights_tabulated <- censoc_numident_v2.1 %>%
  filter(!is.na(weight)) %>% 
  summarize('Min Weight' = round(min(weight),2), 'Max Weight' = round(max(weight), 2)) %>%
  mutate(id = 1:n()) %>% 
  pivot_longer(-id, names_to = "Label", values_to = "Value") %>% 
  select(Value, Label) %>% 
  add_row(Label = "No Weight Assigned", Value = NA) %>% 
  knitr::kable(format = "markdown")

weights <- censoc_numident_v2.1 %>% 
  filter(!is.na(weight)) %>% 
  group_by(death_age, dyear) %>% 
  summarize(weight = mean(weight))

## plot mortality sex ratio Lexis surface
weights_lexis <- weights %>% 
  ggplot() +
  geom_raster(aes(x = dyear, y = death_age,
                  fill = weight)) +
  ## Lexis grid
  geom_hline(yintercept = seq(65, 100, 10),
             alpha = 0.2, lty = "dotted") +
  geom_vline(xintercept = seq(1985, 2005, 10),
             alpha = 0.2, lty = "dotted") +
  geom_abline(intercept = seq(-100, 100, 10)-1910,
              alpha = 0.2, lty = "dotted") +
  scale_fill_viridis_c(option = "magma") +
  scale_x_continuous("Year", expand = c(0.02, 0),
                     breaks = seq(1988, 2005, 5)) +
  scale_y_continuous("Age", expand = c(0, 0),
                     breaks = seq(65, 100, 10)) +
  guides(fill = guide_legend(reverse = TRUE)) +
  # coord
  coord_equal() +
  # theme
  theme_void() +
  theme(
    axis.text = element_text(colour = "black"),
    axis.text.y = element_text(size = 10),
    axis.text.x = element_text(size = 10, angle = 45, hjust = .5), 
    plot.title = element_text(size = 10, vjust = 2),
    legend.text = element_text(size = 10), 
    axis.title=element_text(size = 10,face="bold")
  ) + 
  labs(X = "Year",
       Y = "Age",
       title = "Average weight by age at death and year of death") 

\vspace{50pt}

weights_tabulated

\vspace{50pt}

weights_lexis

\newpage

\huge weight_conservative \normalsize

Label: Sample Weights (Conservative Sample)

\vspace{12pt}

Description: A post-stratification person-weight to Human Mortality Database (HMD) totals for persons (1) born between 1895-1939 (2) dying between 1988-2005 (3) dying between ages 65-100. Please see the CenSoc Methods Protocol for more details on weighting procedure.

weights_conservative_tabulated <- censoc_numident_v2.1 %>%
  filter(!is.na(weight_conservative)) %>% 
  summarize('Min Weight' = round(min(weight_conservative),2), 'Max Weight' = round(max(weight_conservative), 2)) %>%
  mutate(id = 1:n()) %>% 
  pivot_longer(-id, names_to = "Label", values_to = "Value") %>% 
  select(Value, Label) %>% 
  add_row(Label = "No Weight Assigned", Value = NA) %>% 
  knitr::kable(format = "pipe")

weights_conservative <- censoc_numident_v2.1 %>% 
  filter(!is.na(weight_conservative)) %>% 
  group_by(death_age, dyear) %>% 
  summarize(weight_conservative = mean(weight_conservative))

## plot mortality sex ratio Lexis surface
weights_lexis_conservative <- weights_conservative %>% 
  ggplot() +
  geom_raster(aes(x = dyear, y = death_age,
                  fill = weight_conservative)) +
  ## Lexis grid
  geom_hline(yintercept = seq(65, 100, 10),
             alpha = 0.2, lty = "dotted") +
  geom_vline(xintercept = seq(1985, 2005, 10),
             alpha = 0.2, lty = "dotted") +
  geom_abline(intercept = seq(-100, 100, 10)-1910,
              alpha = 0.2, lty = "dotted") +
  scale_fill_viridis_c(option = "magma") +
  scale_x_continuous("Year", expand = c(0.02, 0),
                     breaks = seq(1988, 2005, 5)) +
  scale_y_continuous("Age", expand = c(0, 0),
                     breaks = seq(65, 100, 10)) +
  guides(fill = guide_legend(reverse = TRUE)) +
  # coord
  coord_equal() +
  # theme
  theme_void() +
  theme(
    axis.text = element_text(colour = "black"),
    axis.text.y = element_text(size=10),
    axis.text.x = element_text(size = 10, angle = 45, hjust = .5), 
    plot.title = element_text(size=10, vjust = 2),
    legend.text = element_text(size = 10), 
    axis.title=element_text(size=10,face="bold")
  ) + 
  labs(X = "Year",
       Y = "Age",
       title = "Average weight by age at death and year of death") 

\vspace{50pt}

weights_conservative_tabulated

\vspace{50pt}

weights_lexis_conservative


caseybreen/wcensoc documentation built on Nov. 21, 2024, 5:15 a.m.