In caseybreen/wcensoc:

[^updated]: Last updated: r format(Sys.time(), '%d %B, %Y')

\captionsetup[table]{labelformat=empty}

| Page| Variable | Label | |----:|:-----------------------|:---------------------------------------------| | \hyperlink{page.2}{2} | \hyperlink{page.2}{HISTID} |Historical unique identifier | | \hyperlink{page.3}{3} | \hyperlink{page.3}{byear} |Year of birth | | \hyperlink{page.4}{4} | \hyperlink{page.4}{bmonth} |Month of birth | | \hyperlink{page.5}{5} | \hyperlink{page.5}{dyear} |Year of death | | \hyperlink{page.6}{6} | \hyperlink{page.6}{dmonth} |Month of death | | \hyperlink{page.7}{7} | \hyperlink{page.7}{death_age} |Age at death (years) | | \hyperlink{page.8}{8} |\hyperlink{page.8}{sex} |Sex | | \hyperlink{page.9}{9} |\hyperlink{page.9}{race_first} |Race on First Application | | \hyperlink{page.10}{10} |\hyperlink{page.10}{race_first_cyear} |First Race: Application Year | | \hyperlink{page.11}{11} |\hyperlink{page.11}{race_first_cmonth}|First Race: Application Month | | \hyperlink{page.12}{12} |\hyperlink{page.12}{race_last} |Race on Last Application | | \hyperlink{page.13}{13} |\hyperlink{page.13}{race_last_cyear} |Last Race: Application Year | | \hyperlink{page.14}{14} |\hyperlink{page.14}{race_last_cmonth} |Last Race: Application Month | | \hyperlink{page.15}{15} |\hyperlink{page.15}{bpl} |Place of Birth | | \hyperlink{page.16}{16} |\hyperlink{page.16}{zip_residence} |ZIP Code of Residence at Time of Death | | \hyperlink{page.17}{17} |\hyperlink{page.17}{socstate} |State where Social Security Number Issued | | \hyperlink{page.18}{18} |\hyperlink{page.18}{age_first_application} |Age at First Social Security Application | | \hyperlink{page.19}{19} | \hyperlink{page.19}{link_abe_exact_conservative} |Flag for conservative ABE match | | \hyperlink{page.20}{20} | \hyperlink{page.20}{weight} |CenSoc weight | | \hyperlink{page.21}{21} | \hyperlink{page.21}{weight_conservative} |CenSoc weight (Conservative Sample) | | \hyperlink{page.22}{22} | \hyperlink{page.22}{Additional IPUMS variables}| Additional variables include pernum, perwt, age, mbpl, fbpl, educd, educ_yrs, empstatd, hispan, inconwg, marst, nativity, occ, occscore, ownership, race, rent, serial, statefip, urban|

\vspace{100pt}

Summary: The CenSoc-Numident Version 2.1 Demo dataset (N = 85,865) links the IPUMS 1940 1% census sample to the National Archives' public release of the Social Security Numident file. Records were linked using the standard and conservative variants of the ABE method developed by Abramitzky, Boustan, and Eriksson (2012, 2014, 2017). Note that the demo file isn't conductive to high-resolution mortality research, and we recommend researchers obtain a copy of the 1940 Census from IPUMS-USA and link on the individual-level, unique identifier HISTID variable.

\newpage

\huge HISTID \normalsize \vspace{12pt}

Label: Historical Unique Identifier

Description: HISTID is a unique individual-level identifier. It can be used to merge the CenSoc-Numident file with the 1940 Full-Count Census from IPUMS.

\newpage

\huge byear \normalsize \vspace{12pt}

Label: Birth Year

Description: byear reports a person's year of birth, as recorded in the Numident death records.

## Library Packages
library(tidyverse)
library(data.table)
library(kableExtra)

## read in censoc_numident_v2.1 data file
censoc_numident_v2.1<-read_csv("/data/censoc/censoc_data_releases/censoc_numident_demo/censoc_numident_demo_v2.1/censoc_numident_demo_v2.1.csv")

byear_plot <- censoc_numident_v2.1 %>%
    group_by(byear) %>%
    summarise(n = n()) %>%
    ggplot(aes(x = byear, y = n)) + 
    geom_line() + 
    geom_point() +
    theme_minimal(base_size = 15) +
  ggtitle("Year of Birth") + 
  theme(legend.position="bottom") +
  xlab("title") + 
  labs(x = "Year", 
       y = "Count") + 
  # scale_y_continuous(labels = scales::comma, limits = c(0, 5000)) +
  scale_x_continuous(breaks = scales::pretty_breaks(n=5))

\vspace{75pt}

byear_plot

\newpage \huge bmonth \normalsize \vspace{12pt}

Label: Birth Month

Description: bmonth reports a person's month of birth, as recorded in the Numident death records.

## run in the console and copy and paste into documentation
bmonth_tabulated <- censoc_numident_v2.1 %>% 
    group_by(bmonth) %>%
    tally() %>%
    mutate(freq = signif(n*100 / sum(n), 2)) %>%
    mutate(label = c("January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December")) %>%
    select(bmonth, label, n, `freq %` = freq) %>% 
  knitr::kable(format = "pipe")

\vspace{30pt}

bmonth_tabulated

\newpage

\huge dyear \normalsize \vspace{12pt}

Label: Death Year

Description: dyear reports a person's year of death, as recorded in the Numident death records.

dyear_plot <- censoc_numident_v2.1 %>%
    group_by(dyear) %>%
    summarise(n = n()) %>%
    ggplot(aes(x = dyear, y = n)) + 
    geom_line() + 
    geom_point() +
    theme_minimal(base_size = 15) +  
  ggtitle("Year of Death") + 
  theme(legend.position="bottom") +
  xlab("title") + 
  labs(x = "Year", 
       y = "Count") + 
  scale_y_continuous(labels = scales::comma) + 
 scale_y_continuous(labels = scales::comma, limits = c(0, 7000)) +
  scale_x_continuous(breaks = scales::pretty_breaks(n=5)) 
#The minimum y is 3000 deaths a year when the scale is removed

\vspace{75pt}

dyear_plot

\newpage

\huge dmonth \normalsize \vspace{12pt}

Label: Death Month

Description: dmonth reports a person's month of death, as recorded in the Numident death records.

dmonth_tabulated <- censoc_numident_v2.1 %>%
    group_by(dmonth) %>%
    tally() %>%
    mutate(freq = signif(n*100 / sum(n), 2)) %>%
    mutate(label = c("January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December")) %>%
    select(dmonth, label, n, `freq %` = freq) %>% 
  knitr::kable(format = "pipe")

\vspace{30pt}

dmonth_tabulated

\newpage

\huge death_age \normalsize \vspace{12pt}

Label: Age at Death (Years)

Description: death_age reports a person's age at death in years, calculated using the birth and death information recorded in the Numident death records.

death_age_plot <- censoc_numident_v2.1 %>%
    group_by(death_age) %>%
    summarise(n = n()) %>%
    ggplot(aes(x = death_age, y = n)) + 
    geom_line() + 
    geom_point() +
    theme_minimal(base_size = 15) + 
  ggtitle("Age at Death") + 
  theme(legend.position="bottom") +
  xlab("title") + 
  labs(x = "Age at Death", 
       y = "Count") + 
  scale_y_continuous(labels = scales::comma) + 
  scale_x_continuous(breaks = scales::pretty_breaks(n=5))

\vspace{75pt}

death_age_plot

\newpage

\huge sex \normalsize \vspace{12pt}

Label: Sex

Description: sex reports a person's sex, as recorded in the Numident death, application, or claim records.

sex_tabulated <- censoc_numident_v2.1 %>%
    group_by(sex) %>%
     tally() %>%
     mutate(freq = signif(n*100 / sum(n), 3)) %>%
     mutate(label = c("Men", "Women")) %>%
     select(sex, label, n, `freq %` = freq) %>% 
  knitr::kable(format = "pipe")

\vspace{30pt}

sex_tabulated

\newpage

\huge race_first \normalsize \vspace{12pt}

Label: race first

Description: race_first reports a person's race, as recorded on their first application entry.

Note: Before 1980, the race schema in the Social Security application form contained three categories: White, Black, and Other. In 1980, the SSA added three categories: (1) Asian, Asian American, or Pacific Islander, (2) Hispanic, and (3) North American Indian or Alaskan Native. The Other category was also removed.

race_first_tabulated <- censoc_numident_v2.1 %>%
    group_by(race_first) %>%
    tally() %>%
    mutate(freq = signif(n*100 / sum(n), 3)) %>%
    mutate(label = c("White", "Black", "Other", "Asian", "Hispanic", "North American Native", "Missing")) %>%
    select(race_first, label, n, `freq %` = freq) %>% 
  knitr::kable(format = "pipe")

\vspace{30pt}

race_first_tabulated

\newpage

\huge race_first_cyear \normalsize \vspace{12pt}

Label: First Race: Application Year

Description: race_first_cyear is a numeric variable reporting the year of the application on which a person reported their first race.

\vspace{12pt}

\newpage

\huge race_first_cmonth \normalsize \vspace{12pt}

Label: First Race: Application Month

Description: race_first_cmonth is a numeric variable reporting the month of the application on which a person reported their first race.

\newpage

\huge race_last \normalsize \vspace{12pt}

Label: race last

Description: race_last reports a person's race, as recorded on their most recent application entry.

race_last_tabulated <- censoc_numident_v2.1 %>%
    group_by(race_last) %>%
    tally() %>%
    mutate(freq = signif(n*100 / sum(n), 3)) %>%
    mutate(label = c("White", "Black", "Other", "Asian", "Hispanic", "North American Native", "Missing")) %>%
    select(race_last, label, n, `freq %` = freq) %>% 
  knitr::kable(format = "pipe")

\vspace{30pt}

race_last_tabulated

\newpage

\huge race_last_cyear \normalsize \vspace{12pt}

Label: First Race: Application Year

Description: race_last_cyear reports the year of the application on which a person reported their last race.

\newpage

\huge race_last_cmonth \normalsize \vspace{12pt}

Label: Last Race: Application Month

Description: race_last_cmonth is a numeric variable reporting the month of the application on which a person reported their last race.

\newpage

\huge bpl \normalsize \vspace{12pt}

Label: Birthplace

Description: bpl is a numeric variable reporting a person's place of birth, as recorded in the Numident application or claims records. The accompanying bpl_string variable reports the person's place of birth as a character string. The coding schema matches the detailed IPUMS-USA Birthplace coding schema.

For a complete list of IPUMS Birthplace codes, please see: https://usa.ipums.org/usa-action/variables/BPL

bpl_tabulation <- censoc_numident_v2.1 %>%
    filter(bpl < 10000 | is.na(bpl)) %>%
    group_by(bpl, bpl_string) %>%
    tally() %>%
    ungroup() %>%
    mutate(freq = round(n*100 / sum(n), 2)) %>%
    select(bpl, bpl_string, n, `freq %` = freq)

rows <- seq_len(nrow(bpl_tabulation) %/% 2)

knitr::kable(list(bpl_tabulation[rows,1:4],
           matrix(numeric(), nrow=0, ncol=1),
           bpl_tabulation[-rows, 1:4]),
      caption = "BPL Tabulation (Native born only)",
      label = "tables", format = "latex", booktabs = TRUE)  %>%
  kableExtra::kable_styling(latex_options = c("HOLD_position"))

\newpage

\huge zip_residence

\normalsize

\vspace{12pt}

Label: ZIP Code of Residence at Time of Death

Description: zip_residence is a string variable (9-characters) reporting a person's ZIP Code of residence at time of death, as recorded in the Numident death records.

\newpage \huge socstate

\normalsize

\vspace{12pt}

Label: State where Social Security Number Issued

Description: The state in which a person's social security card was issued. Determined by first three (3) digits of Social Security number, as recorded in Numident death records. The accompanying socstate_string variable reports the state in which a person's social security card was issued as a character string. The coding schema matches the detailed IPUMS-USA Birthplace coding schema.

\vspace{30pt}

socstate_tabulation <- censoc_numident_v2.1 %>%
    filter(socstate < 10000 | is.na(socstate)) %>%
    group_by(socstate, socstate_string) %>%
    tally() %>%
    ungroup() %>%
    mutate(freq = round(n*100 / sum(n), 2)) %>%
    select(socstate, socstate_string, n, `freq %` = freq)

rows <- seq_len(nrow(socstate_tabulation) %/% 2)

knitr::kable(list(socstate_tabulation[rows,1:4],
           matrix(numeric(), nrow=0, ncol=1),
           socstate_tabulation[-rows, 1:4]),
      caption = "Tabulation of socstate (Native born only)",
      label = "tables", format = "latex", booktabs = TRUE)  %>%
  kableExtra::kable_styling(latex_options = "HOLD_position")

\newpage

\huge age_first_app

\normalsize

\vspace{12pt}

Label: Age at First Social Security Application

Description: age_first_application reports the age at which a person submitted their first Social Security Application.

age_first_app_plot <- censoc_numident_v2.1 %>%
  group_by(age_first_application) %>%
  filter(age_first_application %in% c(0:110)) %>% 
  summarise(n = n()) %>%
  ggplot(aes(x = age_first_application, y = n)) + 
  geom_point() +
  geom_line() + 
  theme_minimal(15) + 
  theme(legend.position="bottom") +
  labs(title = "Age of First Application",
       x = "Age of First Application", 
       y = "Count") + 
  scale_y_continuous(labels = scales::comma) + 
  scale_x_continuous(breaks = scales::pretty_breaks(n=5))

\vspace{75pt}

age_first_app_plot

\newpage

\huge link_abe_exact_conservative

\normalsize

\vspace{12pt}

Label: Flag for conservative ABE match

Description: A flag variable reporting whether a match was established with the ABE conservative match with exact names.

Note: All matches were established with the standard ABE match. A subset of these records were also matched with the conservative ABE match.

link_abe_exact_conservative_tabulated <- censoc_numident_v2.1 %>%
  group_by(link_abe_exact_conservative) %>%
  tally() %>%
  mutate(freq = signif(n*100 / sum(n), 3)) %>%
  arrange(desc(link_abe_exact_conservative)) %>% 
  mutate(label = c("Match Established with Conservative ABE Algorithm", "Not Established")) %>%
  select(link_abe_exact_conservative, label, n, `freq %` = freq) %>% 
  knitr::kable(format = "pipe")

\vspace{50pt}

link_abe_exact_conservative_tabulated

\newpage

\huge weight \normalsize

Label: Sample Weight [^1]

[^1]: The IPUMS-USA 1940 1% sample also includes a weight (perweight) to account for the 1940 sampling procedure (thus no weights for the 100% complete count 1940 census). For analysis, we recommend using both sets of weights. A final weight can be constructed by multiplying the two weights together.

\vspace{12pt}

Description: A post-stratification person-weight to Human Mortality Database (HMD) totals for persons (1) born between 1895-1939 (2) dying between 1988-2005 (3) dying between ages 65-100. Please see the CenSoc Methods Protocol for more details on weighting procedure.

weights_tabulated <- censoc_numident_v2.1 %>%
  filter(!is.na(weight)) %>% 
  summarize('Min Weight' = round(min(weight),2), 'Max Weight' = round(max(weight), 2)) %>%
  mutate(id = 1:n()) %>% 
  pivot_longer(-id, names_to = "Label", values_to = "Value") %>% 
  select(Value, Label) %>% 
  add_row(Label = "No Weight Assigned", Value = NA) %>% 
  knitr::kable(format = "markdown")

weights <- censoc_numident_v2.1 %>% 
  filter(!is.na(weight)) %>% 
  group_by(death_age, dyear) %>% 
  summarize(weight = mean(weight))

## plot mortality sex ratio Lexis surface
weights_lexis <- weights %>% 
  ggplot() +
  geom_raster(aes(x = dyear, y = death_age,
                  fill = weight)) +
  ## Lexis grid
  geom_hline(yintercept = seq(65, 100, 10),
             alpha = 0.2, lty = "dotted") +
  geom_vline(xintercept = seq(1985, 2005, 10),
             alpha = 0.2, lty = "dotted") +
  geom_abline(intercept = seq(-100, 100, 10)-1910,
              alpha = 0.2, lty = "dotted") +
  scale_fill_viridis_c(option = "magma") +
  scale_x_continuous("Year", expand = c(0.02, 0),
                     breaks = seq(1988, 2005, 5)) +
  scale_y_continuous("Age", expand = c(0, 0),
                     breaks = seq(65, 100, 10)) +
  guides(fill = guide_legend(reverse = TRUE)) +
  # coord
  coord_equal() +
  # theme
  theme_void() +
  theme(
    axis.text = element_text(colour = "black"),
    axis.text.y = element_text(size = 10),
    axis.text.x = element_text(size = 10, angle = 45, hjust = .5), 
    plot.title = element_text(size = 10, vjust = 2),
    legend.text = element_text(size = 10), 
    axis.title=element_text(size = 10,face="bold")
  ) + 
  labs(X = "Year",
       Y = "Age",
       title = "Average weight by age at death and year of death")

\vspace{50pt}

weights_tabulated

\vspace{50pt}

weights_lexis

\newpage

\huge weight_conservative \normalsize

Label: Sample Weights (Conservative Sample)

\vspace{12pt}

weights_conservative_tabulated <- censoc_numident_v2.1 %>%
  filter(!is.na(weight_conservative)) %>% 
  summarize('Min Weight' = round(min(weight_conservative),2), 'Max Weight' = round(max(weight_conservative), 2)) %>%
  mutate(id = 1:n()) %>% 
  pivot_longer(-id, names_to = "Label", values_to = "Value") %>% 
  select(Value, Label) %>% 
  add_row(Label = "No Weight Assigned", Value = NA) %>% 
  knitr::kable(format = "pipe")

weights_conservative <- censoc_numident_v2.1 %>% 
  filter(!is.na(weight_conservative)) %>% 
  group_by(death_age, dyear) %>% 
  summarize(weight_conservative = mean(weight_conservative))

## plot mortality sex ratio Lexis surface
weights_lexis_conservative <- weights_conservative %>% 
  ggplot() +
  geom_raster(aes(x = dyear, y = death_age,
                  fill = weight_conservative)) +
  ## Lexis grid
  geom_hline(yintercept = seq(65, 100, 10),
             alpha = 0.2, lty = "dotted") +
  geom_vline(xintercept = seq(1985, 2005, 10),
             alpha = 0.2, lty = "dotted") +
  geom_abline(intercept = seq(-100, 100, 10)-1910,
              alpha = 0.2, lty = "dotted") +
  scale_fill_viridis_c(option = "magma") +
  scale_x_continuous("Year", expand = c(0.02, 0),
                     breaks = seq(1988, 2005, 5)) +
  scale_y_continuous("Age", expand = c(0, 0),
                     breaks = seq(65, 100, 10)) +
  guides(fill = guide_legend(reverse = TRUE)) +
  # coord
  coord_equal() +
  # theme
  theme_void() +
  theme(
    axis.text = element_text(colour = "black"),
    axis.text.y = element_text(size=10),
    axis.text.x = element_text(size = 10, angle = 45, hjust = .5), 
    plot.title = element_text(size=10, vjust = 2),
    legend.text = element_text(size = 10), 
    axis.title=element_text(size=10,face="bold")
  ) + 
  labs(X = "Year",
       Y = "Age",
       title = "Average weight by age at death and year of death")

\vspace{50pt}

weights_conservative_tabulated

\vspace{50pt}

weights_lexis_conservative

\newpage

\huge IPUMS 1940 Census Variable \normalsize

\vspace{12pt}

The variables below are from the IPUMS-USA 1940 1% census sample. We recommend looking at the terrific documentation on the IPUMS-USA website: https://usa.ipums.org/usa/index.shtml.

| Variable | Label | |:-------------|:---------------------------------------------| | pernum |Person number in sample weight| |perwt |IPUMS person weight[^2] | | age |Age in 1940| | mbpl |Mother's place of birth| | fbpl |Father's place of birth| | educd |Educational attainment (detailed)| | educ_yrs |Educational attainment in years (constructed)| | empstatd |Employment status (detailed)| | hispan |Hispanic/Spanish/Latino origin| | inconwg |Had non-wage/salary income over $50| | marst |Marital status| | nativity |Foreign birthplace or parentage| | occ |Occupation| | occscore |Occupational income score| | ownership |Ownership of dwelling (tenure)| | race |Race| | rent |Montly contract rent| | serial |Household serial number| | statefip |State of residence 1940| | urban |Urban/rural status|

[^2]: The perweight accounts for the 1940 sampling procedure to construct the 1% sample, and thus is only available in the 1940 1% sample. For analysis, we recommend using both the IPUMS perweight and the CenSoc weight. A final weight can be constructed by multiplying the two weights together

caseybreen/wcensoc documentation built on Nov. 21, 2024, 5:15 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

caseybreen/wcensoc

In caseybreen/wcensoc:

R Package Documentation

Browse R Packages

We want your feedback!