[^updated]: Last updated: r format(Sys.time(), '%d %B, %Y')

\captionsetup[table]{labelformat=empty}

| Page | Variable | Label | |-----------------------------|:---------------------------------------------|:---------------------------------------------| | \hyperlink{page.2}{2} | \hyperlink{page.2}{ssn} |Social Security Number | | \hyperlink{page.3}{3} | \hyperlink{page.3}{birth_string_uncleaned} |Place of birth, uncleaned string | | \hyperlink{page.4}{4} | \hyperlink{page.4}{birth_gnis_code} |Place of birth, GNIS code | | \hyperlink{page.5}{5} | \hyperlink{page.5}{birth_city} |Place of birth, city | | \hyperlink{page.6}{6} | \hyperlink{page.6}{birth_county} |Place of birth, county | | \hyperlink{page.7}{7} | \hyperlink{page.7}{birth_region} |Place of birth, census region | | \hyperlink{page.8}{8} | \hyperlink{page.8}{death_zip} |Place of death, 5-digit ZIP code | | \hyperlink{page.9}{9} | \hyperlink{page.9}{death_city} |Place of death, city | | \hyperlink{page.10}{10} | \hyperlink{page.10}{death_county_fips} |Place of death, FIPS county codes | | \hyperlink{page.11}{11} | \hyperlink{page.11}{death_county} |Place of death, county | | \hyperlink{page.12}{12} | \hyperlink{page.12}{death_state} |Place of death, state | | \hyperlink{page.13}{13} | \hyperlink{page.13}{death_region} |Place of death, census region | | \hyperlink{page.14}{14} | \hyperlink{page.14}{death_country} |Place of death, country | | \hyperlink{page.15}{15} | \hyperlink{page.15}{death_ruc1993} |Place of death, rural-urban continuum |

\vspace{10pt}

\begin{center} \underline{\textbf{Summary and Methodology}} \end{center} \vspace{10pt}

The BUNMD Supplementary Geography File (N = 42,878,885) provides a set of supplementary geography variables reporting place of birth and death for individuals in the BUNMD. This file can be linked onto the BUNMD at the individual-level using Social Security number.

Place of death variables: To construct the place of death variables, we use the ZIP code of last residence available in the Numident Death record. It is likely that this reflects the ZIP code where an individual last lived. We map these ZIP codes onto city, county, state, census region, country, and rural urban continuum codes using a database from the United States Postal Service (USPS) Link. For ZIP codes that have been decommissioned, we use a secondary database from UnitedStatesZipCodes.org.

Place of birth variables: To construct the place of birth variables, we use the uncleaned 12-character city/county of birth string from the Numident Application records. These strings are uncleaned and contain misspellings and other inconsistencies. We mapped these uncleaned strings onto Geographic Names Information System (GNIS) codes using the crosswalk developed for this paper:

| Black, Dan A., Seth G. Sanders, Evan J. Taylor, and Lowell J. Taylor. 2015. “The Impact of the Great | Migration on Mortality of African Americans: Evidence from the Deep South.”American Economic Review. | 105(2):477–503. doi: 10.1257/aer.20120642.

To construct other place of birth variable, we mapped the GNIS codes onto city, county, state, and census regions using a database from the U.S. Board on Geographic Names Link. Please cite Black et al. (2015) if you are using any of the birthplace geography variables in the file.

\newpage

library(dplyr)
library(data.table)
library(tidyverse)
library(knitr)
library(forcats)

BUNMD_Geography_File <- fread("/hdir/0/chrissoria/Birth_Places/BUNMD_Geography_File.csv")

\newpage

\huge ssn \normalsize \vspace{12pt}

Label: Social Security Number

Description: ssn is a numeric variable reporting a person’s Social Security number. This variable uniquely identifies all records in the dataset and can be used to merge onto the BUNMD dataset.

\newpage

\huge birth_string_uncleaned \normalsize \vspace{12pt}

Label: Place of birth, uncleaned city/county string

Description: birth_string_uncleaned reports the 12-character city/county of birth string from the Numident Application records. This variable is uncleaned and unprocessed, and any many contain spelling errors or inconsistencies.

\newpage

\huge birth_gnis_code

\normalsize \vspace{12pt}

Label: Place of birth, GNIS code

Description: birth_gnis_code is a numeric variable that reports a person's Geographic Names Information System (GNIS) code for their place of birth. Each GNIS code maps onto physical locations located in the U.S., including longitude and latitude coordinates, physical feature names, counties, and states.

For more information on GNIS codes, please see https://www.usgs.gov/faqs/what-geographic-names-information-system-gnis.

\newpage

\huge birth_city \normalsize \vspace{12pt}

Label: Place of birth, city

Description: birth_city is a character variable reporting a person's city of birth.

\newpage

\huge birth_county \normalsize \vspace{12pt}

Label: Place of birth, county

Description: birth_county is a character variable reporting a person's county of birth.

\newpage

\huge birth_region \normalsize \vspace{12pt}

Label: Place of birth, census region

Description: birth_region is a character variable reporting a person's census region of birth. For more information on Census Regions, please see https://www.census.gov/programs-surveys/economic-census/guidance-geographies/levels.html.

birthregion_table <- table(BUNMD_Geography_File$birth_region, useNA = "always")

birthregion_table <- as.data.frame(birthregion_table)

birthregion_table$Freq_perc <- round((birthregion_table$Freq/sum(birthregion_table$Freq))*100, digits = 1)

kable(birthregion_table, 
      caption = "Births by Census Region",
      col.names = c("Region", "n","freq %"),
      align = c("l", "r","l"))

\newpage

\huge death_zip \normalsize \vspace{12pt}

Label: Place of death, ZIP code

Description: death_zip is a numeric variable reporting the 5-digit ZIP code of last residence, as recorded in the Numident Death record. We note that ZIP codes are primarily used for USPS mail delivery purposes, not as geographic units. For a more detailed description of ZIP codes, please see: https://faq.usps.com/s/article/ZIP-Code-The-Basics.

\newpage

\huge death_city \normalsize \vspace{12pt}

Label: Place of death, city

Description: death_city is a character variable reporting a person's city of death, as sourced from the ZIP code of their residence at time of death. For ZIP codes that cross city lines, we report the city where the primary post office for that ZIP code is located.

\newpage

\huge death_county_fips \normalsize \vspace{12pt}

Label: Place of death, county FIPS code

Description: death_fips is a numeric variable reporting a person's county FIPS code of death.

For a more detailed description of county FIPS codes, please see: https://transition.fcc.gov/oet/info/maps/census/fips/fips.txt#:~:text=FIPS%20codes%20are%20numbers%20which,to%20which%20the%20county%20belongs.

\newpage

\huge death_county \normalsize \vspace{12pt}

Label: Place of death, county

Description: death_county is a character variable reporting a person's county of death, as sourced from the ZIP code of their residence at time of death.

\newpage

\huge death_state \normalsize \vspace{12pt}

Label: Place of death, state

Description: death_state is a character variable reporting a person's state of death, as sourced from the ZIP code of their residence at time of death.

\newpage

\huge death_region \normalsize \vspace{12pt}

Label: Place of death, census region

Description: death_region is a character variable reporting a person's census region of death, as sourced from the ZIP code of their residence at time of death.

deathregion_table <- table(BUNMD_Geography_File$death_region, useNA = "always")

deathregion_table <- as.data.frame(deathregion_table)

deathregion_table$Freq_perc <- round((deathregion_table$Freq/sum(deathregion_table$Freq))*100, digits = 1)

kable(deathregion_table, 
      caption = "Deaths by Census Region",
      col.names = c("region", "n","freq %"),
      align = c("l", "r","l"))

\newpage

\huge death_country \normalsize \vspace{12pt}

Label: Place of death, country

Description: death_country is a character variable reporting a person's country of death, as sourced from the ZIP code of their residence at time of death. While the majority of ZIP codes are within the U.S., countries that host U.S. military bases on their territories may also have ZIP codes associated with them.

\newpage

\huge death_ruc1993 \normalsize \vspace{12pt}

Label: Place of death, rural-urban continuum code

Description: death_ruc1993 report the Rural-Urban continuum code for a person's county of death, as sourced from the United States Department of Agriculture (USDA). The Urban-Rural Continuum Codes are a classification system developed to categorize U.S. counties based on their degree of urbanization and adjacency to metropolitan areas. The system consists of ten codes, ranging from 0 to 9, where lower codes represent more urbanized counties and higher codes represent more rural areas.

For more information: https://www.ers.usda.gov/data-products/rural-urban-continuum-codes.aspx

ruc1993_table <- table(BUNMD_Geography_File$death_ruc1993, useNA = "always")

ruc1993_table <- as.data.frame(ruc1993_table)

ruc1993_table <- ruc1993_table %>% 
  mutate(Rural_Urban_Label = factor(Var1, levels = c("0", "1", "2", "3", "4", "5", "6", "7", "8", "9", "NA"),
                                    labels = c("Central counties of metro areas of 1 million population or more", "Fringe counties of metro areas of 1 million population or more", "Counties in metro areas of 250,000 to 1 million population", "Counties in metro areas of fewer than 250,000 population", "Urban population of 20,000 or more, adjacent to a metro area", "Urban population of 20,000 or more, not adjacent to a metro area","Urban population of 2,500 to 19,999, adjacent to a metro area","Urban population of 2,500 to 19,999, not adjacent to a metro area","Rural or fewer than 2,500 urban population, adjacent to a metro area","Rural or fewer than 2,500 urban population, not adjacent to a metro area", "NA")))

ruc1993_table$Freq_perc <- round((ruc1993_table$Freq/sum(ruc1993_table$Freq))*100,digits = 1)

ruc1993_table <- ruc1993_table[, c(1, 3, 2, 4)]
# Format the table with kable
kable(ruc1993_table, 
      caption = "Rural-Urban Deaths",
      col.names = c("code", "description", "n","freq %"),
      align = c("l", "r", "l","l"))


caseybreen/wcensoc documentation built on Nov. 25, 2024, 11:32 p.m.