[^updated]: Last updated: r format(Sys.time(), '%d %B, %Y')

| Page| Variable | Label | |----:|:--------------------|:---------------------------------------------| | \hyperlink{page.2}{2} | \hyperlink{page.2}{ssn} |Social Security Number | | \hyperlink{page.3}{3} | \hyperlink{page.3}{fname_clean} |First Name (Cleaned) | | \hyperlink{page.4}{4} | \hyperlink{page.4}{mname_clean} |Middle Name (Cleaned) | | \hyperlink{page.5}{5} | \hyperlink{page.5}{lname_clean} |Last Name (Cleaned) | | \hyperlink{page.6}{6} | \hyperlink{page.6}{father_fname_clean} |Father's First Name (Cleaned) | | \hyperlink{page.7}{7} | \hyperlink{page.7}{father_mname_clean} |Father's Middle Name (Cleaned) | | \hyperlink{page.8}{8} | \hyperlink{page.8}{father_lname_clean} |Father's Last Name (Cleaned) | | \hyperlink{page.9}{9} | \hyperlink{page.9}{mother_fname_clean} |Mother's First Name (Cleaned) | | \hyperlink{page.10}{10} | \hyperlink{page.10}{mother_mname_clean} |Mother's Middle Name (Cleaned) | | \hyperlink{page.11}{11} | \hyperlink{page.11}{mother_lname_clean} |Mother's Last Name (Cleaned) |

\vspace{5 pt}

Summary: The BUNMD Cleaned Names File (N = 49,337,827) is a supplementary dataset that provides cleaned names for individuals and their parents in the Berkeley Unified Numident Mortality Database (BUNMD). Orignal name data is taken from Social Security Numident application and death records, and can be found in the BUNMD. Cleaning names includes removing non-alphabetic characters, standardizing NA values, and replacing some nicknames and name variants with standardized names. Researchers may attach this file to the full BUNMD using the ssn (Social Security number) variable, which uniquely identifies each person in the dataset.

\vspace{10 pt}

Notes:

  1. NA values for names are usually represented with an empty string. However, sometimes strings such as "unknown", "unk", "missing" or "not stated" were present in the name fields of Social Security records, usually for parents' names. Many common words indicated missing data have been removed from this dataset, but some such invalid strings may remain.

  2. This process of name cleaning and standardization is primarily designed to facilitate record linkage. Decisions for how to parse names, standardize names, and convert nicknames to full names are largely compliant with the original implementation of the automated ABE linking algorithm developed by Abramitzky, Boustan and Eriksson (2012, 2014, 2017).

## Library Packages
library(dplyr)
library(data.table)
library(kableExtra)

## Directory for clean names file
## (We don't need to read the whole file to create this codebook)
bunmd_clean_names <- fread("/data/censoc/censoc_data_releases/bunmd/bunmd_v2/bunmd_clean_names_v1.csv",
                           nrow = 10)

names(bunmd_clean_names)

\newpage

\huge ssn \normalsize \vspace{12pt}

Label: Social Security Number

Description: ssn reports a person's Social Security number, as recorded in Numident death records. It uniquely identifies all records, and can be used to link this file to the complete BUNMD.

\newpage

\huge fname_clean \normalsize \vspace{12pt}

Label: First Name (Cleaned)

Description: fname_clean is a character variable reporting the first 16 letters of the person's cleaned first name, as recorded in the Numident death records. This variable was cleaned from raw first names by removing titles (e.g., Dr.) and replacing nicknames with standard names (e.g., Billy to William). Non-alphabetical characters were also removed.

\newpage

\huge mname_clean \normalsize \vspace{12pt}

Label: Middle Name (Cleaned)

Description: mname_clean is a character variable reporting the first 16 letters of the person's cleaned middle name, as recorded in the Numident death records. This variable was cleaned by removing non-alphabetical characters.

\newpage

\huge lname_clean \normalsize \vspace{12pt}

Label: Last Name (Cleaned)

Description: lname_clean is a character variable reporting the first 21 letters of the person's cleaned last name, as recorded in the Numident death records. This variable was cleaned from raw last names by removing non-alphabetical characters. Multiple-word last names with certain prefixes were combined into one word (e.g., Mc Donald to McDonald).

\newpage

\huge father_fname_clean \normalsize \vspace{12pt}

Label: Father's First Name (Cleaned)

Description: father_fname_clean is a character variable reporting the first 16 letters of the person's father's cleaned first name, as recorded in the Numident application records. This variable was cleaned from the father's raw first name by removing titles (e.g., Dr.) and replacing nicknames with standard names (e.g., Billy to William). Non-alphabetical characters were also removed.

\newpage

\huge father_mname_clean \normalsize \vspace{12pt}

Label: Father's Middle Name (Cleaned)

Description: father_mname_clean is a character variable reporting the first 16 letters of the person's father's cleaned middle name, as recorded in the Numident application records. This variable was cleaned from the father's raw middle name by removing non-alphabetical characters.

\newpage

\huge father_lname_clean \normalsize \vspace{12pt}

Label: Father's Last Name (Cleaned)

Description: father_lname_clean is a character variable reporting the first 21 letters of the person's father's cleaned last name, as recorded in the Numident application records. This variable was cleaned from the father's raw last name by removing non-alphabetical characters. Multiple-word last names with certain prefixes were combined into one word (e.g. Mc Donald to McDonald).

\newpage

\huge mother_fname_clean \normalsize \vspace{12pt}

Label: Mother's First Name (Cleaned)

Description: mother_fname_clean is a character variable reporting the first 16 letters of the person's mother's cleaned first name, as recorded in the Numident application records. This variable was cleaned from the mother's raw first names by removing titles (e.g., Dr.) and replacing nicknames with standard names (e.g., Lizzie to Elizabeth). Non-alphabetical characters were also removed.

\newpage

\huge mother_mname_clean \normalsize \vspace{12pt}

Label: Mother's Middle Name (Cleaned)

Description: mother_mname_clean is a character variable reporting the first 16 letters of the person's mother's cleaned middle name, as recorded in the Numident application records. This variable was cleaned from mother's raw middle name by removing non-alphabetical characters.

\newpage

\huge mother_lname \normalsize \vspace{12pt}

Label: Mother's Last Name (Cleaned)

Description: mother_lname_clean is a character variable reporting the first 21 letters of the person's mother's cleaned last (maiden) name, as recorded in the Numident application records. This variable was cleaned from mother's raw last name by removing non-alphabetical characters. Multiple-word last names with certain prefixes were combined into one word (e.g., Mc Donald to McDonald).



caseybreen/wcensoc documentation built on Nov. 21, 2024, 5:15 a.m.