Once keywords (values) have been unnested, they need a lot of cleaning using the same shiny app used for news desks. At this step your renamed values should be in the "renamed_values" folder. This steps joins in the replacement keywords. You also have to ensure every keyword has one name (or category). Possible names are "subject", "persons", "glocations", "organizations", or "creative_works". These cases are somewhat rare but simplify network analysis. This steps writes a file in the "multi_named_values" folder. Before running the next step, you want to open this file in a spreadsheet and manually choose one name (category) for each value (keyword). Save this file in the "single_named_values" folder as "single_named_keywords.csv".
nyt_bind_values_lookups(values_output_folder = "renamed_values") nyt_clean_keywords( unnested_df, values_output_folder = "renamed_values", multi_names_input_folder = "multi_named_values", multi_names_output_folder = "single_named_values" )
values_output_folder | folder to find keyword values post-cleaning |
---|---|
unnested_df | output of |
multi_names_input_folder | folder to find keyword values with more than 1 name (category) |
multi_names_output_folder | folder to save corrected keyword values csv |
nyt_clean_keywords()
returns an unnested df with
replaced keyword values and writes out a file of all keywords
with more than one name
if (FALSE) { consolidated_unnested_df <- nyt_clean_keywords(unnested_df) }