Once keywords (values) have been unnested, they need a lot of cleaning using the same shiny app used for news desks. At this step your renamed values should be in the "renamed_values" folder. This steps joins in the replacement keywords. You also have to ensure every keyword has one name (or category). Possible names are "subject", "persons", "glocations", "organizations", or "creative_works". These cases are somewhat rare but simplify network analysis. This steps writes a file in the "multi_named_values" folder. Before running the next step, you want to open this file in a spreadsheet and manually choose one name (category) for each value (keyword). Save this file in the "single_named_values" folder as "single_named_keywords.csv".

nyt_bind_values_lookups(values_output_folder = "renamed_values")

nyt_clean_keywords(
  unnested_df,
  values_output_folder = "renamed_values",
  multi_names_input_folder = "multi_named_values",
  multi_names_output_folder = "single_named_values"
)

Arguments

values_output_folder

folder to find keyword values post-cleaning

unnested_df

output of nyt_unnest_df()

multi_names_input_folder

folder to find keyword values with more than 1 name (category)

multi_names_output_folder

folder to save corrected keyword values csv

Value

nyt_clean_keywords() returns an unnested df with replaced keyword values and writes out a file of all keywords with more than one name

Examples

if (FALSE) { consolidated_unnested_df <- nyt_clean_keywords(unnested_df) }