Values for 'news_desk' are often slightly different spellings of the same item.
There are too many instances to fix with a case_when()
statement
so use the shiny app to create a lookup table
(nyt_run_example("lookup_table_app")
). Place the output
of the app into the folder "renamed_news_desks". If you
don't have a file of replacements in the
"renamed_news_desks" folder, it will skip that step.
nyt_bind_news_desk_lookups(news_desk_output_folder = "renamed_news_desks") nyt_clean_news_desks( combined_df, news_desk_output_folder = "renamed_news_desks" )
news_desk_output_folder | folder to find news desks post-cleaning |
---|---|
combined_df | output of |
nyt_clean_news_desks()
returns a nested df where news desk values have been consolidated based on the lookup table created from the shiny app
You may not be able to write out the entire lookup table in one file so you first need to bind together multiple files, if they exist.
if (FALSE) { nested_df <- nyt_clean_news_desks(combined_df) }