Bind monthly rds files into single nested df and perform initial cleaning

nyt_bind_api_files() combines folder of rds files into one tbl. nyt_clean_api_tbl() performs initial cleaning on the output of nyt_bind_api_files(), such as removing unnecessary columns. It also writes a separate rds file of news desk values to use as input in the lookup table shiny app. Your working directory must be the same top level folder of the new project where nyt_get_data() was called.

nyt_bind_api_files(api_data_folder = "api_data")

nyt_clean_api_tbl(
  api_df,
  news_desk_input_folder = "news_desk_lookup_input",
  news_desk_output_folder = "renamed_news_desks"
)

Arguments

api_data_folder	name of folder where to write API data
api_df	tbl resulting from `nyt_bind_api_files()`
news_desk_input_folder	folder name to write news desks pre-cleaning
news_desk_output_folder	folder to find news desks post-cleaning

Value

nyt_bind_api_files() returns a tbl of raw API data

nyt_clean_api_tbl() returns a cleaner tbl with nested keywords

Details

This function also provides a fairly accurate fix for a bug in the NYT API where the same article is returned 2x. The URL is different but the date, headline, author etc is the same. It's especially common in 2006. Luckily, the "extra" article has a shorter URL so the function keeps only the longest URL for every headline/pub_date pair.

Examples

if (FALSE) {
api_df <- bind_api_files()
combined_df <- nyt_clean_api_tbl(api_df)
}