nyt_bind_api_files() combines folder of rds files into one tbl. nyt_clean_api_tbl() performs initial cleaning on the output of nyt_bind_api_files(), such as removing unnecessary columns. It also writes a separate rds file of news desk values to use as input in the lookup table shiny app. Your working directory must be the same top level folder of the new project where nyt_get_data() was called.

nyt_bind_api_files(api_data_folder = "api_data")

nyt_clean_api_tbl(
  api_df,
  news_desk_input_folder = "news_desk_lookup_input",
  news_desk_output_folder = "renamed_news_desks"
)

Arguments

api_data_folder

name of folder where to write API data

api_df

tbl resulting from nyt_bind_api_files()

news_desk_input_folder

folder name to write news desks pre-cleaning

news_desk_output_folder

folder to find news desks post-cleaning

Value

nyt_bind_api_files() returns a tbl of raw API data

nyt_clean_api_tbl() returns a cleaner tbl with nested keywords

Details

This function also provides a fairly accurate fix for a bug in the NYT API where the same article is returned 2x. The URL is different but the date, headline, author etc is the same. It's especially common in 2006. Luckily, the "extra" article has a shorter URL so the function keeps only the longest URL for every headline/pub_date pair.

Examples

if (FALSE) { api_df <- bind_api_files() combined_df <- nyt_clean_api_tbl(api_df) }