R/02-prepare-nested.R
nyt_bind_api_files.Rd
nyt_bind_api_files()
combines folder of rds files into one tbl.
nyt_clean_api_tbl()
performs initial cleaning on the output
of nyt_bind_api_files()
, such as removing unnecessary columns.
It also writes a separate rds file of news desk values to use as
input in the lookup table shiny app. Your working directory
must be the same top level folder of the new project where
nyt_get_data()
was called.
nyt_bind_api_files(api_data_folder = "api_data") nyt_clean_api_tbl( api_df, news_desk_input_folder = "news_desk_lookup_input", news_desk_output_folder = "renamed_news_desks" )
api_data_folder | name of folder where to write API data |
---|---|
api_df | tbl resulting from |
news_desk_input_folder | folder name to write news desks pre-cleaning |
news_desk_output_folder | folder to find news desks post-cleaning |
nyt_bind_api_files()
returns a tbl of raw API data
nyt_clean_api_tbl()
returns a cleaner tbl with nested keywords
This function also provides a fairly accurate fix for a bug in the NYT API where the same article is returned 2x. The URL is different but the date, headline, author etc is the same. It's especially common in 2006. Luckily, the "extra" article has a shorter URL so the function keeps only the longest URL for every headline/pub_date pair.
if (FALSE) { api_df <- bind_api_files() combined_df <- nyt_clean_api_tbl(api_df) }