Now that you have a final unnested version of the data, you need to create a nested version. In the process you are also ready to create new columns (india_rank, max_kword, and in_of_n_kword), which detail the relevance of India to an article.
extract_rank(df) find_max_kword(df) nyt_re_nest_keywords(full_unnested_df, max_allowable_india_rank = 30)
df | keyword df belonging to one article |
---|---|
full_unnested_df | output of |
max_allowable_india_rank | upper limit of india keyword rank of an article |
nyt_re_nest_keywords()
returns a nested df
extract_rank()
and find_max_kword()
are small helper functions
used when iterating over each article.
if (FALSE) { full_nested_df <- nyt_re_nest_keywords(full_unnested_df) }