Title: | Clients to the 'Web of Science' and 'InCites' APIs |
---|---|
Description: | R clients to the 'Web of Science' and 'InCites' <https://clarivate.com/products/data-integration/> APIs, which allow you to programmatically download publication and citation data indexed in the 'Web of Science' and 'InCites' databases. |
Authors: | Christopher Baker [aut, cre] |
Maintainer: | Christopher Baker <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.3.0 |
Built: | 2024-10-27 04:19:18 UTC |
Source: | https://github.com/cran/wosr |
auth
asks the API's server for a session ID (SID), which you can then
pass along to either query_wos
or pull_wos
. Note,
there are limits on how many session IDs you can get in a given period of time
(roughly 5 SIDs in a 5 minute period).
auth(username = Sys.getenv("WOS_USERNAME"), password = Sys.getenv("WOS_PASSWORD"))
auth(username = Sys.getenv("WOS_USERNAME"), password = Sys.getenv("WOS_PASSWORD"))
username |
Your username. Specify |
password |
Your password. Specify |
A session ID
## Not run: # Pass user credentials in manually: auth("some_username", password = "some_password") # Use the default of looking for username and password in envvars, so you # don't have to keep specifying them in your code: Sys.setenv(WOS_USERNAME = "some_username", WOS_PASSWORD = "some_password") auth() ## End(Not run)
## Not run: # Pass user credentials in manually: auth("some_username", password = "some_password") # Use the default of looking for username and password in envvars, so you # don't have to keep specifying them in your code: Sys.setenv(WOS_USERNAME = "some_username", WOS_PASSWORD = "some_password") auth() ## End(Not run)
Use this function when you have a bunch of UTs whose data you want to pull and you need to write a series of UT-based queries to do so (i.e., queries in the form "UT = (WOS:000186387100005 OR WOS:000179260700001)").
create_ut_queries(uts, uts_per_query = 200)
create_ut_queries(uts, uts_per_query = 200)
uts |
UTs that will be placed inside the UT-based queries. |
uts_per_query |
Number of UTs to include in each query. Note, there is a limit on how long your query can be, so you probably want to keep this set to around 200. |
A vector of queries. You can feed these queries to
pull_wos_apply
to download data for each query.
## Not run: data <- pull_wos('TS = ("animal welfare") AND PY = (2002-2003)') queries <- create_ut_queries(data$publication$ut) pull_wos_apply(queries) ## End(Not run)
## Not run: data <- pull_wos('TS = ("animal welfare") AND PY = (2002-2003)') queries <- create_ut_queries(data$publication$ut) pull_wos_apply(queries) ## End(Not run)
Pull cited references
pull_cited_refs(uts, sid = auth(Sys.getenv("WOS_USERNAME"), Sys.getenv("WOS_PASSWORD")), ...)
pull_cited_refs(uts, sid = auth(Sys.getenv("WOS_USERNAME"), Sys.getenv("WOS_PASSWORD")), ...)
uts |
Vector of UTs (i.e., publications) whose cited references you want. |
sid |
Session identifier (SID). The default setting is to get a fresh
SID each time you query WoS via a call to |
... |
Arguments passed along to |
A data frame with the following columns:
The publication that is doing the citing. These are the UTs that
you submitted to pull_cited_refs
. If one of your publications
doesn't have any cited refs, it will not appear in this column.
The cited ref's document identifier (similar to a UT).
Roughly equivalent to the cited ref's title.
Roughly equivalent to the cited ref's journal.
The cited ref's first author.
The total number of citations the cited ref has received.
The cited ref's publication year.
The cited ref's page number.
The cited ref's journal volume.
## Not run: sid <- auth("your_username", password = "your_password") uts <- c("WOS:000362312600021", "WOS:000439855300030", "WOS:000294946900020") pull_cited_refs(uts, sid) ## End(Not run)
## Not run: sid <- auth("your_username", password = "your_password") uts <- c("WOS:000362312600021", "WOS:000439855300030", "WOS:000294946900020") pull_cited_refs(uts, sid) ## End(Not run)
Important note: The throttling limits on the InCites API are not
documented anywhere and are difficult to determine from experience. As such,
whenever pull_incites
receives a throttling error from the server, it
uses exponential backoff (with a maximum wait time of 45 minutes) to determine
how long to wait before retrying.
pull_incites(uts, key = Sys.getenv("INCITES_KEY"), as_raw = FALSE, ...)
pull_incites(uts, key = Sys.getenv("INCITES_KEY"), as_raw = FALSE, ...)
uts |
A vector of UTs whose InCites data you would like to download. Each UT is a 15-digit identifier for a given publication. You can specify the UT using only these 15 digits or you can append the 15 digits with "WOS:" (e.g., "000346263300011" or "WOS:000346263300011"). |
key |
The developer key that the server will use for authentication. |
as_raw |
Do you want the data frame that is returned by the API to be
returned to you in its raw form? This option can be useful if the API has
changed the format of the data that it is serving, in which case specifying
|
... |
Arguments passed along to |
A data frame where each row corresponds to a different publication.
The definitions for the columns in this data frame can be found online at
the API's documentation page
(see the DocumentLevelMetricsByUT
method details for definitions).
Note that the column names are all converted to lowercase by
pull_incites
and the 0/1 flag variables are converted to booleans.
Also note that not all publications indexed in WoS are also indexed in
InCites, so you may not get data back for some UTs.
## Not run: uts <- c( "WOS:000346263300011", "WOS:000362312600021", "WOS:000279885800004", "WOS:000294667500003", "WOS:000294946900020", "WOS:000412659200006" ) pull_incites(uts, key = "some_key") pull_incites(c("000346263300011", "000362312600021"), key = "some_key") ## End(Not run)
## Not run: uts <- c( "WOS:000346263300011", "WOS:000362312600021", "WOS:000279885800004", "WOS:000294667500003", "WOS:000294946900020", "WOS:000412659200006" ) pull_incites(uts, key = "some_key") pull_incites(c("000346263300011", "000362312600021"), key = "some_key") ## End(Not run)
pull_wos
wraps the process of querying, downloading, parsing, and
processing Web of Science data.
pull_wos(query, editions = c("SCI", "SSCI", "AHCI", "ISTP", "ISSHP", "BSCI", "BHCI", "IC", "CCR", "ESCI"), sid = auth(Sys.getenv("WOS_USERNAME"), Sys.getenv("WOS_PASSWORD")), ...)
pull_wos(query, editions = c("SCI", "SSCI", "AHCI", "ISTP", "ISSHP", "BSCI", "BHCI", "IC", "CCR", "ESCI"), sid = auth(Sys.getenv("WOS_USERNAME"), Sys.getenv("WOS_PASSWORD")), ...)
query |
Query string. See the WoS query documentation page for details on how to write a query as well as this list of example queries. |
editions |
Web of Science editions to query. Possible values are listed here. |
sid |
Session identifier (SID). The default setting is to get a fresh
SID each time you query WoS via a call to |
... |
Arguments passed along to |
A list of the following data frames:
A data frame where each row corresponds to a different
publication. Note that each publication has a distinct ut
. There is
a one-to-one relationship between a ut
and each of the columns
in this table.
A data frame where each row corresponds to a different
publication/author pair (i.e., a ut
/author_no
pair). In
other words, each row corresponds to a different author on a publication.
You can link the authors in this table to the address
and
author_address
tables to get their addresses (if they exist). See
example in FAQs for details.
A data frame where each row corresponds to a different
publication/address pair (i.e., a ut
/addr_no
pair). In
other words, each row corresponds to a different address on a publication.
You can link the addresses in this table to the author
and
author_address
tables to see which authors correspond to which
addresses. See example in FAQs for details.
A data frame that specifies which authors correspond
to which addresses on a given publication. This data frame is meant to
be used to link the author
and address
tables together.
A data frame where each row corresponds to a different
publication/jsc (journal subject category) pair. There is a many-to-many
relationship between ut
's and jsc
's.
A data frame where each row corresponds to a different publication/keyword pair. These are the author-assigned keywords.
A data frame where each row corresponds to a different publication/keywords_plus pair. These keywords are the keywords assigned by Clarivate Analytics through an automated process.
A data frame where each row corresponds to a different
publication/grant agency/grant ID triplet. Not all publications acknowledge
a specific grant number in the funding acknowledgement section, hence the
grant_id
field can be NA
.
A data frame where each row corresponds to a different publication/document type pair.
## Not run: sid <- auth("your_username", password = "your_password") pull_wos("TS = (dog welfare) AND PY = 2010", sid = sid) # Re-use session ID. This is best practice to avoid throttling limits: pull_wos("TI = \"dog welfare\"", sid = sid) # Get fresh session ID: pull_wos("TI = \"pet welfare\"", sid = auth("your_username", "your_password")) # It's best to see how many records your query matches before actually # downloading the data. To do this, call query_wos before running pull_wos: query <- "TS = ((cadmium AND gill*) NOT Pisces)" query_wos(query, sid = sid) # shows that there are 1,611 matching publications pull_wos(query, sid = sid) ## End(Not run)
## Not run: sid <- auth("your_username", password = "your_password") pull_wos("TS = (dog welfare) AND PY = 2010", sid = sid) # Re-use session ID. This is best practice to avoid throttling limits: pull_wos("TI = \"dog welfare\"", sid = sid) # Get fresh session ID: pull_wos("TI = \"pet welfare\"", sid = auth("your_username", "your_password")) # It's best to see how many records your query matches before actually # downloading the data. To do this, call query_wos before running pull_wos: query <- "TS = ((cadmium AND gill*) NOT Pisces)" query_wos(query, sid = sid) # shows that there are 1,611 matching publications pull_wos(query, sid = sid) ## End(Not run)
pull_wos
across multiple queriesRun pull_wos
across multiple queries
pull_wos_apply(queries, editions = c("SCI", "SSCI", "AHCI", "ISTP", "ISSHP", "BSCI", "BHCI", "IC", "CCR", "ESCI"), sid = auth(Sys.getenv("WOS_USERNAME"), Sys.getenv("WOS_PASSWORD")), ...)
pull_wos_apply(queries, editions = c("SCI", "SSCI", "AHCI", "ISTP", "ISSHP", "BSCI", "BHCI", "IC", "CCR", "ESCI"), sid = auth(Sys.getenv("WOS_USERNAME"), Sys.getenv("WOS_PASSWORD")), ...)
queries |
Vector of queries to issue to the WoS API and pull data for. |
editions |
Web of Science editions to query. Possible values are listed here. |
sid |
Session identifier (SID). The default setting is to get a fresh
SID each time you query WoS via a call to |
... |
Arguments passed along to |
The same set of data frames that pull_wos
returns, with
the addition of a data frame named query
. This data frame frame tells
you which publications were returned by a given query.
## Not run: queries <- c('TS = "dog welfare"', 'TS = "cat welfare"') # we can name the queries so that these names appear in the queries data # frame returned by pull_wos_apply(): names(queries) <- c("dog welfare", "cat welfare") pull_wos_apply(queries) ## End(Not run)
## Not run: queries <- c('TS = "dog welfare"', 'TS = "cat welfare"') # we can name the queries so that these names appear in the queries data # frame returned by pull_wos_apply(): names(queries) <- c("dog welfare", "cat welfare") pull_wos_apply(queries) ## End(Not run)
Returns the number of records that match a given query. It's best to call
this function before calling pull_wos
so that you know how
many records you're trying to download before attempting to do so.
query_wos(query, editions = c("SCI", "SSCI", "AHCI", "ISTP", "ISSHP", "BSCI", "BHCI", "IC", "CCR", "ESCI"), sid = auth(Sys.getenv("WOS_USERNAME"), Sys.getenv("WOS_PASSWORD")), ...)
query_wos(query, editions = c("SCI", "SSCI", "AHCI", "ISTP", "ISSHP", "BSCI", "BHCI", "IC", "CCR", "ESCI"), sid = auth(Sys.getenv("WOS_USERNAME"), Sys.getenv("WOS_PASSWORD")), ...)
query |
Query string. See the WoS query documentation page for details on how to write a query as well as this list of example queries. |
editions |
Web of Science editions to query. Possible values are listed here. |
sid |
Session identifier (SID). The default setting is to get a fresh
SID each time you query WoS via a call to |
... |
Arguments passed along to |
An object of class query_result
. This object contains the number
of publications that are returned by your query (rec_cnt
), as well as
some info that pull_wos
uses when it calls query_wos
internally.
## Not run: # Get session ID and reuse it across queries: sid <- auth("some_username", password = "some_password") query_wos("TS = (\"dog welfare\") AND PY = (1990-2007)", sid = sid) # Finds records in which Max Planck appears in the address field. query_wos("AD = Max Planck", sid = sid) # Finds records in which Max Planck appears in the same address as Mainz query_wos("AD = (Max Planck SAME Mainz)", sid = sid) ## End(Not run)
## Not run: # Get session ID and reuse it across queries: sid <- auth("some_username", password = "some_password") query_wos("TS = (\"dog welfare\") AND PY = (1990-2007)", sid = sid) # Finds records in which Max Planck appears in the address field. query_wos("AD = Max Planck", sid = sid) # Finds records in which Max Planck appears in the same address as Mainz query_wos("AD = (Max Planck SAME Mainz)", sid = sid) ## End(Not run)
query_wos
across multiple queriesRun query_wos
across multiple queries
query_wos_apply(queries, editions = c("SCI", "SSCI", "AHCI", "ISTP", "ISSHP", "BSCI", "BHCI", "IC", "CCR", "ESCI"), sid = auth(Sys.getenv("WOS_USERNAME"), Sys.getenv("WOS_PASSWORD")), ...)
query_wos_apply(queries, editions = c("SCI", "SSCI", "AHCI", "ISTP", "ISSHP", "BSCI", "BHCI", "IC", "CCR", "ESCI"), sid = auth(Sys.getenv("WOS_USERNAME"), Sys.getenv("WOS_PASSWORD")), ...)
queries |
Vector of queries run. |
editions |
Web of Science editions to query. Possible values are listed here. |
sid |
Session identifier (SID). The default setting is to get a fresh
SID each time you query WoS via a call to |
... |
Arguments passed along to |
A data frame which lists the number of records returned by each of your queries.
## Not run: queries <- c('TS = "dog welfare"', 'TS = "cat welfare"') query_wos_apply(queries) ## End(Not run)
## Not run: queries <- c('TS = "dog welfare"', 'TS = "cat welfare"') query_wos_apply(queries) ## End(Not run)
Reads in a series of CSV files (which were written via
write_wos_data
) and places the data in an object of class
wos_data
.
read_wos_data(dir)
read_wos_data(dir)
dir |
Path to the directory where you wrote the CSV files. |
An object of class wos_data
.
## Not run: sid <- auth("your_username", password = "your_password") wos_data <- pull_wos("TS = (dog welfare) AND PY = 2010", sid = sid) # Write files to working directory write_wos_data(wos_data, ".") # Read data back into R wos_data <- read_wos_data(".") ## End(Not run)
## Not run: sid <- auth("your_username", password = "your_password") wos_data <- pull_wos("TS = (dog welfare) AND PY = 2010", sid = sid) # Write files to working directory write_wos_data(wos_data, ".") # Read data back into R wos_data <- read_wos_data(".") ## End(Not run)
Writes each of the data frames in an object of class wos_data
to its
own csv file.
write_wos_data(wos_data, dir)
write_wos_data(wos_data, dir)
wos_data |
An object of class |
dir |
Path to the directory where you want to write the files. If the
directory doesn't yet exist, |
Nothing. Files are written to disk.
## Not run: sid <- auth("your_username", password = "your_password") wos_data <- pull_wos("TS = (dog welfare) AND PY = 2010", sid = sid) # Write files to working directory write_wos_data(wos_data, ".") # Write files to "wos-data" dir write_wos_data(wos_data, "wos-data") ## End(Not run)
## Not run: sid <- auth("your_username", password = "your_password") wos_data <- pull_wos("TS = (dog welfare) AND PY = 2010", sid = sid) # Write files to working directory write_wos_data(wos_data, ".") # Write files to "wos-data" dir write_wos_data(wos_data, "wos-data") ## End(Not run)