Title: | glitter makes SPARQL |
---|---|
Description: | This package aims at writing and sending SPARQL queries. It makes the exploration and use of Linked Open Data (Wikidata in particular) easier for those who do not know SPARQL. |
Authors: | Lise Vaudor [aut, cre] , Maƫlle Salmon [aut] |
Maintainer: | Lise Vaudor <[email protected]> |
License: | GPL-2 |
Version: | 0.2.999 |
Built: | 2024-11-15 03:21:45 UTC |
Source: | https://github.com/lvaudor/glitter |
Correspondence between R-DSL functions and SPARQL functions/operators.
set_functions term_functions misc_functions string_functions numeric_functions datetime_functions operators all_correspondences
set_functions term_functions misc_functions string_functions numeric_functions datetime_functions operators all_correspondences
A data frame.
R-DSL function
SPARQL function
list-column with R vs SPARQL argument names
An object of class tbl_df
(inherits from tbl
, data.frame
) with 21 rows and 2 columns.
An object of class tbl_df
(inherits from tbl
, data.frame
) with 9 rows and 2 columns.
An object of class tbl_df
(inherits from tbl
, data.frame
) with 12 rows and 3 columns.
An object of class tbl_df
(inherits from tbl
, data.frame
) with 4 rows and 2 columns.
An object of class tbl_df
(inherits from tbl
, data.frame
) with 7 rows and 2 columns.
An object of class tbl_df
(inherits from tbl
, data.frame
) with 6 rows and 2 columns.
An object of class tbl_df
(inherits from tbl
, data.frame
) with 66 rows and 3 columns.
Like dbplyr::spq()
.
spq(...) is.spq(x) as.spq(x)
spq(...) is.spq(x) as.spq(x)
... |
Character vectors that will be combined into a single SPARQL expression. |
x |
Object to coerce |
Add a triple pattern statement to a query
spq_add( .query = NULL, .triple_pattern = NULL, .subject = NULL, .verb = NULL, .object = NULL, .prefixes = NULL, .required = TRUE, .label = NA, .within_box = c(NA, NA), .within_distance = c(NA, NA), .filter = NULL, .sibling_triple_pattern = NA )
spq_add( .query = NULL, .triple_pattern = NULL, .subject = NULL, .verb = NULL, .object = NULL, .prefixes = NULL, .required = TRUE, .label = NA, .within_box = c(NA, NA), .within_distance = c(NA, NA), .filter = NULL, .sibling_triple_pattern = NA )
.query |
query |
.triple_pattern |
the triple pattern statement (replaces arguments subject verb and object) |
.subject |
an anonymous variable (for instance, and by default, "?subject") or item (for instance "wd:Q456")) |
.verb |
the property (for instance "wdt:P190") |
.object |
an anonymous variable (for instance, and by default, "?object") or item (for instance "wd:Q456")) |
.prefixes |
Custom prefixes |
.required |
whether the existence of a value for the triple is required or not (defaults to TRUE). If set to FALSE, then other triples in the query are returned even if this particular triple is missing) |
.label |
See |
.within_box |
if provided, rectangular bounding box for the triple query. Provided as list(southwest=c(long=...,lat=...),northeast=c(long=...,lat=...)) |
.within_distance |
if provided, circular bounding box for the triple query. Provided as list(center=c(long=...,lat=...), radius=...), with radius in kilometers. The center can also be provided as a variable (for instance, "?location") for the center coordinates to be retrieved directly from the query. |
.filter |
Filter for the triple. Only use this with |
.sibling_triple_pattern |
Triple this triple is to be grouped with, especially (only?) useful if the sibling triple is optional. |
The arguments .subject
, .verb
, .object
are most useful for programmatic
usage, they are actually used within glitter code itself.
# find the cities spq_init() %>% spq_add("?city wdt:P31/wdt:P279* wd:Q486972") %>% spq_label(city) %>% spq_mutate(coords = wdt::P625(city), .within_distance=list(center=c(long=4.84,lat=45.76), radius=5)) %>% spq_perform() # find the individuals of the species spq_init() %>% spq_add("?mayor wdt:P31 ?species") %>% # dog, cat or chicken spq_set(species = c('wd:Q144','wd:Q146', 'wd:Q780')) %>% # who occupy the function spq_add("?mayor p:P39 ?node") %>% # of mayor spq_add("?node ps:P39 wd:Q30185") %>% # of some places spq_add("?node pq:P642 ?place") %>% spq_perform()
Arrange results by variable value
spq_arrange(.query, ..., .replace = FALSE)
spq_arrange(.query, ..., .replace = FALSE)
.query |
a list with elements of the query |
... |
variables by which to arrange
(or SPARQL strings escaped with |
.replace |
whether to replace the pre-existing arranging |
A query object
# descending length, ascending item_label, "R" syntax spq_init() %>% spq_add("?item wdt:P31/wdt:P279* wd:Q4022") %>% spq_label(item) %>% spq_add("?item wdt:P2043 ?length") %>% spq_add("?item wdt:P625 ?location") %>% spq_arrange(desc(length), item_label) %>% spq_head(50) # descending length, ascending item_label, # "R" syntax with quotes e.g. for a loop variable = "length" spq_init() %>% spq_add("?item wdt:P31/wdt:P279* wd:Q4022") %>% spq_label(item) %>% spq_add("?item wdt:P2043 ?length") %>% spq_add("?item wdt:P625 ?location") %>% spq_arrange(sprintf("desc(%s)", variable), item_label) %>% spq_head(50) # descending length, ascending item_label, SPARQL syntax spq_init() %>% spq_add("?item wdt:P31/wdt:P279* wd:Q4022") %>% spq_label(item) %>% spq_add("?item wdt:P2043 ?length") %>% spq_add("?item wdt:P625 ?location") %>% spq_arrange(spq("DESC(?length) ?item_label")) %>% spq_head(50) # descending xsd:integer(mort), R syntax spq_init() %>% spq_add("?oeuvre dcterms:creator ?auteur") %>% spq_add("?auteur bio:death ?mort") %>% spq_add("?auteur foaf:familyName ?nom") %>% spq_filter(as.integer(mort) < as.integer("1924")) %>% spq_group_by(auteur, nom, mort) %>% spq_arrange(desc(as.integer(mort))) # descending as.integer(mort), SPARQL syntax spq_init() %>% spq_add("?oeuvre dcterms:creator ?auteur") %>% spq_add("?auteur bio:death ?mort") %>% spq_add("?auteur foaf:familyName ?nom") %>% spq_filter(as.integer(mort) < as.integer("1924")) %>% spq_group_by(auteur, nom, mort) %>% spq_arrange(spq("DESC(xsd:integer(?mort))")) # Usage of the .replace argument # .replace = FALSE (default) spq_init() %>% spq_add("?item wdt:P31/wdt:P279* wd:Q4022") %>% spq_label(item) %>% spq_add("?item wdt:P2043 ?length") %>% spq_add("?item wdt:P625 ?location") %>% spq_arrange(desc(length)) %>% spq_arrange(location) %>% spq_head(50) # .replace = TRUE spq_init() %>% spq_add("?item wdt:P31/wdt:P279* wd:Q4022") %>% spq_label(item) %>% spq_add("?item wdt:P2043 ?length") %>% spq_add("?item wdt:P625 ?location") %>% spq_arrange(desc(length)) %>% spq_arrange(location, .replace = TRUE) %>% spq_head(50) # Mixing syntaxes spq_init() %>% spq_add("?item wdt:P31/wdt:P279* wd:Q4022") %>% spq_label(item) %>% spq_add("?item wdt:P2043 ?length") %>% spq_add("?item wdt:P625 ?location") %>% spq_arrange(desc(length), spq("?location")) %>% spq_head(50)
# descending length, ascending item_label, "R" syntax spq_init() %>% spq_add("?item wdt:P31/wdt:P279* wd:Q4022") %>% spq_label(item) %>% spq_add("?item wdt:P2043 ?length") %>% spq_add("?item wdt:P625 ?location") %>% spq_arrange(desc(length), item_label) %>% spq_head(50) # descending length, ascending item_label, # "R" syntax with quotes e.g. for a loop variable = "length" spq_init() %>% spq_add("?item wdt:P31/wdt:P279* wd:Q4022") %>% spq_label(item) %>% spq_add("?item wdt:P2043 ?length") %>% spq_add("?item wdt:P625 ?location") %>% spq_arrange(sprintf("desc(%s)", variable), item_label) %>% spq_head(50) # descending length, ascending item_label, SPARQL syntax spq_init() %>% spq_add("?item wdt:P31/wdt:P279* wd:Q4022") %>% spq_label(item) %>% spq_add("?item wdt:P2043 ?length") %>% spq_add("?item wdt:P625 ?location") %>% spq_arrange(spq("DESC(?length) ?item_label")) %>% spq_head(50) # descending xsd:integer(mort), R syntax spq_init() %>% spq_add("?oeuvre dcterms:creator ?auteur") %>% spq_add("?auteur bio:death ?mort") %>% spq_add("?auteur foaf:familyName ?nom") %>% spq_filter(as.integer(mort) < as.integer("1924")) %>% spq_group_by(auteur, nom, mort) %>% spq_arrange(desc(as.integer(mort))) # descending as.integer(mort), SPARQL syntax spq_init() %>% spq_add("?oeuvre dcterms:creator ?auteur") %>% spq_add("?auteur bio:death ?mort") %>% spq_add("?auteur foaf:familyName ?nom") %>% spq_filter(as.integer(mort) < as.integer("1924")) %>% spq_group_by(auteur, nom, mort) %>% spq_arrange(spq("DESC(xsd:integer(?mort))")) # Usage of the .replace argument # .replace = FALSE (default) spq_init() %>% spq_add("?item wdt:P31/wdt:P279* wd:Q4022") %>% spq_label(item) %>% spq_add("?item wdt:P2043 ?length") %>% spq_add("?item wdt:P625 ?location") %>% spq_arrange(desc(length)) %>% spq_arrange(location) %>% spq_head(50) # .replace = TRUE spq_init() %>% spq_add("?item wdt:P31/wdt:P279* wd:Q4022") %>% spq_label(item) %>% spq_add("?item wdt:P2043 ?length") %>% spq_add("?item wdt:P625 ?location") %>% spq_arrange(desc(length)) %>% spq_arrange(location, .replace = TRUE) %>% spq_head(50) # Mixing syntaxes spq_init() %>% spq_add("?item wdt:P31/wdt:P279* wd:Q4022") %>% spq_label(item) %>% spq_add("?item wdt:P2043 ?length") %>% spq_add("?item wdt:P625 ?location") %>% spq_arrange(desc(length), spq("?location")) %>% spq_head(50)
Assemble query parts into a proper SPARQL query
spq_assemble(.query, strict = TRUE)
spq_assemble(.query, strict = TRUE)
.query |
a list with elements of the query |
strict |
whether to perform some linting on the query, and error in case a problem is detected. |
A query object
spq_init() %>% spq_add("?city wdt:P31 wd:Q515") %>% spq_label(city, .languages = "fr$") %>% spq_add("?city wdt:P1082 ?pop") %>% spq_assemble() %>% cat()
spq_init() %>% spq_add("?city wdt:P31 wd:Q515") %>% spq_label(city, .languages = "fr$") %>% spq_add("?city wdt:P1082 ?pop") %>% spq_assemble() %>% cat()
spq_init()
Create the request control object for spq_init()
spq_control_request( user_agent = getOption("glitter.ua", "glitter R package (https://github.com/lvaudor/glitter)"), max_tries = getOption("glitter.max_tries", 3L), max_seconds = getOption("glitter.max_seconds", 120L), timeout = getOption("glitter.timeout", 1000L), request_type = c("url", "body-form"), rate = NULL, realm = NULL )
spq_control_request( user_agent = getOption("glitter.ua", "glitter R package (https://github.com/lvaudor/glitter)"), max_tries = getOption("glitter.max_tries", 3L), max_seconds = getOption("glitter.max_seconds", 120L), timeout = getOption("glitter.timeout", 1000L), request_type = c("url", "body-form"), rate = NULL, realm = NULL )
user_agent |
a string indicating the user agent to send with the query. |
max_tries , max_seconds
|
Cap the maximal number of
attemps with |
timeout |
maximum number of seconds to wait ( |
request_type |
a string indicating how the query should be sent: in the
URL ( |
rate |
Maximum rate, i.e. maximum number of requests per second.
Usually easiest expressed as a fraction,
|
realm |
An unique identifier that for throttle pool. If not supplied, defaults to the hostname of the request. |
A list to be used in spq_init()
's request_control
argument.
# Defaults spq_control_request() # Tweaking values spq_control_request( user_agent = "Jane Doe https://example.com", max_tries = 1L, max_seconds = 10L, timeout = 10L, request_type = "url" )
# Defaults spq_control_request() # Tweaking values spq_control_request( user_agent = "Jane Doe https://example.com", max_tries = 1L, max_seconds = 10L, timeout = 10L, request_type = "url" )
spq_init()
Create the endpoint info object for spq_init()
spq_endpoint_info(label_property = "rdfs:prefLabel")
spq_endpoint_info(label_property = "rdfs:prefLabel")
label_property |
Property used by the endpoint for labelling. |
A list to be used in spq_init()
's endpoint_info
argument.
spq_endpoint_info(label_property = "skos:preflabel")
spq_endpoint_info(label_property = "skos:preflabel")
Filters results by adding conditions
spq_filter( .query = NULL, ..., .label = NA, .within_box = c(NA, NA), .within_distance = c(NA, NA) )
spq_filter( .query = NULL, ..., .label = NA, .within_box = c(NA, NA), .within_distance = c(NA, NA) )
.query |
a list with elements of the query |
... |
variables by which to arrange
(or SPARQL strings escaped with |
.label |
See |
.within_box |
if provided, rectangular bounding box for the triple query. Provided as list(southwest=c(long=...,lat=...),northeast=c(long=...,lat=...)) |
.within_distance |
if provided, circular bounding box for the triple query. Provided as list(center=c(long=...,lat=...), radius=...), with radius in kilometers. The center can also be provided as a variable (for instance, "?location") for the center coordinates to be retrieved directly from the query. |
A query object
spq_init() %>% spq_filter(item == wdt::P31(wd::Q13442814)) # Lexemes in English that match an expression # here starting with "pota" query <- spq_init() |> spq_prefix(prefixes = c(dct = "http://purl.org/dc/terms/")) |> spq_add(spq('?lexemeId dct:language wd:Q1860')) |> spq_mutate(lemma = wikibase::lemma(lexemeId)) |> spq_filter(str_detect(lemma, '^pota.*')) |> spq_select(lexemeId, lemma)
Group the results by one or more variables
spq_group_by(.query, ...)
spq_group_by(.query, ...)
.query |
query |
... |
Either R-DSL or strings with variable names |
A query object
spq_init() %>% spq_add("?item wdt:P361 wd:Q297853") %>% spq_add("?item wdt:P1082 ?folkm_ngd") %>% spq_add("?area wdt:P31 wd:Q1907114") %>% spq_label(area) %>% spq_add("?area wdt:P527 ?item") %>% spq_group_by(area, area_label) %>% spq_summarise(total_folkm = sum(folkm_ngd))
spq_init() %>% spq_add("?item wdt:P361 wd:Q297853") %>% spq_add("?item wdt:P1082 ?folkm_ngd") %>% spq_add("?area wdt:P31 wd:Q1907114") %>% spq_label(area) %>% spq_add("?area wdt:P527 ?item") %>% spq_group_by(area, area_label) %>% spq_summarise(total_folkm = sum(folkm_ngd))
Return the first lines of results
spq_head(.query, n = 5)
spq_head(.query, n = 5)
.query |
a list with elements of the query |
n |
the maximum number of lines to return |
A query object
spq_offset()
and spq_head()
are only useful when used with
spq_arrange()
that makes the order of results predictable.
# Return the default of 5 items spq_init() %>% spq_add("?item wdt:P31 wd:Q5") %>% spq_label(item) %>% spq_add("?item wdt:P19/wdt:P131* wd:Q60") %>% spq_add("?item wikibase:sitelinks ?linkcount") %>% spq_arrange(desc(linkcount)) %>% spq_head() # Return 42 items spq_init() %>% spq_add("?item wdt:P31 wd:Q5") %>% spq_label(item) %>% spq_add("?item wdt:P19/wdt:P131* wd:Q60") %>% spq_add("?item wikibase:sitelinks ?linkcount") %>% spq_arrange(desc(linkcount)) %>% spq_head(42)
# Return the default of 5 items spq_init() %>% spq_add("?item wdt:P31 wd:Q5") %>% spq_label(item) %>% spq_add("?item wdt:P19/wdt:P131* wd:Q60") %>% spq_add("?item wikibase:sitelinks ?linkcount") %>% spq_arrange(desc(linkcount)) %>% spq_head() # Return 42 items spq_init() %>% spq_add("?item wdt:P31 wd:Q5") %>% spq_label(item) %>% spq_add("?item wdt:P19/wdt:P131* wd:Q60") %>% spq_add("?item wikibase:sitelinks ?linkcount") %>% spq_arrange(desc(linkcount)) %>% spq_head(42)
Initialize a query object.
spq_init( endpoint = "wikidata", request_control = spq_control_request(user_agent = getOption("glitter.ua", "glitter R package (https://github.com/lvaudor/glitter)"), max_tries = getOption("glitter.max_tries", 3L), max_seconds = getOption("glitter.max_seconds", 120L), timeout = getOption("glitter.timeout", 1000L), request_type = c("url", "body-form")), endpoint_info = spq_endpoint_info(label_property = "rdfs:label") )
spq_init( endpoint = "wikidata", request_control = spq_control_request(user_agent = getOption("glitter.ua", "glitter R package (https://github.com/lvaudor/glitter)"), max_tries = getOption("glitter.max_tries", 3L), max_seconds = getOption("glitter.max_seconds", 120L), timeout = getOption("glitter.timeout", 1000L), request_type = c("url", "body-form")), endpoint_info = spq_endpoint_info(label_property = "rdfs:label") )
endpoint |
Endpoint, either name if it is in |
request_control |
An object as returned by |
endpoint_info |
Do not use for an usual endpoint in |
A query object
SPARQL queries are shown using the cli package,
with a built-in theme.
You can change it by using the cli.user_theme
option.
We use
.emph
for keywords and functions,
.field
for variables,
.pkg
for prefixes,
.val
for strings,
.url
for prefix URLs.
You can also turn off the cli behavior by setting the environment variable
"GLITTER.NOCLI"
to any non-empty string.
That's what we do in glitter snapshot tests.
Label variables
spq_label( .query, ..., .required = FALSE, .languages = getOption("glitter.lang", "en$"), .overwrite = FALSE )
spq_label( .query, ..., .required = FALSE, .languages = getOption("glitter.lang", "en$"), .overwrite = FALSE )
.query |
a list with elements of the query |
... |
variables by which to arrange
(or SPARQL strings escaped with |
.required |
whether the existence of a value for the triple is required or not (defaults to TRUE). If set to FALSE, then other triples in the query are returned even if this particular triple is missing) |
.languages |
Languages for which to query labels. Use |
.overwrite |
whether to replace variables with their labels.
|
spq_label()
uses the property:
associated with the usual endpoint see usual_endpoints
the property indicated in spq_endpoint_info()
A query object
spq_init() %>% spq_add("?mayor wdt:P31 ?species") %>% # dog, cat or chicken spq_set(species = c('wd:Q144','wd:Q146', 'wd:Q780')) %>% # who occupy the function spq_add("?mayor p:P39 ?node") %>% # of mayor spq_add("?node ps:P39 wd:Q30185") %>% # of some places spq_add("?node pq:P642 ?place") %>% spq_label(mayor, place, .languages = c("fr", "en", "de")) %>% spq_perform()
Create and modify variables in the results
spq_mutate( .query, ..., .label = NA, .within_box = c(NA, NA), .within_distance = c(NA, NA) )
spq_mutate( .query, ..., .label = NA, .within_box = c(NA, NA), .within_distance = c(NA, NA) )
.query |
a list with elements of the query |
... |
variables by which to arrange
(or SPARQL strings escaped with |
.label |
See |
.within_box |
if provided, rectangular bounding box for the triple query. Provided as list(southwest=c(long=...,lat=...),northeast=c(long=...,lat=...)) |
.within_distance |
if provided, circular bounding box for the triple query. Provided as list(center=c(long=...,lat=...), radius=...), with radius in kilometers. The center can also be provided as a variable (for instance, "?location") for the center coordinates to be retrieved directly from the query. |
A query object
# common name of a plant species in different languages # the triplet pattern "wd:Q331676 wdt:P1843 ?statement" # creates the variable statement # hence our writing it in reverse within the spq_mutate() function spq_init() %>% spq_mutate(statement = wdt::P1843(wd::Q331676)) %>% spq_mutate(lang = lang(statement))
Offset the first generated result
spq_offset(.query, n = 5)
spq_offset(.query, n = 5)
.query |
a list with elements of the query |
n |
the maximum number of lines to return |
A query object
spq_offset()
and spq_head()
are only useful when used with
spq_arrange()
that makes the order of results predictable.
# Return 42 items spq_init() %>% spq_add("?item wdt:P31 wd:Q5") %>% spq_label(item) %>% spq_add("?item wdt:P19/wdt:P131* wd:Q60") %>% spq_add("?item wikibase:sitelinks ?linkcount") %>% spq_arrange(desc(linkcount)) %>% spq_head(n=42) # Return 42 items after the first 11 items spq_init() %>% spq_add("?item wdt:P31 wd:Q5") %>% spq_label(item) %>% spq_add("?item wdt:P19/wdt:P131* wd:Q60") %>% spq_add("?item wikibase:sitelinks ?linkcount") %>% spq_arrange(desc(linkcount)) %>% spq_head(42) %>% spq_offset(11)
# Return 42 items spq_init() %>% spq_add("?item wdt:P31 wd:Q5") %>% spq_label(item) %>% spq_add("?item wdt:P19/wdt:P131* wd:Q60") %>% spq_add("?item wikibase:sitelinks ?linkcount") %>% spq_arrange(desc(linkcount)) %>% spq_head(n=42) # Return 42 items after the first 11 items spq_init() %>% spq_add("?item wdt:P31 wd:Q5") %>% spq_label(item) %>% spq_add("?item wdt:P19/wdt:P131* wd:Q60") %>% spq_add("?item wikibase:sitelinks ?linkcount") %>% spq_arrange(desc(linkcount)) %>% spq_head(42) %>% spq_offset(11)
Assemble query parts into a sparql query and send it to endpoint to get a tibble as a result.
spq_perform( .query, endpoint = lifecycle::deprecated(), user_agent = lifecycle::deprecated(), max_tries = lifecycle::deprecated(), max_seconds = lifecycle::deprecated(), timeout = lifecycle::deprecated(), request_type = lifecycle::deprecated(), dry_run = FALSE, replace_prefixes = FALSE )
spq_perform( .query, endpoint = lifecycle::deprecated(), user_agent = lifecycle::deprecated(), max_tries = lifecycle::deprecated(), max_seconds = lifecycle::deprecated(), timeout = lifecycle::deprecated(), request_type = lifecycle::deprecated(), dry_run = FALSE, replace_prefixes = FALSE )
A query object
Control the way the query is performed via the control_request
argument of spq_init()
.
This way you can create a basic spq object with all the correct options
corresponding to the SPARQL service you are using, and then use it as
the basis of all your subsequent glitter pipelines.
## Not run: spq_init() %>% spq_add(.subject="?city",.verb="wdt:P31",.object="wd:Q515") %>% spq_add(.subject="?city",.verb="wdt:P1082",.object="?pop") %>% spq_label(city) %>% spq_head(n=5) %>% spq_perform() ## End(Not run)
## Not run: spq_init() %>% spq_add(.subject="?city",.verb="wdt:P31",.object="wd:Q515") %>% spq_add(.subject="?city",.verb="wdt:P1082",.object="?pop") %>% spq_label(city) %>% spq_head(n=5) %>% spq_perform() ## End(Not run)
Add prefixes to the query
spq_prefix(.query = NULL, auto = TRUE, prefixes = NULL)
spq_prefix(.query = NULL, auto = TRUE, prefixes = NULL)
.query |
a list with elements of the query |
auto |
whether to use built-in prefixes |
prefixes |
a vector of prefixes |
A query object
spq_init() %>% spq_prefix(prefixes=c(dbo="http://dbpedia.org/ontology/"))
spq_init() %>% spq_prefix(prefixes=c(dbo="http://dbpedia.org/ontology/"))
Select (and create) particular variables
spq_select(.query = NULL, ..., .spq_duplicate = NULL)
spq_select(.query = NULL, ..., .spq_duplicate = NULL)
.query |
a list with elements of the query |
... |
variables by which to arrange
(or SPARQL strings escaped with |
.spq_duplicate |
How to handle duplicates: keep them ( |
A query object
spq_init() |> spq_prefix(prefixes = c(dct = "http://purl.org/dc/terms/")) |> spq_add(spq('?lexemeId dct:language wd:Q1860')) |> spq_add(spq("?lexemeId wikibase:lemma ?lemma")) |> spq_filter(str_detect(lemma, '^pota.*')) |> spq_select(- lemma) spq_init() |> spq_prefix(prefixes = c(dct = "http://purl.org/dc/terms/")) |> spq_add(spq('?lexemeId dct:language wd:Q1860')) |> spq_add(spq("?lexemeId wikibase:lemma ?lemma")) |> spq_filter(str_detect(lemma, '^pota.*')) |> spq_select(lemma)
spq_init() |> spq_prefix(prefixes = c(dct = "http://purl.org/dc/terms/")) |> spq_add(spq('?lexemeId dct:language wd:Q1860')) |> spq_add(spq("?lexemeId wikibase:lemma ?lemma")) |> spq_filter(str_detect(lemma, '^pota.*')) |> spq_select(- lemma) spq_init() |> spq_prefix(prefixes = c(dct = "http://purl.org/dc/terms/")) |> spq_add(spq('?lexemeId dct:language wd:Q1860')) |> spq_add(spq("?lexemeId wikibase:lemma ?lemma")) |> spq_filter(str_detect(lemma, '^pota.*')) |> spq_select(lemma)
Set helper values for the query (helps with readability)
spq_set(.query, ...)
spq_set(.query, ...)
.query |
query |
... |
Helper values and their definition. |
A query object
# find the individuals of the species spq_init() %>% # dog, cat or chicken spq_set(species = c('wd:Q144','wd:Q146', 'wd:Q780'), mayorcode = "wd:Q30185") %>% spq_filter(mayor == wdt::P31(species)) %>% spq_add("?mayor p:P39 ?node") %>% # of mayor spq_add("?node ps:P39 ?mayorcode") %>% # of some places spq_add("?node pq:P642 ?place") %>% spq_label(species, mayor, place) %>% spq_select(-species, -place, -node, -mayor, -mayorcode) %>% spq_perform()
Summarise each group of results to fewer results
spq_summarise(.query, ...) spq_summarize(.query, ...)
spq_summarise(.query, ...) spq_summarize(.query, ...)
.query |
a list with elements of the query |
... |
variables by which to arrange
(or SPARQL strings escaped with |
A query object
result = spq_init() %>% spq_add("?item wdt:P361 wd:Q297853") %>% spq_add("?item wdt:P1082 ?folkm_ngd") %>% spq_add("?area wdt:P31 wd:Q1907114") %>% spq_label(area) %>% spq_add("?area wdt:P527 ?item") %>% spq_group_by(area, area_label) %>% spq_summarise(total_folkm = sum(folkm_ngd))
result = spq_init() %>% spq_add("?item wdt:P361 wd:Q297853") %>% spq_add("?item wdt:P1082 ?folkm_ngd") %>% spq_add("?area wdt:P31 wd:Q1907114") %>% spq_label(area) %>% spq_add("?area wdt:P527 ?item") %>% spq_group_by(area, area_label) %>% spq_summarise(total_folkm = sum(folkm_ngd))
These functions are inspired by dplyr::count()
and dplyr::tally()
.
spq_tally()
assumes you've already done the grouping.
spq_tally(.query, sort = FALSE, name = "n") spq_count(.query, ..., sort = FALSE, name = "n")
spq_tally(.query, sort = FALSE, name = "n") spq_count(.query, ..., sort = FALSE, name = "n")
.query |
a list with elements of the query |
sort |
If |
name |
Name for the count column (like the |
... |
variables by which to arrange
(or SPARQL strings escaped with |
A query object
## Not run: spq_init() %>% spq_add("?film wdt:P31 wd:Q11424") %>% spq_mutate(narrative_location = wdt::P840(film)) %>% spq_label(narrative_location) %>% spq_tally(name = "n_films") %>% spq_perform() # the same with spq_count spq_init() %>% spq_add("?film wdt:P31 wd:Q11424") %>% spq_mutate(narrative_location = wdt::P840(film)) %>% spq_label(narrative_location) %>% spq_count(name = "n_films") %>% spq_perform() # Now with grouping spq_init() %>% spq_add("?film wdt:P31 wd:Q11424") %>% spq_mutate(narrative_location = wdt::P840(film)) %>% spq_label(film, narrative_location) %>% spq_group_by(narrative_location_label) %>% spq_tally(sort = TRUE, name = "n_films") %>% spq_perform() # More direct with spq_count() spq_init() %>% spq_add("?film wdt:P31 wd:Q11424") %>% spq_mutate(narrative_location = wdt::P840(film)) %>% spq_label(film, narrative_location) %>% spq_count(narrative_location_label, sort = TRUE, name = "n_films") %>% spq_perform() ## End(Not run)
## Not run: spq_init() %>% spq_add("?film wdt:P31 wd:Q11424") %>% spq_mutate(narrative_location = wdt::P840(film)) %>% spq_label(narrative_location) %>% spq_tally(name = "n_films") %>% spq_perform() # the same with spq_count spq_init() %>% spq_add("?film wdt:P31 wd:Q11424") %>% spq_mutate(narrative_location = wdt::P840(film)) %>% spq_label(narrative_location) %>% spq_count(name = "n_films") %>% spq_perform() # Now with grouping spq_init() %>% spq_add("?film wdt:P31 wd:Q11424") %>% spq_mutate(narrative_location = wdt::P840(film)) %>% spq_label(film, narrative_location) %>% spq_group_by(narrative_location_label) %>% spq_tally(sort = TRUE, name = "n_films") %>% spq_perform() # More direct with spq_count() spq_init() %>% spq_add("?film wdt:P31 wd:Q11424") %>% spq_mutate(narrative_location = wdt::P840(film)) %>% spq_label(film, narrative_location) %>% spq_count(narrative_location_label, sort = TRUE, name = "n_films") %>% spq_perform() ## End(Not run)
Usual endpoints: this dataset allows the user to refer to them using a simplified name rather than their full url.
usual_endpoints
usual_endpoints
A data frame with usual SPARQL endpoints and abbreviated names
the abbreviated name of the SPARQL endpoint
the full address of the SPARQL endpoint
the property used for labelling
Usual prefixes: this dataset allows the user to refer to usual prefixes in their queries without manually specifying the associated urls.
usual_prefixes
usual_prefixes
A data frame with usual prefixes
the type of prefix
the prefix itself
the corresponding ontology
...
Wikidata properties
wd_properties
wd_properties
A data frame with 8939 rows and 5 variables:
id
property type
property label
property description
alternative labels
...