Writing Wikidata Queries – Workshop Notes

Today morning, I attended a online workshop on “Writing Wikidata Queries” as a part of Small wiki toolkits–Indic workshop series

Get more details here – https://meta.wikimedia.org/wiki/SWT_Indic_Workshop_Series_2020/Workshops#Writing_Wikidata_queries

Thanks Mahir and wikidata team for the nice workshop.

What is wikidata?

Wikidata is a free, collaborative, multilingual, secondary database, collecting structured data to provide support for Wikipedia, Wikimedia Commons, the other wikis of the Wikimedia movement, and to anyone in the world.

All items in wikidata are interlinked.

The Wikidata repository consists mainly of items, each one having a label, a description and any number of aliases. Items are uniquely identified by a Q followed by a number, such as Douglas Adams (Q42).

Statements describe detailed characteristics of an Item and consist of a property and a value. Properties in Wikidata have a P followed by a number, such as with educated at (P69).

Read more here – https://www.wikidata.org/wiki/Wikidata:Introduction

Currently we have 87,402,940 items in wikidata. Any one can add more items.

See here how the items are linked in wikidata.

Linked Data - San Francisco

This is the structure of a wikidata item page.
Datamodel in Wikidata

Read here to get a quick introduction of wikidata. https://towardsdatascience.com/a-brief-introduction-to-wikidata-bb4e66395eb1

SPARQL is a SQL like query language, that we can query the existing data.

We can run the queries at http://query.wikidata.org/

Today, we had a quick introduction to the SPARQL and had demonstrations. I am sharing here the notes I took on the session. This is not a full tutorial. Just my notes. It may not easy for beginners. see here for full tutorial – https://www.wikidata.org/wiki/Wikidata:SPARQL_tutorial

reasonator tool

http://reasonator.toolforge.org – Gives all details about any item as a web page

sample:
reasonator.toolforge.org/?&q=1204

The below things will be available in reasonator

  • labels
  • statements
  • Qualifier
  • rank
  • reference
  • external identifiers
  • sitelinks

 lexemes

Connects all the linguistic details of any given word meaning, synonym, antonym, PoS etc.

Entity schema

describes schema for any item

Commons structured data

can be connected to images or videos on commons too

other wikibase

can be linked with other wikibase sites like http://wiki.personaldata.io

Notabality – Can we add any data to wikidata?

No. We have to add wikidata for notable items only.
Same for lexeme too. Add reference to prove notability.

Probing items using SPARQL

wd:Qxxx, wdt:Pxxx

Select ?i{
?i wdt:p31 wd:Q842402.
}

all items that are instances of P31 hindu temples (Q842402)

wdt: P31 = Instance of
wd = wikidata item

This is SPARQL

run at http://query.wikidata.org

you can select wd from the auto generated sub menu.

select ?item{
?item wdt:P31 wd:Q16970. # insances of Churches
?item wdt:P17 wd:Q668. #P17 = country Q668 = india
}

add label to result :

?i rdfs:label ?iLabel

select ?item{
?item wdt:P31 wd:Q16970. # insances of Churches
?item wdt:P17 wd:Q668. #P17 = country Q668 = india
?i rdfs:label ?iLabel

}

Description:

?i schehema:description ?iDescription.
?i skos:altLabel ?iAltLabel.

sample:

    SELECT ?i ?iLabel ?iDescription ?iAltLabel {
        ?i wdt:P31 wd:Q842402. # instance of Hindu temples
        ?i wdt:rdfs:label ?iLabel. # to get all the labels
        ?i schema:description ?iDescription. # to get all the descriptions
        ?i skos:altLabel ?iAltLabel. # to get all the aliases

}

On the select query, add items to get, then add the details of the item
inside the query.

Limiting the query:

All the long queries can bring hugh number of results and make the browser/OS to hang. We can limit the number of results with adding LIMIT

Add LIMIT 10, after the query

sample:

    SELECT ?i ?iLabel {
        ?i wdt:P31 wd:Q842402 ; rdfs:label ?iLabel.
    }
    LIMIT 100

filter by language:

Add this statement to filter by language in the result

FILTER(LANGMATCHES(LANG(?iDescription),”ta”))

We may get duplicate results, if languages have multiple variants as language code

direct language filteration:

 

SELECT ?i ?iLabel {
?i wdt:P31 wd:Q842402 ; rdfs:label ?iLabel.
FILTER(LANGMATCHES(LANG(?iLabel),”bn”))
}
Excluding languages

SELECT ?i ?iLabel {
?i wdt:P31 wd:Q842402 ; rdfs:label ?iLabel.
FILTER(!LANGMATCHES(LANG(?iLebel),”bn”)) # Use ! to negate
}
SELECT ?i ?iLabel {
?i wdt:P31 wd:Q842402. # No need of rdfs:label 🙂
SERVICE wikibase:label { bd:serviceParam wikibase:language “[AUTO_LANGUAGE],en”. }
}

here – en is fallback language of bn label is not there.
if nothig exists, requrn Qid.
in service use “AUTO_LANGUAGE,en”
set the query site language on top right.

probing sitelinks:

gives the list of wiki site/pages linked

?sitelink schema:about ?i; schema:name ?name ; schema:inLanguage “bn”

SELECT ?i ?sitelink ?name {
?i wdt:P31 wd:Q842402 .
?sitelink schema:about ?i ;
schema:name ?name ;
schema:inLanguage “bn” ; # use either this line or the one below it
schema:isPartOf <https://bn.wikipedia.org/&gt; . # use either this line or the one above it
}

Sorting and sorting by labels:

ORDER BY DESC(?variable_name)

SELECT ?i ?iLabel {
?i wdt:P31 wd:Q44539 ; wdt:P131+ wd:Q2088440.
SERVICE wikibase:label { bd:serviceParam wikibase:language “[AUTO_LANGUAGE],en”. ?i rdfs:label ?iLabel }
}
ORDER BY ASC(?iLabel)

 

avoid duplicates

SELECT DISTINCT – add this to avoid duplicates

SELECT DISTINCT ?i ?iLabel {

?i wdt:P40/wdt:P39/wdt:P279* wd:Q16556694.

 SERVICE wikibase:label { bd:serviceParam wikibase:language “[AUTO_LANGUAGE],en”. }
}

Exclusions based on a statement:

SELECT ?i ?iLabel {
?i wdt:P31 wd:Q842402.
MINUS { ?i wdt:P17 wd:Q668. }
SERVICE wikibase:label { bd:serviceParam wikibase:language “[AUTO_LANGUAGE],en”. }
}

More samples are at

https://etherpad.wikimedia.org/p/swt-2020-queries

Thanks mahir for the nice session and all participants.

Will add here once the slides and videos are uploaded online.

Small wiki toolkits–Indic workshop series is really useful. Dont miss the events. check here for upcoming events.

 

 

 

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s