Indic Wiki stats as grafana dashboard – wikimedia remote hackathon 2020

Last weekend, I attended “Wikimedia Remote Hackathon 2020”. Due to the pandemic period, all the events are being moved to remote.

Like other events, wikimedia hackathon also moved to remote. It was on May 9, 10 2020. Though it was remote, it was well planned and the organizing team took same efforts like the in person hackathon.

All the announcements, planning, communications, various sessions, new telegram channel for new comers, plenty of tools, walks with dogs, quick answers over irc/telegram for any questions, show case/demo, music etc, are really well planned. We can learn a lot from the organizing team on how to conduct a remote hackathon.

Here you can get all the details about the event – https://mediawiki.org/wiki/Wikimedia_Hackathon_2020/Remote_Hackathon

I could not decide what to do till the event start time. On the previous day night, just thought to get the page page count stats of the indic wikipedia sites and showcase them with good graphs, charts etc.

For past few months, I am working on writing custom metrics exporters for prometheus. We can write custom exporter to expose the wikipedia page stats as prometheus metrics. We can setup a prometheus server to scrap the data from the exporter and use grafana to show the graphs.

Good Idea. Right? Now, How to get the page counts for any wikipedia site?

Posted a query on the wiki-tech mailing list.
wikitech-l AT lists.wikimedia.org

See the discussion here
https://lists.wikimedia.org/pipermail/wikitech-l/2020-May/093363.html

This a wonderful place to get answers for any technical queries related to wikipedia.

All wikipedia sites give nice REST API to interact with them.

Basic information on article counts can be fetched from each wiki
using the Action API’s action=query&meta=siteinfo endpoint. See
<https://www.mediawiki.org/wiki/API:Siteinfo&gt; for more information
about this API.

See <https://ta.wikipedia.org/wiki/%E0%AE%9A%E0%AE%BF%E0%AE%B1%E0%AE%AA%E0%AF%8D%E0%AE%AA%E0%AF%81:ApiSandbox#action=query&format=json&meta=siteinfo&siprop=statistics&gt;
for an example usage on tawiki.

This url gives the stats as json
https://ta.wikipedia.org/w/api.php?action=query&format=json&meta=siteinfo&siprop=statistics

With this query, we get the below answer.

{
“batchcomplete”: “”,
“query”: {
“statistics”: {
“pages”: 406044,
“articles”: 129087,
“edits”: 2961233,
“images”: 7758,
“users”: 174664,
“activeusers”: 431,
“admins”: 40,
“jobs”: 0,
“queued-massmessages”: 0
}
}
}

We can parse this and get the required details.

Thats it. After seeing this answer in early morning, I could not sleep. Just woke up. Wrote a custom exporter for these metrics for all indic wikipedia sites.

Here is the code – http://github.com/tshrinivasan/indicwiki_stats_exporter

Here is the phabricator task – https://phabricator.wikimedia.org/T252212

I used a digital ocean droplet server to run the exporter, prometheus and grafana.
Built a dashboard and published the dashboard for public grafana dashboard too. Yey. My first contribution to public grafana dashboards.

https://grafana.com/grafana/dashboards/12265

Here is the grafana dashboard – http://139.59.47.5:3000/d/kx1Pb36Zz/indic-wiki-stats?orgId=1

Image

 

Image

Shared this with wikitech list and telegram groups for the remote hackathon.

Felt good to got an idea and a quicker implementation. Found that wikistats team provides various analytics possibilities here – https://stats.wikimedia.org/#/ta.wikipedia.org/content/pages-to-date/normal|line|2-year|~total|monthly

And here is another site to look for such numbers – https://wikistats.wmflabs.org/display.php?t=wp

Still, this custom exporter and grafana shows a comparing graphs, which is not available anywhere.

Will add more stats for indic wikisource, wikibooks, wikinews, wiktionary sites soon.

Apart from this, I could not join in any of the events, live demos happened on the hackathon. I thought all the live sessions would be recorded. Alas. They were only live. No recording due to inability of the meet.google.com to share the streams with youtube.

I could not attend the showcase event. But saw the event. happy to see the great efforts of other participants.

Thus, the two days remote wikipedia hackathon 2020 came to end. Happy to see that there is my little contribution to this event.

Tons of thanks for the event organizers, Indic Wiki team, Noolaham Foundation for the server I used, wiki tech mailing list and entire wikipedia contributors for making the world a little better.

One thought on “Indic Wiki stats as grafana dashboard – wikimedia remote hackathon 2020

  1. //I could not sleep. Just woke up. Wrote a custom // அதுதாங்க.. அவ்வப்போது மலரும் கட்டற்றப்பூ,.. நம் ஆழ்மனதில் இருக்கும் கட்டற்ற வேர்கள் இருப்பதற்கான அடையாளம். தொடரட்டும் உங்கள் பணி..

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s