I have some custom metadata in the pages of my Jekyll site like this:
---
title: Some title
topic: Quantum mechanics
---
I'd like to list all the pages by topic. So I thought about getting a list of all the topics, then iterate over that list. I know how to filter all the pages whose topic equals Quantum mechanics, but I don't know how to get a list of all the topics.
You can certainly list all topics with some clever liquid code, but you'll loose your time and add load on jekyll build.
It will be far more easy if you use jekyll builtin categories.
Related
I'm trying to get a "category tree" from wikipedia for a project I'm working on. The problem is I only want more common topics and fields of study, so the larger dumps I've been able to find have way too many peripheral articles included.
I recently found the vital articles pages which seem to be a collection of exactly what I'm looking for. Unfortunately I don't really know how to extract the information from those pages or to filter the larger dumps to only include those categories and articles.
To be explicit, my question is: given a vital article level (say level 4), how can I extract the tree of categories and article names for a given list e.g. People, Arts, Physical sciences etc. into a csv or similar file that I can then import into another program. I don't need the actual content of the articles, just the name (and ideally the reference to the article to get more information at a later point).
I'm also open to suggestions about how to better accomplish this task.
Thanks!
Did you use PetScan?. It's wikimedia based tool that allow extract data from pages based on some conditions.
You can achieve your goal by go the tool, then navigate to "Templates&links" tab, then type the page name in field "Linked from All of these pages:", e.g. Wikipedia:Vital_articles/Level/4/History. If you want to add more than one page in the textarea, just type it line by line.
Finally, press Do it! button, and the data will be generated. After that you can download the data from output tab.
I am writing an application that needs lists of Wikipedia page tiles within a certain category. Some categories work really well for this. For example Category:English-language_films is a category which is attributed to about 60k pages. Using the MediaWiki's API I can query with the list=categorymembers, I can get a list of all 60k films.
However this works much less well with something like hockey players in the NHL. Category:Lists_of_National_Hockey_League_players is about as close as a category gets but this is a category of list pages. It turns out that the concept of NHL players is stored in lists, not categories. Where the concept of English-language films is stored as a category.
It's rather difficult to obtain the actual list, simply because these lists themselves are broken up into several sub lists by alphabet or team. It's theoretically possible to screen scrape the data, but simply getting the list of Wikipedia pages linked from that page is error prone.
Is there a straight-forward way to get pages that are listed by lists, including expanding sub lists using the API or some way to tell from the content of a list whether a link is a member of the list or just meta data about the member of the list?
When there is a category of list of things, chances are there will be a category of things as well. In your case that would be Category:National Hockey League players. You can walk that recursively with the categorymembers API. (Unlike lists, categories can't contain red links so depending on your use case that might be a problem.)
Other than that, Wikipedia APIs won't be much help. You can check Wikidata for something appropriate (e.g. data items with the NHL.com player ID property); that's a different data set but sometimes it is kept in sync, and always easy to query. If that's not appropriate, you'll have to scrape the HTML.
I am looking to identify the most popular pages in a Wikipedia Category (for example, which graph algorithms had the highest page views in the last year?). However, there seems to be little up-to-date information of Wikipedia APIs, especially for obtaining statistics.
For example, the StackOverflow post on How to use Wikipedia API to get the page view statistics of a particular page in Wikipedia? contains answers that no longer seem to work.
I have dug around a bit, but I am unable to find any usable APIs, other than a really nice website, where I could potentially do this manually, by typing page titles one by one (max. up to ten pages only): https://tools.wmflabs.org/pageviews/. Would appreciate any help. Thanks!
You can use a MediaWiki API call like this to get the titles in the category: https://en.wikipedia.org/w/api.php?action=query&list=categorymembers&cmtitle=Category:Physics
Then you can use this to get page view statistics for each page: https://wikimedia.org/api/rest_v1/#!/Pageviews_data/get_metrics_pageviews_per_article_project_access_agent_article_granularity_start_end
(careful of the rate limit)
E.g. for the last year, article "Physics" (part of the Physics category): https://wikimedia.org/api/rest_v1/metrics/pageviews/per-article/en.wikipedia.org/all-access/all-agents/Physics/daily/20151104/20161104
If you're dealing with large categories, it may be best to start downloading statistics from https://dumps.wikimedia.org/other/pageviews/2016/2016-11/ to avoid making so many REST API calls.
TreeViews is a tool designed to do exactly this. Getting good data is going to be hard if your category contains thousands of pages, in which case you'd better do the calculations yourself as Krenair suggests.
I administer my own company internal wiki using MediaWiki. I like MediaWiki because many people are already familiar with it having used Wikipedia. Also, it was a joy to configure and I didn't run into a lot of issues, not being that familiar with PHP. (So I'm not necessarily looking for another solution, like DokuWiki...)
My requirement is that the opening page be a listing of all pages, broken down alphabetically by category - much like a Table of Contents for the entire wiki. It would look like this (on the "Main Page"):
Category 1
Page A
Page B
Page C
Category 2
Page E
Page N
Page X
Page Z
Category 3
Page Q
Page V
Each page gets the category assigned to it. I know about the Special:Categories page, but that only shows the categories, and one must drill down (follow the link) to see the pages within that category - therefore, I cannot see multiple pages/multiple categories.
I have seen Extension:Hierarchy, but this does not fit my needs because the "Table of Contents" has to be edited rather than being auto generated by declaring the "parent" or "category" on each page itself.
Is there already existing functionality for this for MediaWiki? (I understand that as the wiki grows, so too will this Table of Contents page, but that is okay.)
Alternatively, I know about the MediaWiki API. I can create a server-side process that:
Does a MySQL lookup for all pages and their categories
Sorts them
Uses the MediaWiki API to generate this Table of Contents on the Main Page
And I can run this process periodically. I am up for the challenge, because I am a programmer and it is an interesting exercise, but why reinvent the wheel if I don't have to?
CategoryTree is an option. Now, a challenge here is that MediaWiki categories are not hierarchical. In other word, you can have category loops (A>B>C>A). Also, one article can show up in any number of categories, and articles can be without categories. The only thing that has to be done manually is to put <categorytree>Category Name</categorytree> for each category on the home "Table of Contents" page. Granted that new categories are not likely going to pop up a lot, this will not be a terrible issue. However, one solution for this inconvenience is to just put all your (top-level) categories into Category:Categories and then display that category via the extension (see the depth and hideroot parameters).
Hard to use, but wikistats produces an HTML representation from an XML dump, see e.g. MediaWiki.org categories.
CatGraph is another analysis tool, even more complex it would seem (but I've not tried setting it up for a wiki of mine, unlike wikistats).
I'd like to create a button on a menu bar that can generate a link to a random article from my blog posts (much like Wikipedia has). It's for a client, and they'd like to have this functionality on the site. I'm not familiar with PHP so I'd like to find a way around that, especially since I don't have access to the root user on my server host's mySQL installation (if this is relevant).
I had a theoretical solution: have a .txt or .xml file containing a list of all the URLs to each of the posts, with a "key" assigned to each of them. Then, when the user clicks the random article button, the current time (ex. 1:45) is hashed and mapped to a specific URL. I am fairly new to Drupal, however, I was wondering if there was some way to have the random article button use a .c file to execute these steps. The site is being hosted on a server that uses Apache 2, and I looked through some modules that were implemented in C code. I'm pretty new to all of this (although proficient in C), and spent many fruitless hours searching for solutions.
In a pure Drupal fashion (don't know if you are interested by this kind of solution), you could create a view (create a block) which retrieve blog posts, use a random sort criteria and limit results to 1 item. Then configure this view to display fields, and add only one field : post title, and check "link to content" in this field parameters window. You'll get one random blog post title which will be rendered as a link to this blog post.
Finally in Structure->Block assign your new block in a region to see it.
It's a pure Drupal / Views / no-code-just-clicks :) way, but it will be far more maintainable and easy to setup than introducing C for such a simple feature.
Views module
Let me know if you try this and have problems configuring your view or anything else.
Good luck