How to delete all pages from a certain category - mediawiki

Is there a possibility or a special page, that provides following use case:
I have several pages > 10 < 100 in a category
Instead of separate page delete, it would be convenient to delete all pages in that category.

[Just to complete] If you are not syadmin or just don't want to install the Extensions, there is always Pywikibot and delete.py.

There are extensions to do bulk page deletes.
Look at:
Nuke (http://www.mediawiki.org/wiki/Extension:Nuke)
DeleteBatch (http://www.mediawiki.org/wiki/Extension:DeleteBatch)

Related

Using a MySQL query and BASH, how can I Delete, Rename, or Move all image files used by Drupal nodes before a certain date?

BACKSTORY IF YOU'RE INTERESTED: A friend of mine owns a magazine and has been publishing a corresponding Drupal 7 website since 2011. The site has thousands of articles and around 50,000 images supporting those articles. Unfortunately, due to copyright trolling attorneys, he's already been hit with a couple of copyright infringement lawsuits over images that he thought were from "creative commons." Since his first lawsuit in 2016, he's made sure all images are from a stock image company. But apparently, very recently, yet another image from before 2016 has caused another copyright troll to seek $18,000 (it's literally a photo of a hotdog by the way). Nevertheless, his business insurance company just wants to pay the settlement fees rather than risk anything in court, but has demanded that all potentially suspect images be deleted from the site going forward. Since 95% of the stories that have been published on his site have had fewer than 1000 views anyway (they are worth less than 50 cents from advertisers), he has agreed to take all those images down because $.50 is definitely not worth the risk of feeding any more trolls.
QUESTION: What's the best way to delete, rename or move all the images that are connected to a story node before a certain date in 2016? It would be nice if we could temporarily just change the filenames on the filesystem from "trollfood.jpg" to "trollfood.jpg.bak" (or something) so that if/when he can ensure that an image is in fact in the public domain, he can revive it. It would also be nice, if we could replace all the potentially suspect images links (in the db) with a placeholder image links for the time being (so that people can still read the article without wondering where the images have gone...perhaps the image will be a brief explanation of the trolling situation). Anyway, it's been a minute since I've done anything with Drupal, so I've forgotten how drupal links files to nodes (and he has some custom content types powering his main articles).
I've been able to get all the potentially suspect images in a list via mysql:
SELECT fid, filename, timestamp, from_unixtime(timestamp, "%Y-%m-%e")
FROM drupal7_therooster.file_managed
where timestamp between unix_timestamp('2011-01-01') and unix_timestamp('2017-01-01');
// here's sample output:
# fid filename timestamp from_unixtime(timestamp, "%Y-%m-%e")
6154 _MG_5147.jpg 1373763148 2013-07-14
6155 _MG_5179.jpg 1373763148 2013-07-14
6161 The Lone Bellow (4 of 5).jpg 1373866156 2013-07-15
6162 The Lone Bellow (1 of 5).jpg 1373866156 2013-07-15
Now, how can I use this to find the potentially offending stories that uses these images, and perform the following:
Create a list of all the stories that use these images so I can save this in case he ever wants to revive these images. I know SQL well enough...I just don't know which tables keep which data.
Create a query that replaces these image associations in these stories to a placeholder image (so if story uses "trollfood.jpg", that story now uses "safetyimageplaceholder.jpg" instead. Some stories have multiple images attached to them.
Once all the potentially offending articles reference a placeholder image instead, I still need to move all the offending files so they can't be accessed by lawyers...I have access via ssh by the way. Are there any good ways of using bash commands to ONLY move/rename files that match the list I generate from an SQL query? I just want to be careful not to delete/rename/move any images that were NOT part of the query. Bear in mind the file creation date in the filesystem is all 2017+ on the server because the server was moved (or copied) in 2017 so the file system's original creation dates are inaccurate.
I know this is a long question...and it involves a Drupal site, but I think I might need the help of proper SQL and bash experts, so I've posted it here instead of the Drupal specific stackexchange. I'm totally open to any suggestions if another completely different approach is better suited for this problem. Cheers!
I was able to answer my own question. I had to do three main things:
STEP ONE: Create a query for Drupal's MySQL database that would give me a list of all potential copyright infringing files that were being used by nodes created between 2012 and 2017:
SELECT fm.fid, fm.filename,
n.title, n.nid, from_unixtime(n.created, "%Y-%m-%d") as 'node_date'
FROM file_managed fm
JOIN file_usage fu ON fm.fid = fu.fid
JOIN node n ON fu.id = n.nid
WHERE created BETWEEN unix_timestamp('2012-01-01') AND unix_timestamp('2017-01-01')
ORDER BY node_date
This is a moderately complex query, but basically it joins columns from three tables (Drupal 7's file_managed, node, and file_usage tables). The file_usage table is a shared key register of which files (via fid) are used on which nodes (via nid).
STEP TWO: Organize and filter the data to create a list of files.
I filtered and ordered the results by node created dates. I got about 48K records from the join query in step one, and then I created a google spreadsheet to clean up and sort the data. Here's a sample of the google spreadsheet. This sheet also includes data from the node_counter table which tracks page views for each node. Using a simple VLOOKUP function to match the total page views for each nid on the main sheet, now the main sheet can be sorted by page views. I did this so I could prioritize which images attached to each nodes/article I should check first. This is the sql query I used to get that data from the db by the way:
SELECT nid, totalcount, daycount, from_unixtime(timestamp, "%Y-%m-%d") as 'date'
FROM node_counter
ORDER BY totalcount DESC
STEP THREE: Write a Shell Script that will take our filtered list of files, and move them somewhere safe (and off the public webserver).
Basically, I needed a simple BASH script that would use the list of files from step two to move them off the web server. Bear in mind, when each image file is uploaded to the server, Drupal can (and did) created about a dozen different aspect ratios and sizes, and placed each one of these copies into corresponding folders. For example, one image filename could be copied and resized into:
files/coolimage.jpg
files/large/coolimage.jpg
files/hero/coolimage.jpg
files/thumbnails/coolimage.jpg
files/*/coolimage.jpg etc etc.
So, I have to take a list of ~50K filenames, and check for those filenames in a dozen different subfolders, and if they are present in a subfolder, move each of them to an archived folder all while preserving the folder/file tree structure and leaving behind files that are "safe" to keep on the public web server. So...I ended up writing THIS simple script and open sourced it on Github in case anyone else might benefit from it.
That's it! Thankfully I knew some SQL and how to use google spreadsheets...and some basic bash..and well, how to use google and solve problems. If google users are able to find this helpful in the future...cheers!

How to compare two MediaWiki sites

We moved a private MediaWiki site to a new server. Some months later we discovered that one or two users had continued to update the old MediaWiki site. So we have some edits in the old server that need to be copied into the new server.
Does anyone know of a routine or process to (conveniently?) compare and identify edits in the old site?
Per the comments attached to this post, the Recent Changes page might work if that page accepted a starting date. Unfortunately, it is limited to a max of 30 days. In this case, I need to review changes for 12 months.
Identify edits done
Identify and verify edits done by your users since the fork
Using the database (assuming MySQL) and no table prefixes
Give me all the edits done since Dec 01 2018 (including that date):
SELECT rev_id, rev_page, rev_text_id, rev_comment, rev_user, rev_user_text, rev_timestamp
FROM revision
WHERE rev_timestamp > '20181201';
Note that the actual page text is stored in the text table, and the page name in the page table.
Give me all edits done since Dec 01 2018 (including that date) with page name and revision text:
SELECT rev_id, rev_page, page_namespace, page_title, rev_text_id, rev_comment, rev_user, rev_user_text, rev_timestamp, old_text
FROM revision r
LEFT JOIN page p
ON p.page_id = r.rev_page
LEFT JOIN text t
ON t.old_id = r.rev_text_id
WHERE rev_timestamp > '20181201';
Note that with tools like MySQL Workbench you can copy results as MySQL insert statements. Dependent on what users did to the old wiki, you might just need to transfer records of 3 tables; however if there were file uploads, deletions or user right changes involved, it's getting complicated. You can track these changes through the logging table.
Using the Web Interface
It is of course possible to show more changes than just 500 for the last 30 days. The setting that allow you to configure this is $wgRCLinkLimits and $wgRCLinkDays. You can also just open the recent changes page, tap 30 days and change the URL parameters so the URL becomes path/to/index.php?title=Special:RecentChanges&days=90&limit=1500 (limit of 1500 within the last 90 days).
The length that recent changes history is retained for depends on $wgRCMaxAge. It is currently 90 days but you might be in luck if the purge job didn't yet delete older entries.
Logs can be viewed without that limitation. Visit Special:Log in your wiki.
Using the API
list=allrevisions lists all page revisions (i.e. changes).
It allows specifying start timestamps (arvstart) and continuation.
Example: https://commons.wikimedia.org/w/api.php?action=query&list=allrevisions&arvlimit=1000
To see deletions, user right changes, uploads, ... use list=logevents.
Fix the issue
Either using database scripts (don't forget to back-up before) or with Special:Export in the source wiki and Special:Import in the Wiki in need of an update.
Avoid the issue
For a future migration to a new server $wgReadOnly might be your friend, avoiding this issue in the first place by making the old wiki read-only.
There is also Extension:Sync, though I am not sure what it is capable of.

How do I delete ContentTypes in Bolt.cm?

I've just installed Bolt CMS and it offered to generate some sample content, which I did. So now there are 3 contenttypes: pages, entries, and showcases.
I want to keep the pages and entries but don't want the showcases. How do I remove them? I can remove the section from the contenttypes.yml file and they disappear from the front-end, but it still leaves the bolt_showcases table in the database, and also 'showcases' rows in other tables like bolt_taxonomy.
Do I need to delete all this manually? How do I make sure I've removed all traces of an old contenttype?
Contenttype records are limited to a single table. The only thing that escapes that boundary is taxonomy relationships.
To achieve what you want, delete the Showcase records in the UI, that will remove the relationships, if any, then you can safely drop the bolt_showcases table.

Batch add tags to posts via phpmyadmin - wordpress

How would I go about batch assigning tags to posts via phpmyadmin? I have a custom table in my database that contains the postID and one column with a comma separated list of keywords for each post/record. I want to use the keyword column as values for my tags for each post.
Is there any way for me to get those tags over to the wp_term_relationships table using an sql query?
Right now I already have each post assigned to one category (and some posts assigned to two categories)...if that makes any difference....I am dealing with almost 200,000 posts.
Thanks for the help!!
I thought I could do this with sql queries but that is just over my head..so I did some extensive searching for plugins (free plugins that is...).
I came up with this, which is working....
Installed the 'WP Post Corrector' plugin, it is old and not updated anymore but it is working with my 3.5.1 wordpress. I have a massive csv file with 2 columns (ID -> which is the same as my wp post ID, and post_tag -> which is a list of comma separated tags). I split the file up into smaller chuncks so php or the server wouldn't crap out (http://sourceforge.net/projects/splitcsv/) - I made each file have 5000 records.
Yes, it took me about an hour to upload about 40 files, but now it is done.

MySQL - Coming up with a Unique Key for each record, not the primary Key

Ok this is a tricky one to explain.
I am creating an app that will have PAGES, currently I'm using PageID as the key to SEL the record.
The issue I'm having now is that I want users to be able to EDIT pages, but not lose the previous page (for history, recording keeping reasons, like a changelog or wiki page history).
This is making me think I need a new field in the PAGE table that acts as the pageID, but isn't the Primary Key that is auto-incremented every time a row is added.
Google Docs has a DOCID: /Doc?docid=0Af_mFtumB56WZGM4d3Y3d2JfMTNjcDlkemRjeg
That way I can have multiple records with the same Doc ID, and show a history change log based on the dataAdded field. And when a user wants to view that DOCID, I simply pull the most recent one.
Thoughts? I appreciate your smart thinking to point me in the right direction!
You're on the right track. What you need is a history or revision id, and a document id. The history id would be the primary key, but you would also have a key on the document id for query purposes.
With history tracking, you add a bit more complexity to your application. You have to be careful that the main view of the document is showing the current history revision (ie. largest history id for a given document id).
As well, if you are storing large documents, every edit is essentially going to add another copy of the document to your database, and the table will quickly grow very large. You might want to consider implementing some kind of "diff" storage, where you store only the changes to the document and not the full thing, or keeping history edits in a separate table for history-searching only.
UUID() creates a randomly generated 128bit number, like
'6ccd780c-baba-1026-9564-0040f4311e29'
This number will not be repeated in a few millions years.
//note most digits are based upon timestamp and machine information, so many of the digits will be similar upon repeated calls, but it will always be unique.
Keep an audit table with the history of the changes. This will allow you to go back if you need to roll back the changes or view change history for example.
You might model it like this:
An app has multiple pages, a page has multiple versions (each with some version info (e.g., date, edit count), and a foreign key to its page)
Viewing a page shows the most recent version
Saving an edit creates a new version
each document is really a revision:
doc - (doc_id)
revision - (rev_id, doc_id, version_num, name, description, content, author_id, active tinyint default 1)
then you can open any content with just the rev_id: /view?id=21981
select * from revision r, doc d where r.rev_id = ? and r.doc_id = d.doc_id
This sounds like a good job for two tables to me. You might have one page_header table and one page_content table. The header table would hold static info like title, categorization (whatever) and the content table would hold the actual editable content. Each time the user updates the page, insert a new page_content record versus updating an existing one. When you display the page just make sure you grab the latest page_content record. This is a simple way to keep a history and roll back if needed.
Good luck!