Accessing titles of Biographies in Wikipedia - mediawiki

I have downloaded Wikipedia latest dump and parsed it a MySQL Database. Now i have Database table that contains only title and content.My requirement is to extract all biography contents from this table.So i want a dump file that have all biography titles.
Thanks in advance

If you want to get all articles in some category and all its subcategories, you need to use the categorylinks table and walk it recursively to get articles in the subcategories.
It's not clear from your question what exactly do you want. If you want articles about biographies, look at Category:Biography. If you want articles that are biographies, look at subcategories of Category:People.

Related

How to find most duplicate value in a table?

The question basically means is how can we find duplicate values in a row?
Example: I am making a blog website and in that, I want to display trending tags...
For this I created 2 tables:
blog
tag
in the blog table, my blog record is there, and in the tag table there are two columns
id (PRIMARY)
name
blog_id
I want to display the top repeating names that come in the tag table.
So, can you please just tell me what is the solution about it?
Like here Java repeats most time in the table.
So, I just want the output to be like:
JAVA
Database
PHP

Finding rows in MySQL Table that do NOT have certain text

Suppose I want to find an article in a database table that includes the text "There were many bison" With phpMyAdmin, I can navigate to Search, choose a field, then choose Like %...%, and it will select the article that includes those words.
I'd like to know if there's a way to find all rows that do NOT include that string.
Let me explain my bigger goal. I'm working on articles about many animal species that are divided into sections on Classification, Distribution, Ecology, etc. Each section can be thought of as an independent article, and I was tempted to make unique tables for each of these sections. However, that would be a logistical nightmare; I'd need literally hundreds of tables.
So I just write one long article with each section beginning with something like this:
So if I have articles about 600 species in my database table, and I want to know which articles DO NOT include an Ecology section, I can simply search for all the rows that do not have that particular div, or something similar (e.g [h2]Ecology[/h2] - though with real tags, not brackets).
Is there a way to do that with phpMyAdmin, MySQL Workbench (which I downloaded and installed just today) or some other tool?
Thanks.
you could use a NOT REGEX http://dev.mysql.com/doc/refman/5.1/en/regexp.html with SQL.
one solution would be to create a categories table on your database and then assign each article a category. That way you could create a query to select all the articles that have the specific category that you want.
example would be :
table articles:
-article_id (primary Key)
-article
-category_cat_id (foreign Key that references cat_id)
table category
- cat_id (primary Key)
-cat_name
a query to select all the articles with the categry of lets say ecology:
SELECT * FROM articles
LEFT JOIN category
ON articles.category_cat_id = category.cat_id
WHERE cat_name != 'ecology'(if you want to select all the articles except those with a ecology cateogry)
another alternative is
WHERE cat_name = 'ecology'( if you want to select all posts with the category of ecology)

mySQL store file details in one or many database table design and best way of creating parent and child categories

----- PHP and mySQL -----
I have two quick questions need some advice.
On my site I will allow users to upload following files - PDF/Videos/Photos. All files uploaded by the user are shown on a profile page. All uploaded files can be searched by name or tags and file type.
What would be the best mysql database design?
Store all files in one table, easier to display on user’s profile page and searching by type and etc.
One table per type e.g. pdf, videos and photos <- this might be better for performance but for searching I don’t know?
Second question is, I allow users to create their own menus/categories with parent and children categories for example:
->parent category
> child category
> child category
->parent category
> child category
> child category
At moment I have two database tables, one stores all the parent categories for each user and second store child category with foreign key (id) to parent category.
To get all the categories I first get all the parent categories and using a foreach loop.
I call a function within the loop to get the children categories by parent id.
I want to know is this the best approach of doing this or can this be done in mySQL query without looping?
thanks guys !!!
For your first question, it depends on what information you want to store about the files.
If it's generic across all types, (name, date, filetype, size, etc.) then a single Files table by itself with a type column makes sense.
But if you're going to save attributes of the files that have to do with what kind of file they are, frame rate of a video file, height and width of an image file, author of a PDF, for example, then you will also need some ancillary tables to store that information. You don't want to have a bunch of columns hanging off your file table that are only useful each for a certain file type.
For your second question, the rough SQL is based on a JOIN between your parent category table and your child category tables.
Example psuedo code:
select p.userid, p.parentcategoryid,c.childcategoryid
from ParentCategory p INNER JOIN
ChildCategory c
on p.parentcategoryid=c.parentcategoryid
WHERE
p.userid = #UserID

Database Organization - Website containing many articles

I want to organize articles written on my website. Currently, I have an author submit their work to me (via email) and I copy/paste their article into a .php file and upload their file with FTP. At the same time I need to update the links for the navigation menu based on the new article.
I've been reading that I can put everything into a mysql database.
Right now, I have 2 Columns (a music column and a college life column) - each column will have articles updated every two weeks by a different author. How do I organize my database
What I was thinking...(after doing some reading)
Table Column:
Column_id
Name
Description
Create_date
Table Column_authors:
column_id
author_id
Table Articles:
Article_id
column_id
Title
Description/Summary
Body
create_date
Table Articles_authors:
article_id
author_id
Table Articles_keyword:
article_id
keyword_id
Table authors:
author_id
Name
Email
about
Table Keyword
keyword_id
name
?????
(I'm not sure how to organize with the keyword - each article can have multiple keywords)
I'm completely new to organizing with a database, so I have no idea what I'm doing!
Could someone, point me in the right direction of a good tutorial.
Please let me know if I need to be more specific
You can do this with WordPress. WordPress is built on top of a MySQL database, but you don't really to to mess around with it too much other than setting it up initially, if that (some hosting sites have an automated WordPress install that sets up the database for you).
Once you are all set up, then you can use Posts in WordPress for your articles and the latest article is displayed first, with links to the old ones automatically generated. If you have any static content, you can use Pages in WordPress.

WordPressMU - get blog list, alphabetically sorted by blogname

In WordPress MU, I've tried writing my own query for this but can't seem to get all of the joins I really need. The result set I'm looking for would be something like:
blog_id
blog name
blog path
owner first name
owner last name
and return it all alphabetically, by blog name. The trouble I'm having is that the first and last name of the blog owner are in wp_usermeta, the blog id and path are in wp_blogs, and the blog name is in wp_[blog id here]_options, with wp_usermeta requiring the user ID from wp_users.
Is it possible to join all of this in one query?
There is not a way to combine all of the information into one result set because of the way WPMU handles the database table names.
The best solution I have come up with is some PHP logic that gets the blogs from the wp_blogs table, uses the IDs there to gather information from the wp_X_options tables, and then builds up the information I need. It's the same reason there is no good way to get a list of all the posts across all of the blogs with just a query. You need backend logic to build the query based on the blogs in wp_blogs.