Complex MySQL query is giving incorrect results - mysql

Note: You can find the previous question and its answer here. A deep testing on it proved the previous answer is incorrect: Writing a Complex MySQL Query
I have 3 tables.
Table Words_Learned contains all the words known by a user, and the order in which the words were learned. It has 3 columns 1) word ID and 2)user id and 3) order in which the word was learned.
Table Article contains the articles. It has 3 columns 1) article ID, 2) unique word count and 3) article contents.
Table Words contains a list of all unique words contained in each article. It has 2 columns 1) word ID and 2) article ID
The database diagram is as below/
Now, using this database and using "only" mysql, I need to do the below work.
Given a user ID, it should get a list of all words known by this user, sorted in the revese order from which they were learned. In other words, the most recently learned words will be at the top of the list.
Let’s say that a query on a user ID shows that they’ve memorized the following 3 words, and we track the order in which they’ve learned the words.
Octopus - 3
Dog - 2
Spoon - 1
First we get a list of all articles containing the word Octopus, and then do the calculation using table Words on just those articles. Calculation means if that article contains more than 10 words that do not appear in the user’s vocabulary list (pulled from table words_learned), then it is excluded from the listing.
Then, we do a query for all records that contain dog, but DO NOT contain “octopus”
Then, we do a query for all records that contain spoon, but DO NOT contain the words Octopus or Dog
And you keep doing this repetitive process until we’ve found 100 records that meet this criteria.
To achieve this process, I did the below (Please visit the SQLFiddle link to see the table structures, test data and my query)
http://sqlfiddle.com/#!2/48dae/1
In my query, you can see the generated results and they are invalid. But on a "Proper Query", the result should be,
Level 1
Level 1
Level 1
Level 2
Level 2
Level 2
Level 3
Level 3
Here is a phudocode for better understanding.
Do while articles found < 100
{
for each ($X as known words, in order that those words were learned)
{
Select all articles that contain the word $X, where the 1) article has not been included in any previous loops, and 2)where the count of "unknown" words is less than 10.
Keep these articles in order.
}
}

select * from (
select a.idArticle, a.content, max(`order`) max_order
from words_learned wl
join words w on w.idwords = wl.idwords
join article a on a.idArticle = w.idArticle
where wl.userId = 4
group by a.idArticle
) a
left join (
select count(*) unknown_count, w2.idArticle from words w2
left join words_learned wl2 on wl2.idwords = w2.idwords
and wl2.userId = 4
where wl2.idwords is null
group by w2.idArticle
) unknown_counts on unknown_counts.idArticle = a.idArticle
where unknown_count is null or unknown_count < 10
order by max_order desc
limit 100
http://sqlfiddle.com/#!2/6944b/9
The first derived table selects unique articles a given user knows one or more words from as well as the maximum order value of those words. The maximum order value is used to sort the final results so that articles containing high order words appear first.
The second derived table counts the number of words a given user doesn't know for each article. This table is used to exclude any articles that contain 10 or more words the user doesn't know.

Related

Search a many to many relationship with a wild card, performance issues

I am building a database for an app and I am testing performance issues on a larger data set. I generated about 250,000 location records. Each location can be assigned to many categories and a category can be assigned to many locations. My data-set has 2-4 categories assigned to each location.
I want to allow the user to search for locations by filtering which categories should be allowed using a wild card search. So maybe I want to match all categories with the word "red" in it. So if I type red, now it shows all locations which have a category title that has "red" in it. In addition, I would like to wildcard search the location title with that same string.
I wrote up a query which works but performance is awful in large data-sets. Essentially I am using inner queries which is fine if my limit is set and I find results quick (around .05ms). If I don't find any results right away, it looks like it goes through the whole database and the query takes around 9-10 seconds.
Here is a simplified layout of my database:
locations: id | title | address
categories: id | title
locations_categories: id | location_id | category_id
Here is the query I currently am using:
SELECT `id`,`title`,`address`
FROM (`locations`)
WHERE title LIKE '%string%'
AND WHERE id IN (
SELECT location_id
FROM locations_categories
JOIN categories ON categories.id = locations_categories.category_id
WHERE categories.title LIKE '%string%')
First of all, you main query just uses the value of the subquery, so it can be rewritten:
SELECT location_id
FROM locations_categories
JOIN categories ON categories.id = locations_categories.category_id
WHERE categories.title LIKE '%string%'
But I'd propose to split this query in two—JOINs are slow for big datasets. First one will get necessary category IDs (with paging):
SELECT id
FROM categories
WHERE title LIKE '%string%' LIMIT BY <start>, <step>
Then you can get locations_categories:
SELECT location_id FROM locations_categories WHERE category_id IN (...)
And you'll use the location IDs you've got to retrieve corresponding records:
SELECT * FROM locations WHERE id IN (...)
These 3 queries combined will be much faster then your original one.
Also, make sure your title column is indexed—it can be the bottleneck. But since you have a wildcard at the start of the search term, you'll have to use FULLTEXT index here.
Your explain plan will confirm (or disprove) this but I suspect that your issue is that the leading % in the clauses
WHERE categories.title LIKE '%string%'
and
WHERE title LIKE '%string%`
forces full table scans. To address this often requires some knowledge of the domain and application in question
The simple approach is to only search for 'starts with'. Others include full text searching, function based indexes, having a 'grouping table' that presorts and lists the relevant records for known searches.

Find first, second, third, and so forth record per person

I have a 1 to many relationship between people and notes about them. There can be 0 or more notes per person.
I need to bring all the notes together into a single field and since there are not going to be many people with notes and I plan to only bring in the first 3 notes per person I thought I could do this using at most 3 queries to gather all my information.
My problem is in geting the mySQL query together to get the first, second, etc note per person.
I have a query that lets me know how many notes each person has and I have that in my table. I tried something like
SELECT
f_note, f_person_id
FROM
t_person_table,
t_note_table
WHERE
t_person_table.f_number_of_notes > 0
AND t_person_table.f_person_id = t_note_table.f_person_id
GROUP BY
t_person_table.f_person_id
LIMIT 1 OFFSET 0
I had hoped to run this up to 3 times changing the OFFSET to 1 and then 2 but all I get is just one note coming back, not one note per person.
I hope this is clear, if not read on for an example:
I have 3 people in the table. One person (A) has 0 notes, one (B) with 1 and one (C) with 2.
First I would get the first note for person B and C and insert those into my person table note field.
Then I would get the second note for person C and add that to the note field in the person table.
In the end I would have notes for persons B and C where the note field for person C would be a concatination of their 2 notes.
Welcome to SO. The thing you're trying to do, selecting the three most recent items from a table for each person mentioned, is not easy in MySQL. But it is possible. See this question.
Select number of rows for each group where two column values makes one group
and, see my answer to it.
Select number of rows for each group where two column values makes one group
Once you have a query giving you the three rows, you can use GROUP_CONCAT() ... GROUP BY to aggregate the note fields.
You can get one note per person using a nested query like this:
SELECT
f_person_id,
(SELECT f_note
FROM t_note_table
WHERE t_person_table.f_person_id = t_note_table.f_person_id
LIMIT 1) AS note
FROM
t_person_table
WHERE
t_person_table.f_number_of_notes > 0
Note that tables in SQL are basically without a defined inherent order, so you should use some form or ORDER BY in the subquery. Otherwise, your results might be random, and repeated runs asking for different notes might unexpectedly return the same data.
If you only aim for a concatenation of notes in any case, then you can use the GROUP_CONCAT function to combine all notes into a single column.

How to convert list field into many-to-many table in MySQL

I have inherited a database in which a person table has a field called authorised_areas. The front end allows the user to choose multiple entries from a pick list (populated with values from the description field of the area table) and then sets the value of authorised_areas to a comma-delimited list. I am migrating this to a MySQL database and while I'm at it, I would like to improve the database integrity by removing the authorised_areas field from the person table and create a many-to-many table person_area which would just hold pairs of person-area keys. There are several hundred person records, so I would like to find a way to do this efficiently using a few MySQL statements, rather than individual insert or update statements.
Just to clarify, the current structure is something like:
person
id name authorised_areas
1 Joe room12, room153, 2nd floor office
2 Anna room12, room17
area
id description
1 room12
2 room17
3 room153
4 2nd floor office
...but what I would like is:
person
id name
1 Joe
2 Anna
area
id description
1 room12
2 room17
3 room153
4 2nd floor office
person_area
person_id area_id
1 1
1 3
1 4
2 1
2 2
There is no reference to the area id in the person table (and some text values in the lists are not exactly the same as the description in the area table), so this would need to be done by text or pattern matching. Would I be better off just writing some php code to split the strings, find the matches and insert the appropriate values into the many-to-many table?
I'd be surprised if I were the first person to have to do this, but google search didn't turn up anything useful (perhaps I didn't use the appropriate search terms?) If anyone could offer some suggestions of a way to do this efficiently, I would very much appreciate it.
While it is possible to do this I would suggest that as a one off job it would probably be quicker to knock up a php (or your favorite scripting language) script to do it with multiple inserts.
If you must do it in a single statement then have a table of integers (0 to 9, cross join against itself to get as big a range as you need) and join this against your original table, using string functions to get the Xth comma and from that each of the values for each row.
Possible, and I have done it but mainly to show that having a delimited field is not a good idea. It would likely be FAR quicker to knock up a script with multiple inserts.
You could base an insert on something like this SELECT (although this also comes up with a blank line for each person as well as the relevant ones, and will only cope with up to 1000 authorised areas per person)
SELECT z.id, z.name, x.an_authorised_area
FROM person z
LEFT OUTER JOIN (
SELECT DISTINCT a.id, SUBSTRING_INDEX( SUBSTRING_INDEX( authorised_areas, ",", b.ournumbers ) , ",", -1 ) AS an_authorised_area
FROM person a, (
SELECT hundreds.i *100 + tens.i *10 + units.i AS ournumbers
FROM integers AS hundreds
CROSS JOIN integers AS tens
CROSS JOIN integers AS units
)b
)x ON z.id = x.id

MySQL join count from one table with ids from another

I have two tables that make up a full text index of article content for search purposes. One of the tables is just a primary key associated with a word, whereas the other records the article it occurred in and its location in the document. A single word can conceivably appear many times in the same document with different locations, so the same word id can occur several times in the word_locations table.
Here are the structures:
words:
id bigint
word tinytext
word_location:
id bigint(20)
wordid bigint(20)
location int(11)
article_id int(11)
What i need to write is a query that will find the count of occurrences for each word for any one profile. I need to preserve a zero value for wordids that don't appear at all, so I assume this needs to be a left join. However, whenever I try to add a where query to limit off article, any wordids that don't appear at all are not included in the result set.
I have tried:
select words.wordid, COUNT(word_location.wordid) as appears from words left join word_location on word.id = word_location.wordid where article_id = %s GROUP BY wordid
But this query does not return zeros for words that don't appear at all.
How can I modify this left join?
Thanks in advance!
EDIT:
Here is an example data set and the result sets for the different queries.
Example article content:
Bob's Restaurant is one of the finest restaurants in greater
County where you can enjoy the finest Turkish Cuisine.
So the vocabulary table, after being adjusted by the application to exclude stop words, will have in its vocabulary rows for Bob, Restaurant, finest, greater, county, enjoy, Turkish, and cusine. (I'm using this actual article since it's the first in the set, so the ids actually appear starting from integer 1.
The query provided by #Mark Bannister produces this result set:
wordid - word - occurances:
128 clifton 0
1 bob's 2
2 restaurant 2
3 one 1
4 finest 3
5 restaurants 2
6 greater 1
9 county 1
12 enjoy 3
13 turkish 6
14 cuisine 1
The result set is correct per se - but id 128 doesn't appear in the document at all and is the only thing in the result set with occurance 0. The goal is to have the entire vocabulary returned with number of occurrences from the document (this is roughly 2500 different words)
My original problematic query from before the edit above actually returned the same result set, but without ANY 0 occurance rows at all.
You need to include your article selection in your join condition:
select words.wordid, COUNT(word_location.wordid) as appears
from words
left join word_location on word.id = word_location.wordid and article_id = ?
GROUP BY wordid
Including the restriction on article_id in the WHERE clause effectively turns your left join back into an inner join.
I would use a subselect instead of a join.
SELECT words.id, (SELECT count(*) FROM word_location WHERE word_location.wordid = words.id) as appears
Bit of a guess this one, but I think COUNT() is just disregarding your nulls, not COUNTing them and arriving at 0. (NULL + NULL != 0)
Look at the IFNULL() function, you might be able to do something like:
COUNT(IFNULL(word_location.wordid, 0))
(Disclaimer - I am more used to Oracle's NVL(, ) function hence this is a little speculative!)

Selecting multiple rows based on specific categories (mysql)

I don't think this is a duplicate posting because I've looked around and this seems a bit more specific than whats already been asked (but I could be wrong).
I have 4 tables and one of them is just a lookup table
SELECT exercises.id as exid, name, sets, reps, type, movement, categories.id
FROM exercises
INNER JOIN exercisecategory ON exercises.id = exerciseid
INNER JOIN categories ON categoryid = categories.id
INNER JOIN workoutcategory ON workoutid = workoutcategory.id
WHERE (workoutcategory.id = '$workouttypeid')
AND rand_id > UNIX_TIMESTAMP()
ORDER BY rand_id ASC LIMIT 6;
exercises table contains a list of exercise names, sets, reps, and an id
categories table contains an id, musclegroup, and type of movement
workoutcategory table contains an id, and a more specific motion (ie: upper body push, or upper body pull)
exercisecategory table is the lookup table that contains (and matches the id's) for exerciseid, categoryid, and workoutid
I've also added a column to the exercises table that generates a random number upon entering the row in the database. This number is then updated only for the specified category when it is called, and then sorted and displays the ascending order of the top 6 listings. This generates a nice random entry for me. (Found that solution elsewhere here on SO).
This works fine for generating 6 random exercises from a specific top level category. But I'd like to drill down further. Here's an example...
select all rows inside categoryid 4
then still within the category 4 results, find all that have movementid 2, and then find one entry with a typeid 1, then another for typeid 2, etc
TLDR; Basically there's a few levels of categories and I'm looking to select a few from here and a few from there and they're all within this top level. I'm thinking this could all be executed within more than one query but im not sure how... in the end I'm looking to end with one array of the randomized entries.
Sorry for the long read, its the best explanation I've got.
Just realized I never came back to this posting...
I ended up using several mysql queries within a switch based on what is needed during the request. Worked out perfectly.