Optimize SQL-Query that is using REGEXP in a JOIN - mysql

I have the following situation:
Table Words:
| ID | WORD |
|----|--------|
| 1 | us |
| 2 | to |
| 3 | belong |
| 4 | are |
| 5 | base |
| 6 | your |
| 7 | all |
| 8 | is |
| 9 | yours |
Table Sentence:
| ID | SENTENCE |
|----|-------------------------------------------|
| 1 | <<7>> <<6>> <<5>> <<4>> <<3>> <<2>> <<1>> |
| 2 | <<7>> <<8>> <<9>> |
And i want to replace the <<(\d)>> with the equivalent word from the Word-Table.
So the result should be
| ID | SENTENCE |
|----|--------------------------------|
| 1 | all your base are belong to us |
| 2 | all is yours |
What i came up with is the following SQL-Code:
SELECT id, GROUP_CONCAT(word ORDER BY pos SEPARATOR ' ') AS sentence FROM (
SELECT sentence.id, words.word, LOCATE(words.id, sentence.sentence) AS pos
FROM sentence
LEFT JOIN words
ON (sentence.sentence REGEXP CONCAT('<<',words.id,'>>'))
) AS TEMP
GROUP BY id
I made a sqlfiddle for this:
http://sqlfiddle.com/#!2/634b8/4
The code basically is working, but i'd like to ask you pros if there is a way without a derived table or without filesort in the execution plan.

You should make a table with one entry per word, so your sentense (sic) can be made by joining on that table. It would look something like this
SentenceId, wordId, location
2, 7, 1
2, 8, 2
2, 9, 3
They way you have it set up, you are not taking advantage of your database, basically putting several points of data in 1 table-field.
The location field (it is tempting to call it "order", but as this is an SQL keyword, don't do it, you'll hate yourself) can be used to 'sort' the sentence.
(and you might want to rename sentense to sentence?)

Related

How can I merge two strings of comma-separated numbers in MySQL?

For example, there are three rooms.
1|gold_room|1,2,3
2|silver_room|1,2,3
3|brown_room|2,4,6
4|brown_room|3
5|gold_room|4,5,6
Then, I'd like to get
gold_room|1,2,3,4,5,6
brown_room|2,3,4,6
silver_room|1,2,3
How can I achieve this?
I've tried: select * from room group by name; And it only prints the first row. And I know CONCAT() can combine two string values.
Please use below query,
select col2, GROUP_CONCAT(col3) from data group by col2;
Below is the Test case,
https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=ab35e8d66ffe3ac6436c17faf97ee9af
I'm not making an assumption that the lists don't have elements in common on separate rows.
First create a table of integers.
mysql> create table n (n int primary key);
mysql> insert into n values (1),(2),(3),(4),(5),(6);
You can join this to your rooms table using the FIND_IN_SET() function. Note that this cannot be optimized. It will execute N full table scans. But it does create an interim set of rows.
mysql> select * from n inner join rooms on find_in_set(n.n, rooms.csv) order by rooms.room, n.n;
+---+----+-------------+-------+
| n | id | room | csv |
+---+----+-------------+-------+
| 2 | 3 | brown_room | 2,4,6 |
| 3 | 4 | brown_room | 3 |
| 4 | 3 | brown_room | 2,4,6 |
| 6 | 3 | brown_room | 2,4,6 |
| 1 | 1 | gold_room | 1,2,3 |
| 2 | 1 | gold_room | 1,2,3 |
| 3 | 1 | gold_room | 1,2,3 |
| 4 | 5 | gold_room | 4,5,6 |
| 5 | 5 | gold_room | 4,5,6 |
| 6 | 5 | gold_room | 4,5,6 |
| 1 | 2 | silver_room | 1,2,3 |
| 2 | 2 | silver_room | 1,2,3 |
| 3 | 2 | silver_room | 1,2,3 |
+---+----+-------------+-------+
Use GROUP BY to reduce these rows to one row per room. Use GROUP_CONCAT() to put the integers together into a comma-separated list.
mysql> select room, group_concat(distinct n.n order by n.n) as csv
from n inner join rooms on find_in_set(n.n, rooms.csv) group by rooms.room
+-------------+-------------+
| room | csv |
+-------------+-------------+
| brown_room | 2,3,4,6 |
| gold_room | 1,2,3,4,5,6 |
| silver_room | 1,2,3 |
+-------------+-------------+
I think this is a lot of work, and impossible to optimize. I don't recommend it.
The problem is that you are storing comma-separated lists of numbers, and then you want to query it as if the elements in the list are discrete values. This is a problem for SQL.
It would be much better if you did not store your numbers in a comma-separated list. Store multiple rows per room, with one number per row. You can run a wider variety of queries if you do this, and it will be more flexible.
For example, the query you asked about, to produce a result with numbers in a comma-separated list is more simple, and you don't need the extra n table:
select room, group_concat(n order by n) as csv from rooms group by room
See also my answer to Is storing a delimited list in a database column really that bad?

How do I select records in MySQL with multiple columns matching map of values?

I have the following 3-column table:
+----+---------+------------+
| ID | First | Last |
+----+---------+------------+
| 1 | Maurice | Richard |
| 2 | Yvan | Cournoyer |
| 3 | Carey | Price |
| 4 | Guy | Lafleur |
| 5 | Steve | Shutt |
+----+---------+------------+
If I want to look for everyone in (Maurice,Guy) I can do select * from table where first in (Maurice,Guy).
If I want to find just Maurice Richard, I can do select * from table where first = "Maurice" and last = "Richard".
How do I do a map, an array of multiples?
[
[Maurice, Richard]
[Guy,Lafleur]
[Yvan,Cournoyer]
]
If I have an arbitrary number of entries, I cannot construct a long complex where (first = "Maurice" and last = "Richard") or (first = "Guy" and last = "Lafleur") or .....
How do I do the moral equivalent of where (first, last) in ((Guy,Lafleur),(Maurice,Richard)) ?
You can do it just like you describe it:
SELECT *
FROM mytable
WHERE (first, last) IN (('Guy','Lafleur'),('Maurice','Richard'))
Demo here

A better way to search for tags in mysql table

Say I have a table and one of the columns is titled tags with data that is comma separated like this.
"tag1,tag2,new york,tag4"
As you can see, some of tags will have spaces.
Whats the best or most accurate way of querying the table for any tags that are equal to "new york"?
In the past I've used:
SELECT id WHERE find_in_set('new york',tags) <> 0
But find_in_set does not work when the value has a space.
I'm currently using this:
SELECT id WHERE concat(',',tags,',') LIKE concat(',%new york%,')
But I'm not sure if this is the best approach.
How would you do it?
When Item A can be associated with many of item B, and item B can be associated with many of item A. This is called Many to many relationship
Data with these relationship should be stored in separate table and join together only on query.
Examble
Table 1
| product_uid | price | amount |
| 1 | 12000 | 3000 |
| 2 | 30000 | 600 |
Table 2
| tag_uid | tag_value |
| 1 | tag_01 |
| 2 | tag_02 |
| 3 | tag_03 |
| 4 | tag_04 |
Then we use a join table to relate them
Table 3
| entry_uid | product_uid | tag_uid |
| 1 | 1 | 3 |
| 2 | 1 | 4 |
| 3 | 2 | 1 |
| 4 | 2 | 2 |
| 5 | 4 | 2 |
The query will be (If you want to select item one and the tag)
SELECT t1.*, t2.tag_value
FROM Table1 as t1,
JOIN Table3 as join_table ON t1.product_uid = join_table.product_uid
JOIN Table2 as t2 ON t2.tag_uid = join_table.tag_uid
WHERE t1.product_uid = 1
If I needed to ignore the leading spaces before and after the commas in tags.
For example, if tags had a value of:
'atlanta,boston , chicago, los angeles , new york '
and assuming spaces are the only character I want to ignore, and the tag I'm searching for doesn't have any leading or trailing spaces, then I'd likely use a regular expression. Something like this:
SELECT ...
FROM t
WHERE t.tags REGEXP CONCAT('^|, *', 'new york' ,' *,|$')
I recommend Bill Karwin's excellent book "SQL Antipatterns: Avoiding the Pitfalls of Database Programming"
https://www.amazon.com/SQL-Antipatterns-Programming-Pragmatic-Programmers/dp/1934356557
Chapter 2 Jaywalking covers the antipattern of comma separated lists.

SQL 'COUNT' not returning what I expect, and somehow limiting results to one row

Some background: an 'image' is part of one 'photoshoot', and may be a part of zero or many 'galleries'. My tables:
'shoots' table:
+----+--------------+
| id | name |
+----+--------------+
| 1 | Test shoot |
| 2 | Another test |
| 3 | Final test |
+----+--------------+
'images' table:
+----+-------------------+------------------+
| id | original_filename | storage_location |
+----+-------------------+------------------+
| 1 | test.jpg | store/test.jpg |
| 2 | test.jpg | store/test.jpg |
| 3 | test.jpg | store/test.jpg |
+----+-------------------+------------------+
'shoot_images' table:
+----------+----------+
| shoot_id | image_id |
+----------+----------+
| 1 | 1 |
| 1 | 2 |
| 3 | 3 |
+----------+----------+
'gallery_images' table:
+------------+----------+
| gallery_id | image_id |
+------------+----------+
| 1 | 1 |
| 1 | 2 |
| 2 | 3 |
| 3 | 1 |
| 4 | 1 |
+------------+----------+
What I'd like to get back, so I can say 'For this photoshoot, there are X images in total, and these images are featured in Y galleries:
+----+--------------+-------------+---------------+
| id | name | image_count | gallery_count |
+----+--------------+-------------+---------------+
| 3 | Final test | 1 | 1 |
| 2 | Another test | 0 | 0 |
| 1 | Test shoot | 2 | 4 |
+----+--------------+-------------+---------------+
I'm currently trying the SQL below, which appears to work correctly but only ever returns one row. I can't work out why this is happening. Curiously, the below also returns a row even when 'shoots' is empty.
SELECT shoots.id,
shoots.name,
COUNT(DISTINCT shoot_images.image_id) AS image_count,
COUNT(DISTINCT gallery_images.gallery_id) AS gallery_count
FROM shoots
LEFT JOIN shoot_images ON shoots.id=shoot_images.shoot_id
LEFT JOIN gallery_images ON shoot_images.image_id=gallery_images.image_id
ORDER BY shoots.id DESC
Thanks for taking the time to look at this :)
You are missing the GROUP BY clause:
SELECT
shoots.id,
shoots.name,
COUNT(DISTINCT shoot_images.image_id) AS image_count,
COUNT(DISTINCT gallery_images.gallery_id) AS gallery_count
FROM shoots
LEFT JOIN shoot_images ON shoots.id=shoot_images.shoot_id
LEFT JOIN gallery_images ON shoot_images.image_id=gallery_images.image_id
GROUP BY 1, 2 -- Added this line
ORDER BY shoots.id DESC
Note: The SQL standard allows GROUP BY to be given either column names or column numbers, so GROUP BY 1, 2 is equivalent to GROUP BY shoots.id, shoots.name in this case. There are many who consider this "bad coding practice" and advocate always using the column names, but I find it makes the code a lot more readable and maintainable and I've been writing SQL since before many users on this site were born, and it's never cause me a problem using this syntax.
FYI, the reason you were getting one row before, and not getting and error, is that in mysql, unlike any other database I know, you are allowed to omit the group by clause when using aggregating functions. In such cases, instead of throwing a syntax exception, mysql returns the first row for each unique combination of non-aggregate columns.
Although at first this may seem abhorrent to SQL purists, it can be incredibly handy!
You should look into the MySQL function group by.

MySQL: Sort by group and field

I have a table with the following (simplified) structure:
INT id,
INT type,
INT sort
What I need is a SELECT that sorts my data in a way, so that:
all rows of the same type are in sequency, sorted ascendingly by sort internally, and
all "blocks" of one type are sorted by their minimum sort.
Example:
If the table looks like this:
| id | type | sort |
| 1 | 1 | 3 |
| 2 | 3 | 5 |
| 3 | 3 | 1 |
| 4 | 2 | 4 |
| 5 | 1 | 2 |
| 6 | 2 | 6 |
The query should sort the result like this:
| id | type | sort |
| 3 | 3 | 1 |
| 2 | 3 | 5 |
| 5 | 1 | 2 |
| 1 | 1 | 3 |
| 4 | 2 | 4 |
| 6 | 2 | 6 |
I hope this makes it clear enough.
Looks to me, as this should be a very common requirement, but I didn't find any examples close enough to be able to transfer it to my use case on my own. I suppose I can't avoid at least one subquery, but I didn't figure it out on my own.
Any help is appreciated, thanks in advance.
By the way: I'm going to use this query with CakePHP 2.1, so if you know of a comfortable way to do it with Cake, please let me know.
This is simpler than it initially sounds. I believe the following should do the trick:
SELECT a.id, a.type, a.sort
FROM Some_Table as a
JOIN (SELECT type, MIN(sort) as min
FROM Some_Table
GROUP BY type) as b
ON b.type = a.type
ORDER BY b.min, a.type, a.sort
For best (fastest) results, you're probably going to want an index on (type, sort).
You want an additional sort by a.type (instead of (b.min, a.sort)), in case there are two groups with the same sort value (would result in mixed rows). If there are no duplicate values, you can remove it.
sort and type are reserved words on some databases and can cause you problems.
Have you tried?
ORDER BY TYPE DESC, SORT ASC