Using REGEXP vs IN on a subquery mysql - mysql

I want to use the data from table 'similar' to find results from table 'releases'
Table 'Similar' has this structure
artist similar_artist
Moodymann Theo Parrish
Moodymann Jeff Mills
Moodymann Marcellus Pittman
Moodymann Rick Wilhite
My query so far is
SELECT * FROM releases
WHERE
releases.all_artists REGEXP 'Moodymann'
OR releases.label_no_country='KDJ'
OR releases.all_artists IN (SELECT similar_artist
FROM similar
WHERE artist='Moodymann')
ORDER BY date DESC
the column 'all_artists' has records like this:
Moodymann | Theo Parrish | Rick Wade
Jeff Mills | Moodymann | Rick Wilhite
So the end query that I want will essentially be this
SELECT * FROM releases
WHERE
releases.all_artists REGEXP 'Moodymann'
OR releases.label_no_country='KDJ'
OR releases.all_artists IN ('Theo Parrish','Jeff Mills','Marcellus Pittman','Rick Wilhite')
To make matches I think I need to use REGEXP instead of IN - REGEXP returns the 'Subquery returns more than 1 row'. How can use the data returned from the subquery?
Also the query is taking a long time to run (up to 20 seconds) - is there anyway to speed this up as this is not usable in my web app.
Thanks!

The only way I would know of how to use REGEXP with a subquery, would be to use that subquery to produce a REGEXP string.
SELECT * FROM releases
WHERE
releases.all_artists REGEXP 'Moodymann'
OR releases.label_no_country='KDJ'
OR releases.all_artists REGEXP (
SELECT GROUP_CONCAT(similar_artist SEPARATOR '|')
FROM similar
WHERE artist='Moodymann'
GROUP BY similar_artist)
ORDER BY date DESC
The above isn't tested, is just a theory to what I might try. It's not going to be very optimal however.
update
Have since tested this and found that GROUP BY similar_artist should be GROUP BY artist
SELECT * FROM releases
WHERE
releases.all_artists REGEXP 'Moodymann'
OR releases.label_no_country='KDJ'
OR releases.all_artists REGEXP (
SELECT GROUP_CONCAT(similar_artist SEPARATOR '|')
FROM similar
WHERE artist='Moodymann'
GROUP BY artist)
ORDER BY date DESC
However, as mentioned by Pheonix you would be better off refactoring your structure to have a releases_artist table. You could then do all this work via JOINs which would be much, much faster.

Try this SQL
SELECT *
FROM releases
WHERE releases.all_artists LIKE '%Moodymann%'
OR releases.label_no_country='KDJ'
ORDER BY date DESC
SQL Fiddle
MySQL 5.5.30 Schema Setup:
CREATE TABLE Table1
(`artist` varchar(9), `similar_artist` varchar(17))
;
INSERT INTO Table1
(`artist`, `similar_artist`)
VALUES
('Moodymann', 'Theo Parrish'),
('Moodymann', 'Jeff Mills'),
('Moodymann', 'Marcellus Pittman'),
('Moodymann', 'Rick Wilhite')
;
create table allt(allf varchar(50));
insert into allt values('Moodymann | Theo Parrish | Rick Wade'),
('Jeff Mills | Moodymann | Rick Wilhite'),
('Jeff Mills | asdasdadasd | Rick Wilhite');
Query 1:
SELECT *
FROM allt
WHERE allt.allf LIKE '%Moodymann%'
Results:
| ALLF |
-----------------------------------------
| Moodymann | Theo Parrish | Rick Wade |
| Jeff Mills | Moodymann | Rick Wilhite |

You can do a join on a comma separated list (won't be fast, but might be quicker than using LIKE with a leading wild card), and you can replace your existing delimiter with a comma to allow this. Also you can use a load of UNIONs to get your list of artists to behave like a table to do a join on.
Further you can use union instead of your other WHERE clauses which might well help with allowing the use of indexes (MySQL will only use one index per table in a query, hence using OR to query on a different column forces it to not use an index for one of the columns it is checking).
As such you can do something like the following:-
SELECT releases.*
FROM releases
INNER JOIN (SELECT 'Theo Parrish' AS anArtist UNION SELECT 'Jeff Mills' UNION SELECT 'Marcellus Pittman' UNION SELECT 'Rick Wilhite') Sub1
ON FIND_IN_SET(Sub1.anArtist, REPLACE(releases.all_artists, " | ", ",")) > 0
UNION
SELECT releases.*
FROM releases
WHERE releases.label_no_country='KDJ'
However if changing the database design to split the pipe separated list of artists onto a different table is even a slight option then do that instead. It will be far quicker and will cope with far greater numbers of artists.

Related

Union as sub query using MySQL 8

I'm wanting to optimize a query using a union as a sub query.
Im not really sure how to construct the query though.
I'm using MYSQL 8.0.12
Here is the original query:
---------------
| c1 | c2 |
---------------
| 18182 | 0 |
| 18015 | 0 |
---------------
2 rows in set (0.35 sec)
I'm sorry but the question doesn't stored if I paste the sql query as text and format using ctrl+k
Output expected
---------------
| c1 | c2 |
---------------
| 18182 | 167 |
| 18015 | 0 |
---------------
As a output I would like to have the difference of rows between the two tables in UNION ALL.
I processed this question using the wizard https://stackoverflow.com/questions/ask
Since a parenthesized SELECT can be used almost anywhere a expression can go:
SELECT
ABS( (SELECT COUNT(*) FROM tbl_aaa) -
(SELECT COUNT(*) FROM tbl_bbb) ) AS diff;
Also, MySQL is happy to allow a SELECT without a FROM.
There are several ways to go for this, including UNION, but I wouldn't recommend it, as it is IMO a bit 'hacky'. Instead, I suggest you use subqueries or use CTEs.
With subqueries
SELECT
ABS(c_tbl_aaa.size - c_tbl_bbb.size) as diff
FROM (
SELECT
COUNT(*) as size
FROM tbl_aaa
) c_tbl_aaa
CROSS JOIN (
SELECT
COUNT(*) as size
FROM tbl_bbb
) c_tbl_bbb
With CTEs, also known as WITHs
WITH c_tbl_aaa AS (
SELECT
COUNT(*) as size
FROM tbl_aaa
), c_tbl_bbb AS (
SELECT
COUNT(*) as size
FROM tbl_bbb
)
SELECT
ABS(c_tbl_aaa.size - c_tbl_bbb.size) as diff
FROM c_tbl_aaa
CROSS JOIN c_tbl_bbb
In a practical sense, they are the same. Depending on the needs, you might want to define and join the results though, and in said cases, you could use a single number as a "pseudo id" in the select statement.
Since you only want to know the differences, I used the ABS function, which returns the absolute value of a number.
Let me know if you want a solution with UNIONs anyway.
Edit: As #Rick James pointed out, COUNT(*) should be used in the subqueries to count the number of rows, as COUNT(id_***) will only count the rows with non-null values in that field.

Mysql-> Group after rand()

I have the following table in Mysql
Name Age Group
abel 7 A
joe 6 A
Rick 7 A
Diana 5 B
Billy 6 B
Pat 5 B
I want to randomize the rows, but they should still remain grouped by the Group column.
For exmaple i want my result to look something like this.
Name Age Group
joe 6 A
abel 7 A
Rick 7 A
Billy 6 B
Pat 5 B
Diana 5 B
What query should i use to get this result? The entire table should be randomised and then grouped by "Group" column.
What you describe in your question as GROUPing is more correctly described as sorting. This is a particular issue when talking about SQL databases where "GROUP" means something quite different and determines the scope of aggregation operations.
Indeed "group" is a reserved word in SQL, so although mysql and some other SQL databases can work around this, it is a poor choice as an attribute name.
SELECT *
FROM yourtable
ORDER BY `group`
Using random values also has a lot of semantic confusion. A truly random number would have a different value every time it is retrieved - which would make any sorting impossible (and databases do a lot of sorting which is normally invisible to the user). As long as the implementation uses a finite time algorithm such as quicksort that shouldn't be a problem - but a bubble sort would never finish, and a merge sort could get very confused.
There are also degrees of randomness. There are different algorithms for generating random numbers. For encryption it's critical than the random numbers be evenly distributed and completely unpredictable - often these will use hardware events (sometimes even dedicated hardware) but I don't expect you would need that. But do you want the ordering to be repeatable across invocations?
SELECT *
FROM yourtable
ORDER BY `group`, RAND()
...will give different results each time.
OTOH
SELECT
FROM yourtable
ORDER BY `group`, MD5(CONCAT(age, name, `group`))
...would give the results always sorted in the same order. While
SELECT
FROM yourtable
ORDER BY `group`, MD5(CONCAT(DATE(), age, name, `group`))
...will give different results on different days.
DROP TABLE my_table;
CREATE TABLE my_table
(name VARCHAR(12) NOT NULL
,age INT NOT NULL
,my_group CHAR(1) NOT NULL
);
INSERT INTO my_table VALUES
('Abel',7,'A'),
('Joe',6,'A'),
('Rick',7,'A'),
('Diana',5,'B'),
('Billy',6,'B'),
('Pat',5,'B');
SELECT * FROM my_table ORDER BY my_group,RAND();
+-------+-----+----------+
| name | age | my_group |
+-------+-----+----------+
| Joe | 6 | A |
| Abel | 7 | A |
| Rick | 7 | A |
| Pat | 5 | B |
| Diana | 5 | B |
| Billy | 6 | B |
+-------+-----+----------+
Do the random first then sort by column group.
select Name, Age, Group
from (
select *
FROM yourtable
order by RAND()
) t
order by Group
Try this:
SELECT * FROM table order by Group,rand()

Aggregate Text data using SQL

I have the following data:
Name | Condition
Mike | Good
Mike | Good
Steve | Good
Steve | Alright
Joe | Good
Joe | Bad
I want to write an if statement, if Bad exists, I want to classify the name as Bad. If Bad does not exist but Alright Exists, then classify as Alright. If only Good exists, then classify as good.
So my data would turn into:
Name | Condition
Mike | Good
Steve | Alright
Joe | Bad
Is this possible in SQL?
An Access query would be easy if you first create a table which maps Condition to a rank number.
Condition rank
--------- ----
Bad 1
Alright 2
Good 3
Then a GROUP BY query would give you the minimum rank for each Name:
SELECT y.Name, Min(c1.rank) AS MinOfrank
FROM
[YourTable] AS y
INNER JOIN conditions AS c1
ON y.Condition = c1.Condition
GROUP BY y.Name;
If you want to display the Condition string for those ranks, join back to the conditions table again:
SELECT sub.Name, sub.MinOfrank, c2.Condition
FROM
(
SELECT y.Name, Min(c1.rank) AS MinOfrank
FROM
[YourTable] AS y
INNER JOIN conditions AS c1
ON y.Condition = c1.Condition
GROUP BY y.Name
) AS sub
INNER JOIN conditions AS c2
ON sub.MinOfrank = c2.rank;
Performance should be fine with indexes on those conditions fields.
Seems to me this approach could also work in those other databases (MySQL and SQL Server) tagged in the question.
You can use a case statement to rank the conditions then max() or min() to summarize the results before returning them back to the user in the same format.
Query:
SELECT [Name]
, case min(case condition when 'bad' then 0 when 'alright' then 1 else 2 end)
when 0 then 'bad' when 1 then 'alright' when 2 then 'good' end as Condition
from mytable
group by [name]
mysql has an IF - function.
Here, have a look at it: https://dev.mysql.com/doc/refman/5.1/en/control-flow-functions.html#function_if

How to use MySQL regexp with 'and' 'or' in one statement

I am using MySQL 5.x, and am trying to come out with a SQL statement to select rows base on the following datasets
ID | Type | Name
1 | Silver | Customer A
2 | Golden | Customer B
3 | Silver, Golden | Customer C
4 | Bronze, Silver | Customer D
I need to use regexp (Legacy system reasons) in the SQL statement, where I need to only select ID=1 and ID=4, which means I need "Silver", "Silver with Bronze" customer type, but not "Silver + Golden"
I am not very familiar with regular expressions, been trying with SQL like below:
SELECT DISTINCT `customer_type` FROM `customers` WHERE
`customer_type` regexp
"(Silver.*)(^[Golden].*)"
Where I need to have the regular expressions in one place like above, but not like below:
SELECT DISTINCT `customer_type` FROM `customers` WHERE
`customer_type` regexp
"(Silver.*)"
AND NOT
customer_type` regexp
"(Golden.*)"
Although LIKE will work, but I can't use it for special reasons.
SELECT DISTINCT `customer_type` FROM `customers` WHERE
`customer_type` LIKE "%Silver%"
AND NOT
customer_type` LIKE "%Golden%"
I couldn't get the first SQL statement to work, and not sure even if that is possible.
Just try these one:
SELECT DISTINCT `id`, `customer_type`
FROM `customers`
WHERE `customer_type` regexp "^.*Silver$"
This matches "anything + Silver" or just Silver.

Multiple where in (select) slow

I'm creating a procedure that searches for result in a database with about 700k rows.
It looks like this (somewhat simplified):
BEGIN
SET `page` = `page`*8;
SELECT
businessName
FROM tbl_business as bs
INNER JOIN tbl_categories as ct
ON ct.catId = bs.catId
WHERE
(
MATCH(businessName, businessShortDescription) AGAINST (`q`)
OR
businessName like CONCAT(`q`, '%')
)
AND
(
bs.catId IN (SELECT * FROM tmp_search_child_cats)
OR
ct.catPId IN (SELECT * FROM tmp_search_parent_cats)
)
ORDER BY score DESC
LIMIT `page`, 8;
END
catPId is parent ID, which menas that a category can have a parent. Terrible solution, but I'm working with an old db.
I need this to run lightning fast (in a couple of milliseconds).
The temporary tables have only one column with category ID's, like so:
|-- cat --|
| 5 |
| 234 |
| 9 |
|---------|
Any thoughts?
I know this might not be perfect, but what I ended up doing was to actually join the input into a string, and add that to the WHERE IN().
e.g. WHERE IN (3,4,5) etc.
However, if one does go for this approach, be very mindful of SQL injection.