Converting sub-query to join to create a view - mysql

I have a nested SQL query :
SELECT *
FROM
(
SELECT *
FROM asset_status
ORDER BY session_id DESC
) tmp
GROUP BY asset_id, workflow_element_id
I would like to create a view from this query but MySQL doesn't seem to allow subqueries in views. How can convert this to a join?

SQL Server does allow sub-queries in views. What you can't do, is SELECT * and GROUP BY a, b
Have you tried... (I'll assume this isn't your whole query so I'll make the minimum possible changes)
SELECT asset_id, workflow_element_id
FROM
(
SELECT *
FROM asset_status
-- ORDER BY session_id DESC (Removed as innefective in a view)
) tmp
GROUP BY asset_id, workflow_element_id
Also, note that the ORDER BY in the inner query is innefective (and possibly even dis-allowed), as the outer query is then allowed to re-order it (it won't always come back in a different order, but this layout doesn't guarnatee the order you seem to want). Even in the outer query, it may cause your results to be order when using the view, but again the optimiser is allowed to re-order the results. Unless the ORDER BY is in the query using the view, the order is never absolutely guaranteed...
SELECT * FROM view ORDER BY x
Finally, you tagged this as a LEFT JOIN question. If you have a more complete example of the code, I'm sure someone will suggest an alternative layout. But I'm off out for a few days now. Good luck! :)

There is no need for subquery since the inner order by is not guaranteed to be used at all. You can write:
SELECT DISTINCT asset_id, workflow_element_id
FROM asset_status
If you need to order by session_id you would have to include it in an aggregate, max for example. (or in the group by)
SELECT asset_id, workflow_element_id
FROM asset_status
GROUP BY asset_id, workflow_element_id
ORDER BY MAX(session_id) DESC

According to the MySQL reference manual, you can create views that use sub-queries, but not in the From clause.
Therefore, I think you need to create your view like the following:
select a.*
from asset_status a
join (select asset_id, workflow_element_id, MAX(session_id) session_id
from asset_status
group by asset_id, workflow_element_id) sq
on a.session_id = sq.session_id
However, it probably won't perform as well as your original query.

Related

The last value using GROUP BY

I need to take the last value from table where can_id equal.
So I've tried this SQL query
SELECT com.text, com.can_id
FROM (SELECT * FROM comments ORDER BY id DESC) as com
GROUP BY com.can_id
But if I change ASC / DESC in the first select, the second select will just group without sorting and take the value with the first id
This select will be used like left join in the query.
Example:
I need to get com.text with value "text2" (lasts)
If you are on MySql 8, you can use row_number:
SELECT com.text, com.can_id
FROM (SELECT comments.*,
row_number() over (partition by can_id order by id desc) rn
FROM comments) as com
WHERE rn = 1;
If you are on MySql 5.6+, you can (ab)use group_concat:
SELECT SUBSTRING_INDEX(group_concat(text order by id desc), ',', 1),
can_id
FROM comments
GROUP BY can_id;
In any version of MySQL, the following will work:
SELECT c.*
FROM comments c
WHERE c.id = (SELECT MAX(c2.id)
FROM comments c2
WHERE c2.can_id = c.can_id
);
With an index on comments(can_id, id), this should also have the best performance.
This is better than a group by approach because it can make use of an index and is not limited by some internal limitation on intermediate string lengths.
This should have better performance than row_number() because it does not assign a row number to each row, only then to filter things out.
The order by clause in the inner select is redundant since it's being used as a table, and tables in a relational database are unordered by nature.
While other databases such as SQL Server will treat is as an error, I guess MySql simply ignores it.
I think you are looking for something like this:
SELECT text, can_id
FROM comments
ORDER BY id DESC
LIMIT 1
This way you get the text and can_id associated with the highest id value.

Two select statements on same table and get Count(*)

Im trying to do two queries on the same table to get the Count(*) value.
I have this
SELECT `a`.`name`, `a`.`points` FROM `rank` AS a WHERE `id` = 1
And in the same query I want to do this
SELECT `b`.`Count(*)` FROM `rank` as b WHERE `b`.`points` >= `a`.`points`
I tried searching but did not find how to do a Count(*) in the same query.
Typically you would not intermingle a non aggregate and aggregate query together in MySQL. You might do this in databases which support analytic functions, such as SQL Server, but not in (the current version of) MySQL. That being said, your second query can be handled using a correlated subquery in the select clause the first query. So you may try the following:
SELECT
a.name,
a.points,
(SELECT COUNT(*) FROM rank b WHERE b.points >= a.points) AS cnt
FROM rank a
WHERE a.id = 1;
As I understand from the question, you want to find out in a table for a given id how many rows have the points greater than this row. This can be achieved using full join.
select count(*) from rank a join rank b on(a.id != b.id) where a.id=1 and b.points >= a.points;

MySQL 5.7 How to do GROUP BY with sorting?

Similar to this issue: MySQL 5.7 group by latest record
I'm not sure how to do this properly in 5.7. Also with possibility of 2nd sort column. Working query in 5.6 that I'm trying to replicate in 5.7:
SELECT id FROM test
GROUP BY category
ORDER BY sort1 DESC, sort2 DESC
id is not always the highest, so MAX(id) does not work.
Looking into the link above, the solution for single sort should be:
SELECT t1.*
FROM test t1
INNER JOIN (
SELECT category, max(sort) AS sort FROM test GROUP BY category
) t2 ON t2.category = t1.category AND t2.sort = t1.sort
But how will it work with 2 sorting?
You are using GROUP BY the wrong way.
Think of group by as a way to separate data row into different groups. Each group has multiple rows, based on the value of group by column.
Once you get those groups, selecting table columns (as in: select *) is like picking any row from that group randomly. This is not helpful nor useful.
Usually once we group records (or rows), we need to find meta information about those records. For example: get us the count of records in that group (as in: select count(*)), or the sum of values of a specific column in that group (as in: select sum(price)), or get the min, max or avg values.
So in a nutshell, when you use group by you should use on of the aggregation functions with it, otherwise it's not going to do you any good.
Why don't you have the ORDER BY at your outer query, instead?
SELECT *
FROM (
SELECT 100 AS id, 1 AS category, NULL AS sort
UNION
SELECT 200 AS id, 1 AS category, 2 AS sort
) dt
GROUP BY category
ORDER BY sort DESC;
It seems that what happened to the data when it was grouped, it took the first data while neglecting the ORDER BY DESC. On your first query, it ordered descending first then group by took the first record which is 200. And yes, this shouldn't be the way you should use GROUP BY. It is used in conjunction with aggregate functions.
when you select a column in a group by query that is not one of the columns you are grouping by, (ie, your id) you have no control over the value unless you use another aggregate function. If you want to sort, use MIN or MAX:
SELECT MAX(id), category, FROM `test2`
GROUP BY category; -- always returns 200
SELECT MIN(id), category, FROM `test2`
GROUP BY category; -- always returns 100

Do I need inner ORDER BY when there is an outer ORDER BY?

Here is my query:
( SELECT id, table_code, seen, date_time FROM events
WHERE author_id = ? AND seen IS NULL
) UNION
( SELECT id, table_code, seen, date_time FROM events
WHERE author_id = ? AND seen IS NOT NULL
LIMIT 2
) UNION
( SELECT id, table_code, seen, date_time FROM events
WHERE author_id = ?
ORDER BY (seen IS NULL) desc, date_time desc -- inner ORDER BY
LIMIT 15
)
ORDER BY (seen IS NULL) desc, date_time desc; -- outer ORDER BY
As you see there is an outer ORDER BY and also one of those subqueries has its own ORDER BY. I believe that ORDER BY in subquery is useless because final result will be sorted by that outer one. Am I right? Or that inner ORDER BY has any effect on the sorting?
Also my second question about query above: in reality I just need id and table_code. I've selected seen and date_time just for that outer ORDER BY, Can I do that better?
You need the inner order by when you have a limit in the query. So, the third subquery is choosing 15 rows based on the order by.
In general, when you have limit, you should be using order by. This is particularly true if you are learning databases. You might seem to get the right answer -- and then be very surprised when it doesn't work at some later point in time. Just because something seems to work doesn't mean that it is guaranteed to work.
The outer order by just sorts all the rows returned by the subqueries.

Optimizing database query with up to 10mil rows as result

I have a MySQL Query that i need to optimize as much as possible (should have a load time below 5s, if possible)
Query is as follow:
SELECT domain_id, COUNT(keyword_id) as total_count
FROM tableName
WHERE keyword_id IN (SELECT DISTINCT keyword_id FROM tableName WHERE domain_id = X)
GROUP BY domain_id
ORDER BY total_count DESC
LIMIT ...
X is an integer that comes from an input
domain_id and keyword_id are indexed
database is on localhost, so the network speed should be max
The subquery from the WHERE clause can get up to 10 mil results. Also, for MySQL seems really hard to calculate the COUNT and ORDER BY this count.
I tried to mix this query with SOLR, but no results, getting such a high number of rows at once gives hard time for both MySQL and SOLR
I'm looking for a solution to have the same results, no matter if i have to use a different technology or an improvement to this MySQL query.
Thanks!
Query logic is this:
We have a domain and we are searching for all the keywords that are being used on that domain (this is the sub query). Then we take all the domains that use at least one of the keywords found on the first query, grouped by domain, with the number of keywords used for each domain, and we have to display it ordered DESC by the number of keywords used.
I hope this make sense
You may try JOIN instead of subquery:
SELECT tableName.domain_id, COUNT(tableName.keyword_id) AS total_count
FROM tableName
INNER JOIN tableName AS rejoin
ON rejoin.keyword_id = tableName.keyword_id
WHERE rejoin.domain_id = X
GROUP BY tableName.domain_id
ORDER BY tableName.total_count DESC
LIMIT ...
I am not 100% sure but can you try this please
SELECT t1.domain_id, COUNT(t1.keyword_id) as total_count
FROM tableName AS t1 LEFT JOIN
(SELECT DISTINCT keyword_id FROM tableName WHERE domain_id = X) AS t2
ON t1.keyword_id = t2.keyword_id
WHERE t2.keyword_id IS NTO NULL
GROUP BY t1.domain_id
ORDER BY total_count DESC
LIMIT ...
The goal is to replace the WHERE IN clause with INNER JOIN and that will make it lot quicker. WHERE IN clause always make the Mysql server to struggle, but it is even more noticeable when you do it with huge amount of data. Use WHERE IN only if it make you query look easier to be read/understood, you have a small data set or it is not possible in another way (but you probably will have another way to do it anyway :) )
In terms of MySQL all you can do is to minimize Disk IO for the query using covering indexes and rewrite it a little more efficient so that the query would benefit from them.
Since keyword_id has a match in another copy of the table, COUNT(keyword_id) becomes COUNT(*).
The kind of subqueries you use is known to be the worst case for MySQL (it executes the subquery for each row), but I am not sure if it should be replaced with a JOIN here, because It might be a proper strategy for your data.
As you probably understand, the query like:
SELECT domain_id, COUNT(*) as total_count
FROM tableName
WHERE keyword_id IN (X,Y,Z)
GROUP BY domain_id
ORDER BY total_count DESC
would have the best performance with a covering composite index (keyword_id, domain_id [,...]), so it is a must. From the other side, the query like:
SELECT DISTINCT keyword_id FROM tableName WHERE domain_id = X
will have the best performance on a covering composite index (domain_id, keyword_id [,...]). So you need both of them.
Hopefully, but I am not sure, when you have the latter index, MySQL can understand that you do not need to select all those keyword_id in the subquery, but you just need to check if there is an entry in the index, and I am sure that it is better expressed if you do not use DISTINCT.
So, I would try to add those two indexes and rewrite the query as:
SELECT domain_id, COUNT(*) as total_count
FROM tableName
WHERE keyword_id IN (SELECT keyword_id FROM tableName WHERE domain_id = X)
GROUP BY domain_id
ORDER BY total_count DESC
Another option is to rewrite the query as follows:
SELECT domain_id, COUNT(*) as total_count
FROM (
SELECT DISTINCT keyword_id
FROM tableName
WHERE domain_id = X
) as kw
JOIN tableName USING (keyword_id)
GROUP BY domain_id
ORDER BY total_count DESC
Once again you need those two composite indexes.
Which one of the queries is quicker depends on the statistics in your tableName.