why does sub query with group by do full scan twice?

why does sub query with group by do full scan twice? - mysql

I am testing two types of queries.
The first type looks like below:
explain select * from ord_order;
explain select * from (select * from ord_order) as tbl;
These two execution plan shows the same behavior (full scan once).
However, the second type looks like below:
explain select * from ord_order
group by bundle_or_order_number;
explain select * from
(select * from ord_order
group by bundle_or_order_number) as tbl;
the second query do the full scan twice!
Can someone explain it? Thanks.

First, it doesn't matter because your queries are malformed. Don't use select * with group by. It just doesn't make sense -- and current versions of MySQL do not support it. The question is: What rows do the columns come from? You should be using aggregation functions.
Why are the two queries different? In MySQL lingo, the difference is whether or not the derived table (a subquery in the from clause) is materialized. Whether or not subqueries are materialized depends on the nature of the subquery and what MySQL decides to do in the version you are using.
You can read about optimizing derived tables in the documentation.

Related

Problem request between MySql And MariaDB

I need you help to resolve a problem..
Before, my website used MySql 5.5
Now, it seems to use MariaDB 10.0
I found no difference but...
This request (I have simplified the request for a better understanding)
select * from ( select * from MYTABLE ) tmpTable ORDER BY tmpTable.id DESC
This request WORKS on Mysql and MariaDB
BUT...
select * from ( select * from MYTABLE ORDER BY tmpTable.id DESC) tmpTable
I think if my order by is inside my seconde select, he is not concidered
This request DOESN'T WORK ! Result is good, but ORDER BY doesn't work ... It's order by ASCENDING and not DESCENDING like I specified in my second select ...
Someone understand why ? Is it a difference between mysql and Maria DB?
Thanks a lot !
Have a good day

In SQL, rows of a table have no pre-defined order. You need order by to sort a record set.
What happens with the second query is that the subquery creates a derived table that is then used in the outer query. The fact that you order the rows in the subquery does not make a difference: from the perspective of the outer query, rows of the derived table have no inherent ordering.
In other words there is no guarantee that the inner sort propagates to the outer scope. If you want the resultset to be consistently sorted, use order by in the outer scope.

why there is performance difference when retrieving data from view vs underlying select of that view

I am doing query on view with single predicates which gives me the record in 4-7 seconds, but when i try to retrieve the record with same predicate and directly with underlying query from that view it gives me records in less then seconds. I am using MySQL.
I have tried checking the execution plan of both the query and it gives major differences if i have hundreds of thousands of records in tables.
So any clue or idea why performance is better when using query directly?
Following is my view definition
SELECT entity_info.source_entity_info_id AS event_sync_id,
entity_info.source_system_id AS source_system_id,
entity_info.target_system_id AS destination_system_id,
event_sync_info.integrationid AS integration_id,
event_sync_info.source_update_time AS last_updated,
entity_info.source_internal_id AS source_entity_internal_id,
entity_info.source_entity_project AS source_entity_project,
entity_info.target_internal_id AS destination_entity_internal_id,
entity_info.destination_entity_project AS destination_entity_project,
entity_info.source_entity_type AS source_entity_type,
entity_info.destination_entity_type AS destination_entity_type,
event_sync_info.opshub_update_time AS opshub_update_time,
event_sync_info.entity_info_id AS entity_info_id,
entity_info.global_id AS global_id,
entity_info.target_entity_info_id AS target_entity_info_id,
entity_info.source_entity_info_id AS source_entity_info_id,
(
SELECT Count(0) AS count(*)
FROM ohrv_failed_event_view_count failed_event_view
WHERE ((
failed_event_view.integration_id = event_sync_info.integrationid)
AND (
failed_event_view.entityinfo = entity_info.source_entity_info_id))) AS no_of_failures
FROM (ohrv_entity_info entity_info
LEFT JOIN ohmt_eai_event_sync_info event_sync_info
ON ((
entity_info.source_entity_info_id = event_sync_info.entity_info_id)))
WHERE (
entity_info.source_entity_info_id IS NOT NULL)
Query examples
select * from view where integration_id=10
Execution plan of this processes 142668 rows for sub query that is there in this view
select QUERY_OF_VIEW and integration_id=10
Execution plan of this looks good and only required rows are getting processed.

I think the issue is in the following query:
SELECT * FROM view WHERE integration_id = 10;
This forces MySQL to materialize an intermediate table, against which it then has to query again to apply the restriction in the WHERE clause. On the other hand, in the second version:
SELECT (QUERY_OF_VIEW with WHERE integration_id = 10)
MySQL does not have to materialize anything other than the query in the view itself. That is, in your second version MySQL just has to execute the query in the view, without any subsequent subquery.

refereeing to this link of documentation you can see,that its depend on if the MERGE algorithm can used it will , but if its not applicable so new temp table must generated to find the relations of data, also you can see this answer that talking about optimization and when to use view and when you should not .
If the MERGE algorithm cannot be used, a temporary table must be used
instead. MERGE cannot be used if the view contains any of the
following constructs:
Aggregate functions (SUM(), MIN(), MAX(), COUNT(), and so forth)
DISTINCT
GROUP BY
HAVING
LIMIT
UNION or UNION ALL
Subquery in the select list
Refers only to literal values (in this case, there is no underlying
table)

EXISTS vs ALL, ANY, SOME

I'm trying to understand the difference between EXISTS and ALL in MySQL. Let me give you an example:
SELECT *
FROM table1
WHERE NOT EXISTS (
SELECT *
FROM table2
WHERE table2.val < table1.val
);
SELECT *
FROM table1
WHERE val <= ALL( SELECT val FROM table2 );
A quote from MySQL docs:
Traditionally, an EXISTS subquery starts with SELECT *, but it could
begin with SELECT 5 or SELECT column1 or anything at all. MySQL
ignores the SELECT list in such a subquery, so it makes no difference. [1]
Reading this, it seems to me that mysql should be able to translate both queries to the same relational algebra expression. Both queries are just a simple comparison between values from two tables. However, that doesn't seem to be the case. I tried both queries and the second one performs much better than the first one.
How are this queries exactly handled by the optimizer?
Why the optimizer can't make the first query perform as the second one?
Is it always more efficient to use an ALL/ANY/SOME condition?

The queries in your question are not equivalent, so they will have different execution plans regardless of how well they're optimized. If you used NOT val > ANY(...) then it would be equivalent.
You should always use EXPLAIN to see the execution plan of a query and realize that the execution plan can change as your data changes. Testing and understanding the execution plan will help you determine which methods perform better. There is no hard and fast rule for ALL/ANY/SOME and they're often optimized down to an EXISTS.

get total for limit in mysql using same query?

I am making a pagination method, what i did was:
First query will count all results and the second query will do the normal select with LIMIT
Is there technically any way to do this what I've done, but with only one query?
What I have now:
SELECT count(*) from table
SELECT * FROM table LIMIT 0,10

No one really mentions this, but the correct way of using the SQL_CALC_FOUND_ROWS technique is like this:
Perform your query: SELECT SQL_CALC_FOUND_ROWS * FROM `table` LIMIT 0, 10
Then run this query directly afterwards: SELECT FOUND_ROWS(). The result of this query contains the full count of the previous query, i.e. as if you hadn't used the LIMIT clause. This second query is instantly fast, because the result has already been cached.

You can do it with a subquery :
select
*,
(select count(*) from mytable) as total
from mytable LIMIT 0,10
But I don't think this has any kind of advantage.
edit: Like Ilya said, the total count and the rows have a totally different meaning, there's no real point in wanting to retrieve these data in the same query. I'll stick with the two queries. I just gave this answer for showing that this is possible, not that it is a good idea.

While i've seen some bad approaches to this, when i have looked into this previously there were two commonly accepted solutions:
Running your query and then running the same query with a count as you have done in your question.
Run your query and then run it again with the SQL_CALC_FOUND_ROWS keyword.
eg. SELECT SQL_CALC_FOUND_ROWS * FROM table LIMIT 0,10
This second approach is how phpMyAdmin does it.

You can run the first query and then the second query, and so you'll get both the count and the first results.
A query returns a set of records. The count is definitely not one of the records a "SELECT *" query can return, because there is only one count for the entire result set.
Anyway, you didn't say what programming language you run these SQL queries from and what interface you're using. Maybe this option exists in the interface.

SELECT SQL_CALC_FOUND_ROWS
your query here
Limit ...
Without running any other queries or destroying the session then run
SELECT FOUND_ROWS();
And you will get the total count of rows

The problem with 2 queries is consistency. In the time between the 2 queries the data may be changed. If your logic depends on what is counted has to be returned then your only approach would be to use SQL_CALC_FOUND_ROWS.
As of Mysql 8.0.17 SQL_CALC_FOUND_ROWS and FOUND_ROWS() will be deprecated. I don't know of a consistent solution after this functionality has been removed.

sql group by versus distinct

Why would someone use a group by versus distinct when there are no aggregations done in the query?
Also, does someone know the group by versus distinct performance considerations in MySQL and SQL Server. I'm guessing that SQL Server has a better optimizer and they might be close to equivalent there, but in MySQL, I expect a significant performance advantage to distinct.
I'm interested in dba answers.
EDIT:
Bill's post is interesting, but not applicable. Let me be more specific...
select a, b, c
from table x
group by a, b,c
versus
select distinct a,b,c
from table x

GROUP BY maps groups of rows to one row, per distinct value in specific columns, which don't even necessarily have to be in the select-list.
SELECT b, c, d FROM table1 GROUP BY a;
This query is legal SQL (correction: only in MySQL; actually it's not standard SQL and not supported by other brands). MySQL accepts it, and it trusts that you know what you're doing, selecting b, c, and d in an unambiguous way because they're functional dependencies of a.
However, Microsoft SQL Server and other brands don't allow this query, because it can't determine the functional dependencies easily. edit: Instead, standard SQL requires you to follow the Single-Value Rule, i.e. every column in the select-list must either be named in the GROUP BY clause or else be an argument to a set function.
Whereas DISTINCT always looks at all columns in the select-list, and only those columns. It's a common misconception that DISTINCT allows you to specify the columns:
SELECT DISTINCT(a), b, c FROM table1;
Despite the parentheses making DISTINCT look like function call, it is not. It's a query option and a distinct value in any of the three fields of the select-list will lead to a distinct row in the query result. One of the expressions in this select-list has parentheses around it, but this won't affect the result.

A little (VERY little) empirical data from MS SQL Server, on a couple of random tables from our DB.
For the pattern:
SELECT col1, col2 FROM table GROUP BY col1, col2
and
SELECT DISTINCT col1, col2 FROM table
When there's no covering index for the query, both ways produced the following query plan:
|--Sort(DISTINCT ORDER BY:([table].[col1] ASC, [table].[col2] ASC))
|--Clustered Index Scan(OBJECT:([db].[dbo].[table].[IX_some_index]))
and when there was a covering index, both produced:
|--Stream Aggregate(GROUP BY:([table].[col1], [table].[col2]))
|--Index Scan(OBJECT:([db].[dbo].[table].[IX_some_index]), ORDERED FORWARD)
so from that very small sample SQL Server certainly treats both the same.

In MySQL I've found using a GROUP BY is often better in performance than DISTINCT.
Doing an "EXPLAIN SELECT DISTINCT" shows "Using where; Using temporary " MySQL will create a temporary table.
vs a "EXPLAIN SELECT a,b, c from T1, T2 where T2.A=T1.A GROUP BY a" just shows "Using where"

Both would generate the same query plan in MS SQL Server.... If you have MS SQL Server you could just enable the actual execution plan to see which one is better for your needs ...
Please have a look at those posts:
http://blog.sqlauthority.com/2007/03/29/sql-server-difference-between-distinct-and-group-by-distinct-vs-group-by/
http://www.sqlmag.com/Article/ArticleID/24282/sql_server_24282.html

If you really are looking for distinct values, the distinct makes the source code more readable (like if it's part of a stored procedure) If I'm writing ad-hoc queries I'll usually start with the group by, even if I have no aggregations because I'll often end up putting them on.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

why does sub query with group by do full scan twice? - mysql

Related

Problem request between MySql And MariaDB

why there is performance difference when retrieving data from view vs underlying select of that view

EXISTS vs ALL, ANY, SOME

get total for limit in mysql using same query?

sql group by versus distinct

Categories

Resources