How can I EXPLAIN several consecutive queries without executing them? - mysql

Suppose I have a pair of arbitrary SQL queries, each one depending upon the former ones, e.g.
CREATE VIEW v1 ( c3 ) AS SELECT c1 + c2 FROM t1;
SELECT sum(c3) FROM v1;
DROP VIEW v1;
(but note I am not asking about these specific queries - this is just an example; assume I get the queries from a file and do not know them in advance.)
Now, I want to get my DBMS to EXPLAIN its plan for all of my queries (or an arbitrary query in the middle, it's the same problem essentially) - but I do not want it to actually execute any of them.
Is this possible with (1) MySQL? (2) PostgreSQL? (3) MonetDB?

PostgreSQL
You may use the explain statements as follows.
BEGIN;
EXPLAIN ANALYZE ...;
ROLLBACK;
Refer this, documentation answers your question.
MonetDB
Similarly to the above, except that the transaction-related statements are BEGIN TRANSACTION and ROLLBACK statements (assuming that you are in auto-commit mode to begin with).
Refer this.
MySQL
MySQL explain it self does what you need. No need to ROLLBACK.
Refer this answer.

When you execute a view, the underlying SELECT query is executed, obviously. So in PostgreSQL the actual execution plan is based on this:
-- Common use of the view
SELECT sum(c3) FROM v1;
becomes
-- Expansion of the view into plain SQL
SELECT sum(c3) FROM (SELECT c1 + c2 AS c3 FROM t1) v1;
becomes
-- Flattening by the query planner, this is what actually gets executed
SELECT sum(c1 + c2) FROM t1;
So the answer is:
EXPLAIN SELECT sum(c1 + c2) FROM t1;
This most certainly works for PostgreSQL and most likely for all other DBMSes too, but check their docs on how the query planner works.
If your view definition is very complex, just take the query on the view you want to evaluate and paste the entire view definition in brackets () just before the name of the view (which then effectively becomes an alias for a sub-query). The query planner will do the rest for you.

Related

why there is performance difference when retrieving data from view vs underlying select of that view

I am doing query on view with single predicates which gives me the record in 4-7 seconds, but when i try to retrieve the record with same predicate and directly with underlying query from that view it gives me records in less then seconds. I am using MySQL.
I have tried checking the execution plan of both the query and it gives major differences if i have hundreds of thousands of records in tables.
So any clue or idea why performance is better when using query directly?
Following is my view definition
SELECT entity_info.source_entity_info_id AS event_sync_id,
entity_info.source_system_id AS source_system_id,
entity_info.target_system_id AS destination_system_id,
event_sync_info.integrationid AS integration_id,
event_sync_info.source_update_time AS last_updated,
entity_info.source_internal_id AS source_entity_internal_id,
entity_info.source_entity_project AS source_entity_project,
entity_info.target_internal_id AS destination_entity_internal_id,
entity_info.destination_entity_project AS destination_entity_project,
entity_info.source_entity_type AS source_entity_type,
entity_info.destination_entity_type AS destination_entity_type,
event_sync_info.opshub_update_time AS opshub_update_time,
event_sync_info.entity_info_id AS entity_info_id,
entity_info.global_id AS global_id,
entity_info.target_entity_info_id AS target_entity_info_id,
entity_info.source_entity_info_id AS source_entity_info_id,
(
SELECT Count(0) AS count(*)
FROM ohrv_failed_event_view_count failed_event_view
WHERE ((
failed_event_view.integration_id = event_sync_info.integrationid)
AND (
failed_event_view.entityinfo = entity_info.source_entity_info_id))) AS no_of_failures
FROM (ohrv_entity_info entity_info
LEFT JOIN ohmt_eai_event_sync_info event_sync_info
ON ((
entity_info.source_entity_info_id = event_sync_info.entity_info_id)))
WHERE (
entity_info.source_entity_info_id IS NOT NULL)
Query examples
select * from view where integration_id=10
Execution plan of this processes 142668 rows for sub query that is there in this view
select QUERY_OF_VIEW and integration_id=10
Execution plan of this looks good and only required rows are getting processed.
I think the issue is in the following query:
SELECT * FROM view WHERE integration_id = 10;
This forces MySQL to materialize an intermediate table, against which it then has to query again to apply the restriction in the WHERE clause. On the other hand, in the second version:
SELECT (QUERY_OF_VIEW with WHERE integration_id = 10)
MySQL does not have to materialize anything other than the query in the view itself. That is, in your second version MySQL just has to execute the query in the view, without any subsequent subquery.
refereeing to this link of documentation you can see,that its depend on if the MERGE algorithm can used it will , but if its not applicable so new temp table must generated to find the relations of data, also you can see this answer that talking about optimization and when to use view and when you should not .
If the MERGE algorithm cannot be used, a temporary table must be used
instead. MERGE cannot be used if the view contains any of the
following constructs:
Aggregate functions (SUM(), MIN(), MAX(), COUNT(), and so forth)
DISTINCT
GROUP BY
HAVING
LIMIT
UNION or UNION ALL
Subquery in the select list
Refers only to literal values (in this case, there is no underlying
table)

MySql performance query vs view with 'explain' output

I'm trying to understand why a direct query takes ~0.5s to run but a view using the same query takes ~10s to run. MySql v5.6.27.
Direct Query:
select
a,b,
(select count(*) from TableA i3 where i3.b = i.a) as e,
func1(a) as f, func2(a) as g
from TableA i
where i.b = -1 and i.a > 1500;
Direct Query 'explain' Results:
id,select_type,table,type,possible_keys,key,key_len,ref,rows,Extra
1,PRIMARY,i,range,PRIMARY,PRIMARY,4,\N,3629,Using where
2,DEPENDENT SUBQUERY,i3,ALL,\N,\N,\N,\N,7259,Using where
The View's definition/query is the same without the 'where' clause...
select
a,b,
(select count(*) from TableA i3 where i3.b = i.a) as e,
func1(a) as f, func2(a) as g
from TableA i;
Query against the view:
select * from ViewA t where t.b = -1 and t.a > 1500;
Query of View 'explain' results:
id,select_type,table,type,possible_keys,key,key_len,ref,rows,Extra
1,PRIMARY,<derived2>,ALL,\N,\N,\N,\N,7259,Using where
2,DERIVED,i,ALL,\N,\N,\N,\N,7259,\N
3,DEPENDENT SUBQUERY,i3,ALL,\N,\N,\N,\N,7259,Using where
Why does the query against the view end up performing 3 full table scans whereas the direct query performs ~1.5?
The short answer is: the MySQL optimizer is not clever enough to do it.
When processing a view, MySQL can either merge the view or create a temporary table for it:
For MERGE, the text of a statement that refers to the view and the view definition are merged such that parts of the view definition replace corresponding parts of the statement.
For TEMPTABLE, the results from the view are retrieved into a temporary table, which then is used to execute the statement.
This applies in a very similar way to derived tables and subqueries too.
The behaviour you are seeking is the merge. This is the default value, and MySQL will try to use it whenever possible. If it is not possible (or rather: if MySQL thinks it is not possible), MySQL has to evaluate the complete view, no matter if you only need one row out of it. This obviously takes more time, and is what happens in your view.
There is a list of things that prevent MySQL from using the merge algorithm:
MERGE cannot be used if the view contains any of the following constructs:
Aggregate functions (SUM(), MIN(), MAX(), COUNT(), and so forth)
DISTINCT
GROUP BY
HAVING
LIMIT
UNION or UNION ALL
Subquery in the select list
Assignment to user variables
Refers only to literal values (in this case, there is no underlying table)
You can test this if MySQL will merge or not: try to create the view specifying the merge-algorithm:
create algorithm=merge view viewA as ...
If MySQL doesn't think it can merge the view, you get the warning
1 warning(s): 1354 View merge algorithm can't be used here for now (assumed undefined algorithm)
In your case, the Subquery in the select list is preventing the merge. This is not because it would be impossible to do. You have already prooven that it is possible to merge it: by just rewriting it.
But the MySQL optimizer didn't see that possibility. It is not specific to views: it will actually not merge it either if you use the unmerged viewcode directly: explain select * from (select a, b, ... from TableA i) as ViewA where .... You would have to test this on MySQL 5.7, as MySQL 5.6 will not merge in this situation on principle (as, in a query, it assumes you want to have a temptable here, even for very simple derived tables that could be merged). MySQL 5.7 will by default try to do it, although it won't work with your view.
As the optimizer gets improved, in some situation, the optimizer will merge even in cases where there is an subquery in the select list, so there are some exceptions to that list. MariaDB, which is based on MySQL, is actually a lot better doing the merge optimization, and would merge your view just like you did it - so it is possible to do it even as a machine.
So to summarize: the MySQL optimizer is currently not clever enough to do it.
And you unfortunately cannot do much about it, except testing if MySQL accepts algorithm=merge and then not using views that MySQL cannot merge and instead merge them yourself.

Sql syntax: select without from clause as subquery in select (subselect)

While editing some queries to add alternatives for columns without values, I accidentally wrote something like this (here is the simplyfied version):
SELECT id, (SELECT name) FROM t
To my surprise, MySQL didn't throw any error, but completed the query giving my expected results (the name column values).
I tried to find any documentation about it, but with no success.
Is this SQL standard or a MySQL specialty?
Can I be sure that the result of this syntax is really the column value from the same (outer) table? The extended version would be like this:
SELECT id, (SELECT name FROM t AS t1 where t1.id=t2.id) FROM t AS t2
but the EXPLAIN reports No tables used in the Extra column for the former version, which I think is very nice.
Here's a simple fiddle on SqlFiddle (it keeps timing out for me, I hope you have better luck).
Clarification: I know about subqueries, but I always wrote subqueries (correlated or not) that implied a table to select from, hence causing an additional step in the execution plan; my question is about this syntax and the result it gives, that in MySQL seems to return the expected value without any.
What you within your first query is a correlated subquery which simply returns the name column from the table t. no actual subquery needs to run here (which is what your EXPLAIN is telling you).
In a SQL database query, a correlated subquery (also known as a
synchronized subquery) is a subquery (a query nested inside another
query) that uses values from the outer query.
https://en.wikipedia.org/wiki/Correlated_subquery
SELECT id, (SELECT name) FROM t
is the same as
SELECT id, (SELECT t.name) FROM t
Your 2nd query
SELECT id, (SELECT name FROM t AS t1 where t1.id=t2.id) FROM t AS t2
Also contains correlated subquery but this one is actually running a query on table t to find records where t1.id = t2.id.
This is the default behavior for the SQL language and it is defined on the SQL ANSI 2011 over ISO/IEC 9075-1:2011(en) documentation. Unfortunately it is not open. This behavior is described on the section 4.11 SQL-Statements.
This behavior happens because the databases process the select comand without the from clause, therefore if it encounters:
select id, (select name) from some
It will try to find that name field as a column of the outer queries to process.
Fortunately I remember that some while ago I've answered someone here and find a valid available link to an SQL ANSI document that is online in FULL but it is for the SQL ANSI 99 and the section may not be the same one as the new document. I think, did not check, that it is around the section 4.30. Take a look. And I really recommend the reading (I did that back in the day).
Database Language SQL - ISO/IEC 9075-2:1999 (E)
It's not standard. In oracle,
select 1, (select 2)
from dual
Throws error, ORA-00923: FROM keyword not found where expected
How can you be sure of your results? Get a better understanding of what the query is supposed to acheive before you write it. Even the exetended version in the question does not make any sense.

EXISTS vs ALL, ANY, SOME

I'm trying to understand the difference between EXISTS and ALL in MySQL. Let me give you an example:
SELECT *
FROM table1
WHERE NOT EXISTS (
SELECT *
FROM table2
WHERE table2.val < table1.val
);
SELECT *
FROM table1
WHERE val <= ALL( SELECT val FROM table2 );
A quote from MySQL docs:
Traditionally, an EXISTS subquery starts with SELECT *, but it could
begin with SELECT 5 or SELECT column1 or anything at all. MySQL
ignores the SELECT list in such a subquery, so it makes no difference. [1]
Reading this, it seems to me that mysql should be able to translate both queries to the same relational algebra expression. Both queries are just a simple comparison between values from two tables. However, that doesn't seem to be the case. I tried both queries and the second one performs much better than the first one.
How are this queries exactly handled by the optimizer?
Why the optimizer can't make the first query perform as the second one?
Is it always more efficient to use an ALL/ANY/SOME condition?
The queries in your question are not equivalent, so they will have different execution plans regardless of how well they're optimized. If you used NOT val > ANY(...) then it would be equivalent.
You should always use EXPLAIN to see the execution plan of a query and realize that the execution plan can change as your data changes. Testing and understanding the execution plan will help you determine which methods perform better. There is no hard and fast rule for ALL/ANY/SOME and they're often optimized down to an EXISTS.

Check if MySQL Table is empty: COUNT(*) is zero vs. LIMIT(0,1) has a result?

This is a simple question about efficiency specifically related to the MySQL implementation. I want to just check if a table is empty (and if it is empty, populate it with the default data). Would it be best to use a statement like SELECT COUNT(*) FROM `table` and then compare to 0, or would it be better to do a statement like SELECT `id` FROM `table` LIMIT 0,1 then check if any results were returned (the result set has next)?
Although I need this for a project I am working on, I am also interested in how MySQL works with those two statements and whether the reason people seem to suggest using COUNT(*) is because the result is cached or whether it actually goes through every row and adds to a count as it would intuitively seem to me.
You should definitely go with the second query rather than the first.
When using COUNT(*), MySQL is scanning at least an index and counting the records. Even if you would wrap the call in a LEAST() (SELECT LEAST(COUNT(*), 1) FROM table;) or an IF(), MySQL will fully evaluate COUNT() before evaluating further. I don't believe MySQL caches the COUNT(*) result when InnoDB is being used.
Your second query results in only one row being read, furthermore an index is used (assuming id is part of one). Look at the documentation of your driver to find out how to check whether any rows have been returned.
By the way, the id field may be omitted from the query (MySQL will use an arbitrary index):
SELECT 1 FROM table LIMIT 1;
However, I think the simplest and most performant solution is the following (as indicated in Gordon's answer):
SELECT EXISTS (SELECT 1 FROM table);
EXISTS returns 1 if the subquery returns any rows, otherwise 0. Because of this semantic MySQL can optimize the execution properly.
Any fields listed in the subquery are ignored, thus 1 or * is commonly written.
See the MySQL Manual for more info on the EXISTS keyword and its use.
It is better to do the second method or just exists. Specifically, something like:
if exists (select id from table)
should be the fastest way to do what you want. You don't need the limit; the SQL engine takes care of that for you.
By the way, never put identifiers (table and column names) in single quotes.