I'm using knowage software for data analysis, I'm facing performance issues, now I'm watching 'dataset audit' log to see what queries does the system perform. I found this one that, to me, is a nonsense:
SELECT COUNT(*)
FROM
(select TOP(100) PERCENT "ATC_1" AS "ATC_1"
from
(SELECT [ID_AFo]
,[ATC]
,[ATC_1]
,[ATC_3]
,[ATC_4]
,[ATC_5]
FROM [AFO]
) T order by "ATC_1" ASC
) u
inner T query is the dataset definition query I entered that basically is a select * from [AFO] on my table, outer wrap are made by knowage (I never wrote them)
doesn't a select count (*) from T have performed the same calculation but avoiding a cexpensive order by?
EDIT:
Backend (data source) is MSSQL, cache server is MYSQL so frequent queries are on mysql
This query is equivalent to:
SELECT COUNT(*)
FROM [AFO];
The only reason that I can think of for constructing such a query is if the "100" could be set to another value. I'm not sure if SQL Server's optimizer is good enough to eliminate the ORDER BY in the subquery.
Related
I am doing query on view with single predicates which gives me the record in 4-7 seconds, but when i try to retrieve the record with same predicate and directly with underlying query from that view it gives me records in less then seconds. I am using MySQL.
I have tried checking the execution plan of both the query and it gives major differences if i have hundreds of thousands of records in tables.
So any clue or idea why performance is better when using query directly?
Following is my view definition
SELECT entity_info.source_entity_info_id AS event_sync_id,
entity_info.source_system_id AS source_system_id,
entity_info.target_system_id AS destination_system_id,
event_sync_info.integrationid AS integration_id,
event_sync_info.source_update_time AS last_updated,
entity_info.source_internal_id AS source_entity_internal_id,
entity_info.source_entity_project AS source_entity_project,
entity_info.target_internal_id AS destination_entity_internal_id,
entity_info.destination_entity_project AS destination_entity_project,
entity_info.source_entity_type AS source_entity_type,
entity_info.destination_entity_type AS destination_entity_type,
event_sync_info.opshub_update_time AS opshub_update_time,
event_sync_info.entity_info_id AS entity_info_id,
entity_info.global_id AS global_id,
entity_info.target_entity_info_id AS target_entity_info_id,
entity_info.source_entity_info_id AS source_entity_info_id,
(
SELECT Count(0) AS count(*)
FROM ohrv_failed_event_view_count failed_event_view
WHERE ((
failed_event_view.integration_id = event_sync_info.integrationid)
AND (
failed_event_view.entityinfo = entity_info.source_entity_info_id))) AS no_of_failures
FROM (ohrv_entity_info entity_info
LEFT JOIN ohmt_eai_event_sync_info event_sync_info
ON ((
entity_info.source_entity_info_id = event_sync_info.entity_info_id)))
WHERE (
entity_info.source_entity_info_id IS NOT NULL)
Query examples
select * from view where integration_id=10
Execution plan of this processes 142668 rows for sub query that is there in this view
select QUERY_OF_VIEW and integration_id=10
Execution plan of this looks good and only required rows are getting processed.
I think the issue is in the following query:
SELECT * FROM view WHERE integration_id = 10;
This forces MySQL to materialize an intermediate table, against which it then has to query again to apply the restriction in the WHERE clause. On the other hand, in the second version:
SELECT (QUERY_OF_VIEW with WHERE integration_id = 10)
MySQL does not have to materialize anything other than the query in the view itself. That is, in your second version MySQL just has to execute the query in the view, without any subsequent subquery.
refereeing to this link of documentation you can see,that its depend on if the MERGE algorithm can used it will , but if its not applicable so new temp table must generated to find the relations of data, also you can see this answer that talking about optimization and when to use view and when you should not .
If the MERGE algorithm cannot be used, a temporary table must be used
instead. MERGE cannot be used if the view contains any of the
following constructs:
Aggregate functions (SUM(), MIN(), MAX(), COUNT(), and so forth)
DISTINCT
GROUP BY
HAVING
LIMIT
UNION or UNION ALL
Subquery in the select list
Refers only to literal values (in this case, there is no underlying
table)
While editing some queries to add alternatives for columns without values, I accidentally wrote something like this (here is the simplyfied version):
SELECT id, (SELECT name) FROM t
To my surprise, MySQL didn't throw any error, but completed the query giving my expected results (the name column values).
I tried to find any documentation about it, but with no success.
Is this SQL standard or a MySQL specialty?
Can I be sure that the result of this syntax is really the column value from the same (outer) table? The extended version would be like this:
SELECT id, (SELECT name FROM t AS t1 where t1.id=t2.id) FROM t AS t2
but the EXPLAIN reports No tables used in the Extra column for the former version, which I think is very nice.
Here's a simple fiddle on SqlFiddle (it keeps timing out for me, I hope you have better luck).
Clarification: I know about subqueries, but I always wrote subqueries (correlated or not) that implied a table to select from, hence causing an additional step in the execution plan; my question is about this syntax and the result it gives, that in MySQL seems to return the expected value without any.
What you within your first query is a correlated subquery which simply returns the name column from the table t. no actual subquery needs to run here (which is what your EXPLAIN is telling you).
In a SQL database query, a correlated subquery (also known as a
synchronized subquery) is a subquery (a query nested inside another
query) that uses values from the outer query.
https://en.wikipedia.org/wiki/Correlated_subquery
SELECT id, (SELECT name) FROM t
is the same as
SELECT id, (SELECT t.name) FROM t
Your 2nd query
SELECT id, (SELECT name FROM t AS t1 where t1.id=t2.id) FROM t AS t2
Also contains correlated subquery but this one is actually running a query on table t to find records where t1.id = t2.id.
This is the default behavior for the SQL language and it is defined on the SQL ANSI 2011 over ISO/IEC 9075-1:2011(en) documentation. Unfortunately it is not open. This behavior is described on the section 4.11 SQL-Statements.
This behavior happens because the databases process the select comand without the from clause, therefore if it encounters:
select id, (select name) from some
It will try to find that name field as a column of the outer queries to process.
Fortunately I remember that some while ago I've answered someone here and find a valid available link to an SQL ANSI document that is online in FULL but it is for the SQL ANSI 99 and the section may not be the same one as the new document. I think, did not check, that it is around the section 4.30. Take a look. And I really recommend the reading (I did that back in the day).
Database Language SQL - ISO/IEC 9075-2:1999 (E)
It's not standard. In oracle,
select 1, (select 2)
from dual
Throws error, ORA-00923: FROM keyword not found where expected
How can you be sure of your results? Get a better understanding of what the query is supposed to acheive before you write it. Even the exetended version in the question does not make any sense.
Related (SQL Server): Count(*) vs Count(1)
Could you please tell me what is better in performance (MySQL)? Count(*) or count(1)?
This is a MySQL answer.
They perform exactly the same - unless you are using MyISAM, then a special case for COUNT(*) exists. I always use COUNT(*) anyway.
https://dev.mysql.com/doc/refman/5.6/en/aggregate-functions.html#function_count
For MyISAM tables, COUNT(*) is optimized to return very quickly if the
SELECT retrieves from one table, no other columns are retrieved, and
there is no WHERE clause. For example:
mysql> SELECT COUNT(*) FROM student;
This optimization only applies to MyISAM
tables, because an exact row count is stored for this storage engine
and can be accessed very quickly. COUNT(1) is only subject to the same
optimization if the first column is defined as NOT NULL.
###EDIT
Some of you may have missed the dark attempt at humour. I prefer to keep this as a non-duplicate question for any such day when MySQL will do something different to SQL Server. So I threw a vote to reopen the question (with a clearly wrong answer).
The above MyISAM optimization applies equally to
COUNT(*)
COUNT(1)
COUNT(pk-column)
COUNT(any-non-nullable-column)
So the real answer is that they are always the same.
I have a query which actually runs two queries on a table. I query the whole table, a datediff and then a subquery which tells me the sum of hours each unit spent in certain operational steps. The main query limits the results to the REP depot so technically I don't need to put that same criteria on the subquery since repair_order is unique.
Would it be faster, slower or no difference to apply the depot filter on the subquery?
SELECT
*,
DATEDIFF(date_shipped, date_received) as htg_days,
(SELECT SUM(t3.total_days) FROM report_tables.cycle_time_days as t3 WHERE t1.repair_order=t3.repair_order AND (operation='MFG' OR operation='ENG' OR operation='ENGH' OR operation='HOLD') GROUP BY t3.repair_order) as subt_days
FROM
report_tables.cycle_time_days as t1
WHERE
YEAR(t1.date_shipped)=2010
AND t1.depot='REP'
GROUP BY
repair_order
ORDER BY
date_shipped;
I run into this with a lot of situations but I never know if it would be better to put the filter in the sub query, main query or both.
In this example, it would actually alter the query if you moved your WHERE clause to filter by REP into the subquery. So it wouldn't be about performance at that point, it would be about getting the same result set. In general, though, if you will get the same exact result set by moving a WHERE clause elsewhere in a complex query, it is better to do so at the most atomic level possible, ie, in the subquery. Then the subquery returns a smaller result set to the main query before the main query has to process it.
The answer to your question will vary depending on your schema, the complexity of your queries, the reliability of your data, etc. A general rule of thumb is to try to process the least amount of data possible, which generally means filtering it at the lowest level possible as well.
When you want to optimize a query the absolute number one place to start is to use the EXPLAIN output to see what optimizations the query parser was able to figure out and check to see what the weakest link is in the query plan. Resolve that, rinse, repeat.
You can also use explain's "extended" keyword to see the actual query it built to run which will reveal more about its usage of your criteria. In some cases, it will optimize away duplicate conditions between parent/subqueries. In other cases, it may push the conditions down from the parent in to the subquery. In some cases for (too) complex queries I've seen the it repeat the condition when it was only specified in the query once. Thankfully, you don't have to guess, mysql's explain plan will reveal all, albeit sometimes in cryptic ways.
I usually use a derived table as a "driver or aggregating" query then join that result back onto whatever table that i want to pull data from:
select
t1.*,
datediff(t1.date_shipped, t1.date_received) as htg_days,
subt_days.total_days
from
cycle_time_days as t1
inner join
(
-- aggregating/driver query
select
repair_order,
sum(total_days) as total_days
from
cycle_time_days
where
year(date_shipped) = 2010 and depot = 'REP' and
operation in ('MFG','ENG','ENGH','HOLD') -- covering index on date, depot, op ???
group by
repair_order -- indexed ??
having
total_days > 14 -- added for demonstration purposes
order by
total_days desc limit 10
) as subt_days on t1.repair_order = subt_days.repair_order
order by
t1.date_shipped;
Here's the query (the largest table has about 40,000 rows)
SELECT
Course.CourseID,
Course.Description,
UserCourse.UserID,
UserCourse.TimeAllowed,
UserCourse.CreatedOn,
UserCourse.PassedOn,
UserCourse.IssuedOn,
C.LessonCnt
FROM
UserCourse
INNER JOIN
Course
USING(CourseID)
INNER JOIN
(
SELECT CourseID, COUNT(*) AS LessonCnt FROM CourseSection GROUP BY CourseID
) C
USING(CourseID)
WHERE
UserCourse.UserID = 8810
If I run this, it executes very quickly (.05 seconds roughly). It returns 13 rows.
When I add an ORDER BY clause at the end of the query (ordering by any column) the query takes about 10 seconds.
I'm using this database in production now, and everything is working fine. All my other queries are speedy.
Any ideas of what it could be? I ran the query in MySQL's Query Browser, and from the command line. Both places it was dead slow with the ORDER BY.
EDIT: Tolgahan ALBAYRAK solution works, but can anyone explain why it works?
maybe this helps:
SELECT * FROM (
SELECT
Course.CourseID,
Course.Description,
UserCourse.UserID,
UserCourse.TimeAllowed,
UserCourse.CreatedOn,
UserCourse.PassedOn,
UserCourse.IssuedOn,
C.LessonCnt
FROM
UserCourse
INNER JOIN
Course
USING(CourseID)
INNER JOIN
(
SELECT CourseID, COUNT(*) AS LessonCnt FROM CourseSection GROUP BY CourseID
) C
USING(CourseID)
WHERE
UserCourse.UserID = 8810
) ORDER BY CourseID
Is the column you're ordering by indexed?
Indexing drastically speeds up ordering and filtering.
You are selecting from "UserCourse" which I assume is a joining table between courses and users (Many to Many).
You should index the column that you need to order by, in the "UserCourse" table.
Suppose you want to "order by CourseID", then you need to index it on UserCourse table.
Ordering by any other column that is not present in the joining table (i.e. UserCourse) may require further denormalization and indexing on the joining table to be optimized for speed;
In other words, you need to have a copy of that column in the joining table and index it.
P.S.
The answer given by Tolgahan Albayrak, although correct for this question, would not produce the desired result, in cases where one is doing a "LIMIT x" query.
Have you updated the statistics on your database? I ran into something similar on mine where I had 2 identical queries where the only difference was a capital letter and one returned in 1/2 a second and the other took nearly 5 minutes. Updating the statistics resolved the issue
Realise answer is too late, however I have just had a similar problem, adding order by increased the query time from seconds to 5 minutes and having tried most other suggestions for speeding it up, noticed that the /tmp files where getting to be 12G for this query. Changed the query such that a varchar(20000) field being returned was "trim("ed and performance dramatically improved (back to seconds). So I guess its worth checking whether you are returning large varchars as part of your query and if so, process them (maybe substring(x, 1, length(x))?? if you dont want to trim them.
Query was returning 500k rows and the /tmp file indicated that each row was using about 20k of data.
A similar question was asked before here.
It might help you as well. Basically it describes using composite indexes and how order by works.
Today I was running into a same kind of problem. As soon as I was sorting the resultset by a field from a joined table, the whole query was horribly slow and took more than a hundred seconds.
The server was running MySQL 5.0.51a and by chance I noticed that the same query was running as fast as it should have always done on a server with MySQL 5.1. When comparing the explains for that query I saw that obviously the usage and handling of indexes has changed a lot (at least from 5.0 -> 5.1).
So if you encounter such a problem, maybe your resolution is to simply upgrade your MySQL