My problem seems to be very simple but I'm stuck here. I have a table which has an "nvarchar" column called "SrcID" and I store both numbers and strings in that. Now, when I try to check for "IsNumeric" on that column in a "Join" condition, something like below,
ISNUMERIC(SrcID) = 1 AND SrcID > 15
I am getting the following error:
Msg 245, Level 16, State 1, Line 47
Conversion failed when converting the nvarchar value 'Test' to data type int.
Amazingly, when I remove the check "SrcID > 15", my query is running properly. Should I include anything else in this statement?
Please help me in fixing the issue. Thanks in advance!!
You can't count on the order in which a database will evaluate filtering expressions. There is a query optimizer that will evaluate your SQL and build a plan to execute the query based on what it perceives will yield the best performance.
In this context, IsNumeric() cannot be used with an index, and it means running a function against every row in the table. Therefore, it will almost never provide the best perceived performance. Compare this with the SrcID > 15 expression, which can be matched with an index (if one exists), and is just a single operator expression even if one doesn't. It can also be used to filter down the number of potential rows where the IsNumeric() function needs to run.
You can likely get around this with a view, a subquery, a CTE, a CASE statement, or a computed column. Here's a CTE example:
With NumericOnly As
(
SELECT <columns> FROM MyTable WHERE IsNumeric(SrcID) = 1
)
SELECT <columns> FROM NumericOnly WHERE SrcID > 15
And here's a CASE statement option:
SELECT <columns> FROM MyTable WHERE CASE WHEN IsNumeric(SrcIC) = 1 THEN Cast(SrcID As Int) ELSE 0 END > 15
The filters in a WHERE clause are not evaluated in any particular order.
This is a common misconception with SQL Server - the optimizer will check whichever conditions it thinks it can the fastest/easiest, and try to limit the data in the most efficient way possible.
In your example, you probably have an index on SrcID, and the optimizer thinks it will be quicker to FIRST limit the results to where the SrcID > 15, then run the function on all those rows (since the function will need to check every single row otherwise).
You can try to force an order of operations with parentheses like:
WHERE (ISNUMERIC(SrcID) = 1) AND SrcID > 15
Or with a case statement:
WHERE CASE WHEN ISNUMERIC(SrcID) = 1 THEN SrcID > 15 ELSE 1=0 END
Related
So far following is my scenario :
Parameters controlled by user: (These parameters are controlled by a dashboard but for testing purposes I have created sql parameters in order to change their values)
SET #device_param := "all devices";
SET #date_param_start_bar_chart := '2016-09-01';
SET #date_param_end_bar_chart := '2016-09-19';
SET #country_param := "US";
SET #channel_param := "all channels";
Query that runs at the back-end
SELECT
country_code,
channel_report_tag,
SUM(count_more_then_30_min_play) AS '>30 minutes',
SUM(count_15_30_min_play) AS '15-30 Minutes',
SUM(count_0_15_min_play) AS '0-15 Minutes'
FROM
channel_play_times_cleaned
WHERE IFNULL(country_code, '') =
CASE
WHEN #country_param = "all countries"
THEN IFNULL(country_code, '')
ELSE #country_param
END
AND IFNULL(channel_report_tag, '') =
CASE
WHEN #channel_param = "all channels"
THEN IFNULL(channel_report_tag, '')
ELSE #channel_param
END
AND iFnull(device_report_tag, '') =
CASE
WHEN #device_param = "all devices"
THEN iFnull(device_report_tag, '')
ELSE #device_param
END
AND playing_date BETWEEN #date_param_start_bar_chart
AND #date_param_end_bar_chart
GROUP BY channel_report_tag
ORDER BY SUM(count_more_then_30_min_play) DESC
limit 10 ;
The index that I have applied is
CREATE INDEX my_index
ON channel_play_times_cleaned (
country_code,
channel_report_tag,
device_report_tag,
playing_date,
channel_report_tag
)
I have followed this link : My SQL Index Cook-Book Guide to create my index.
However the EXPLAIN keyword while executing the above query tells me that there is no index used.
I want to what am I doing wrong over here ?
You use functions and case expression in the first 3 where condition. Simple field index cannot be used to speed up such look ups.
MySQL could potentially use an index for the playing_date criteria, but that field is not the leftmost in the cited index, therefore the cited index is not suitable for that either.
If I were you, I would remove the logic from the where criteria and moved that into the application layer by constructing such an sql statement that has the case conditions resolved and emits only the necessary sql.
Your CASE expressions in the WHERE clause are forcing full table scans. Clearly, they have to go... but how?
You have to think like the optimizer and remember that its job is to avoid as much work as possible.
Consider this query:
SELECT * FROM users
WHERE first_name LIKE '%a%';
Every row must be read to find all first_name values containing the letter 'a'. Very slow.
Now, this one:
SELECT * FROM users
WHERE first_name LIKE '%a%'
AND 2 < 1;
For each row, you're asking the server to check the first_name again and to include only rows where 2 is a smaller number than 1.
Is it slow, or fast?
It's very fast, because the optimizer detects an Impossible WHERE. There is no point in scanning the rows because 2 < 1 is always false.
Now, use this logic to tell the optimizer what you really want:
Not this:
WHERE IFNULL(country_code, '') =
CASE
WHEN #country_param = "all countries"
THEN IFNULL(country_code, '')
ELSE #country_param
END
AND
But this:
WHERE
(
(
#country_param = "all countries"
)
OR
(
#country_param != "all countries"
AND
country_code = #country_param
)
)
AND ...
The difference should be stark. If #country_param = "all countries" the second test is not needed, and otherwise, only the rows with the matching country are needed and this portion of the WHERE clause is false by definition for all other rows, allowing an index on country_param to be used.
One or the other of these OR'ed expressions is always false, and that one will be optimized away, early -- never evaluated for each row. The expression #country_param != "all countries" should be treated no differently than the expression 2 < 1 or 2 > 1. It is not going to change its truthiness based on the data in the rows, so it only needs to be evaluated once, at the beginning.
Repeat for the other CASE. You should almost never pass columns as arguments to functions in the WHERE clause because the optimizer can't "look backwards through" functions and form an intelligent query plan.
The other answers have explained why your query is slow. I will explain what you should do.
Write code to "construct" the query. It would either leave out the test for country_code if the user said "all countries", or it add in AND country_code = "US". No #variables, no CASE, etc.
Then, one 5-column index won't work except for a few cases. Instead, get a feel for what users are asking for, then build a few 2-column indexes to cover the popular cases.
Consider the following table:
SELECT id, Bill_Freq, Paid_From, Paid_To, Paid_Dt, rev_code FROM psr_20160708091408;
The requirement is to fetch the row which has rev_code populated with the string **SUM**.
I've also noticed that for every row with rev_code populated as **SUM** its Bill_Freq won't be either null or zero.
So I wrote two queries to fetch the row with the lowest id
Query based on string check in where clause:
select
min(id) as head_id,
bill_freq,
Paid_From,
Paid_To,
Paid_Dt
from
`psr_20160708091408` where rev_code = "**SUM**";
Query based on true condition:
select
min(id) as head_id,
bill_freq,
Paid_From,
Paid_To,
Paid_Dt
from
`psr_20160708091408` where bill_freq;
I haven't seen anyone use the second type, would like to know its reliability and circumstance of failure.
If by "second type" you mean a where clause with no explicit condition, then there is a good reason why you do not see it.
The SQL standard -- and most databases -- require explicit conditions in the where. MySQL allows the shorthand that you use but it really means:
where not billing_freq <=> 0
or equivalently:
where billing_freq <> 0 or billing_freq is null
(The <=> is the null-safe comparison operator.
The more important issue with your query is the min(). I presume that you actually want this:
select p.*
from psr_20160708091408 p
where rev_code = '**SUM**'
order by id
limit 1;
Also, you should use single quotes as string delimiters. That is the ANSI standard and there is rarely any reason to use double quotes.
Actually you can use the second type of query, but as your requirement is based on rev_code, it is always good to have condition with rev_code, because of 2 reasons
Bill_Freq having no NUlls or Zeros might be assumption based on current data
Even if it is true, in future, your application logic might change and it might have a scenario having NULL or zero, which will break your logic in future.
So my suggestion is to use first query with Rev_code
Please try to use below query
select
id,
bill_freq,
Paid_From,
Paid_To,
Paid_Dt
from
`psr_20160708091408` where rev_code = "**SUM**" ORDER BY ASC LIMIT 0,1;
Thanks.
The requirement says it itself.
The requirement is to fetch the row which has rev_code populated with
the string '**SUM**'
In the scenario that bill_freq IS NOT NULL and rev_code is populated with
the string '**SUM**' then your logic will obviously fail.
Go for
where rev_code = "**SUM**";
Apologies in advance if this is a common question, I tried researching it but can't seem to find something that fits.
I have a query that pulls data the way I like but would like to add a parameter that will tell me only of any values that occur 5 times or more in a 60 second period;
select from_unixtime(dateTimeOrigination), callingPartyNumber,
originalCalledPartyNumber, finalCalledPartyNumber, duration, origDeviceName, destDeviceName
from cdr_records
where (from_unixtime(dateTimeOrigination) like '2016-05-20%') and
(callingPartyNumber not like 'b00%') and
(originalCalledPartyNumber not like 'b00%') and
(finalCalledPartyNumber not like 'b00%')
order by originalCalledPartyNumber, dateTimeOrigination;
This query already filters for results in a specified day and orders the results the way I like, but it pulls everything. Can someone tell me how I can say, "only tell me about value originalCalledPartyNumber if it shows up 5 times or more in any 60 second period."?
If we want to filter out the rows where there aren't at least four preceding rows within the past 60 seconds, assuming that dateTimeOrigination is integer type, a 32-bit unix-style timestamp, we can do something like this:
SELECT FROM_UNIXTIME(r.dateTimeOrigination) AS dateTimeOrigination
, r.callingPartyNumber
, r.originalCalledPartyNumber
, r.finalCalledPartyNumber
, r.duration
, r.origDeviceName
, r.destDeviceName
FROM cdr_records r
WHERE r.dateTimeOrigination >= UNIX_TIMESTAMP('2016-05-20')
AND r.dateTimeOrigination < UNIX_TIMESTAMP('2016-05-21')
AND r.callingPartyNumber NOT LIKE 'b00%'
AND r.originalCalledPartyNumber NOT LIKE 'b00%'
AND r.finalCalledPartyNumber NOT LIKE 'b00%'
AND ( SELECT COUNT(1)
FROM cdr_records c
WHERE c.originalCalledPartyNumber = r.originalCalledPartyNumber
AND c.dateTimeOrigination > r.dateTimeOrigination - 60
AND c.dateTimeOrigination <= r.dateTimeOrigination
) > 4
ORDER
BY r.originalCalledPartyNumber
, r.dateTimeOrigination
NOTE: For performance, we prefer to have predicates on bare columns.
With a form like this, with the column wrapped in an expression:
WHERE FROM_UNIXTIME(r.dateTimeOrigination) LIKE '2016-05-20%'
MySQL will evaluate the function for every row in the table, and then compare the return from the function to the literal.
With a form like this:
WHERE r.dateTimeOrigination >= UNIX_TIMESTAMP('2016-05-20')
AND r.dateTimeOrigination < UNIX_TIMESTAMP('2016-05-21')
MySQL will evaluate the expressions on the right side one time, as literals. Which allows MySQL to make effective use of a range scan operation on a suitable index.
FOLLOWUP
For best performance of the outer query, the best index would likely be an index with leading column of dateTimeOrigination, preferably containing
... ON cdr_records (dateTimeOrigination
,callingPartyNumber,originalCalledPartyNumber,finalCalledPartyNumber)
For best performance, a covering index, to avoid lookups to the pages in the underlying table. For example:
... ON cdr_records (dateTimeOrigination
,callingPartyNumber,originalCalledPartyNumber,finalCalledPartyNumber
,duration,origDeviceName,destDeviceName)
With that, we'd expect EXPLAIN to show "Using index".
For the correlated subquery, we'd want an index with leading columns like this:
... ON cdr_records (originalCalledPartyNumber,dateTimeOrigination)
I strongly recommend you look at the output from EXPLAIN to see which indexes MySQL is using for the query.
I encounter some strange results in the following query :
SET #indi_id = 768;
SET #generations = 8;
SELECT num, sosa, seq, len, dernier, ful_ful_nom
FROM fullindi
LEFT JOIN lignee_new
ON ((ful_indi_id = dernier) AND (len BETWEEN 1 AND #generations))
RIGHT JOIN numbers
ON ((sosa = num) AND (premier = #indi_id))
WHERE num BETWEEN 1 AND pow(2, #generations)
GROUP BY num
ORDER BY num;
The result looks like this :
Why the row just before a full NULL one doesn't display the existing values 'sosa', 'len', 'dernier', ful_ful_nom') but only the 'seq' value (see rows 43 and 47 in this example) ?
What am I missing?
As requested, here are data :
table lignee_new :
table fullindi :
The problem is that MySQL does really dumb things when an Aggregate function is introduced, or a GROUP BY is included, but not all of the fields are in an Aggregate Function or your GROUP BY.
You are asking it to GROUP BY num but none of the other columns in your SELECT are included in the Group BY nor are they being aggregated with a function (SUM, MAX, MIN, AVG, etc..)
In any other RDBMS this query wouldn't run and would throw an error, but MySQL just carries on. It uses the logic to decide which value it should show for each field that isn't num by just grabbing the first value it finds in it's data storage which may be different between innoDB and whatever else folks use anymore.
My guess is that in your case you have more than one record in lignee_new that has a num of 43. Since you GROUP BY num and nothing else, it just grabs values randomly from your multiple records where num=43 and displays them... which is reasonable. By not including them in an aggregate function you are pretty much saying "I don't care what you display for these other fields, just bring something back" and so MySQL does.
Remove your GROUP BY clause completely and you'll see data that makes sense. Perhaps use WHERE to further filter your records to get rid of nulls or other things you don't need (don't use GROUP BY to filter).
I need to query data from a second table, but only if a rare set of conditions in the primary table is met:
SELECT ..., IF(a AND b AND c AND (SELECT 1 FROM tableb ...)) FROM tablea ...
a, b, and c conditions are almost always false, so my thinking is the subquery will never execute for most rows in the result set and thus be way faster than a join. But that would only true if the IF() statement short circuits.
Does it?
Thanks for any help you guys can provide.
The answer is YES.
The IF(cond,expr_true,expr_false) within a mysql query is short-circuited.
Here a test, using #variables to prove the fact:
SET #var:=5;
SELECT IF(1 = 0, (#var:=#var + 1), #var ); -- using ':=' operator to modify 'true' expr #var
SELECT IF(1 = 1, #var, (#var:=#var + 1) ); -- using ':=' operator to modify 'false' expr #var
SELECT #var;
The result is '5' from all three SELECT queries.
Had the IF() function NOT short circuited, the result would be a '5' from SELECT #1, and '6' from SELECT #2, and a '7' from the last "select #var".
This is because the 'true' expression is NEVER executed, in select #1 and nor is the false expression executed for select #2.
Note the ':=' operator is used to modify an #var, within an SQL query (select,from, and where clauses). You can get some really fancy/complex SQL from this. I've used #vars to apply 'procedural' logic within a SQL query.
-- J Jorgenson --
With J. Jorgenson's help I came up with my own test case. His example does not try to short circuit in the condition evaluation, but using his idea I came up with my own test and verified that MySQL does indeed short-circuit the IF() condition check.
SET #var:=5;
SELECT IF(1 = 0 AND (#var:=10), 123, #var); #Expected output: 5
SELECT IF(1 = 1 AND (#var:=10), #var, 123); #Expected output: 10
On the second example, MySQL is properly short-circuiting: #var never gets set to 10.
Thanks for the help J. Jorgenson!
It depends.
IF doesn't short-circuit such that it can be used to avoid truncation warnings with GROUP_CONCAT, for example in:
set ##group_concat_max_len = 5;
select if(true or #var:=group_concat('warns if evaluated'), 'actual result', #var);
the result will be 'actual result' but you'll get a warning:
Warning (Code 1260): Row 1 was cut by GROUP_CONCAT()
which is the same warning you get with less trivial GROUP_CONCAT expressions, such as distinct keys, and without the IF at all.
Try it in the SQL analyzer. If you want to be on the safe side and not have to trust the database to work one way (and not to change that behavior ever in new versions), just make two queries and do the IF programmatically.