MySQL Index not working ( Use Case specific scenario) - mysql

So far following is my scenario :
Parameters controlled by user: (These parameters are controlled by a dashboard but for testing purposes I have created sql parameters in order to change their values)
SET #device_param := "all devices";
SET #date_param_start_bar_chart := '2016-09-01';
SET #date_param_end_bar_chart := '2016-09-19';
SET #country_param := "US";
SET #channel_param := "all channels";
Query that runs at the back-end
SELECT
country_code,
channel_report_tag,
SUM(count_more_then_30_min_play) AS '>30 minutes',
SUM(count_15_30_min_play) AS '15-30 Minutes',
SUM(count_0_15_min_play) AS '0-15 Minutes'
FROM
channel_play_times_cleaned
WHERE IFNULL(country_code, '') =
CASE
WHEN #country_param = "all countries"
THEN IFNULL(country_code, '')
ELSE #country_param
END
AND IFNULL(channel_report_tag, '') =
CASE
WHEN #channel_param = "all channels"
THEN IFNULL(channel_report_tag, '')
ELSE #channel_param
END
AND iFnull(device_report_tag, '') =
CASE
WHEN #device_param = "all devices"
THEN iFnull(device_report_tag, '')
ELSE #device_param
END
AND playing_date BETWEEN #date_param_start_bar_chart
AND #date_param_end_bar_chart
GROUP BY channel_report_tag
ORDER BY SUM(count_more_then_30_min_play) DESC
limit 10 ;
The index that I have applied is
CREATE INDEX my_index
ON channel_play_times_cleaned (
country_code,
channel_report_tag,
device_report_tag,
playing_date,
channel_report_tag
)
I have followed this link : My SQL Index Cook-Book Guide to create my index.
However the EXPLAIN keyword while executing the above query tells me that there is no index used.
I want to what am I doing wrong over here ?

You use functions and case expression in the first 3 where condition. Simple field index cannot be used to speed up such look ups.
MySQL could potentially use an index for the playing_date criteria, but that field is not the leftmost in the cited index, therefore the cited index is not suitable for that either.
If I were you, I would remove the logic from the where criteria and moved that into the application layer by constructing such an sql statement that has the case conditions resolved and emits only the necessary sql.

Your CASE expressions in the WHERE clause are forcing full table scans. Clearly, they have to go... but how?
You have to think like the optimizer and remember that its job is to avoid as much work as possible.
Consider this query:
SELECT * FROM users
WHERE first_name LIKE '%a%';
Every row must be read to find all first_name values containing the letter 'a'. Very slow.
Now, this one:
SELECT * FROM users
WHERE first_name LIKE '%a%'
AND 2 < 1;
For each row, you're asking the server to check the first_name again and to include only rows where 2 is a smaller number than 1.
Is it slow, or fast?
It's very fast, because the optimizer detects an Impossible WHERE. There is no point in scanning the rows because 2 < 1 is always false.
Now, use this logic to tell the optimizer what you really want:
Not this:
WHERE IFNULL(country_code, '') =
CASE
WHEN #country_param = "all countries"
THEN IFNULL(country_code, '')
ELSE #country_param
END
AND
But this:
WHERE
(
(
#country_param = "all countries"
)
OR
(
#country_param != "all countries"
AND
country_code = #country_param
)
)
AND ...
The difference should be stark. If #country_param = "all countries" the second test is not needed, and otherwise, only the rows with the matching country are needed and this portion of the WHERE clause is false by definition for all other rows, allowing an index on country_param to be used.
One or the other of these OR'ed expressions is always false, and that one will be optimized away, early -- never evaluated for each row. The expression #country_param != "all countries" should be treated no differently than the expression 2 < 1 or 2 > 1. It is not going to change its truthiness based on the data in the rows, so it only needs to be evaluated once, at the beginning.
Repeat for the other CASE. You should almost never pass columns as arguments to functions in the WHERE clause because the optimizer can't "look backwards through" functions and form an intelligent query plan.

The other answers have explained why your query is slow. I will explain what you should do.
Write code to "construct" the query. It would either leave out the test for country_code if the user said "all countries", or it add in AND country_code = "US". No #variables, no CASE, etc.
Then, one 5-column index won't work except for a few cases. Instead, get a feel for what users are asking for, then build a few 2-column indexes to cover the popular cases.

Related

Missing values in a query

I encounter some strange results in the following query :
SET #indi_id = 768;
SET #generations = 8;
SELECT num, sosa, seq, len, dernier, ful_ful_nom
FROM fullindi
LEFT JOIN lignee_new
ON ((ful_indi_id = dernier) AND (len BETWEEN 1 AND #generations))
RIGHT JOIN numbers
ON ((sosa = num) AND (premier = #indi_id))
WHERE num BETWEEN 1 AND pow(2, #generations)
GROUP BY num
ORDER BY num;
The result looks like this :
Why the row just before a full NULL one doesn't display the existing values 'sosa', 'len', 'dernier', ful_ful_nom') but only the 'seq' value (see rows 43 and 47 in this example) ?
What am I missing?
As requested, here are data :
table lignee_new :
table fullindi :
The problem is that MySQL does really dumb things when an Aggregate function is introduced, or a GROUP BY is included, but not all of the fields are in an Aggregate Function or your GROUP BY.
You are asking it to GROUP BY num but none of the other columns in your SELECT are included in the Group BY nor are they being aggregated with a function (SUM, MAX, MIN, AVG, etc..)
In any other RDBMS this query wouldn't run and would throw an error, but MySQL just carries on. It uses the logic to decide which value it should show for each field that isn't num by just grabbing the first value it finds in it's data storage which may be different between innoDB and whatever else folks use anymore.
My guess is that in your case you have more than one record in lignee_new that has a num of 43. Since you GROUP BY num and nothing else, it just grabs values randomly from your multiple records where num=43 and displays them... which is reasonable. By not including them in an aggregate function you are pretty much saying "I don't care what you display for these other fields, just bring something back" and so MySQL does.
Remove your GROUP BY clause completely and you'll see data that makes sense. Perhaps use WHERE to further filter your records to get rid of nulls or other things you don't need (don't use GROUP BY to filter).

MySQL query with WHERE clause, but drop a condition if no results?

I'm currently doing this via two separate queries from PHP, but would love to optimize and somehow in a single query.
First query..
SELECT `referrer`
FROM `tbl_traffic_log`
WHERE `domain` = 'mysite.com'
AND `referrer` != '$referringDomain'
AND CASE WHEN `clicks_in_unique`=0 THEN 2 ELSE `clicks_out_unique`/`clicks_in_unique` END < 1.4
ORDER BY RAND()
LIMIT 1
..and if mysql_num_rows shows no results, I do a second query to try again and check if there are any results minus the referrer != 'partner1.com' part.
The code is basically trying to find a random trade partner who ISN'T the partner who sent that click, but if there are no matches, as a last resort it's ok to send back, provided it matches the other criteria.
I'm pretty sure there is a way to do this, but just can't find a way after searching (probably because I'm not understanding the problem enough to type in the right thing).
Any other critique of the query is welcome as well.
Thank you :)
I think you can do this like this:
SELECT `referrer`
FROM `tbl_traffic_log`
WHERE `domain` = 'mysite.com'
AND CASE WHEN `clicks_in_unique`=0 THEN 2 ELSE `clicks_out_unique`/`clicks_in_unique` END < 1.4
ORDER BY `referrer` != '$referringDomain' desc, RAND()
LIMIT 1
The idea is to put the condition in the order by. The condition (in MySQL) evaluates to either 0 or 1, so we want where the condition is true first (hence the desc). It then chooses a random row. If there are no rows where the condition is true, then it chooses a random row.

Is there a special character in mySql that would return always true in WHERE clauses?

Is there a character, say, $,
SELECT * FROM Persons WHERE firstName='Peter' AND areaCode=$;
such that the statement would return the same as
SELECT * FROM Persons WHERE firstName='Peter'
i.e. areaCode=$ would always return always true and, thus, effectively “turns of” the criteria areaCode=...
I’m writing a VBA code in Excel that fetches some rows based on a number of criteria. The criteria can either be enabled or disabled. A character like $ would make the disabling so much easier.
instead of disabling it, pass it through to your query as NULL and use COALESCE:
SELECT *
FROM Persons
WHERE firstName='Peter'
AND areaCode = COALESCE(<your parameter>, areaCode);
%
See Wildcards
You could use NULL for this purpose:
AND (areaCode = ? OR ? IS NULL)
I think you could use something like
SELECT * FROM Persons WHERE firstName=firstName
of course without quotes
From your question I assume that you actually want the ability to include or exclude the where clause, in which case you need to use or.
SELECT *
FROM Persons
WHERE ( 1 = 2
OR ( firstName = 'Peter'
AND < more conditions if needed >
)
)
In this example 1 <> 2 so the only condition evaluated is firstName = 'Peter'. If you then want to ignore the where clause you change 2 to 1. As 1 = 1 this is evaluated for every row and the rest of the conditions will be ignored.

IsNumeric in SQL Server JOIN

My problem seems to be very simple but I'm stuck here. I have a table which has an "nvarchar" column called "SrcID" and I store both numbers and strings in that. Now, when I try to check for "IsNumeric" on that column in a "Join" condition, something like below,
ISNUMERIC(SrcID) = 1 AND SrcID > 15
I am getting the following error:
Msg 245, Level 16, State 1, Line 47
Conversion failed when converting the nvarchar value 'Test' to data type int.
Amazingly, when I remove the check "SrcID > 15", my query is running properly. Should I include anything else in this statement?
Please help me in fixing the issue. Thanks in advance!!
You can't count on the order in which a database will evaluate filtering expressions. There is a query optimizer that will evaluate your SQL and build a plan to execute the query based on what it perceives will yield the best performance.
In this context, IsNumeric() cannot be used with an index, and it means running a function against every row in the table. Therefore, it will almost never provide the best perceived performance. Compare this with the SrcID > 15 expression, which can be matched with an index (if one exists), and is just a single operator expression even if one doesn't. It can also be used to filter down the number of potential rows where the IsNumeric() function needs to run.
You can likely get around this with a view, a subquery, a CTE, a CASE statement, or a computed column. Here's a CTE example:
With NumericOnly As
(
SELECT <columns> FROM MyTable WHERE IsNumeric(SrcID) = 1
)
SELECT <columns> FROM NumericOnly WHERE SrcID > 15
And here's a CASE statement option:
SELECT <columns> FROM MyTable WHERE CASE WHEN IsNumeric(SrcIC) = 1 THEN Cast(SrcID As Int) ELSE 0 END > 15
The filters in a WHERE clause are not evaluated in any particular order.
This is a common misconception with SQL Server - the optimizer will check whichever conditions it thinks it can the fastest/easiest, and try to limit the data in the most efficient way possible.
In your example, you probably have an index on SrcID, and the optimizer thinks it will be quicker to FIRST limit the results to where the SrcID > 15, then run the function on all those rows (since the function will need to check every single row otherwise).
You can try to force an order of operations with parentheses like:
WHERE (ISNUMERIC(SrcID) = 1) AND SrcID > 15
Or with a case statement:
WHERE CASE WHEN ISNUMERIC(SrcID) = 1 THEN SrcID > 15 ELSE 1=0 END

SQL 2008: Prevent Full-text lookup in query when not needed

From working on this specific situation, it was news to me that the logic operators are not short circuited in SQL.
I routinely do something along these lines in the where clause (usually when dealing with search queries):
WHERE
(#Description IS NULL OR #Description = myTable.Description)
Which, even if it's not short-circuited in this example, doesn't really matter. However, when dealing with the fulltext search functions, it does matter.. If the second part of that query was CONTAINS(myTable.Description, #Description), it wouldn't work because the variable is not allowed to be null or empty for these functions.
I found out the WHEN statements of CASE are executed in order, so I can change my query like so to ensure the fulltext lookup is only called when needed, along with changing the variable from null to '""' when it is null to allow the query to execute:
WHERE
(CASE WHEN #Description = '""' THEN 1 WHEN CONTAINS(myTable.Description, #Description) THEN 1 ELSE 0 END = 1)
The above code should prevent the full-text query piece from executing unless there is actually a value to search with.
My question is, if I run this query where #Description is '""', there is still quite a bit of time in the execution plan spent dealing with clustered index seeks and fulltextmatch, even though that table and search does not end up being used at all: is there any way to avoid this?
I'm trying to get this out of a hardcoded dynamic query and into a stored procedure, but if the procedure ends up being slower, I'm not sure I can justify it.
It's not ideal, but maybe something like this would work:
IF #Description = ''
BEGIN
SELECT ...
END
ELSE
BEGIN
SELECT ...
WHERE CONTAINS(mytable.description, #Description)
END
That way you avoid mysql and also running the FT scan when it's not needed.
As a few general notes, I usually find CONTAINSTABLE to be a bit faster. Also, since the query plan is going to be very different whether you're using my solution or yours, watch out for parameter sniffing. Parameter sniffing is when the optimizer builds a plan based on a passed in specific parameter value.
In case anyone else runs into a scenario like this, this is what I ended up doing, which is pretty close to what M_M was getting at; I broke away the full-text pieces and placed them behind branches:
DECLARE #TableBfullSearch TABLE (TableAId int)
IF(#TableBSearchInfo IS NOT NULL)
INSERT INTO #TableBfullSearch
SELECT
TableAId
FROM
TableB
WHERE
...(fulltext search)...
DECLARE #TableCfullSearch TABLE (TableAId int)
IF(#TableCSearchInfo IS NOT NULL)
INSERT INTO #TableCfullSearch
SELECT
TableAId
FROM
TableC
WHERE
...(fulltext search)...
--main query with this addition in the where clause
SELECT
...
FROM
TableA
WHERE
...
AND (#TableBSearchInfo IS NULL OR TableAId IN (SELECT TableAId FROM #TableBfullSearch))
AND (#TableCSearchInfo IS NULL OR TableAId IN (SELECT TableAId FROM #TableCfullSearch))
I think that's probably about as good as it'll get without some sort of dynamic query