SQL 2008: Prevent Full-text lookup in query when not needed - sql-server-2008

From working on this specific situation, it was news to me that the logic operators are not short circuited in SQL.
I routinely do something along these lines in the where clause (usually when dealing with search queries):
WHERE
(#Description IS NULL OR #Description = myTable.Description)
Which, even if it's not short-circuited in this example, doesn't really matter. However, when dealing with the fulltext search functions, it does matter.. If the second part of that query was CONTAINS(myTable.Description, #Description), it wouldn't work because the variable is not allowed to be null or empty for these functions.
I found out the WHEN statements of CASE are executed in order, so I can change my query like so to ensure the fulltext lookup is only called when needed, along with changing the variable from null to '""' when it is null to allow the query to execute:
WHERE
(CASE WHEN #Description = '""' THEN 1 WHEN CONTAINS(myTable.Description, #Description) THEN 1 ELSE 0 END = 1)
The above code should prevent the full-text query piece from executing unless there is actually a value to search with.
My question is, if I run this query where #Description is '""', there is still quite a bit of time in the execution plan spent dealing with clustered index seeks and fulltextmatch, even though that table and search does not end up being used at all: is there any way to avoid this?
I'm trying to get this out of a hardcoded dynamic query and into a stored procedure, but if the procedure ends up being slower, I'm not sure I can justify it.

It's not ideal, but maybe something like this would work:
IF #Description = ''
BEGIN
SELECT ...
END
ELSE
BEGIN
SELECT ...
WHERE CONTAINS(mytable.description, #Description)
END
That way you avoid mysql and also running the FT scan when it's not needed.
As a few general notes, I usually find CONTAINSTABLE to be a bit faster. Also, since the query plan is going to be very different whether you're using my solution or yours, watch out for parameter sniffing. Parameter sniffing is when the optimizer builds a plan based on a passed in specific parameter value.

In case anyone else runs into a scenario like this, this is what I ended up doing, which is pretty close to what M_M was getting at; I broke away the full-text pieces and placed them behind branches:
DECLARE #TableBfullSearch TABLE (TableAId int)
IF(#TableBSearchInfo IS NOT NULL)
INSERT INTO #TableBfullSearch
SELECT
TableAId
FROM
TableB
WHERE
...(fulltext search)...
DECLARE #TableCfullSearch TABLE (TableAId int)
IF(#TableCSearchInfo IS NOT NULL)
INSERT INTO #TableCfullSearch
SELECT
TableAId
FROM
TableC
WHERE
...(fulltext search)...
--main query with this addition in the where clause
SELECT
...
FROM
TableA
WHERE
...
AND (#TableBSearchInfo IS NULL OR TableAId IN (SELECT TableAId FROM #TableBfullSearch))
AND (#TableCSearchInfo IS NULL OR TableAId IN (SELECT TableAId FROM #TableCfullSearch))
I think that's probably about as good as it'll get without some sort of dynamic query

Related

SQL Query - Select a value, then use it again in following statements

I've tried looking it up, and while I think this should be possible I can't seem to find the answer I need anywhere.
I need to lookup a date from one table, then store it for use in a following query.
Below is statements that should work, with my setting the variable (which I know won't work, but I'm unsure the best way to do/show it otherwise - bar maybe querying it twice inside the if statement.)
I'm then wanting to in the latter statement, use either the date given in the second query, or if the date from the first query (that I'm thinking to set as a variable) is newer, use that instead.
startDateVariable = (SELECT `userID`, `startDate`
FROM `empDetails`
WHERE `userID` = 1);
SELECT `userID`, SUM(`weeksGROSS`) AS yearGROSS
FROM `PAYSLIP`
WHERE `date` <= "2021-11-15"
AND `date` >= IF( "2020-11-15" > startDateVariable , "2020-11-15" , startDateVariable )
AND `userID` IN ( 1 )
GROUP BY `userID`
Naturally all dates given in the query ("2021-11-15" etc) would be inserted dynamically in the prepared statement.
Now while I've set the userID IN to just query 1, it'd be ideal if I can lookup multiple users this way at once, though I can accept that I may need to make an individual query per user doing it this way.
Much appreciated!
So turns I was going about this the wrong way, looks like the best way to do this or something similar is by using SQL JOIN
This allows you to query the tables as if they are one.
I also realised rather then using an IF, i could simply make sure i was looking up newer or equal to both the date given and the start date.
Below is working as required. And allows lookup of multiple users at once as wanted.
SELECT PAYSLIP.userID, employeeDetails.startDate, SUM(PAYSLIP.weeksGROSS) AS yearGROSS
FROM PAYSLIP
INNER JOIN employeeDetails ON employeeDetails.userID=PAYSLIP.userID
WHERE PAYSLIP.date <= "2021-11-15"
AND PAYSLIP.date >= "2020-11-15"
AND PAYSLIP.date >= employeeDetails.startDate
AND PAYSLIP.userID IN ( 1,2,8 )
GROUP BY PAYSLIP.userID
See here for more usage examples: https://www.w3schools.com/sql/sql_join.asp
However along the lines of my particular question, it's possible to store variables. I.E.
SET #myvar= 'Example showing how to declare variable';
Then use it in the SQL statement by using
#myvar where you want the variable to go.

Where to write FROM in IF?

I've already written the function, I'm trying to use FROM within an IF statement. The code below is written within a function.
IF myParameter IN (SELECT id FROM myTable) AND myTable.myType='Type A' THEN
Of course, it gives an error
Unknown table 'myTable'
when trying to execute the function.
EDIT: myParameter is an INT
Is this what you want?
IF myParameter IN (SELECT id FROM myTable WHERE myTable.myType = 'Type A') THEN
That is, move the condition to the subquery.
Typically, I'd use EXISTS for this kind of check:
IF EXISTS (SELECT * FROM myTable WHERE id = myParameter AND myType='Type A') THEN
EXISTS will stop processing at the first match, and despite requiring a SELECT for syntax reasons, doesn't actually create an intermediate result; IN generally processes the entire query within it, formulates a result set, and then checks the left hand argument of the IN against potentially everything in that result (it probably stops at the first match two, but it still had to assemble the entire result).
If myparameter is filtering on the table's primary key, it probably will not make much of a difference in this case; but it something to keep in mind in the more general sense.

MySQL Index not working ( Use Case specific scenario)

So far following is my scenario :
Parameters controlled by user: (These parameters are controlled by a dashboard but for testing purposes I have created sql parameters in order to change their values)
SET #device_param := "all devices";
SET #date_param_start_bar_chart := '2016-09-01';
SET #date_param_end_bar_chart := '2016-09-19';
SET #country_param := "US";
SET #channel_param := "all channels";
Query that runs at the back-end
SELECT
country_code,
channel_report_tag,
SUM(count_more_then_30_min_play) AS '>30 minutes',
SUM(count_15_30_min_play) AS '15-30 Minutes',
SUM(count_0_15_min_play) AS '0-15 Minutes'
FROM
channel_play_times_cleaned
WHERE IFNULL(country_code, '') =
CASE
WHEN #country_param = "all countries"
THEN IFNULL(country_code, '')
ELSE #country_param
END
AND IFNULL(channel_report_tag, '') =
CASE
WHEN #channel_param = "all channels"
THEN IFNULL(channel_report_tag, '')
ELSE #channel_param
END
AND iFnull(device_report_tag, '') =
CASE
WHEN #device_param = "all devices"
THEN iFnull(device_report_tag, '')
ELSE #device_param
END
AND playing_date BETWEEN #date_param_start_bar_chart
AND #date_param_end_bar_chart
GROUP BY channel_report_tag
ORDER BY SUM(count_more_then_30_min_play) DESC
limit 10 ;
The index that I have applied is
CREATE INDEX my_index
ON channel_play_times_cleaned (
country_code,
channel_report_tag,
device_report_tag,
playing_date,
channel_report_tag
)
I have followed this link : My SQL Index Cook-Book Guide to create my index.
However the EXPLAIN keyword while executing the above query tells me that there is no index used.
I want to what am I doing wrong over here ?
You use functions and case expression in the first 3 where condition. Simple field index cannot be used to speed up such look ups.
MySQL could potentially use an index for the playing_date criteria, but that field is not the leftmost in the cited index, therefore the cited index is not suitable for that either.
If I were you, I would remove the logic from the where criteria and moved that into the application layer by constructing such an sql statement that has the case conditions resolved and emits only the necessary sql.
Your CASE expressions in the WHERE clause are forcing full table scans. Clearly, they have to go... but how?
You have to think like the optimizer and remember that its job is to avoid as much work as possible.
Consider this query:
SELECT * FROM users
WHERE first_name LIKE '%a%';
Every row must be read to find all first_name values containing the letter 'a'. Very slow.
Now, this one:
SELECT * FROM users
WHERE first_name LIKE '%a%'
AND 2 < 1;
For each row, you're asking the server to check the first_name again and to include only rows where 2 is a smaller number than 1.
Is it slow, or fast?
It's very fast, because the optimizer detects an Impossible WHERE. There is no point in scanning the rows because 2 < 1 is always false.
Now, use this logic to tell the optimizer what you really want:
Not this:
WHERE IFNULL(country_code, '') =
CASE
WHEN #country_param = "all countries"
THEN IFNULL(country_code, '')
ELSE #country_param
END
AND
But this:
WHERE
(
(
#country_param = "all countries"
)
OR
(
#country_param != "all countries"
AND
country_code = #country_param
)
)
AND ...
The difference should be stark. If #country_param = "all countries" the second test is not needed, and otherwise, only the rows with the matching country are needed and this portion of the WHERE clause is false by definition for all other rows, allowing an index on country_param to be used.
One or the other of these OR'ed expressions is always false, and that one will be optimized away, early -- never evaluated for each row. The expression #country_param != "all countries" should be treated no differently than the expression 2 < 1 or 2 > 1. It is not going to change its truthiness based on the data in the rows, so it only needs to be evaluated once, at the beginning.
Repeat for the other CASE. You should almost never pass columns as arguments to functions in the WHERE clause because the optimizer can't "look backwards through" functions and form an intelligent query plan.
The other answers have explained why your query is slow. I will explain what you should do.
Write code to "construct" the query. It would either leave out the test for country_code if the user said "all countries", or it add in AND country_code = "US". No #variables, no CASE, etc.
Then, one 5-column index won't work except for a few cases. Instead, get a feel for what users are asking for, then build a few 2-column indexes to cover the popular cases.

T-SQL stored procedure: Select * From Where Value is not null

I have a table with several columns. I want a stored procedure that will filter a Select * statement for each of those values if the parameter for those values is not null. Would I have to write a like a sql server string or whatever, and if that parameter is not null, just append the parameter value to the string? Or is there a simpler built in mechanism for that sort of thing? I know you have to do the string thing for Oracle. But ms sql server has always struck me as more user friendly. I thought I would check first before I dove in.
Thanks
The easy route, assuming col is not nullable, or it is and you don't want NULL rows to match:
WHERE col LIKE COALESCE(#param, col)
-- or the longer version:
WHERE (col LIKE #param OR #param IS NULL)
(Where #param is either NULL or something like '%asdf%'.)
If col is nullable and you do want NULL rows to match, you could try this:
WHERE COALESCE(col, 'x') LIKE COALESCE(#param, col, 'x')
There are other ways to do it, as this could potentially lead to bad plans based on your parameterization settings and what parameters are used the first time it is cached (this can lead to poor plan choice due to "parameter sniffing"), but that is probably largely irrelevant here because your WHERE clause is going to force a table scan anyway.
A common alternative when plan quality becomes an issue is to use dynamic SQL, e.g.
DECLARE #sql NVARCHAR(MAX) = N'SELECT ... FROM ... WHERE 1 = 1';
IF #param IS NOT NULL
BEGIN
SET #sql += ' AND col LIKE ''' + REPLACE(#param, '''', '''''') + '%''';
END
It can be helpful in cases like this to make sure the optimize for ad hoc workloads setting is enabled.
For information on parameter sniffing and dynamic SQL, see these posts by Erland Sommarskog:
http://www.sommarskog.se/query-plan-mysteries.html
http://www.sommarskog.se/dynamic_sql.html

Count occurrences of a word in a row in MySQL

I'm making a search function for my website, which finds relevant results from a database. I'm looking for a way to count occurrences of a word, but I need to ensure that there are word boundaries on both sides of the word ( so I don't end up with "triple" when I want "rip").
Does anyone have any ideas?
People have misunderstood my question:
How can I count the number of such occurences within a single row?
This is not the sort of thing that relational databases are very good at, unless you can use fulltext indexing, and you have already stated that you cannot, since you're using InnoDB. I'd suggest selecting your relevant rows and doing the word count in your application code.
You can try this perverted way:
SELECT
(LENGTH(field) - LENGTH(REPLACE(field, 'word', ''))) / LENGTH('word') AS `count`
ORDER BY `count` DESC
This query can be very slow
It looks pretty ugly
REPLACE() is case-sensitive
You can overcome the issue of mysql's case-sensitive REPLACE() function by using LOWER().
Its sloppy, but on my end this query runs pretty fast.
To speed things along I retrieve the resultset in a select which I have declared as a derived table in my 'outer' query. Since mysql already has the results at this point, the replace method works pretty quickly.
I created a query similar to the one below to search for multiple terms in multiple tables and multiple columns. I obtain a 'relevance' number equivalent to the sum of the count of all occurrances of all found search terms in all columns searched
SELECT DISTINCT (
((length(x.ent_title) - length(replace(LOWER(x.ent_title),LOWER('there'),''))) / length('there'))
+ ((length(x.ent_content) - length(replace(LOWER(x.ent_content),LOWER('there'),''))) / length('there'))
+ ((length(x.ent_title) - length(replace(LOWER(x.ent_title),LOWER('another'),''))) / length('another'))
+ ((length(x.ent_content) - length(replace(LOWER(x.ent_content),LOWER('another'),''))) / length('another'))
) as relevance,
x.ent_type,
x.ent_id,
x.this_id as anchor,
page.page_name
FROM (
(SELECT
'Foo' as ent_type,
sp.sp_id as ent_id,
sp.page_id as this_id,
sp.title as ent_title,
sp.content as ent_content,
sp.page_id as page_id
FROM sp
WHERE (sp.title LIKE '%there%' OR sp.content LIKE '%there%' OR sp.title LIKE '%another%' OR sp.content LIKE '%another%' ) AND (sp_content.title NOT LIKE '%goes%' AND sp_content.content NOT LIKE '%goes%')
) UNION (
[search a different table here.....]
)
) as x
JOIN page ON page.page_id = x.page_id
WHERE page.rstatus = 'ACTIVE'
ORDER BY relevance DESC, ent_title;
Hope this helps someone
-- Seacrest out
create a user defined function like this and use it in your query
DELIMITER $$
CREATE FUNCTION `getCount`(myStr VARCHAR(1000), myword VARCHAR(100))
RETURNS INT
BEGIN
DECLARE cnt INT DEFAULT 0;
DECLARE result INT DEFAULT 1;
WHILE (result > 0) DO
SET result = INSTR(myStr, myword);
IF(result > 0) THEN
SET cnt = cnt + 1;
SET myStr = SUBSTRING(myStr, result + LENGTH(myword));
END IF;
END WHILE;
RETURN cnt;
END$$
DELIMITER ;
Hope it helps
Refer This
Something like this should work:
select count(*) from table where fieldname REGEXP '[[:<:]]word[[:>:]]';
The gory details are in the MySQL manual, section 11.4.2.
Something like LIKE or REGEXP will not scale (unless it's a leftmost prefix match).
Consider instead using a fulltext index for what you want to do.
select count(*) from yourtable where match(title, body) against ('some_word');
I have used the technique as described in the link below. The method uses length and replace functions of MySQL.
Keyword Relevance
If you want a search I would advise something like Sphinx or Lucene, I find Sphinx (as an independent full text indexer) to be a lot easier to set up and run. It runs fast, and generates the indexes very fast. Even if you were using MyISAM I would suggest using it, it has a lot more power than a full text index from MyISAM.
It can also integrate (somewhat) with MySQL.
It depends on what DBMS you are using, some allow writing UDFs that could do this.