Using concat in where conditions, good or bad? - mysql

A simple quiz:
Probably many guys know this before,
In my app there is a query in which Im using concat in where condition like this,
v_book_id and v_genre_id are 2 variables in my procedure.
SELECT link_id
FROM link
WHERE concat(book_id,genre_id) = concat(v_book_id,v_genre_id);
Now, I know there is a catch/bug in this, which will occur only twice in your lifetime. Can you tell me what is it?
I found this out yesterday and thought I should make a noise about all others practicing this.
Thanks.

Let's have a look
WHERE concat(book_id,genre_id) = concat(v_book_id,v_genre_id);
as opposed to
WHERE book_id = v_book_id AND genre_id = v_genre_id;
There. The second solution is
faster (optimal index usage)
easier to write (less code)
easier to read (what on earth was the author thinking to concatenate numbers???)
more correct (as Alnitak also stated in the question's comments). check out this sample data:
book_id | genre_id
1 | 12
11 | 2
Now add (or concat) v_book_id = 1 and v_genre_id = 12 and see how you'll get funny results with your concat() query
Note, some databases (including MySQL) allow operations on tuples, which may be what the clever author of the above really intended to do:
WHERE (book_id, genre_id) = (v_book_id, v_genre_id);
A working example of such a tuple predicate:
SELECT * FROM (
SELECT 1 x, 2 y FROM DUAL UNION ALL
SELECT 1 x, 3 y FROM DUAL UNION ALL
SELECT 1 x, 2 y FROM DUAL
) a
WHERE (x, y) = (1, 2)
Note, some databases will need extra parentheses around the right-hand side tuple : ((1, 2))

Related

Recursively running a MySQL function

I have a function in MySQL that needs to be run about 50 times (not a set value) in a query. the inputs are currently stored in an array such as
[1,2,3,4,5,6,7,8,9,10]
when executing the MySQL query individually it's working fine, please see below
column_name denotes the column it's getting the data for, in this case, it's a DOUBLE in the database
The second value in the MOD() function is the input I'm supplying MySQL from the aforementioned array
SELECT id, MOD(column_name, 4) AS mod_output
FROM table
HAVING mod_output > 10
To achieve the output I require* the following code works
SELECT id, MOD(column_name, 4) AS mod_output1, MOD(column_name, 5) AS mod_output2, MOD(column_name, 6) AS mod_output3
FROM table
HAVING mod_output1 > 10 AND mod_output2 > 10 AND mod_output3 > 10
However this obviously is extremely dirty, and when having not 3 inputs, but over 50, this will become highly inefficient.
Appart from calling over 50 individual querys, is there a better way to acchieve the same sort (see below) of output?
In escennce i need to supply MySQL with a list of values and have it run MOD() over all of them on a specified column.
The only data I need returned is the id's of the rows that match the MOD() functions output with the specified input (see value 2 of the MOD() function) where the output is less than 10
Please note, MOD() has been used as an example function, however, the final function required *should* be a drop in replacement
example table layout
id | column_name
1 | 0.234977
2 | 0.957739
3 | 2.499387
4 | 48.395777
5 | 9.943782
6 | -39.234894
7 | 23.49859
.....
(The title may be worded wrong, I'm not quite sure how else you'd explain what I'm trying to do here)
Use a join and derived table or temporary table:
SELECT n.n, t.id, MOD(t.column_name, n.n) AS mod_output
FROM table t CROSS JOIN
(SELECT 4 as n UNION ALL SELECT 5 UNION ALL SELECT 6 . . .
) n
WHERE MOD(t.column_name, n.n) > 10;
If you want the results as columns, you can use conditional aggregation afterwards.

How to Find First Valid Row in SQL Based on Difference of Column Values

I am trying to find a reliable query which returns the first instance of an acceptable insert range.
Research:
some of the below links adress similar questions, but I could get none of them to work for me.
Find first available date, given a date range in SQL
Find closest date in SQL Server
MySQL difference between two rows of a SELECT Statement
How to find a gap in range in SQL
and more...
Objective Query Function:
InsertRange(1) = (StartRange(i) - EndRange(i-1)) > NewValue
Where InsertRange(1) is the value the query should return. In other words, this would be the first instance where the above condition is satisfied.
Table Structure:
Primary Key: StartRange
StartRange(i-1) < StartRange(i)
StartRange(i-1) + EndRange(i-1) < StartRange(i)
Example Dataset
Below is an example User table (3 columns), with a set range distribution. StartRanges are always ordered in a strictly ascending way, UserID are arbitrary strings, only the sequences of StartRange and EndRange matters:
StartRange EndRange UserID
312 6896 user0
7134 16268 user1
16877 22451 user2
23137 25142 user3
25955 28272 user4
28313 35172 user5
35593 38007 user6
38319 38495 user7
38565 45200 user8
46136 48007 user9
My current Query
I am trying to use this query at the moment:
SELECT t2.StartRange, t2.EndRange
FROM user AS t1, user AS t2
WHERE (t1.StartRange - t2.StartRange+1) > NewValue
ORDER BY t1.EndRange
LIMIT 1
Example Case
Given the table, if NewValue = 800, then the returned answer should be 23137. This means, the first available slot would be between user3 and user4 (with an actual slot size = 813):
InsertRange(1) = (StartRange(i) - EndRange(i-1)) > NewValue
InsertRange = (StartRange(6) - EndRange(5)) > NewValue
23137 = 25955 - 25142 > 800
More Comments
My query above seemed to be working for the special case where StartRanges where tightly packed (i.e. StartRange(i) = StartRange(i-1) + EndRange(i-1) + 1). This no longer works with a less tightly packed set of StartRanges
Keep in mind that SQL tables have no implicit row order. It seems fair to order your table by StartRange value, though.
We can start to solve this by writing a query to obtain each row paired with the row preceding it. In MySQL, it's hard to do this beautifully because it lacks the row numbering function.
This works (http://sqlfiddle.com/#!9/4437c0/7/0). It may have nasty performance because it generates O(n^2) intermediate rows. There's no row for user0; it can't be paired with any preceding row because there is none.
select MAX(a.StartRange) SA, MAX(a.EndRange) EA,
b.StartRange SB, b.EndRange EB , b.UserID
from user a
join user b ON a.EndRange <= b.StartRange
group by b.StartRange, b.EndRange, b.UserID
Then, you can use that as a subquery, and apply your conditions, which are
gap >= 800
first matching row (lowest StartRange value) ORDER BY SB
just one LIMIT 1
Here's the query (http://sqlfiddle.com/#!9/4437c0/11/0)
SELECT SB-EA Gap,
EA+1 Beginning_of_gap, SB-1 Ending_of_gap,
UserId UserID_after_gap
FROM (
select MAX(a.StartRange) SA, MAX(a.EndRange) EA,
b.StartRange SB, b.EndRange EB , b.UserID
from user a
join user b ON a.EndRange <= b.StartRange
group by b.StartRange, b.EndRange, b.UserID
) pairs
WHERE SB-EA >= 800
ORDER BY SB
LIMIT 1
Notice that you may actually want the smallest matching gap instead of the first matching gap. That's called best fit, rather than first fit. To get that you use ORDER BY SB-EA instead.
Edit: There is another way to use MySQL to join adjacent rows, that doesn't have the O(n^2) performance issue. It involves employing user variables to simulate a row_number() function. The query involved is a hairball (that's a technical term). It's described in the third alternative of the answer to this question. How do I pair rows together in MYSQL?

Sql Select into array - column has seperater

I have a column in my DB that has the following data (yeah i know its wrong to have multiple names separated by some random character)
"John Cusack | Thandie Newton | Chiwetel Ejiofor"
I want to be able to separate these people into an array to use later or even just to be able display them like below will help
John Cusack
Thandie Newton
Chiwetel Ejiofor
any ideas please
thanks in advance
As you say, storing delimited lists in an RDBMS really is not a good idea; however, you may be able to use MySQL's string manipulation functions such as SUBSTRING_INDEX() to obtain your desired results (MySQL doesn't have array types, so I assume you're merely looking to split the data):
SELECT SUBSTRING_INDEX(SUBSTRING_INDEX(my_column, '|', 1), -1),
SUBSTRING_INDEX(SUBSTRING_INDEX(my_column, '|', 2), -1),
SUBSTRING_INDEX(SUBSTRING_INDEX(my_column, '|', 3), -1)
FROM my_table
Note that one doesn't actually need to invoke SUBSTRING_INDEX() twice for the first and last elements of the list, but I thought it informative to do so in order that the pattern for further elements can be seen more clearly.
If you were so inclined, you could build a stored procedure that loops over the string populating a temporary table with each found element—but this is all so far away from "good practice" that it's almost certainly not worth delving into it any further.
you can try this.
select substring_index(substring_index('a|b|c|h', '|',#r:=#r+1),'|',-1) zxz
from (select #r:=0) x,
(select 'x' xx union select 'v' xx union select 'z' xx union select 'p' xx) z;
Result looks like
----
|zxz|
-----
|a |
------
|b |
------
|c |
------
|h |
------
locatet here: Mysql
and a little modified.
Remember: The "count" of the union statements have to be the same as your delemiter.
Kind Regars

Using IN clause in sql server

My query is like below.I want to select values if Type = 1 and subtype = 1,3 or 2.
select sum(case when Type = 1 and SubType in (1, 3 or 2) then 1 else 0 end) as 'WorkStations'
Is this right way?
Since you're only trying to get a count of the workstations that meet the criteria as far as I can see:
SELECT COUNT(*) AS Workstations FROM MyWorkStationTable WHERE Type = 1 AND SubType IN (1, 2, 3)
Also, an IN clause is by nature already an OR. It is neither valid syntax nor necessary to state it.
If you're simply counting records, your best bet is to use the COUNT function provided by SQL Server. Consider using the following:
SELECT COUNT(*) FROM [Table] WHERE TYPE = 1
AND (SUBTYPE = 1
OR SUBTYPE = 2
OR SUBTYPE = 3)
It is best to avoid using 'IN' as it can lead to unnecessary calls to the SQL engine.
SELECT COUNT(*) [Workstations] FROM [YourTable] t WHERE t.Type = 1 AND t.SubType IN (1, 2, 3)
Try avoiding IN Predicates and instead use Joins because it Iterate unnecessarily despite of the fact that there is just one/two match. I will explain it with an example.
Suppose I have two list objects.
List 1 List 2
1 12
2 7
3 8
4 98
5 9
6 10
7 6
Using IN, it will search for each List-1 item in List-2 that means iteration will happen 49 times !!!

Need Help streamlining a SQL query to avoid redundant math operations in the WHERE and SELECT

*Hey everyone, I am working on a query and am unsure how to make it process as quickly as possible and with as little redundancy as possible. I am really hoping someone there can help me come up with a good way of doing this.
Thanks in advance for the help!*
Okay, so here is what I have as best I can explain it. I have simplified the tables and math to just get across what I am trying to understand.
Basically I have a smallish table that never changes and will always only have 50k records like this:
Values_Table
ID Value1 Value2
1 2 7
2 2 7.2
3 3 7.5
4 33 10
….50000 44 17.2
And a couple tables that constantly change and are rather large, eg a potential of up to 5 million records:
Flags_Table
Index Flag1 Type
1 0 0
2 0 1
3 1 0
4 1 1
….5,000,000 1 1
Users_Table
Index Name ASSOCIATED_ID
1 John 1
2 John 1
3 Paul 3
4 Paul 3
….5,000,000 Richard 2
I need to tie all 3 tables together. The most results that are likely to ever be returned from the small table is somewhere in the neighborhood of 100 results. The large tables are joined on the index and these are then joined to the Values_Table ON Values_Table.ID = Users_Table.ASSOCIATED_ID …. That part is easy enough.
Where it gets tricky for me is that I need to return, as quickly as possible, a list limited to 10 results where value1 and value2 are mathematically operated on to return a new_ value where that new_value is less than 10 and the result is sorted by that new_value and any other where statements I need can be applied to the flags. I do need to be able to move along the limit. EG LIMIT 0,10 / 11,10 / 21,10 etc...
In a subsequent (or the same if possible) query I need to get the top 10 count of all types that matched that criteria before the limit was applied.
So for example I want to join all of these and return anything where Value1 + Value2 < 10 AND I also need the count.
So what I want is:
Index Name Flag1 New_Value
1 John 0 9
2 John 0 9
5000000 Richard 1 9.2
The second response would be:
ID (not index) Count
1 2
2 1
I tried this a few ways and ultimately came up with the following somewhat ugly query:
SELECT INDEX, NAME, Flag1, (Value1 * some_variable + Value2) as New_Value
FROM Values_Table
JOIN Users_Table ON ASSOCIATED_ID = ID
JOIN Flags_Table ON Flags_Table.Index = Users_Table.Index
WHERE (Value1 * some_variable + Value1) < 10
ORDER BY New_Value
LIMIT 0,10
And then for the count:
SELECT ID, COUNT(TYPE) as Count, (Value1 * some_variable + Value2) as New_Value
FROM Values_Table
JOIN Users_Table ON ASSOCIATED_ID = ID
JOIN Flags_Table ON Flags_Table.Index = Users_Table.Index
WHERE (Value1 * some_variable + Value1) < 10
GROUP BY TYPE
ORDER BY New_Value
LIMIT 0,10
Being able to filter on the different flags and such in my WHERE clause is important; that may sound stupid to comment on but I mention that because from what I could see a quicker method would have been to use the HAVING statement but I don't believe that will work in certain instance depending on what I want to use my WHERE clause to filter against.
And when filtering using the flags table :
SELECT INDEX, NAME, Flag1, (Value1 * some_variable + Value2) as New_Value
FROM Values_Table
JOIN Users_Table ON ASSOCIATED_ID = ID
JOIN Flags_Table ON Flags_Table.Index = Users_Table.Index
WHERE (Value1 * some_variable + Value1) < 10 AND Flag1 = 0
ORDER BY New_Value
LIMIT 0,10
...filtered count:
SELECT ID, COUNT(TYPE) as Count, (Value1 * some_variable + Value2) as New_Value
FROM Values_Table
JOIN Users_Table ON ASSOCIATED_ID = ID
JOIN Flags_Table ON Flags_Table.Index = Users_Table.Index
WHERE (Value1 * some_variable + Value1) < 10 AND Flag1 = 0
GROUP BY TYPE
ORDER BY New_Value
LIMIT 0,10
That works fine but has to run the math multiple times for each row, and I get the nagging feeling that it is also running the math multiple times on the same row in the Values_table table. My thought was that I should just get only the valid responses from the Values_table first and then join those to the other tables to cut down on the processing; with how SQL optimizes things though I wasn't sure if it might not already be doing that. I know I could use a HAVING clause to only run the math once if I did it that way but I am uncertain how I would then best join things.
My questions are:
Can I avoid running that math twice and still make the query work
(or I suppose if there is a good way
to make the first one work as well
that would be great)
What is the fastest way to do this
as this is something that will
be running very often.
It seems like this should be painfully simple but I am just missing something stupid.
I contemplated pulling into a temp table then joining that table to itself but that seems like I would trade math for iterations against the table and still end up slow.
Thank you all for your help in this and please let me know if I need to clarify anything here!
** To clarify on a question, I can't use a 3rd column with the values pre-calculated because in reality the math is much more complex then addition, I just simplified it for illustration's sake.
Do you have a benchmark query to compare against? Usually it doesn't work to try to outsmart the optimizer. If you have acceptable performance from a starting query, then you can see where extra work is being expended (indicated by disk reads, cache consumption, etc.) and focus on that.
Avoid the temptation to break it into pieces and solve those. That's an antipattern. That includes temp tables especially.
Redundant math is usually ok - what hurts is disk activity. I've never seen a query that needed CPU work reduction on pure calculations.
Gather your results and put them in a temp table
SELECT * into TempTable FROM (SELECT INDEX, NAME, Type, ID, Flag1, (Value1 + Value2) as New_Value
FROM Values_Table
JOIN Users_Table ON ASSOCIATED_ID = ID
JOIN Flags_Table ON Flags_Table.Index = Users_Table.Index
WHERE New_Value < 10)
ORDER BY New_Value
LIMIT 0,10
Return Result for First Query
SELECT INDEX, NAME, Flag1, New_Value
FROM TempTable
Return Results for count of Types
Select ID, Count(Type)
FROM TempTable
GROUP BY TYPE
Is there any chance that you can add a third column to the values_table with the pre-calculated value? Even if the result of your calculation is dependent on other variables, you could run the calculation for the whole table but only when those variables change.