Split the field value in a mysql JOIN query - mysql

I've got a column "code" which may have a string of multiple values e.g. "CODE1&CODE2"... I just need the first one for my JOIN ... kind of like code.split("&")[0]
SELECT myTable.*, otherTable.id AS theID
FROM myTable INNER JOIN otherTable
ON myTable.(+++ code before the & +++) = otherTable.code
The value in myTable may also just be CODE1

SUBSTRING_INDEX will do exactly what you want - return the substring of your column up to the specified character:
SELECT
myTable.*,
otherTable.id AS theID
FROM myTable
INNER JOIN otherTable
ON SUBSTRING_INDEX(myTable.code, '&', 1) = otherTable.code
More info at: http://dev.mysql.com/doc/refman/5.0/en/string-functions.html
And here's a fiddle demoing it: http://sqlfiddle.com/#!2/96a6e/2
Please note that this will be SLOW if you're joining many columns. You're not only eliminating the possibility of using an index, but performing a very slow string operation on every comparison. I wouldn't suggest using this on very large tables. If your data set is huge, you may want to consider rearchitecting your DB.

Related

How to use FIND_IN_SET using list of data

I have used FIND_IN_SET multiple times before but this case is a bit different.
Earlier I was searching a single value in the table like
SELECT * FROM tbl_name where find_in_set('1212121212', sku)
But now I have the list of SKUs which I want to search in the table. E.g
'3698520147','088586004490','868332000057','081308003405','088394000028','089541300893','0732511000148','009191711092','752830528161'
I have two columns in the table SKU LIKE 081308003405 and SKU Variation
In SKU column I am saving single value but in variation column I am saving the value in the comma-separated format LIKE 081308003405,088394000028,089541300893
SELECT * FROM tbl_name
WHERE 1
AND upc IN ('3698520147','088586004490','868332000057','081308003405','088394000028',
'089541300893','0732511000148','009191711092','752830528161')
I am using IN function to search UPC value now I want to search variation as well in the variation column. This is my concern is how to search using SKU list in variation column
For now, I have to check in the loop for UPC variation which is taking too much time. Below is the query
SELECT id FROM products
WHERE 1 AND upcVariation AND FIND_IN_SET('88076164444',upc_variation) > 0
First of all consider to store the data in a normalized way. Here is a good read: Is storing a delimited list in a database column really that bad?
Now - Assumng the following schema and data:
create table products (
id int auto_increment,
upc varchar(50),
upc_variation text,
primary key (id),
index (upc)
);
insert into products (upc, upc_variation) values
('01234', '01234,12345,23456'),
('56789', '45678,34567'),
('056789', '045678,034567');
We want to find products with variations '12345' and '34567'. The expected result is the 1st and the 2nd rows.
Normalized schema - many-to-many relation
Instead of storing the values in a comma separated list, create a new table, which maps product IDs with variations:
create table products_upc_variations (
product_id int,
upc_variation varchar(50),
primary key (product_id, upc_variation),
index (upc_variation, product_id)
);
insert into products_upc_variations (product_id, upc_variation) values
(1, '01234'),
(1, '12345'),
(1, '23456'),
(2, '45678'),
(2, '34567'),
(3, '045678'),
(3, '034567');
The select query would be:
select distinct p.*
from products p
join products_upc_variations v on v.product_id = p.id
where v.upc_variation in ('12345', '34567');
As you see - With a normalized schema the problem can be solved with a quite basic query. And we can effectively use indices.
"Exploiting" a FULLTEXT INDEX
With a FULLTEXT INDEX on (upc_variation) you can use:
select p.*
from products p
where match (upc_variation) against ('12345 34567');
This looks quite "pretty" and is probably efficient. But though it works for this example, I wouldn't feel comfortable with this solution, because I can't say exactly, when it doesn't work.
Using JSON_OVERLAPS()
Since MySQL 8.0.17 you can use JSON_OVERLAPS(). You should either store the values as a JSON array, or convert the list to JSON "on the fly":
select p.*
from products p
where json_overlaps(
'["12345","34567"]',
concat('["', replace(upc_variation, ',', '","'), '"]')
);
No index can be used for this. But neither can for FIND_IN_SET().
Using JSON_TABLE()
Since MySQL 8.0.4 you can use JSON_TABLE() to generate a normalized representation of the data "on the fly". Here again you would either store the data in a JSON array, or convert the list to JSON in the query:
select distinct p.*
from products p
join json_table(
concat('["', replace(p.upc_variation, ',', '","'), '"]'),
'$[*]' columns (upcv text path '$')
) v
where v.upcv in ('12345', '34567');
No index can be used here. And this is probably the slowest solution of all presented in this answer.
RLIKE / REGEXP
You can also use a regular expression:
select p.*
from products p
where p.upc_variation rlike '(^|,)(12345|34567)(,|$)'
See demo of all queries on dbfiddle.uk
You can try with below example:
SELECT * FROM TABLENAME
WHERE 1 AND ( FIND_IN_SET('3698520147', SKU)
OR UPC IN ('3698520147') )
I have a solution for you, you can consider this solution:
1: Create a temporary table example here: Sql Fiddle
select
tablename.id,
SUBSTRING_INDEX(SUBSTRING_INDEX(tablename.name, ',', numbers.n), ',', -1) sku_variation
from
numbers inner join tablename
on CHAR_LENGTH(tablename.sku_split)
-CHAR_LENGTH(REPLACE(tablename.sku_split, ',', ''))>=numbers.n-1
order by id, n
2: Use the temporary table to filter. find in set with your data
Performance considerations. The main thing that matters for performance is whether some index can be used. The complexity of the expression has only a minuscule impact on overall performance.
Step 1 is to learn what can be optimized, and in what way:
Equal: WHERE x = 1 -- can use index
IN/1: WHERE x IN (1) -- Turned into the Equal case by Optimizer
IN/many: WHERE x IN (22,33,44) -- Usually worse than Equal and better than "range"
Easy OR: WHERE (x = 22 OR x = 33) -- Turned into IN if possible
General OR: WHERE (sku = 22 OR upc = 33) -- not sargable (cf UNION)
Easy LIKE: WHERE x LIKE 'abc' -- turned into Equal
Range LIKE: WHERE x LIKE 'abc%' -- equivalent to "range" test
Wild LIKE: WHERE x LIKE '%abc%' -- not sargable
REGEXP: WHERE x RLIKE 'aaa|bbb|ccc' -- not sargable
FIND_IN_SET: WHERE FIND_IN_SET(x, '22,33,44') -- not sargable, even for single item
JSON: -- not sargable
FULLTEXT: WHERE MATCH(x) AGAINST('aaa bbb ccc') -- fast, but not equivalent
NOT: WHERE NOT ((any of the above)) -- usually poor performance
"Sargable" -- able to use index. Phrased differently "Hiding the column in a function call" prevents using an index.
FULLTEXT: There are many restrictions: "word-oriented", min word size, stopwords, etc. But it is very fast when it applies. Note: When used with outer tests, MATCH comes first (if possible), then further filtering will be done without the benefit of indexes, but on a smaller set of rows.
Even when an expression "can" use an index, it "may not". Whether a WHERE clause makes good use of an index is a much longer discussion than can be put here.
Step 2 Learn how to build composite indexes when you have multiple tests (WHERE ... AND ...):
When constructing a composite (multi-column) index, include columns in this order:
'Equal' -- any number of such columns.
'IN/many' column(s)
One range test (BETWEEN, <, etc)
(A couple of side notes.) The Optimizer is smart enough to clean up WHERE 1 AND .... But there are not many things that the Optimizer will handle. In particular, this is not sargable: `AND DATE(x) = '2020-02-20', but this does optimize as a "range":
AND x >= '2020-02-20'
AND x < '2020-02-20' + INTERVAL 1 DAY
Reading
Building indexes: http://mysql.rjweb.org/doc.php/index_cookbook_mysql
Sargable: https://en.wikipedia.org/wiki/Sargable
Tips on Many-to-many: http://mysql.rjweb.org/doc.php/index_cookbook_mysql#many_to_many_mapping_table
This depends on how you use it. In MySQL I found that find_in_set is way faster than using JSON when tested on the following commands, so much faster it wasn't even a competition (to be clear, the speed test did not include the set command line):
Fastest
set #ids = (select group_concat(`ID`) from `table`);
select count(*) from `table` where find_in_set(`ID`, #ids);
10 x slower
set #ids = (select json_arrayagg(`ID`) from `table`);
select count(*) from `table` where `ID` member of( #ids );
34 x slower
set #ids = (select json_arrayagg(`ID`) from `table`);
select count(*) from `table` where JSON_CONTAINS(#ids, convert(`ID`, char));
34 x slower
set #ids = (select json_arrayagg(`ID`) from `table`);
select count(*) from `table` where json_overlaps(#ids, json_array(`ID`));
SELECT * FROM tbl_name t1,(select
group_concat('3698520147',',','088586004490',',','868332000057',',',
'081308003405',',','088394000028',',','089541300893',',','0732511000148',',','009191711092',
',','752830528161') as skuid)t
WHERE FIND_IN_SET(t1.sku,t.skuid)>0

Split the values from ('1,2,3') to ('1','2','3') or (1,2,3) in sql

I have a table field named category_ids (text) which saves another table's ids as "1,2,3".
Now i want to use this category_ids to a sql IN() query. the query will be like tab1.category_id IN (select category_ids from tab2). but i'm facing issue as select category_ids from tab2 returns '1,2,3' so IN() query not working.
Is there any simple way to convert '1,2,3' to ('1','2','3') or (1,2,3) in sql?
You may use FIND_IN_SET here:
SELECT *
FROM tab1
WHERE FIND_IN_SET(tab1.category_id,
(select category_ids from tab2));
This is just a sample query, your actual one may differ. But the point is that if you want to search for '1' inside a CSV string '1,2,3', then there is a way to do it.
As others have already mentioned, you should avoid storing CSV data in your tables. When I see FIND_IN_SET being heavily used there is usually a smell.
try
tab1.category_id IN (select REPLACE(category_ids, '''', '') from tab2)
If table2 has more than one row, you can use exists with find_in_set():
select t1.*
from table1 t1
where exists (select 1
from table2 t2
where find_in_set(t1.category_id, t2.category_ids) > 0
);
Note that this is a very poor data structure. You should have a single row for each category id in table2. If you did, then the query would be simpler and have better performance.

Find out values which are not available in table

Let assume Following simple table
Col1
======
one
two
Let assume Following simple query
Select count(*) from TABLE_A where Col1 in ('one','two','three','four')
In above query it will produce following result
2
Now I want to find out what are the values in IN- condition which is not available in table_A.
How to find out that values which are not available in table?
like below result
three
four
Above queries only example. In my real time query in have 1000 values in IN-Condition.
Working Database : DB2
This is the one of the work around to achieve your expectation.
Instead of hard-coding the values in IN condition, you can move those values in to a table. If it done simply using LEFT JOIN with NULL check you can get the not matching values.
SELECT MR.Col1
FROM MatchingRecords MR -- here MatchingRecords table contains the IN condition values
LEFT JOIN Table_A TA ON TA.Col1 = MR.Col1
WHERE TA.Col1 IS NULL;
Working DEMO
If the values are to be listed in the statement string rather than stored in a table, then perhaps a revision to the syntax being used for that list of values currently being composed [apparently, from some other input than a TABLE] for the IN predicate can be effected? The following revised syntax for a list of values could be used both for the original aggregate query [shown immediately below as the first of two queries], and for the query for which the how-to-code is being asked [the second of the two queries below]:
Select count(*)
from TABLE_A
where Col1 in ( values('one'),('two'),('three'),('four') )
; -- report from above query follows:
COUNT ( * )
2
[Bgn-Edit 05-Aug-2016: adding this text and example just below]Apparently at least one DB2 variant balks at unnamed columns for the derived table, so the query just below names the column; I chose COL1, so as to match the name from the actual TABLE, but that should not be necessary. The (col1) is added to the original query that remains from the original pre-edit version; that version remains after this edit\insertion and is missing the (col1) added here:
select *
from ( values('one'),('two'),('three'),('four') ) as x (col1)
except ( select * from table_a )
; -- report from above query follows:
COL1
three
four
The following is the original query given, for which the comment below suggests a failure for an unnamed column when run on some unstated DB2 variant; I should have noted that this SQL query functions without error, on DB2 for i 7.1
[End-Edit 05-Aug-2016]
select *
from ( values('one'),('two'),('three'),('four') ) as x
except ( select * from table_a )
; -- report from above query follows:
VALUES
three
four

MySQL select all from list where value not in table

I cannot create a virtual table for this. Basically what I have, is a list of values:
'Succinylcholine','Thiamine','Trandate','Tridol Drip'
I want to know which of those values is not present in table1 and display them. Is this possible? I have tried using left joins and creating a variable with the list which I can compare to the table, but it returns the wrong results.
This is one of the things I have tried:
SET #list="'Amiodarone','Ammonia Inhalents','Aspirin';
SELECT #list FROM table1 where #list not in (
SELECT Description
FROM table1
);
With only narrow exceptions, you need to have data in table form to be able to obtain those data in your result set. This is the essential problem that all attempts at a solution to this problem run into, given that you cannot create a temporary table. If indeed you can provide the input in any form or format (per your comment), then you can provide it in the form of a subquery:
(
SELECT 'Amiodarone' AS description
UNION ALL
SELECT 'Ammonia Inhalents'
UNION ALL
SELECT 'Aspirin'
)
(Note that that exercises the biggest of the exceptions I noted: you can select scalars directly, without a base table. If you like, you can express that explicitly -- in MySQL and Oracle, at least -- by selecting FROM DUAL.)
In that case, this should work for you:
SELECT
a.description
FROM
(
SELECT 'Amiodarone' AS description
UNION ALL
SELECT 'Ammonia Inhalents'
UNION ALL
SELECT 'Aspirin'
) a
LEFT JOIN table1
ON a.description = table1.description
WHERE table1.description IS NULL
That won't work. the variable's contents will be treated as a monolithic string - one solid block of letters, not 3 separate comma-separated values. The query will be parsed/executed as:
SELECT ... WHERE "'Amio.....rin'" IN (x,y,z,...)
^--------------^--- string
Plus, since you're just doing a sub-select on the very same table, there's no point in this kind of a construct. You could try mysql find_in_set() function:
SELECT #list
FROM table1
WHERE FIND_IN_SET(Description, #list) <> ''

Count on DISTINCT of several fields work only on MySQL?

I need a Query that without any changes work on these three different database server : MySQL, MSSQL, PostgreSQL .
In this query i have to to calculate a column with the following expression that work correctly on MySQL :
COUNT(DISTINCT field_char,field_int,field_date) AS costumernum
The fields in the distinct are of different type :
field_char = character
field_int = integer
field_date = datetime
The expression is inside a parent query select, so if i try to achieve the result with a sub query approach, i stumble in this situation :
SELECT t0.description,t0.depnum
(select count(*) from (
select distinct f1, f2, f3 from salestable t1
where t1.depnum = t0.depnum
) a) AS numitems
FROM salestable t0
I get an error with this query, how can i get the value of the parent query ?
The expression work correctly on MySQL but i get an error when i try to execute it on Sql Server or PostgreSQL (the problem is that the count function doesn't accept 3 arguments of different type on MSSQL/PostgreSQL), is there a way to achieve the same result with an expression that work in each of these database server (SQL Server, MySQL, PostgreSQL ) ?
A general way to do this on any platform is as follows:
select count(*) from (
select distinct f1, f2, f3 from table
) a
Edit for new info:
What if you try joining to the distinct list (including the dept) and then doing the count? I created some test data and this seems to work. Make sure the COUNT is on one of the t1 columns - otherwise it will mistakenly return 1 instead of 0 when there are no corresponding entries in t1.
SELECT t0.description, t0.depnum, count(t1.depnum) as 'numitems'
FROM salestable t0
LEFT JOIN (select distinct f1,f2,f3,depnum from salestable) t1
ON t0.depnum = t1.depnum
GROUP BY
t0.description, t0.depnum
How about concatenating?
COUNT(DISTINCT field_char || '.' ||
cast(field_int as varchar) || '.' ||
cast(field_date as varchar)) AS costumernum
Warning: your concatenation operator may vary with RDBMS flavor.
Update
Apparently, the concatenation operator portability is question by itself:
String concatenation operator in Oracle, Postgres and SQL Server
I tried to help you with the distinct issue.