Complex - returning information that is not found in the database - mysql

I have a strange query to perform from a website. I have sets of arrays that contain pertinent ids from a many tables - 1 table per array. For example (the array name is the name of the table):
Array Set 1:
array "q": 1,2,3
array "u": 1,5
array "k": 7
Array Set 2:
array "t": 2,12
array "o": 8, 25
Array Set 3 (not really a set):
array "e": 5
I have another table, Alignment, which is not represented by the arrays. It performs a one to many relationship, allowing records from tables q,u, and k (array set 1, and recorded as relType/relID in the table) to be linked to records from t and o (array set 2, recorded as keyType/keyID) and e (array set 3, recorded as keyType/keyID). Example below:
Table: Alignment
id keyType keyID relType relID
1 e 5 q 1
2 o 8 q 1
3 o 8 u 1
4 t 2 q 2
5 t 2 k 7
6 t 12 q 1
So, in record 6, a record with an id of 12 from table t is being linked to a record with an id of 1 from table q.
I have to find missing links. The ideal state is that each of the ids from array set 1 have a record in the alignment table linking them to at least 1 record from array set 2. In the example, alignment record 1 does not count towards this goal, because it aligns a set 1 id to a set 3 id (instead of set 2).
Scanning the table, you can quickly see that there are some missing ids from array set 1: "q"-3 and "u"-5.
I've been doing this with script, by looping through each set 1 array and looking for a corresponding record, which generates a whole bunch of sql calls and really kills any page that calls this function.
Is there some way I could accomplish this in a single sql statement?
What would I like the results to look like (ideally):
recordset (consisting magically of data that didn't exist in the table):
relType | relID
q 3
u 5
However, I would be elated with even a binary type answer from the database - were all the proper ids found: true or false? (Though the missing records array is required for other functions, but at least I'd be able to choose between the fast and slow options).
Oh, MySQL 5.1.
User Damp gave me an excellent answer using a temporary table, a join, and an IS NULL statement. But it was before I added in the wrinkle that there was a third array set that needed to be excluded from the results, which also ruins the IS NULL part. I edited his sql statement to look like this:
SELECT *
FROM k2
LEFT JOIN alignment
USING ( relType, relID )
HAVING alignment.keyType IS NULL
OR alignment.keyType = "e"
I've also tried it with a Group By relID (i always thought that was a requirement of the HAVING clause). The problem is that my result set includes "q"-1, which is linked to all three types of records ("o","t", and "e"). I need this result excluded, but I'm not sure how.
Here's the sql I ended up with:
SELECT *
FROM k2
LEFT JOIN (
SELECT *
FROM alignment
WHERE keyType != 'e' and
(
(relType = 'q' AND relID IN ( 1, 2, 3 ))
OR
(relType = 'u' AND relID IN ( 1, 5 ))
OR
(relType = 'k' AND relID IN ( 7 ))
)
)A
USING ( relType, relID )
HAVING keyType Is Null
I have to dump the values for the IN qualifiers with script. The key was not to join to the alignment table directly.

You can try to go this route:
DROP TABLE IF EXISTS k2;
CREATE TEMPORARY TABLE k2 (relType varchar(10),relId int);
INSERT INTO k2 VALUES
('q',1),
('q',2),
('q',3),
('u',1),
('u',5),
('k',7);
SELECT * FROM k2
LEFT JOIN Alignment USING(relType,relId)
HAVING Alignment.keyType IS NULL
This should work well for small tables. Not sure about very large ones though...
EDIT
If you wanted to add a WHERE statement the query would be as follow
SELECT * FROM k2
LEFT JOIN Alignment USING(relType,relId)
WHERE Alignment.keyType != 'e'
HAVING Alignment.keyType IS NULL

Related

How to get the first record of each type in sequence?

Table Data:
ID
Type
1
A
2
A
3
B
4
A
5
A
6
B
7
B
8
A
9
A
10
A
How to get only rows with IDs 1,3,4,6,8, or the first records on type-change by single query?
We were doing this in code using multiple queries and extensive processing especially for large data, is there a way to do this in a single query?
Use LAG() window function to get for every row the previous row's type and compare it to the current type.
Create a flag column that is true if the 2 types are different and use it to filter the table:
WITH cte AS (
SELECT *, type <> LAG(type, 1, '') OVER (ORDER BY id) flag
FROM tablename
)
SELECT * FROM cte WHERE flag;
I assume that the column type does not contain empty values (nulls or
empty strings).
See the demo.

How to properly format overlapping mySQL IN and NOT IN conditions

I have the following mySQL table:
data
1
2
3
4
5
6
7
8
9
I would like to supply my select statement with two seperate lists
Exculde List:
1,4,5,7
Include List:
1,2,3,4,5,6,7
I tried the following statement:
Select * FROM table WHERE data NOT IN ('1,4,5,7') AND data IN ('1,2,3,4,5,6,7)
Expecting the following output:
data
2
3
6
But I received no results. I realize I passed an impossible condition but I don't know how to format my query to return the expected results.
Can anyone tell me what I'm doing wrong here?
IN takes a list of values, not a string that holds a delimited list of values.
Examples:
x IN (1, 2, 3)
x IN ('a', 'b', 'c')
Use IN (1,2,3) and not IN ('1,2,3') as the former compares to individual values 1, 2 and 3 while the latter is against the literal string 1,2,3.
Select * FROM ( (Select * FROM table WHERE data NOT IN ('1,4,5,7') ) AS table WHERE data IN ('1,2,3,4,5,6,7)
you try againt

MySQL Query with LEFT JOIN where second table has a 2-Part Primary Key

I have 2 tables in a MySQL database (storeskus). The first is FBM_Orders and the second is IM_INV.
I am trying the query
SELECT `FBM_Orders`.`order-id`,`FBM_Orders`.`order-item-id`,`FBM_Orders`.`purchase-date`,
`FBM_Orders`.`promise-date`,`FBM_Orders`.`buyer-name`,`FBM_Orders`.`sku`,
`FBM_Orders`.`product-name`,`FBM_Orders`.`quantity-purchased`,
`FBM_Orders`.`recipient-name`,`IM_INV`.`LOC_ID`,`IM_INV`.`QTY_ON_HND`
FROM `FBM_Orders`
LEFT JOIN `IM_INV` ON `FBM_Orders`.`sku` = `IM_INV`.`ITEM_NO`
WHERE `FBM_Orders`.`quantity-to-ship` > 0
ORDER BY `FBM_Orders`.`purchase-date`, `IM_INV`.`LOC_ID` ASC;
Because the IM_INV table has a 2-part primary key: ITEM_NO & LOC_ID, I am getting 4 lines for each ITEM_NO with the QTY_ON_HND for each of the 4 locations (LOC_ID).
I am fairly new to SQL so I'm thrilled to have gotten this far, but how can I make it so that the result is a single line per ITEM_NO but with a column for each LOC_ID with its QTY_ON_HND?
Example:
My current result is
FBM_Order.sku FBM_Order.quantity-purchased IM_INV.LOC_ID QTY_ON_HND
'SCHO645256' 1 AF 2
'SCHO645256' 1 LO 2
'SCHO645256' 1 S 3
'SCHO645256' 1 SL 1
How can I change that to
FBM_Order.sku FBM_Order.quantity-purchased QTY_ON_HND_AF QTY_ON_HND_LO QTY_ON_HND_S QTY_ON_HND_SL
'SCHO645256' 1 2 2 3 1
?
Thanks!
You may load it as you already do and treat it inside your application, but if you really wanna make that inside your MySQL, try GROUP CONCAT and JSON as follows:
SELECT
GROUP_CONCAT(JSON_OBJECT(
'LOC_ID', IM_INV.LOC_ID,
'QTY_ON_HND', QTY_ON_HND
))
{another fields}
FROM `FBM_Orders`
LEFT JOIN `IM_INV` ON `FBM_Orders`.`sku` = `IM_INV`.`ITEM_NO`
WHERE `FBM_Orders`.`quantity-to-ship` > 0
GROUP BY `FBM_Orders`.`order-id`;
Note: JSON is just available for MySQL 5.7+ and may slow down your query a little bit. You're still gonna need convert your data to array inside your application. So it's half done inside your app and half inside your database.

How to Find First Valid Row in SQL Based on Difference of Column Values

I am trying to find a reliable query which returns the first instance of an acceptable insert range.
Research:
some of the below links adress similar questions, but I could get none of them to work for me.
Find first available date, given a date range in SQL
Find closest date in SQL Server
MySQL difference between two rows of a SELECT Statement
How to find a gap in range in SQL
and more...
Objective Query Function:
InsertRange(1) = (StartRange(i) - EndRange(i-1)) > NewValue
Where InsertRange(1) is the value the query should return. In other words, this would be the first instance where the above condition is satisfied.
Table Structure:
Primary Key: StartRange
StartRange(i-1) < StartRange(i)
StartRange(i-1) + EndRange(i-1) < StartRange(i)
Example Dataset
Below is an example User table (3 columns), with a set range distribution. StartRanges are always ordered in a strictly ascending way, UserID are arbitrary strings, only the sequences of StartRange and EndRange matters:
StartRange EndRange UserID
312 6896 user0
7134 16268 user1
16877 22451 user2
23137 25142 user3
25955 28272 user4
28313 35172 user5
35593 38007 user6
38319 38495 user7
38565 45200 user8
46136 48007 user9
My current Query
I am trying to use this query at the moment:
SELECT t2.StartRange, t2.EndRange
FROM user AS t1, user AS t2
WHERE (t1.StartRange - t2.StartRange+1) > NewValue
ORDER BY t1.EndRange
LIMIT 1
Example Case
Given the table, if NewValue = 800, then the returned answer should be 23137. This means, the first available slot would be between user3 and user4 (with an actual slot size = 813):
InsertRange(1) = (StartRange(i) - EndRange(i-1)) > NewValue
InsertRange = (StartRange(6) - EndRange(5)) > NewValue
23137 = 25955 - 25142 > 800
More Comments
My query above seemed to be working for the special case where StartRanges where tightly packed (i.e. StartRange(i) = StartRange(i-1) + EndRange(i-1) + 1). This no longer works with a less tightly packed set of StartRanges
Keep in mind that SQL tables have no implicit row order. It seems fair to order your table by StartRange value, though.
We can start to solve this by writing a query to obtain each row paired with the row preceding it. In MySQL, it's hard to do this beautifully because it lacks the row numbering function.
This works (http://sqlfiddle.com/#!9/4437c0/7/0). It may have nasty performance because it generates O(n^2) intermediate rows. There's no row for user0; it can't be paired with any preceding row because there is none.
select MAX(a.StartRange) SA, MAX(a.EndRange) EA,
b.StartRange SB, b.EndRange EB , b.UserID
from user a
join user b ON a.EndRange <= b.StartRange
group by b.StartRange, b.EndRange, b.UserID
Then, you can use that as a subquery, and apply your conditions, which are
gap >= 800
first matching row (lowest StartRange value) ORDER BY SB
just one LIMIT 1
Here's the query (http://sqlfiddle.com/#!9/4437c0/11/0)
SELECT SB-EA Gap,
EA+1 Beginning_of_gap, SB-1 Ending_of_gap,
UserId UserID_after_gap
FROM (
select MAX(a.StartRange) SA, MAX(a.EndRange) EA,
b.StartRange SB, b.EndRange EB , b.UserID
from user a
join user b ON a.EndRange <= b.StartRange
group by b.StartRange, b.EndRange, b.UserID
) pairs
WHERE SB-EA >= 800
ORDER BY SB
LIMIT 1
Notice that you may actually want the smallest matching gap instead of the first matching gap. That's called best fit, rather than first fit. To get that you use ORDER BY SB-EA instead.
Edit: There is another way to use MySQL to join adjacent rows, that doesn't have the O(n^2) performance issue. It involves employing user variables to simulate a row_number() function. The query involved is a hairball (that's a technical term). It's described in the third alternative of the answer to this question. How do I pair rows together in MYSQL?

mysql distribution of combinations/values

I have a mysql table which contains some random combination of numbers. For simplicity take the following table as example:
index|n1|n2|n3
1 1 2 3
2 4 10 32
3 3 10 4
4 35 1 2
5 27 1 3
etc
What I want to find out is the number of times a combination has occured in the table. For instance, how many times has the combination of 4 10 or 1 2 or 1 2 3 or 3 10 4 etc occured.
Do I have to create another table that contains all possible combinations and do comparison from there or is there another way to do this?
For a single combination, this is easy:
SELECT COUNT(*)
FROM my_table
WHERE n1 = 3 AND n2 = 10 AND n3 = 4
If you want to do this with multiple combinations, you could create a (temporary) table of them and join that table with you data, something like this:
CREATE TEMPORARY TABLE combinations (
id INTEGER NOT NULL AUTO_INCREMENT PRIMARY KEY,
n1 INTEGER, n2 INTEGER, n3 INTEGER
);
INSERT INTO combinations (n1, n2, n3) VALUES
(1, 2, NULL), (4, 10, NULL), (1, 2, 3), (3, 10, 4);
SELECT c.n1, c.n2, c.n3, COUNT(t.id) AS num
FROM combinations AS c
LEFT JOIN my_table AS t
ON (c.n1 = t.n1 OR c.n1 IS NULL)
AND (c.n2 = t.n2 OR c.n2 IS NULL)
AND (c.n3 = t.n3 OR c.n3 IS NULL)
GROUP BY c.id;
(demo on SQLize)
Note that this query as written is not very efficient due to the OR c.n? IS NULL clauses, which MySQL isn't smart enough to optimize. If all your combinations contain the same number of terms, you can leave those out, which will allow the query to make use of indexes on the data table.
Ps. With the query above, the combination (1, 2, NULL) won't match (35, 1, 2). However, (NULL, 1, 2) will, so, if you want both, a simple workaround would be to just include both patterns in your table of combinations.
If you actually have many more columns than shown in your example, and you want to match patterns that occur in any set of consecutive columns, then your really should pack your columns into a string and use a LIKE or REGEXP query. For example, if you concatenate all your data columns into a comma-separated string in a column named data, you could search it like this:
INSERT INTO combinations (pattern) VALUES
('1,2'), ('4,10'), ('1,2,3'), ('3,10,4'), ('7,8,9');
SELECT c.pattern, COUNT(t.id) AS num
FROM combinations AS c
LEFT JOIN my_table AS t
ON CONCAT(',', t.data, ',') LIKE CONCAT('%,', c.pattern, ',%')
GROUP BY c.id;
(demo on SQLize)
You could make this query somewhat faster by making the prefixes and suffixes added with CONCAT() part of the actual data in the tables, but this is still going to be a fairly inefficient query if you have a lot of data to search, because it cannot make use of indexes. If you need to do this kind of substring searching on large datasets efficiently, you may want to use something better suited for than specific purpose than MySQL.
You only have three columns in the table, so you are looking for combinations of 1, 2, and 3 elements.
For simplicity, I'll start with the following table:
select index, n1 as n from t union all
select index, n2 from t union all
select index, n3 from t union all
select distinct index, -1 from t union all
select distinct index, -2 from t
Let's call this "values". Now, we want to get all triples from this table for a given index. In this case, -1 and -2 represent NULL.
select (case when v1.n < 0 then NULL else v1.n end) as n1,
(case when v2.n < 0 then NULL else v2.n end) as n2,
(case when v3.n < 0 then NULL else v3.n end) as n3,
count(*) as NumOccurrences
from values v1 join
values v2
on v1.n < v2.n and v1.index = v2.index join
values v3
on v2.n < v3.n and v2.index = v3.index
This is using the join mechanism to generate the combinations.
This method finds all combinations regardless of ordering (so 1, 2, 3 is the same as 2, 3, 1). Also, this ignores duplicates, so it cannot find (1, 2, 2) if 2 is repeated twice.
SELECT
CONCAT(CAST(n1 AS VARCHAR(10)),'|',CAST(n2 AS VARCHAR(10)),'|',CAST(n3 AS VARCHAR(10))) AS Combination,
COUNT(CONCAT(CAST(n1 AS VARCHAR(10)),'|',CAST(n2 AS VARCHAR(10)),'|',CAST(n3 AS VARCHAR(10)))) AS Occurrences
FROM
MyTable
GROUP BY
CONCAT(CAST(n1 AS VARCHAR(10)),'|',CAST(n2 AS VARCHAR(10)),'|',CAST(n3 AS VARCHAR(10)))
This creates a single column that represents the combination of the values within the 3 columns by concatenating the values. It will count the occurrences of each.