MySQL select from custom set and compare with table data - mysql

Hi I'm trying to solve which elements doesn't exists in my database. In order to do so I want to compare list of integers (output from external script) with data in table. How to do such thing like:
SELECT * FROM (1,1,2,3,5,8,13...) l WHERE l NOT IN (select id from table1);

This is probably best done with a left outer join. But, your problem is creating the table of constants:
SELECT *
FROM (select 1 as id union all select 2 union all select 3 union all select 5 union all
select 8 union all select 13 union all select 21 . . .
) ids
where ids.id NOT IN (select id from table1);
This can have odd behavior, if table1.id is ever NULL. The following works more generally:
SELECT *
FROM (select 1 as id union all select 2 union all select 3 union all select 5 union all
select 8 union all select 13 union all select 21 . . .
) ids left outer join
table1 t1
on ids.id = t1.id
where t1.id is null;
EDIT:
The size of a MySQL query is dictated by the parameter max_packet_size (see here). The most recent version has a limit of 1 Gbyte. You should be able to fit 18,000 rows of:
select <n> union all
into that limit, quite easily. Gosh, I don't even think it would be 1 megabyte. I would say, though, that passing a list of 18,000 ids through the application seems inefficient. It would be nice if one database could just pull the data from the other database, without going through the application.

If your set to compare is huge I'd recommend you to create a temporary table myids with the only column id, put there all your 18K values and run query like that:
select id from myids where myids.id not in (select id from table1);

Related

Redshift nested json extraction

I have a table with two columns, one column named user, one json column named js that looks like this:
{"1":{"partner_id":54,"provider_id":13},
"2":{"partner_id":56,"provider_id":8},
"3":{"partner_id":2719,"provider_id":274}}
I want to select all 'provider_id' in one column/row.So it should look like this:
user| provider_ids
0001| 13,8,274
0002| 21,36,57,12
How can I do this? Thanks in advance!
Your provided json format is not so easy to work with.
Crated table for test purposes:
create table json_test as
select '0001' as usr, '{"1":{"partner_id":54,"provider_id":13},
"2":{"partner_id":56,"provider_id":8},
"3":{"partner_id":2719,"provider_id":274}}'
as json_text
union all
select '0002' as usr, '{"1":{"partner_id":54,"provider_id":21},
"2":{"partner_id":56,"provider_id":36},
"2":{"partner_id":56,"provider_id":57},
"3":{"partner_id":2719,"provider_id":12}}'
as json_text;
Query to return results:
with NS AS (
select 1 as n union all
select 2 union all
select 3 union all
select 4 union all
select 5 union all
select 6 union all
select 7 union all
select 8 union all
select 9 union all
select 10
)
select usr,
listagg(trim(TRIM(split_part(SPLIT_PART(js.json_text, '},', NS.n),'"provider_id":',2)),'}'),',') within group(order by null) AS t
from NS
join json_test js ON true and NS.n <= REGEXP_COUNT(js.json_text, '\\},') + 1
group by usr;
Notes:
1) do not name column "user" as it is reserved keyword
2) add as many dummy rows in NS subquery as there is maximum of json provider records
3) Yes, I know, this isn't very readable SQL :D

Search for the existance of many objects in a database

Say I have a database with 5 million users, with the columns
id (unsigned int, auto-increment), facebook_id (unsigned int), and name (varchar)
In a program, I have a list of a variable amount of users from a person's facebook friend list (generally ranging from 500-1200 different facebook ids).
What's the most efficient way to send a query to my database that returns the facebook_id's of all of the users where that same facebook_id exists in the database?
Pseudo-code:
$friends = array(12345, 22345, 32345, 42345, 52345, ... ~1000 more);
$q = mysql_query("SELECT * FROM users ...");
$friendsAlreadyUsingApp = parseQuery($q);
This is a topic of almost an endless number of articles, blogs, Q&As etc; and the essence of this problem is that it looks really simple - but isn't.
The heart of the problem is that the parameters looks like it should work using WHERE field IN() BUT it does not do that because the parameter is a single string that just happens to have lots of commas in it.
So, when that parameter is passed to SQL it is necessary to process that single string into multiple parts so that the field can be compared to each part. This is where it gets a little complex as not all database types have all the same features to handle this. MySQL for example does not have a table variable that MS SQL Server provides.
So. A simple method, for MySQL is this:
SET #param := '105,110,125,135,145,155,165,175,185,195,205';
SELECT
*
FROM Users
WHERE FIND_IN_SET(facebook_id, #param) > 0
;
FIND_IN_SET Return the index position of the first argument
within the second argument
Just how well this scales in your database I cannot tell, it might not be acceptable for parameters containing 1000+ id's.
So if text processing like FIND_IN_SET is too slow, then each id needs to be broken out from the parameter and inserted into a table. That way the resulting table can be used through an INNER JOIN to filter the users; but this requires a table and inserts which take time, and there may be concurrency issues if more than one user is attempting to use that table at the same time.
Using the following sets-up a table of 10,000 integers (1 to 10,000)
/* Create a table called Numbers */
CREATE TABLE `Numbers`
(
`Number` int PRIMARY KEY
);
/* use cross joins to create 10,000 integers from 1 & store into table */
INSERT INTO Numbers (Number)
select 1 + (a.a + (10 * b.a) + (100 * c.a) + (1000 * d.a)) as N
from (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as a
cross join (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as b
cross join (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as c
cross join (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as d
;
This "utility table" can then be used to divide a comma separated parameter into a derived table of the individual integers, and this then used in an INNER JOIN to your users table will provide the wanted result.
SET #param := '105,110,125,135,145,155,165,175,185,195,205';
SET #delimit := ',';
SELECT
users.id
, users.facebook_id
, users.name
FROM users
INNER JOIN (
SELECT
CAST(SUBSTRING(iq.param, n.number + 1, LOCATE(#delimit, iq.param, n.number + 1) - n.number - 1) AS UNSIGNED INTEGER) AS itemID
FROM (
SELECT
concat(#delimit, #param, #delimit) AS param
) AS iq
INNER JOIN Numbers n
ON n.Number < LENGTH(iq.param)
WHERE SUBSTRING(iq.param, n.number, 1) = #delimit
) AS derived
ON users.facebook_id = derived.itemID
;
This query can be used as the basis for a stored procedure which might be easier for you to call from PHP.
See this SQLFiddle demo

SQL query return what's NOT in table [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
SQL: find missing IDs in a table
getting values which dont exist in mysql table
Just wondering, is it possible to have a query that somehow tells you the values it did not find in a table?
So if I had a query SELECT * FROM mytable WHERE id IN (1,2,3,4,5,6,7,8,9) and only 2,3,6,7,9 was returned. I wouldd like to know that 1,4,5,8 were not found.
It will be a little hard to do a manual comparision, because this is going to be run over apx 2,000+ rows in a table (the id's are going to be provided via a csv file which can be copied into the query)
Thanks in advance
This is probably silly, but what about creating a temporary table containing all your IDs from which you'll substract the result of your SELECT query ?
Untested, but in theory:
Table 1:
+----+-----+
| id | num |
+----+-----+
Table 2:
+----+
| id |
+----+
Table 1 contains the data you're looking for (and num is any field containing any data)
Table 2 contains the IDs from the CSV
SQL:
SELECT COUNT(`Table1`.`num`) AS `count`
FROM `Table1`
LEFT JOIN `Table2` ON `Table1`.`id` = `Table2`.`id`
WHERE `count` = 0
Quick solution, open your csv file, replace all comma's with " union select " put select in front of that line and use it as the first line of the query at the bottom query.
So 1,2,3 becomes
Select 1 union select 2 union select 3
Use this in the query below
Select 1 union select 2 union select x -- replace this line with the line generated from your csv
Except
(
Select id from mytable
)
What about:
SELECT *
FROM (select 1 as f
UNION
SELECT 2 as f
UNION
SELECT 3 as f
UNION
SELECT 4 as f
UNION
SELECT 5 as f
UNION
SELECT 6 as f
UNION
SELECT 7 as f
UNION
SELECT 8 as f
UNION
SELECT 9 ) as s1
WHERE f NOT IN (SELECT id FROM mytable);

Inserting duplicate records based on a value without using cursor

I had a problem in database. I have to insert duplicate records of a particular record on a another table based on a value.
First i used cursor to fetch each records and get the number of duplication i wants and after that used another cursor for duplication. Everything worked fine. But if the records in more than 500, i went dead slow. Then i did some research and found a way to insert without cursor.
INSERT INTO report(id, Name)
SELECT i.id,i.Name FROM (SELECT 1 AS id
UNION SELECT 2
UNION SELECT 3
UNION SELECT 4
UNION SELECT 5
UNION SELECT 6
UNION SELECT 7
UNION SELECT 8
UNION SELECT 9
UNION SELECT 10) AS o
INNER JOIN table i WHERE o.id<=i.frequence;
where frequence is the number of duplication. Please drop your idea to improve your query.
You could try creating a table with a record for each value from 1 to 10 and then join to that. I'm not sure it would be any faster though. You would have to experiment with it.
In this example the table with the values from 1 to 10 is called "dup" and the field containing these values is called "id".
INSERT INTO report(id, Name)
SELECT i.id, i.Name
FROM table i
JOIN dup d
ON d.id <= i.frequence;
If you have any table that contains a row number that goes at least as high as the maximum frequence, you could to this:
INSERT INTO report(id, Name)
SELECT i.id,i.Name FROM table i
inner join (
select distinct some_row_number_column from some_table
) o on o.some_row_number_column <= i.frequence;
This is basically the same as what you were doing, but it avoids the messy union all statements.
Or you could make a cursor that inserts numbers from 1 to the maximum frequence into a temporary table, then use that in your join. Or you could use a row numbering variable to generate the necessary sequence. Basically, do anything that will generate a list of consecutive numbers from 1 to the maximum that you need.
I would normally use recursion for this (DB2 syntax):
INSERT INTO report(id, Name)
with num_list (num) as (
values (1)
union all
select num + 1 from num_list
where num < (select max(frequence) from table)
)
SELECT i.id,i.Name FROM table i
inner join num_list on num_list.num <= i.frequence;
However, MySQL doesn't support recursion, apparently.

Changing a Query with a numbered result set (with gaps,) to return result with no gaps, containing every number.

I have a select statement: select a, b, [...]; which returns the results:
a|b
---------
1|8688798
2|355744
4|457437
7|27834
I want it to return:
a|b
---------
1|8688798
2|355744
3|0
4|457437
5|0
6|0
7|27834
An example query that does not do what I would like, since it does not have the gap numbers:
select
sub.num_of_ratings,
count(sub.rater)
from
(
select
r.rater_id as rater,
count(r.id) as num_of_ratings
from ratings r
group by rater
) as sub
group by num_of_ratings;
Explanation of the query:
If a user rates another user, the rating is listed in the table ratings and the id of the rating user is kept in the field rater_id. Effectively I check for all users who are referred to in ratings and count how many ratings records I find for that user, which is rater / num_of_ratings, and then I use this result to find how many users have rated a given number of times.
At the end I know how many users rated once, how many users rated twice, etc. My problem is that the numbers for count(sub.rater) start fine from 1,2,3,4,5... However, for bigger numbers there are gaps. This is because there might be one user who rated 1028 times - but no user who rated 1027 times.
I don't want to apply stored procedures looping over the result or something like that. Is it possible to fill those gaps in the result without using stored procedures, looping, or creating temporary tables?
If you have a sequence of numbers, then you can do a JOIN with that table and fill in the gaps properly.
You can check out this questions on how to get the sequence:
generate an integer sequence in MySQL
Here is one of the answers posted that might be easily used with the limitation that generates numbers from 1 to 10,000:
SELECT #row := #row + 1 as row FROM
(select 0 union all select 1 union all select 3 union all select 4 union all select 5 union all select 6 union all select 6 union all select 7 union all select 8 union all select 9) t,
(select 0 union all select 1 union all select 3 union all select 4 union all select 5 union all select 6 union all select 6 union all select 7 union all select 8 union all select 9) t2,
(select 0 union all select 1 union all select 3 union all select 4 union all select 5 union all select 6 union all select 6 union all select 7 union all select 8 union all select 9) t3,
(select 0 union all select 1 union all select 3 union all select 4 union all select 5 union all select 6 union all select 6 union all select 7 union all select 8 union all select 9) t4,
(SELECT #row:=0) t5
Using a sequence of numbers, you can join your result set. For instance, assuming your number list is in a table called numbersList, with column number:
Select number, Count
from
numbersList left outer join
(select
sub.num_of_ratings,
count(sub.rater) as Count
from
(
select
r.rater_id as rater,
count(r.id) as num_of_ratings
from ratings r
group by rater
) as sub
group by num_of_ratings) as num
on num.num_of_ratings=numbersList.number
where numbersList.number<max(num.num_of_ratings)
Your numbers list must be larger than your largest value, obviously, and the restriction will allow it to not have all numbers up to the maximum. (If MySQL does not allow that type of where clause, you can either leave the where clause out to list all numbers up to the maximum, or modify the query in various ways to achieve the same result.)
#mazzucci: the query is too magical and you are not actually explaining the query.
#David: I cannot create a table for that purpose (as stated in the question)
Basically what I need is a select that returns a gap-less list of numbers. Then I can left join on that result set and treat NULL as 0.
What I need is an arbitrary table that keeps more records than the length of the final list. I use the table user for that in the following example:
select #row := #row + 1 as index
from (select #row := -1) r, users u
limit 101;
This query returns a set of the numbers von 0 to 100. Using it as a subquery in a left join finally fills the gap.
users is just a dummy to keep the relational engine going and hence producing the numbers incrementally.
select t1.index as a, ifnull(t2.b, 0) as b
from (
select #row := #row + 1 as index
from (select #row := 0) r, users u
limit 7
) as t1
left join (
select a, b [...]
) as t2
on t1.index = t2.a;
I didn't try this very query live, so have merci with me if there is a little flaw. but technically it works. you get my point.
EDIT:
just used this concept to gain a gapless list of dates to left join measures onto it:
select #date := date_add(#date, interval 1 day) as date
from (select #date := '2010-10-14') d, users u
limit 700
starts from 2010/10/15 and iterates 699 more days.