How 'Select * into' works into background? - sql-server-2014

Today I came through the weird issue for which I need your help. Basically, I am copying a table(temp_a) into table (temp_b) using below query.
select * into temp_b
from temp_a
where cast(date_from as date)>='2010-01-01'
and cast(date_from as date)<'2018-01-01'
Temp_a table Structure and sample data:
id int primary key,
name varchar not null,
date_from datetime,
update_time getdate()
Temp_A
ID name date_from update_time
-------------------------------------------------------
1 A 2010-01-01 2010-01-01
2 B 2011-02-02 2011-02-02
3 C 2012-02-02 2012-02-02
4 D 2013-09-09 2013-09-09
5 E 2014-08-06 2014-08-06
But above query results duplicate records in temp_b table.
Temp_B
ID name date_from update_time
-------------------------------------------------------
1 A 2010-01-01 2010-01-01
1 A 2010-01-01 2010-01-01
2 B 2011-02-02 2011-02-02
3 C 2012-02-02 2012-02-02
3 C 2012-02-02 2012-02-02
4 D 2013-09-09 2013-09-09
5 E 2014-08-06 2014-08-06
Can someone please elaborate what would be the reason for duplicate records in destination table when there is no duplicate in the source table.
Or
How "Select * into" works in the background?

Related

Find rows where ID matches and date is within X days

Somewhat new to SQL and I'm running into a bit of issue with a project. I have a table like this:
ID
subscription_ID
renewal_date
1
11
2022-01-01 00:00:00
2
11
2022-01-02 00:00:00
3
12
2022-01-01 00:00:00
4
12
2022-01-01 12:00:00
5
13
2022-01-01 12:00:00
6
13
2022-01-03 12:00:00
My goal is to return rows where the subscription_ID matches and the start_date is within or equal to a certain # of days (hours would work as well). For instance, I'd like rows where subscription_ID matches and the start_date is within or equal to 1 day such that my results from the table above would be:
ID
subscription_ID
renewal_date
1
11
2022-01-01 00:00:00
2
11
2022-01-02 00:00:00
3
12
2022-01-01 00:00:00
4
12
2022-01-01 12:00:00
Any assistance would be greatly appreciated--thanks!
If I understand correctly maybe you are trying something like:
select t.*
from test_tbl t
join ( SELECT subscription_id
, MAX(diff) max_diff
FROM
( SELECT x.subscription_id
, DATEDIFF(MIN(y.start_date),x.start_date) diff
FROM test_tbl x
JOIN test_tbl y ON y.subscription_id = x.subscription_id
AND y.start_date > x.start_date
GROUP BY x.subscription_id , x.start_date
) z
GROUP BY subscription_id
) as t1 on t.subscription_id=t1.subscription_id
where t1.max_diff<=1;
Result:
id subscription_id start_date
1 11 2022-01-01 00:00:00
2 11 2022-01-02 00:00:00
3 12 2022-01-01 00:00:00
4 12 2022-01-01 12:00:00
The subquery returns:
subscription_id max_diff
11 1
12 0
13 2
which is used on the where condition.
Demo

how do i get the correct id with the query results

I want to create a stored procedure in MySQL, but first, I want to get the query right. However, I keep getting the problem that I can't seem to get the correct id back from my query that correspond with the DateTime stamps that I get back.
this is the table I am trying to get the result from:
id EventId start end
1 1 2019-04-05 00:00:00 2019-04-07 00:00:00
2 2 2020-04-03 00:00:00 2020-04-03 00:00:00
3 3 2020-04-02 00:00:00 2020-04-02 00:00:00
7 1 2020-06-11 00:00:00 2020-06-11 00:00:00
9 2 2020-06-18 00:00:00 2020-06-18 00:00:00
10 3 2020-06-11 00:00:00 2020-06-11 00:00:00
11 3 2020-06-07 00:00:00 2020-06-07 00:00:00
query:
SELECT DISTINCT Eventid, MIN(start), id
from date_planning
WHERE `start` >= NOW()
GROUP BY Eventid
this gives me the following result
EventId Min(start) id
1 2020-06-11 00:00:00 3
2 2020-06-18 00:00:00 9
3 2020-06-07 00:00:00 10
but these are the correct ids that belong to those DateTimes:
EventId Min(start) id
1 2020-06-11 00:00:00 7
2 2020-06-18 00:00:00 9
3 2020-06-07 00:00:00 11
You want the row with the minimum "future" date for each eventId. To solve this greatest-n-per-group problem, you need to filter rather than aggregate. Here is one option using a correlated subquery:
select dt.*
from date_planning dt
where dt.start = (
select min(dt1.start)
from date_planning dt1
where dt1.eventId = dt.eventId and dt1.start >= now()
)
For performance, you need an index on (eventId, start).

SQL - group by column 1, order by column 2

Here's my situation, I have two tables named people and contacts respectively
id name
1 dev one
2 dev two
3 dev three
4 dev five
5 dev four
id person_id code_name updated_at
1 1 base1 2019-12-18 00:00:01
2 3 base2 2019-12-18 00:00:02
3 2 home 2019-12-18 00:00:03
4 2 home2 2019-12-18 00:00:04
5 3 work 2019-12-18 00:00:05
6 4 work 2019-12-18 00:00:06
7 5 base 2019-12-18 00:00:07
8 4 base2 2019-12-18 00:00:08
9 2 base 2019-12-18 00:00:09
10 5 work 2019-12-18 00:00:10
And I'm trying to get a result from contacts where its ordered by most recent updated_at and grouped(note: not exactly the sql "group by") by person_id, that looks similar to the following result.
id person_id code_name updated_at
10 5 work 2019-12-18 00:00:10
7 5 base 2019-12-18 00:00:07
9 2 base 2019-12-18 00:00:09
4 2 home2 2019-12-18 00:00:04
3 2 home 2019-12-18 00:00:03
8 4 base2 2019-12-18 00:00:08
6 4 work 2019-12-18 00:00:06
5 3 work 2019-12-18 00:00:05
2 3 base2 2019-12-18 00:00:02
1 1 base1 2019-12-18 00:00:01
Currently I'm ordering the contacts table by person_id desc and updated_at desc and results to a bit close of what I expected but not exactly correct.
See results when doing ORDER BY person_id DESC, updated_at DESC https://monosnap.com/file/xN0cuZAu2x2df4Q5qNDksKq5P3sEjU contact with id => 1 should be at the top of the result set since it's the most recent updated of them all.
Note: PostgreSQL is my first use case on this case but it's nice to know also for MySQL if there is any difference.
I have tried the following in the PostgreSQL 9.3.
Data Sample:
create table contact
(
id int,
person_id int,
code_name varchar(20),
updated_at timestamp
);
INSERT INTO contact VALUES
(1,1,'base1','2019-12-18 00:00:01'),
(2,3,'base2','2019-12-18 00:00:02'),
(3,2,'home','2019-12-18 00:00:03'),
(4,2,'home2','2019-12-18 00:00:04'),
(5,3,'work','2019-12-18 00:00:05'),
(6,4,'work','2019-12-18 00:00:06'),
(7,5,'base','2019-12-18 00:00:07'),
(8,4,'base2','2019-12-18 00:00:08'),
(9,2,'base','2019-12-18 00:00:09'),
(10,5,'work','2019-12-18 00:00:10');
Query:
DROP TABLE IF EXISTS TEMP_Stage_Table;
SELECT string_agg(id::text,',' order by updated_at desc) id,
person_id,
string_agg(code_name,',' order by updated_at desc) code_name,
string_agg(updated_at::text,',' order by updated_at desc) updated_at INTO TEMP_Stage_Table
FROM contact
GROUP BY person_id
ORDER BY MAX(updated_at) DESC;
SELECT regexp_split_to_table(t.id, E',') AS id,
t.person_id,
regexp_split_to_table(t.code_name, E',') AS code_name,
regexp_split_to_table(t.updated_at, E',') AS updated_at
FROM TEMP_Stage_Table t;
Output:
(MySQL/MariaDB syntax)
This will find the "ordering" for each "group of rows" for a person, correct?
SELECT MAX(updated_at), person_id
FROM tbl GROUP BY person_id ;
So, let's make use of that thus:
SELECT y.*
FROM (SELECT MAX(updated_at) AS latest, person_id
FROM tbl GROUP BY person_id ) AS x
JOIN tbl AS y USING(person_id)
ORDER BY x.latest DESC, y.updated_at DESC;

Select the 2 latest records from table

I have data like in this mysql table:
id customer_id date price
1 A 2014-01-01 4
2 A 2014-02-01 3
3 B 2014-03-01 2.5
4 B 2014-04-01 1
5 B 2014-05-01 5
6 C 2014-06-01 2
7 D 2014-07-01 2
8 D 2014-08-01 2.5
9 D 2014-09-01 1
I want to get the latest two dates for customer_id A, B and D. My result should be like this:
id customer_id date price
1 A 2014-01-01 4
2 A 2014-02-01 3
4 B 2014-04-01 1
5 B 2014-05-01 5
8 D 2014-08-01 2.5
9 D 2014-09-01 1
Any help is greatly appreciated.
One possible way :
SELECT *
FROM test s
WHERE (
SELECT COUNT(*)
FROM test f
WHERE f.customer_id = s.customer_id AND
f.`date` >= s.`date`
) <= 2
AND customer_id in('A','B','D');
[SQL Fiddle demo]
Try like this
select * from table where customer_id in('A','B','D') order by date desc limit 2

SQL, multiple email adresses problem

I've a user table (MySQL) with the following data
id email creation_date
1 bob#mail.com 2011-08-01 09:00:00
2 bob#mail.com 2011-06-24 02:00:00
3 john#mail.com 2011-02-01 04:00:00
4 john#mail.com 2011-08-05 20:30:00
5 john#mail.com 2011-08-05 23:00:00
6 jill#mail.com 2011-08-01 00:00:00
As you can see we allow email duplicates so its possible to register several accounts with the same email address.
Now I need to select all adresses ordered by the creation_date but no duplicates. This is easy (i think)
SELECT * FROM (SELECT * FROM users ORDER BY creation_date) AS X GROUP BY email
Expected result:
id email creation_date
2 bob#mail.com 2011-06-24 02:00:00
6 jill#mail.com 2011-08-01 00:00:00
3 john#mail.com 2011-02-01 04:00:00
But then I also need to select all other adresses, ie. all that are not present in the result from the first query. Duplicate are allowed here.
Expected result:
id email creation_date
1 bob#mail.com 2011-08-01 09:00:00
4 john#mail.com 2011-08-05 20:30:00
5 john#mail.com 2011-08-05 23:00:00
Any ideas? Perfomance is important because the real database is very huge
SELECT * FROM a
FROM users a
LEFT JOIN (SELECT email, MIN(creation_date) as min_date GROUP BY email)x ON
(x.email = a.email AND x.min_date=a.creation_date)
WHERE x.email IS NULL
In SQL server we would do a Select statement using a rank.
Here are some MYSQL samples:
How to perform grouped ranking in MySQL
http://thinkdiff.net/mysql/how-to-get-rank-using-mysql-query/
I hope this helps.