Could someone help me with this? I have couple of tables with some data. I need to query this table for the number of rows processed per day and load into another table:
Table1:
PNO ModelNo OrderNo CustID DAY
1 100012 1000AY 2345 31-AUG
2 109014 100YT8 3452 01-AUG
2 109014 100YT8 3452 31-AUG
Table2:
AN DAST CODE ROWS DAY
19 VEN EFD 19 31-AUG
21 EHT UYE 21 01-SEP
22 VEG WTE 24 01-SEP
Final Table:
DAY Source Rows
31-AUG Table1 2
01-SEP Table1 1
31-AUG Table2 1
01-SEP Table2 2
*Source: should be the table name.
Should I have to use Temp table or create a inner query concept and do it? Would like to know which is effecient. Please help.
Keep you updated: that all of these table are created under same schema..
As Shaharyar suggests, the reasons for actually creating such a table are probably questionable at best. However, the query to create the resulting table:
SELECT 'Table1' as Source, COUNT(*) as Rows, DAY FROM Table1 GROUP BY DAY
UNION
SELECT 'Table2' as Source, COUNT(*) as Rows, DAY FROM Table2 GROUP BY DAY
This doesn't scale very well for many tables. Though. Also, it would be preferable to add indexes on the DAY columns.
If you actually want to generate a manifested table, this might do the job:
CREATE TABLE final_table SELECT ...
Related
I would like to know the most efficient way to insert records from one table to another based on whether the record has changed. Along with the insertion, an update would also need to be performed.
Some key notes. The most recent record will have an endDate of 2100-12-31 to signify it is open-ended. The strtDate is a copy of theTimestamp. I am working with the Snowflake SQL environment. I am unable to use User-Defined Functions.
Suppose I have a Table1:
ID primKey1 primKey2 checkVar1 checkVar2 theTimestamp strtDate endDate
100 1 2 302.1 423.5 2001-07-13 2001-07-13 2100-12-31
101 3 6 506.4 236.7 2005-10-25 2005-10-25 2100-12-31
And I want to insert Table2:
ID primKey1 primKey2 checkVar1 checkVar2 theTimestamp
100 1 2 302.1 423.5 2001-10-31
101 3 6 767.9 236.7 2006-12-05
The variables I want to check on whether a record has changed is checkVar1 and checkVar2. In this scenario, the record for ID=100, did not change in the insertion table (Table2), so I don't want to insert this record. But, ID=101 did change, so I want to insert this record.
Here is how Table1 should now look:
ID primKey1 primKey2 checkVar1 checkVar2 theTimestamp strtDate endDate
100 1 2 302.1 423.5 2001-07-13 2001-07-13 2100-12-31
101 3 6 506.4 236.7 2005-10-25 2005-10-25 *2006-12-05*
101 3 6 767.9 236.7 2006-12-05 2006-12-05 2100-12-31
As you can see, the endDate for the old record has updated with the new record's theTimestamp. Then the new record is inserted as a continuation of the old record by taking on the 2100-12-31 endDate. So there needs to be both an UPDATE and an INSERTION at the same time.
My Method:
WITH newTable2Rows AS (
SELECT DISTINCT ID, primKey1, primKey2
FROM Table2
)
WITH maxTable1Rows AS (
SELECT A.ID, A.primKey1, A.primKey2, A.checkVar1, A.checkVar2, A.theTimestamp, A.strtDate, MAX(A.endDate)
FROM Table1 A
JOIN newTable2Rows B
ON A.ID = B.ID, A.primKey1 = B.primKey1, A.primKey2 = B.primKey
GROUP BY A.ID, A.primKey1, A.primKey2, A.checkVar1, A.checkVar2, A.theTimestamp, A.strtDate
)
INSERT INTO Table1 (
ID, primKey1, primKey2, checkVar1, checkVar2, theTimestamp, strtDate, endDate
)
SELECT
ID, primKey1, primKey2, checkVar1, checkVar2, theTimestamp, theTimestamp AS strtDate, '2100-12-31' AS endDate
FROM Table2
MINUS maxTable1Rows
There is a little bit of pseudo code at the end because I haven't completed it yet. But basically I wanted to subtract the max Table1 rows from Table2 so that the duplicate rows are deleted from Table2. This will leave me with unique updated rows from Table2. After this, I will still need to update the max rows from Table1 with '2100-12-31'.
The issue is that storing full rows into the maxTable1Rows table is very expensive. I am dealing with tables which contain 100gb+ of data. The datasets I work with contain over 28 million records and 200+ columns. So I am looking for a method which can perform the UPDATE and INSERT in the most efficient way. Any help would be greatly appreciated.
Isn't this just a simple use for the MERGE statement? Snowflake MERGE
The MERGE gives you full control to compare columns and do either inserts or updates based on your criteria.
Here is a table
id date name
1 180101 josh
2 180101 peter
3 180101 julia
4 180102 robert
5 180103 patrick
6 180104 josh
7 180104 adam
I need to get all the names whom having the same days as 'josh'. how can i achieve it without groupping the whole table together. i need to keep it efficient (this is not my real table, i just simplified my problem here, and i have hundred thousands of records, and 99% of the rows have different dates, so groupable rows by date is kind of rare).
So basicaly what i want is: if 'josh' is the target, i need to get 'josh,peter,julia,adam' (actually the first 10 distinct names sharing the same date with josh).
SELECT
COUNT(date) as datecount,
GROUP_CONCAT(DISTINCT name) as names,
FROM
table
GROUP BY
date
HAVING
datecount>1
// && name IN ('josh') would work nice for me, but im getting error because 'name' is not in GROUPED BY
LIMIT 10
Any idea ? As i mentioned it needs to be fast, and most of the rows have unique dates
Join the table with itself on date:
select distinct t1.name
from tbl t1
join tbl t2 using (date)
where t2.name = 'josh'
Demo
For the best performance you would have indexes on (name) and (date, name).
I got a table A and a table B (and a Table C which is not really relevant). The relation is 1:n.
Table A
- id
- c_foreign_key
Table B
- id
- A_id
- datetime
Table A has about 400'000 entries, table B about 20 million.
I have a time-range, lets say from 2014/01/01 to 2014/12/31.
What i want for each month in this range is:
Count all entries from table A, grouped by c_foreign_key, where table A has no entries in table B for (month - 1.year to month).
The Result should look like this:
date c_foreign_key count(*)
--------------------------------
14/01 1 2000
14/01 2 3000
...
14/02 1 4000
14/01 2 6000
...
I already tried left join and "not in select" for each month the performance wasn't really good.
You should debug your SQL queries with explain more info at Mysql Explain Syntax, also you should place index- es on your datetime fields for a better performance. Explain usualy is used to see which indexes does mysql use in your query.
I have two tables with the exact same structure/columns. I need to combine them without duplicates based on the second column (title). First column is ID and these may have duplicates but should be ignored.
Example of database structure
id (primary key, auto increment), title (unique), description, link
Ex. Table 1
1 Bob thisisbob bob.com
2 Tom thisistom tom.com
3 Chad thisischad chad.com
Ex. Table 2
1 Chris thisischris chris.com
2 Chad thisischad chad.com
3 Dough thisisdough doug.com
What I need in Table 3
1 Bob thisisbob bob.com
2 Tom thisistom tom.com
3 Chad thisischad chad.com
4 Chris thisischris chris.com
5 Dough thisisdough doug.com
Both tables have about 5 million entries/rows each. I need the most efficient way possible to combine them.
It is a little hard to understand exactly what you want. Perhaps, though, this might be:
select *
from table1
union all
select *
from table2
where not exists (select 1 from table1 where table1.title = table2.title);
This will run faster with an index on table1(title).
EDIT:
If you want to insert them into a third table, you can do:
create table table3 as
select *
from table1
union all
select *
from table2
where not exists (select 1 from table1 where table1.title = table2.title);
Question:
I have the table tbl_studentapplication as bellow and possible
values for column idprogram are (1,2,3,4)
idapplication ICNO IdProgram
1 123 1
2 345 2
3 123 3
4 345 4
5 1234 3
How to fetch the ICNO and count(ICNO) from the table where idprogram in(3,4) only(ie do not fetch ICNO if ICNO belongs to idprogram in(1,2))
I have tried by using subquery in mysql but due to big table its taking more execution time and getting output nothing (table consist almost 30 columns and one lac rows).
output:
ICNO count(ICNO)
1234 1
Do you mean something like this?
SELECT ICNO, count(ICNO)
FROM tbl_studentapplication
WHERE ICNO NOT IN (
SELECT DISTINCT ICNO
FROM tbl_studentapplication
WHERE IdProgram IN (1,2)
)
GROUP BY ICNO
You might want to build an index on IdProgram and ICNO to make this query faster