I have a table with the following columns:
SessionID - contains actions within a session (1-10 rows per SessionID let's say)
ActionName - is the name of the action
Time - time action occurred
I need to return a new table, whose columns are the same, and contains only 1 row per SessionID IF the action's name is either "a" or "b".
That is, my new table should have a 1 row per SessionID which had the action "a" or "b".
I tried:
CREATE TABLE U_SessionID (
SELECT DISTINCT(SessionID) AS SessionID FROM test1)
;
I copy the unique sessions to a new table
SELECT U_SessionID.SessionID, test1.ActionName, test1.SessionID
FROM test1
INNER JOIN U_SessionID ON (SELECT SessionID
FROM test1
WHERE U_SessionID.SessionID = test1.SessionID AND (ActionName =
"a" OR ActionName = "b")
ORDER BY Time DESC
LIMIT 1);
But this code causes MySQL workbench to crash (timeout performing query), and I have no idea if it even works.
Sample data:
Can you think of a lighter query to run this?
Maybe a better approach would be:
take all rows with action "a" or action "b":
SELECT * FROM test1 WHERE ActionName = "a" OR ActionName = "b";
drop duplicates based on the SessionID only (no matter the time order)
Ideas for that?
Is the time unique for a session ID?
If so:-
SELECT U_SessionID.SessionID, test1.ActionName, test1.SessionID
FROM test1
INNER JOIN
(
SELECT SessionID
MAX(Time) AS MaxTime
FROM test1
WHERE ActionName IN ("a", "b")
GROUP BY SessionID
) sub0
ON test1.SessionID = sub0.SessionID
INNER JOIN U_SessionID
ON sub0.SessionID = U_SessionID.SessionID
AND sub0.MaxTime = U_SessionID.Time
If time isn't unique, but assuming the table has a unique column called id:-
SELECT U_SessionID.SessionID, test1.ActionName, test1.SessionID
FROM test1
INNER JOIN
(
SELECT SessionID
SUBSTRING_INDEX(GROUP_CONCAT(id ORDER BY Time DESC), ',', 1) AS MaxId
FROM test1
WHERE ActionName IN ("a", "b")
GROUP BY SessionID
) sub0
ON test1.SessionID = sub0.SessionID
INNER JOIN U_SessionID
ON sub0.SessionID = U_SessionID.SessionID
AND sub0.MaxId = U_SessionID.id
EDIT
If you just want a random record for each session id then you could (ab)use the GROUP BY clause. I really do not like this idea as while it probably does work, it might well not work depending on the configuration of your MySQL database, and even if it does work someone might do an update that stops it working. Even then if it still works there is no garuntee that it will bring back a single real record, rather than a single rows whose columns are a mix of those from different rows.
So put here for completeness and to give you an option but I strongly advise against using it
SELECT U_SessionID.SessionID, test1.ActionName, test1.SessionID
FROM test1
INNER JOIN U_SessionID ON U_SessionID.SessionID = test1.SessionID
WHERE ActionName IN ("a" , "b")
GROUP BY U_SessionID.SessionID
If you want a simple solution that is more future proof then use my 2nd solution and remove the ORDER BY clause within the GROUP_CONCAT
You can change you query like below. Your subquery is unnecessary here.
SELECT U_SessionID.SessionID, test1.ActionName, test1.SessionID
FROM test1
INNER JOIN U_SessionID ON U_SessionID.SessionID = test1.SessionID
WHERE ActionName IN ("a" , "b")
ORDER BY `Time` DESC
LIMIT 1;
Requirement:
We have monthly run process, in which we create Table1 and Table2.
1. Let say if we ran our process in Jan it creates Table2. Since Jan is first run we don't have Table1 created before. so, system should put everything from Table2 into Table1.
2. Now let say if we ran our process in Feb it creates Table2. And our system should check if Table1 exists. If yes and (t2.Run_dt > t1.Insert_dt) then pick all ID's from Table2 which don't exists in Table1 and insert/append those records into Table1.
3. Now let say if we re-ran our process in Feb again it creates Table2. And our system should check if Table1 exists if yes and (t2.Run_dt = t1.Insert_dt) then pick all ID's from Table2 which doesn't exists in Table1 and insert/append those records into Table1.
4. and so on...
I have these two tables;
Table1
ID Price Insert_Dt
----- ------- -------------
345 24.35 01-APR-2015
Table2
ID Price Run_Date
----- ------- -------------
345 24.35 01-MAY-2015
678 15.35 01-MAY-2015
I want to write a query to update table1 on the below given logic.
If Table1.Run_date >= Table2.Insert_Dt
and Table2 - Table1 = records found
then insert new records into Table1
If Table1.Run_date >= Table2.Insert_Dt
and Table2-Table1 = no records found
then do nothing
else do nothing
DECLARE
nCount NUMBER;
mCount NUMBER;
BEGIN
select count(*) into nCount from dba_tables where table_name = Table1;
if ( (nCount>0)
and ( (select max(a.Run_Date) from Table1 a)
> (select max(b.Insert_Date) from Table2 b) ) )
then
create table difference as
select * from Table2 c where c.ID not in(select d.ID from Table2 d)
select count(*) into mCount
from dba_tables
where table_name = 'difference';
if (nCount > 0) then insert /*+ append */ into Table1
select ID,Price,Run_Date
from (select ID,Price,Insert_Date from difference);
end
END;
I think this does what you described, but I have a couple of questions:
insert into table1
with max_run as (
select id, max (insert_dt) as max_date
from table1
group by id
)
select
t2.id, t2.price, t2.run_date
from
table2 t2,
max_run t1
where
t2.id = t1.id (+) and
-- (t1.id is null or t2.run_date >= t1.max_date) changed below, per edit
(t1.id is null or t2.run_date > t1.max_date)
Are you sure you want a >= insert_dt and not a > insert_dt? Having a >= means we keep inserting those records over and over, even though they are not really new
Did you want to insert new records or update existing records with the new info? Your inquiry said you wanted to insert them, and I didn't see any update code in your snippet, so I kept that as-is
I'm looking to clone an entry in a database while the id is the same.
Each line has its own ID, and then a separate column for the entry I'm looking to duplicate. I'm trying to find any entry with data in this column, and clone it to any other lines with the same ID.
Example:
ID COLUMNNAME
1 Test 1
1
2
2 Test 2
3
3
In this case, Test 1 would clone to the line below, and Test 2 would clone to the line above, while ID 3 would stay blank.
I have:
SELECT `columnname`, `id`
FROM `table`
WHERE `columnname` <> ''
AND `id` = `id`
written up to find the entries with data, but unsure where to go from here as I'm still very new to MySQL.
You can make a self-join with the mulitple-table UPDATE syntax:
UPDATE my_table t1 JOIN my_table t2 ON t2.ID = t1.ID AND t2.columnname <> ''
SET t1.columnname = t2.columnname
See it on sqlfiddle.
Or a Subquery
Update t1
Set t1.ColumnName = (Select Min(t2.ColumnName) From my_table t2 Where t2.ID = t1.ID And t2.ColumnName <> '')
FROM dbo.my_table t1
Where t1.ColumnName = '' Or t1.ColumnName IS NULL
Fiddle: http://sqlfiddle.com/#!3/5fb81/5
But, the self-join solution is much cleaner and faster.
Right now I am using something like this to delete duplicates in mysql table :
delete t2 from my_table1 as t1, my_table1 as t2 where
t1.TestCase = t2.TestCase and t2.id > t1.id;
say I have a structure like this :
ID TestCAse Result
1 T1 PASS
2 T2 FAIL
3 T3 FAIL
4 T3 PASS
now, in the above case T3 is duplicate entry, and if I use the SQL that I mentioned above, it would delete 4th row where the result is PASS, but this is the row that I want to keep and I want row 3 to get deleted which is FAIL.
Any help please?
Thank you.
if I undersstand correctly in case of duplicate you want to delete the "FAIL" and not the "PASS" ? in this case you can have the following query:
delete t2 from my_table1 as t1, my_table1 as t2 where
t1.TestCase = t2.TestCase and t2.id != t1.id and t2.Result='FAIL';
but what do you want to do when all the duplicate have "FAIL" in their column result? With the query above, both will be removed. Do you want to keep one in this case ?
What's the best way to delete duplicate records in a mysql database using rails or mysql queries?
What you can do is copy the distinct records into a new table by:
select distinct * into NewTable from MyTable
Here's another idea in no particular language:
rs = `select a, b, count(*) as c from entries group by 1, 2 having c > 1`
rs.each do |a, b, c|
`delete from entries where a=#{a} and b=#{b} limit #{c - 1}`
end
Edit:
Kudos to Olaf for that "having" hint :)
well, if it's a small table, from rails console you can do
class ActiveRecord::Base
def non_id_attributes
atts = self.attributes
atts.delete('id')
atts
end
end
duplicate_groups = YourClass.find(:all).group_by { |element| element.non_id_attributes }.select{ |gr| gr.last.size > 1 }
redundant_elements = duplicate_groups.map { |group| group.last - [group.last.first] }.flatten
redundant_elements.each(&:destroy)
Check for Duplicate entries :
SELECT DISTINCT(req_field) AS field, COUNT(req_field) AS fieldCount FROM
table_name GROUP BY req_field HAVING fieldCount > 1
Remove Duplicate Queries :
DELETE FROM table_name
USING table_name, table_name AS vtable
WHERE
(table_name.id > vtable.id)
AND (table_name.req_field=req_field)
Replace req_field and table_name - should work without any issues.
New to SQL :-)
This is a classic question - often asked in interviews:-)
I don't know whether it'll work in MYSQL but it works in most databases -
> create table t(
> a char(2),
> b char(2),
> c smallint )
> select a,b,c,count(*) from t
> group by a,b,c
> having count(*) > 1
a b c
-- -- ------ -----------
(0 rows affected)
> insert into t values ("aa","bb",1)
(1 row affected)
> insert into t values ("aa","bb",1)
(1 row affected)
> insert into t values ("aa","bc",1)
(1 row affected)
> select a,b,c,count(*) from t group by a,b,c having count(*) > 1
a b c
-- -- ------ -----------
aa bb 1 2
(1 row affected)
If you have PK (id) in table (EMP) and want to older delete duplicate records with name column. For large data following query may be good approach.
DELETE t3
FROM (
SELECT t1.name, t1.id
FROM (
SELECT name
FROM EMP
GROUP BY name
HAVING COUNT(name) > 1
) AS t0 INNER JOIN EMP t1 ON t0.name = t1.name
) AS t2 INNER JOIN EMP t3 ON t3.name = t2.name
WHERE t2.id < t3.id;
suppose we have a table name tbl_product and there is duplicacy in the field p_pi_code and p_nats_id in maximum no of count then
first create a new table insert the data from existing table ...
ie from tbl_product to newtable1 if anything else then newtable1 to newtable2
CREATE TABLE `newtable2` (
`p_id` int(10) unsigned NOT NULL auto_increment,
`p_status` varchar(45) NOT NULL,
`p_pi_code` varchar(45) NOT NULL,
`p_nats_id` mediumint(8) unsigned NOT NULL,
`p_is_special` tinyint(4) NOT NULL,
PRIMARY KEY (`p_id`)
) ENGINE=InnoDB;
INSERT INTO newtable1 (p_status, p_pi_code, p_nats_id, p_is_special) SELECT
p_status, p_pi_code, p_nats_id, p_is_special FROM tbl_product group by p_pi_code;
INSERT INTO newtable2 (p_status, p_pi_code, p_nats_id, p_is_special) SELECT
p_status, p_pi_code, p_nats_id, p_is_special FROM newtable1 group by p_nats_id;
after that we see all the duplicacy in the field is removed
I had to do this recently on Oracle, but the steps would have been the same on MySQL. It was a lot of data, at least compared to what I'm used to working with, so my process to de-dup was comparatively heavyweight. I'm including it here in case someone else comes along with a similar problem.
My duplicate records had different IDs, different updated_at times, possibly different updated_by IDs, but all other columns the same. I wanted to keep the most recently updated of any duplicate set.
I used a combination of Rails logic and SQL to get it done.
Step one: run a rake script to identify the IDs of the duplicate records, using model logic. IDs go in a text file.
Step two: create a temporary table with one column, the IDs to delete, loaded from the text file.
Step three: create another temporary table with all the records I'm going to delete (just in case!).
CREATE TABLE temp_duplicate_models
AS (SELECT * FROM models
WHERE id IN (SELECT * FROM temp_duplicate_ids));
Step four: actual deleting.
DELETE FROM models WHERE id IN (SELECT * FROM temp_duplicate_ids);
You can use:
http://lenniedevilliers.blogspot.com/2008/10/weekly-code-find-duplicates-in-sql.html
to get the duplicates and then just delete them via Ruby code or SQL code (I would do it in SQL code but thats up to you :-)
If your table has a PK (or you can easily give it one), you can specify any number of columns in the table to be equal (to qualify is as a duplicate) with the following query (may be a bit messy looking but it works):
DELETE FROM table WHERE pk_id IN(
SELECT DISTINCT t3.pk_id FROM (
SELECT t1.* FROM table AS t1 INNER JOIN (
SELECT col1, col2, col3, col4, COUNT(*) FROM table
GROUP BY col1, col2, col3, col4 HAVING COUNT(*)>1) AS t2
ON t1.col1 = t2.col1 AND t1.col2 = t2.col2 AND t1.col3 = t2.col3 AND
t1.col4 = t2.col4)
AS t3, (
SELECT t1.* FROM table AS t1 INNER JOIN (
SELECT col1, col2, col3, col4, COUNT(*) FROM table
GROUP BY col1, col2, col3, col4 HAVING COUNT(*)>1) AS t2
ON t1.col1 = t2.col1 AND t1.col2 = t2.col2 AND t1.col3 = t2.col3 AND
t1.col4 = t2.col4)
AS t4
WHERE t3.col1 = t4.col1 AND t3.pk_id > t4.pk_id
)
This will leave the first record entered into the database, deleting the 'newest' duplicates. If you want to keep the last record, switch the > to <.
In MySql when I put something like
delete from A where IDA in (select IDA from A )
mySql said something like "you can't use the same table in the select part of the delete operation."
I've just have to delete some duplicate records, and I have succeeded with a .php program like that
<?php
...
$res = hacer_sql("SELECT MIN(IDESTUDIANTE) as IDTODELETE
FROM `estudiante` group by `LASTNAME`,`FIRSTNAME`,`CI`,`PHONE`
HAVING COUNT(*) > 1 )");
while ( $reg = mysql_fetch_assoc($res) ) {
hacer_sql("delete from estudiante where IDESTUDIANTE = {$reg['IDTODELETE']}");
}
?>
I am using Alter Table
ALTER IGNORE TABLE jos_city ADD UNIQUE INDEX(`city`);
I used #krukid's answer above to do the following on a table with around 70,000 entries:
rs = 'select a, b, count(*) as c from table group by 1, 2 having c > 1'
# get a hashmap
dups = MyModel.connection.select_all(rs)
# convert to array
dupsarr = dups.map { |i| [i.a, i.b, i.c] }
# delete dups
dupsarr.each do |a,b,c|
ActiveRecord::Base.connection.execute("delete from table_name where a=#{MyModel.sanitize(a)} and b=#{MyModel.sanitize(b)} limit #{c-1}")
end
Here is the rails solution I came up with. May not be the most efficient, but not a big deal if its a one time migration.
distinct_records = MyTable.all.group(:distinct_column_1, :distinct_column_2).map {|mt| mt.id}
duplicates = MyTable.all.to_a.reject!{|mt| distinct_records.include? mt.id}
duplicates.each(&:destroy)
First, groups by all columns that determine uniqueness, the example shows 2 but you could have more or less
Second, selects the inverse of that group...all other records
Third, Deletes all those records.
Firstly do group by column on which you want to delete duplicate.But I am not doing it with group by.I am writing self join.
You don't need to create the temporary table.
Delete duplicate except one record:
In this table it should have auto increment column.
The possible solution that I've just come across:
DELETE n1 FROM names n1, names n2 WHERE n1.id > n2.id AND n1.name = n2.name
if you want to keep the row with the lowest auto increment id value OR
DELETE n1 FROM names n1, names n2 WHERE n1.id < n2.id AND n1.name = n2.name
if you want to keep the row with the highest auto increment id value.
You can cross check your solution, find duplicate again:
SELECT * FROM `names` GROUP BY name, id having count(name) > 1;
If it return 0 result, then you query is successful.