Want distinct consecutive row - mysql

I have below table STORAGE_CAPACITY with me.
CREATE TABLE STORAGE_CAPACITY(DATE_TIME DATETIME,COL1 INT,COL2 INT,COL3 INT,COL4 INT);
INSERT INTO STORAGE_CAPACITY values(SYSDATE(),1,2,3,4);
INSERT INTO STORAGE_CAPACITY values(SYSDATE(),1,2,3,4);
INSERT INTO STORAGE_CAPACITY values(SYSDATE(),4,5,6,7);
INSERT INTO STORAGE_CAPACITY values(SYSDATE(),4,5,8,9);
INSERT INTO STORAGE_CAPACITY values(SYSDATE(),1,2,3,4);
SELECT * FROM storage_capacity
Now what i want is if two consecutive rows have same element in col1 to col4 then i only want older one.
And if same row happened in future then i want that row.
So my expected O/P
DATE_TIME, COL1, COL2, COL3, COL4
'2017-08-16 16:37:02', '1', '2', '3', '4'
'2017-08-16 16:37:18', '4', '5', '6', '7'
'2017-08-16 16:37:26', '4', '5', '8', '9'
'2017-08-16 16:37:57', '1', '2', '3', '4'

Assuming that the first column defines the ordering, you can use variables for this:
select t.*
from (select t.*,
(#rn := if(#cols = concat_ws(',', col1, col2, col3, col4), #rn + 1,
if(#cols := concat_ws(',', col1, col2, col3, col4), 1, 1)
)
) as rn
from t cross join
(select #cols := '', #rn := 0) params
order by t.date_time
) t
where rn = 1;
Note: To establish the order of insert, it is safer to use an auto_increment columns rather than a datetime column. Multiple rows can be inserted in the table with the same datetime value.

Related

MySQL Removing duplicates based on condition and multiple columns combinations

I have a table in MySQL as below:
ID, COL1, COL2 VALUE
'1', 'OBJ1', 'OBJ2', '5'
'2', 'OBJ1', 'OBJ2', '1'
'3', 'OBJ2', 'OBJ1', '3'
'4', 'OBJ3', 'OBJ1', '4'
'5', 'OBJ3', 'OBJ4', '6'
Relation between col1 and col2 is independent of position, ie OBJ1 in col1 and OBJ2 in col2 is same as OBJ1 in col2 and OBJ2 in col1. This means that OBJ1 and OBJ2 shares a relationship.
Now, this means that the object OBJ1 and OBJ2 have a value of 1,5,3...
I want to keep only distinct values ie OBJ1, OBJ2 should occur only once in the table, not even OBJ2,OBJ1.
Importantly, I want to retain only the row with HIGHEST value.
The result I want is thus:
ID, COL1, COL2 VALUE
'1', 'OBJ1', 'OBJ2', '5'
'4', 'OBJ3', 'OBJ1', '4'
'5', 'OBJ3', 'OBJ4', '6'
What is the best and efficient way of doing this? I have over 10 million rows.
I have searched in many forums/Google but cannot find the exact answer I am looking for..
Try this:
SELECT t1.ID, t1.COL1, t1.COL2, t1.VALUE
FROM mytable AS t1
JOIN (
SELECT LEAST(COL1, COL2) AS C1,
GREATEST(COL1, COL2) AS C2,
MAX(VALUE) AS max_Value
FROM mytable
GROUP BY LEAST(COL1, COL2),
GREATEST(COL1, COL2)
) AS t2 ON t1.COL1 = t1.C1 AND t1.COL2 = t2.C2 AND t1.VLAUE = t2.max_Value
You could use an in clause and subselect grouped by
for solve also the problem related to the distinct pair combination
You should organize the data in a proper way
select
id
, case when col1 <= col2 then col1 else col2 end COL1
, case when col1 > col2 then col1 else col2 end COL2
, value
from start_table
then the query became
SELECT t1.ID, t1.COL1, t1.COL2, t1.VALUE
FROM (
select
id
, case when col1 <= col2 then col1 else col2 end COL1
, case when col1 > col2 then col1 else col2 end COL2
, value
from start_table
) t1
where value in (
select max(value)
FROM (
select
id
, case when col1 <= col2 then col1 else col2 end COL1
, case when col1 > col2 then col1 else col2 end COL2
, value
from start_table
) mytable
group by col1, col2
)
or using an inner join
SELECT t1.ID, t1.COL1, t1.COL2, t1.VALUE
FROM (
select
id
, case when col1 <= col2 then col1 else col2 end COL1
, case when col1 > col2 then col1 else col2 end COL2
, value
from start_table
) t1
inner join
(
select max(value) as value
FROM (
select
id
, case when col1 <= col2 then col1 else col2 end COL1
, case when col1 > col2 then col1 else col2 end COL2
, value
from start_table
) mytable
group by col1, col2
) T2 on t1.value = t2.value
Rebuild the table so that no dups are allowed; in the process, get rid of the dups. (And get rid of the apparently useless id.)
CREATE TABLE new (
col1 ...,
col2 ...,
`value` ...,
PRIMARY KEY(col1, col2),
INDEX(col2, col2, `value`)
) ENGINE=InnoDB;
INSERT INTO new (col1, col2, `value`)
SELECT LEAST(col1, col2),
GREATEST(col1, col2),
`value`
ON DUPLICATE KEY UPDATE
`value` := GREATEST(`value`, VALUES(`value`));
RENAME TABLE real TO old,
new TO real;
DROP TABLE old;
In the future, you will need this for INSERTing/UPDATEing new rows:
INSERT INTO new (col1, col2, `value`)
VALUES (?, ?, ?)
ON DUPLICATE KEY UPDATE
`value` := GREATEST(`value`, VALUES(`value`));
(This assumes you want to increase value whenever it is already in the table.)
These save space and speed (important for 10M rows): Getting rid of id; having optimal indexes; using InnoDB; etc.

Error Code: 1060. Duplicate column name

I've been receiving Error Code: 1060. :
Duplicate column name 'NULL'
Duplicate column name '2016-08-04 01:25:06'
Duplicate column name 'john'
However, I need to insert some field with the same value, but SQL is denying and showing the above error. The error is probably sql can't select the same column name, in that case is there other way of writing the code? Below is my current code
INSERT INTO test.testTable SELECT *
FROM (SELECT NULL, 'hello', 'john', '2016-08-04 01:25:06', 'john'
, '2016-08-04 01:25:06', NULL, NULL) AS tmp
WHERE NOT EXISTS (SELECT * FROM test.testTable WHERE message= 'hello' AND created_by = 'john') LIMIT 1
My Column:
(id, message, created_by, created_date, updated_by, updated_date, deleted_by, deleted_date)
Please assist, thanks.
Your duplicate column names are coming from your subquery. You select null, john, and 2016-08-04 01:25:06 multiple times. Provide the columns you are selecting with names/aliases:
INSERT INTO test.testTable
SELECT *
FROM (SELECT NULL as col1, 'hello' as col2,
'john' as col3, '2016-08-04 01:25:06' as col4,
'john' as col5, '2016-08-04 01:25:06' as col6,
NULL as col7, NULL as col8) AS tmp
WHERE NOT EXISTS (SELECT *
FROM test.testTable
WHERE message= 'hello' AND created_by = 'john')
LIMIT 1
Not sure limit 1 is useful here, you are only selecting a single row to potentially insert.
You are using a subquery. Because you don't give the columns aliases, MySQL has to choose aliases for you -- and it chooses the formulas used for the definition.
You can write the query without the subquery:
INSERT INTO test.testTable( . . .)
SELECT NULL, 'hello', 'john', '2016-08-04 01:25:06', 'john',
'2016-08-04 01:25:06', NULL, NULL
FROM dual
WHERE NOT EXISTS (SELECT 1
FROM test.testTable tt
WHERE tt.message = 'hello' AND tt.created_by = 'john'
);
If you do use a subquery in the SELECT, then use correlation clauses in the WHERE subquery:
INSERT INTO test.testTable( . . .)
SELECT *
FROM (SELECT NULL as col1, 'hello' as message, 'john' as created_by,
'2016-08-04 01:25:06' as date, 'john' as col2,
'2016-08-04 01:25:06' as col3, NULL as col4, NULL as col5
) t
WHERE NOT EXISTS (SELECT 1
FROM test.testTable tt
WHERE tt.message = t.message AND
tt.created_by = t.created_by
);
In addition, the LIMIT 1 isn't doing anything because you only have one row.

SQL: Find top n in groupby data

I have a table like this:
id | name | surname | city|
-------------------------------
'1', 'mohit', 'garg', 'delhi'
'2', 'mohit', 'gupta', 'delhi'
'3', 'ankita', 'gupta', 'jaipur'
'4', 'ankita', 'garg', 'jaipur'
'5', 'vivek', 'garg', 'delhi'
I am looking for a query that returns (id,city) grouped by city, with at most two (id) per city, but without using nested queries.
Expected output:
'1', 'delhi'
'2', 'delhi'
'3', 'jaipur'
'4', 'jaipur'
Perhaps the only way without subqueries is to use a trick with substring_index() and group_concat():
select city, substring_index(group_concat(id), ',', 2)
from t
group by city;
This puts the ids in a comma-delimited list, rather than in separate rows. Also, you have to be careful about the size of the intermediate results.
Of course, the accepted practice would use either a subquery in the where clause or a subquery using variables.
EDIT:
Here is a method for getting two ids per city without listing the cities:
select city, min(id) as id
from t
group by city
union
select city, max(id)
from t
group by city;
You can do this with a LEFT OUTER JOIN, although using a subquery will probably be clearer and might be faster. Here's a method using the JOIN:
SELECT
T1.id,
T1.city
FROM
My_Table T1
LEFT OUTER JOIN My_Table T2 ON T2.city = T1.city AND T2.id <= T1.id
GROUP BY
T1.id,
T1.city
HAVING
COUNT(*) <= 2
You're effectively finding all rows in T1 where the number of rows with the same name and a lower id is <= 2, which means that it must be one of the top two rows by id.
Try like below
create table
#test (id int, name varchar(10), name2 varchar(10),place varchar(10))
insert into #test
select
'1', 'mohit', 'garg', 'delhi'
union
select
'2', 'mohit', 'gupta', 'delhi'
union
select
'3', 'ankita', 'gupta', 'jaipur'
union
select
'4', 'ankita', 'garg', 'jaipur'
union
select
'5', 'vivek', 'garg', 'delhi'
with data
as
(
select ROW_NUMBER() OVER(PARTITION BY place ORDER BY id) RN,id,name,name2,place
from #test
),
data1
as(
select id, place
from data
where rn <=2
)
select *from data1

get count of all users for particular category and rank them

how do I get count of each user contribution/appearing for that particular category.The table below has user,category .I am looking for count of hoe many times all users have contributed/appeared in the table below and rank them.
http://sqlfiddle.com/#!2/d4458/2
CREATE TABLE if not exists tblA
(
id int(11) NOT NULL auto_increment ,
user varchar(255),
category int(255),
PRIMARY KEY (id)
);
INSERT INTO tblA (user, category ) VALUES
('1', '1'),
('1', '2'),
('1', '3'),
('1', '1'),
('2', '1'),
('2', '1');
Response like: Search for category where its '1'
user category count rank
1 1 2 1
2 1 2 2
SELECT USER,
category,
count(*) AS num
FROM tblA
WHERE category=1
GROUP BY USER,
category
ORDER BY num DESC;
demo: http://sqlfiddle.com/#!2/d4458/10/0
SET #prev_value = NULL;
SET #rank_count = 0;
SELECT
i.*,
CASE
WHEN #prev_value = i.num THEN #rank_count
ELSE #rank_count := #rank_count + 1
END AS rank
FROM (
SELECT
user,category,COUNT(*) AS num
FROM tblA
WHERE category=1
GROUP BY user,category
ORDER BY num DESC
) i;

time difference between two similar values

I want to find the time difference between two function names in my database. the database looks like this:
what I want to do is to find the time difference between two consecutive function names who have the same name. for example the output will be for "getPrice" at row number "2" and row number "3" and then time difference for "getPrice"at row "3" and row "5" and so on for all other times and all other function names. Please help me and thanks a lot!
I tried
SELECT a.lid, a.date, (b.date-a.date) as timeDifference
FROM myTable a
INNER JOIN myTable b ON b.lid = (a.lid+1)
ORDER BY a.lid ASC;
The problem is, it gives time difference for any consecutive function names even if they are not identical!
#tombom
there is a table I use for testing and have different variable names than the example I provided earlier. the table looks like this:
and after applying your code (and of course change the variable names to match with this table) the output looks like this:
as you can see the "getTax" is subtracted from "getPrice" although they are different. how can I solve this problem?? Thanks a lot.
the schema I'm trying to build is:
CREATE TABLE `test` (
`id` INT NOT NULL AUTO_INCREMENT ,
`nserviceName` VARCHAR(45) NULL ,
`functionName` VARCHAR(45) NULL ,
`time` TIMESTAMP NULL ,
`tps` INT NULL ,
`clientID` INT NULL ,
PRIMARY KEY (`id`) );
and the insert is :
INSERT INTO `test` (`id`, `nserviceName`, `functionName`, `time`, `tps`, `clientID`) VALUES ('1', 'X1', 'getPrice', '2013-05-23 00:36:08', '22', '0');
INSERT INTO `test` (`id`, `nserviceName`, `functionName`, `time`, `tps`, `clientID`) VALUES ('2', 'X2', 'getTax', '2013-05-23 00:38:00', '33', '0');
INSERT INTO `test` (`id`, `nserviceName`, `functionName`, `time`, `tps`, `clientID`) VALUES ('3', 'X1', 'getPrice', '2013-05-23 00:35:00', '12', '0');
INSERT INTO `test` (`id`, `nserviceName`, `functionName`, `time`, `tps`, `clientID`) VALUES ('4', 'X1', 'getPrice', '2013-05-23 00:35:00', '11', '0');
INSERT INTO `test` (`id`, `nserviceName`, `functionName`, `time`, `tps`, `clientID`) VALUES ('5', 'X2', 'getTax', '2013-05-23 00:35:00', '88', '0');
INSERT INTO `test` (`id`, `nserviceName`, `functionName`, `time`, `tps`, `clientID`) VALUES ('6', 'X1', 'getPrice', '2013-05-23 00:35:00', '33', '0');
thanks.
#tombom
the operation I want to perform on the table is like the following image:
where I start from the first record X1 getPrice which have no record before it. so no operation is required. then check number two getTax have no getPrice before it which are not identical so again no operation will be performed. then number 3 getPrice have getTax before it so it ignores it and check above getTax to find getPrice here it will do the time difference between getPrice(#3) and getPrice(#1). next getPrice at row 4 will check the rows above it, and it find the one directly above it is getPrice so time difference between getPrice*(#4) and getPrice(#3) will be found. then getTax at row 5 will check the rows above it until it finds a similar functionName (getTax) which is at row #2. then the time difference between getTax at row 5 and getTax at row 2 will be found.
thanks a lot..
Please have a try with this one:
SELECT lid, `date`, serviceName, functionName, responseTime, sid, timeDifference FROM (
SELECT
IF(#prevFname = functionName, SEC_TO_TIME(TIMESTAMPDIFF(SECOND, `date`, #prevDate)), 'functionName differs') AS timeDifference,
#prevFname := functionName AS a,
#prevDate := `date` AS b,
yt.*
FROM
yourTable yt
, (SELECT #prevFname:=NULL, #prevDate:=NULL) vars
ORDER BY functionName, `date`
) subquery_alias
I like to use user defined variables in such cases as I made amazing experiences regarding performance, since no self-join is needed.
Also note that I used the timestampdiff function and sec_to_time to polish the output. Timestampdiff is the correct way to subtract different dates(+times). Only downside is, that sec_to_time only allows a range from '00:00:00' to '23:59:59'. If this can lead to problems, remove the function again. Read more about both functions on this site.
UPDATE (less complicated than necessary):
SELECT lid, `date`, serviceName, functionName, responseTime, sid, timeDifference FROM (
SELECT
SEC_TO_TIME(TIMESTAMPDIFF(SECOND, #prevDate, `date`)) AS timeDifference,
#prevDate := `date` AS b,
yt.*
FROM
yourTable yt
, (SELECT #prevDate:=NULL) vars
ORDER BY lid
) subquery_alias
UPDATE 2:
This one resets the timedifference to 00:00:00 when functionName differs to previous one.
SELECT * /*choose here only the columns you need*/ FROM (
SELECT
IF(#prevFunction = functionName, SEC_TO_TIME(TIMESTAMPDIFF(SECOND, #prevDate, `time`)), '00:00:00') AS timeDifference,
#prevFunction := functionName AS a,
#prevDate := `time` AS b,
yt.*
FROM
test yt
, (SELECT #prevDate:=NULL, #prevFunction:=NULL) vars
ORDER BY id
) subquery_alias
UPDATE 3:
Okay, what a difficult birth. Just a minor tweak.
SELECT * /*choose here only the columns you need*/ FROM (
SELECT
IF(#prevFunction = functionName, SEC_TO_TIME(TIMESTAMPDIFF(SECOND, #prevDate, `time`)), '00:00:00') AS timeDifference,
#prevFunction := functionName AS a,
#prevDate := `time` AS b,
yt.*
FROM
test yt
, (SELECT #prevDate:=NULL, #prevFunction:=NULL) vars
ORDER BY functionName, id#, `time`
) subquery_alias
ORDER BY id
I order by function name and id again (or time if you prefer) in the subquery, do all the calculations, then sort it by id again in the outer query. That's it.