I am trying to select duplicate records based on a match of three columns. The list of triples could be very long (1000), so I would like to make it concise.
When I have a list of size 10 (known duplicates) it only matches 2 (seemingly random ones) and misses the other 8. I expected 10 records to return, but only saw 2.
I've narrowed it down to this problem:
This returns one record. Expecting 2:
select *
from ali
where (accountOid, dt, x) in
(
(64, '2014-03-01', 10000.0),
(64, '2014-04-23', -122.91)
)
Returns two records, as expected:
select *
from ali
where (accountOid, dt, x) in ( (64, '2014-03-01', 10000.0) )
or (accountOid, dt, x) in ( (64, '2014-04-23', -122.91) )
Any ideas why the first query only returns one record?
I'd suggest you don't use IN() for this, instead use a where exists query, e.g.:
CREATE TABLE inlist
(`id` int, `accountOid` int, `dt` datetime, `x` decimal(18,4))
;
INSERT INTO inlist
(`id`, `accountOid`, `dt`, `x`)
VALUES
(1, 64, '2014-03-01 00:00:00', 10000.0),
(2, 64, '2014-04-23 00:00:00', -122.91)
;
select *
from ali
where exists ( select null
from inlist
where ali.accountOid = inlist.accountOid
and ali.dt = inlist.dt
and ali.x = inlist.x
)
;
I was able to reproduce a problem (compare http://sqlfiddle.com/#!2/7d2658/6 to http://sqlfiddle.com/#!2/fe851/1 both MySQL 5.5.3) where if the x column was numeric and the value negative it was NOT matched using IN() but was matched when either numeric or decimal using a table and where exists.
Perhaps not a conclusive test but personally I wouldn't have used IN() for this anyway.
Why are you not determining the duplicates this way?
select
accountOid
, dt
, x
from ali
group by
accountOid
, dt
, x
having
count(*) > 1
Then use that as a derived table within the where exists condition:
select *
from ali
where exists (
select null
from (
select
accountOid
, dt
, x
from ali
group by
accountOid
, dt
, x
having
count(*) > 1
) as inlist
where ali.accountOid = inlist.accountOid
and ali.dt = inlist.dt
and ali.x = inlist.x
)
see http://sqlfiddle.com/#!2/ede292/1 for the query immediately above
Related
I want to use a simple query to decrement a value in a table like so:
UPDATE `Table`
SET `foo` = `foo` - 1
WHERE `bar` IN (1, 2, 3, 4, 5)
This works great in examples such as the above, where the IN list contains only unique values, so each matching row has its foo column decremented by 1.
The problem is when the list contains duplicates, for example:
UPDATE `Table`
SET `foo` = `foo` - 1
WHERE `bar` IN (1, 3, 3, 3, 5)
In this case I would like the row where bar is 3 to be decremented three times (or by three), and 1 and 5 to be decremented by 1.
Is there a way to change the behaviour, or an alternative query that I can use where I can get the desired behaviour?
I'm specifically using MySQL 5.7, in case there are any MySQL specific workarounds that are helpful.
Update: I'm building the query in a scripting language, so feel free to provide solutions that perform any additional processing prior to running the query (perhaps as pseudo code, to be as useful to as many as possible?). I don't mind doing it this way, I just want to keep the query as simple as possible while giving the expected result.
If you can process your original list first to get the counts, you could dynamically construct this kind of query:
UPDATE `Table`
SET `foo` = `foo` - CASE `bar` WHEN 1 THEN 1 WHEN 3 THEN 3 WHEN 5 THEN 1 ELSE 0 END
WHERE `bar` IN (1, 3, 5)
;
Note: the ELSE is just being thorough/paranoid; the WHERE should prevent it from ever getting that far.
There is an example might be beneficial for your purpose:
create table #temp (value int)
create table #mainTable (id int, mainValue int)
insert into #temp (value) values (1),(3),(3),(3),(4)
insert into #mainTable values (1,5),(2,5),(3,5),(4,5)
select value,count(*) as AddValue
into #otherTemp
from #temp t
group by value
update m
set mainValue = m.mainValue+ ot.AddValue
from #otherTemp ot
inner join #mainTable m on m.id=ot.value
select * from #mainTable
This is a little tricky, but you can do it by aggregating first:
update table t join
(select bar, count(*) as factor
from (select 1 as bar union all select 3 as bar union all select 3 as bar union all select 3 as bar union all select 5
) b
) b
on t.bar = b.bar
t.foo = t.foo - bar.factor;
When I compare a float value in where clause it does not give proper results.
for example
SELECT * FROM users WHERE score = 0.61
in this query score is a column of double type
but the above query works if I check the score to be 0.50 nothing else is being searched while I have records with 0.61 too
The above query also work if i use
SELECT * FROM users WHERE trim(score) = 0.61
I suggest you to use decimal instead of float. And it also have 2 decimal places only.
Here is the documentation on how to use it. Link.
I hope this will solve your problem.
If you didn't did not specify the decimal range in your float column i will not work without casting or trim:
this works fine :
-- drop table test_float;
create table test_float(f float(6,4) , d DECIMAL(4,2));
insert into test_float values (0.5,0.5);
insert into test_float values (0.61,0.61);
select * from test_float where f = d;
select * from test_float where f = 0.61;
this don't work :
drop table test_float;
create table test_float(f float , d DECIMAL(4,2));
insert into test_float values (0.5,0.5);
insert into test_float values (0.61,0.61);
select * from test_float;
select * from test_float where f = d;
select * from test_float where f = 0.61;
select * from test_float where CAST(f as DECIMAL(16,2)) = 0.61;
it work for decimal range range = 1
why , I really don't know why ?!!
I need to calculate the TIMEDIFF between a row and the row whose field dateCompleted is the last one just before this one and then get the value as timeSinceLast.
I can do this easily as a subquery but it's very slow. (About 12-15 times slower than a straight query on the table for just the rows).
#Very slow
Select a.*, TIMDIFF(a.dateCompleted, (SELECT a2.dateCompleted FROM action a2 WHERE a2.dateCompleted < a.dateCompleted ORDER BY a2.dateCompleted DESC LIMIT 1)) as timeSinceLast
FROM action a;
I tried doing it as a join with itself but couldn't figure out how to get that work as I don't know how to do a LIMIT 1 on the join table and not the query as a whole.
#How limit the join table only?
SELECT a.*, TIMEDIFF(a.dateCompleted, a2.dateCompleted)
FROM action a
LEFT JOIN action a2 on a2.dateCompleted < a.dateCompleted
LIMIT 1;
Is this possible in MySQL?
EDIT: Schema and data
http://sqlfiddle.com/#!9/03b5c/3
create table Actions
(
id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
dateCompleted datetime not null
);
#Notice, they can come out of order.
# The third one would affect the first one in my query as
# it's the first completed date right after the first
insert into Actions (dateCompleted)
values ("2016-05-06 12:11:01");
insert into Actions (dateCompleted)
values ("2016-05-06 12:11:03");
insert into Actions (dateCompleted)
values ("2016-05-06 12:11:02");
insert into Actions (dateCompleted)
values ("2016-05-06 12:11:05");
insert into Actions (dateCompleted)
values ("2016-05-06 12:11:04");
Result (order by dateCompleted):
id dateCompleted timeSinceLast
1, "2016-05-06 12:11:01", null
3, "2016-05-06 12:11:02", 1
2, "2016-05-06 12:11:03", 1
5, "2016-05-06 12:11:04", 1
4, "2016-05-06 12:11:05", 1
(In this simple example, they all had a one second time since the next one)
SELECT x.*
, MIN(TIMEDIFF(x.datecompleted,y.datecompleted))
FROM actions x
LEFT
JOIN actions y
ON y.datecompleted < x.datecompleted
GROUP
BY x.id
ORDER
BY x.datecompleted;
...or faster...
SELECT x.*
, TIMEDIFF(datecompleted,#prev)
, #prev:=datecompleted
FROM actions x
, (SELECT #prev:=null) vars
ORDER
BY datecompleted;
SELECT with WHERE Clause returns a new sequence of items matching the predicate by iteration.
Is there is any way to predict the given Search Criteria is available or not (Boolean) in MySQL?
Sample SQL
CREATE TABLE Ledger
(
PersonID int,
ldate date,
dr float,
cr float,
bal float
);
INSERT INTO Ledger(PersonID, ldate, dr, cr, bal)
VALUES
('1001', '2016-01-23', 105 ,0 ,0),
('1001', '2016-01-24', 0, 5.25, 0),
('1002', '2016-01-24', 0, 150, 0),
('1001', '2016-01-25', 0, 15, 0),
('1002', '2016-01-25', 73, 0, 0);
Here I need to Check PersonID 1002 is exist or not
Common Way of Checking is
SELECT COUNT(PersonID) > 0 AS my_bool FROM Ledger WHERE PersonID = 1002
SELECT COUNT(*) > 0 AS my_bool FROM Ledger WHERE ldate = '2016-01-24' AND (bal > 75 AND bal <100)
The above two queries are only for sample.
But, the above SELECT Query iterates the whole collection and filters the result. It degrades the Performance in a very Big Database Collection (over 300k active records).
Is there's a version with a predicate (in which case it returns whether or not any items match) and a version without (in which case it returns whether the query-so-far contains any items).
Here I given a very simple WHERE Clause. But in real scenario there is
complex WHERE Clause. .NET there is a method .Any() it predicts the
collection instead of .Where().
How to achieve this in a efficient way?
You can optimise by using EXISTS
SELECT EXISTS
(
SELECT 1
FROM Ledger
WHERE ldate = '2016-01-24'
AND (bal > 75 AND bal < 100)
) AS my_bool
It will return as soon as a match is found.
Click here to have a play on SQL Fiddle
More info here:
Optimizing Subqueries with EXISTS Strategy
Best way to test if a row exists in a MySQL table
Subqueries with EXISTS vs IN - MySQL
I have a table as below:
CREATE TABLE IF NOT EXISTS `status`
(`code` int(11) NOT NULL AUTO_INCREMENT PRIMARY KEY
,`IMEI` varchar(15) NOT NULL
,`ACC` tinyint(1) NOT NULL
,`datetime` datetime NOT NULL
);
INSERT INTO status VALUES
(1, 123456789012345, 0, '2014-07-09 10:00:00'),
(2, 453253453334445, 0, '2014-07-09 10:05:00'),
(3, 912841851252151, 0, '2014-07-09 10:08:00'),
(4, 123456789012345, 1, '2014-07-09 10:10:00'),
(5, 123456789012345, 1, '2014-07-09 10:15:00');
I need to get all rows for a given IMEI (e.g 123456789012345) where ACC=1 AND the previous row for same IMEI has ACC=0. The rows may be one after the other or very apart.
Given the exampl above, I'd want to get the 4th row (code 4) but not 5th (code 5).
Any ideas? Thanks.
Assuming that you mean previous row by datetime
SELECT *
FROM status s
WHERE s.imei='123456789012345'
AND s.acc=1
AND (
SELECT acc
FROM status
WHERE imei=s.imei
AND datetime<s.datetime
ORDER BY datetime DESC
LIMIT 1
) = 0
The way I would approach this problem is much different from the approaches given in other answers.
The approach I would use would be to
1) order the rows, first by imei, and then by datetime within each imei. (I'm assuming that datetime is how you are going to determine if a row is "previous" to another row.
2) sequentially process the rows, first comparing imei from the current row to the imei from the previous row, and then checking if the ACC from the current row is 1 and the ACC from the previous row is 0. Then I would know that the current row was a row to be returned.
3) for each processed row, in the resultset, include a column that indicates whether the row should be returned or not
4) return only the rows that have the indicator column set
A query something like this:
SELECT t.code
, t.imei
, t.acc
, t.datetime
FROM ( SELECT IF(s.imei=#prev_imei AND s.acc=1 AND #prev_acc=0,1,0) AS ret
, s.code AS code
, #prev_imei := s.imei AS imei
, #prev_acc := s.acc AS acc
, s.datetime AS datetime
FROM (SELECT #prev_imei := NULL, #prev_acc := NULL) i
CROSS
JOIN `status` s
WHERE s.imei = '123456789012345'
ORDER BY s.imei, s.datetime, s.code
) t
WHERE t.ret = 1
(I can unpack that a bit, to explain how it works.)
But the big drawback of this approach is that it requires MySQL to materialize the inline view as a derived table (temporary MyISAM table). If there was no predicate (WHERE clause) on the status table, the inline view would essentially be a copy of the entire status table. And with MySQL 5.5 and earlier, that derived table won't be indexed. So, this could present a performance issue for large sets.
Including predicates (e.g. WHERE s.imei = '123456789' to limit rows from the status table in the inline view query may sufficiently limit the size of the temporary MyISAM table.
The other gotcha with this approach is that the behavior of user-defined variables in the statement is not guaranteed. But we do observe a consistent behavior, which we can make use of; it does work, but the MySQL documentation warns that the behavior is not guaranteed.
Here's a rough overview of how MySQL processes this query.
First, MySQL runs the query for the inline view aliased as i. We don't really care what this query returns, except that we need it to return exactly one row, because of the JOIN operation. What we care about is the initialization of the two MySQL user-defined variables, #prev_imei and #prev_acc. Later, we are going to use these user-defined variables to "preserve" the values from the previously processed row, so we can compare those values to the current row.
The rows from the status table are processed in sequence, according to the ORDER BY clause. (This may change in some future release, but we can observe that it works like this in MySQL 5.1 and 5.5.)
For each row, we compare the values of imei and acc from the current row to the values preserved from the previous row. If the boolean in the IF expression evaluates to TRUE, we return a 1, to indicate that this row should be returned. Otherwise, we return a 0, to indicate that we don't want to return this row. (For the first row processed, we previously initialized the user-defined variables to NULL, so the IF expression will evaluate to 0.)
The #prev_imei := s.imei and #prev_acc := s.acc assigns the values from the current row to the user-defined values, so they will be available for the next row processed.
Note that it's important that the tests of the user-defined variables (the first expression in the SELECT list) before we overwrite the previous values with the values from the current row.
We can run just the query from the inline view t, to observe the behavior.
The outer query returns rows from the inline view that have the derived ret column set to a 1, rows that we wanted to return.
select * from status s1
WHERE
ACC = 1
AND code = (SELECT MIN(CODE) FROM status WHERE acc = 1 and IMEI = s1.IMEI)
AND EXISTS (SELECT * FROM status WHERE IMEI = s1.IMEI AND ACC = 0)
AND IMEI = 123456789012345
SELECT b.code,b.imei,b.acc,b.datetime
FROM
( SELECT x.*
, COUNT(*) rank
FROM status x
JOIN status y
ON y.imei = x.imei
AND y.datetime <= x.datetime
GROUP
BY x.code
) a
JOIN
( SELECT x.*
, COUNT(*) rank
FROM status x
JOIN status y
ON y.imei = x.imei
AND y.datetime <= x.datetime
GROUP
BY x.code
) b
ON b.imei = a.imei
AND b.rank = a.rank + 1
WHERE b.acc = 1
AND a.acc = 0;
you can do a regular IN() and then group any duplicates (you could also use a limit but that would only work for one IMEI)
SETUP:
INSERT INTO `status`
VALUES
(1, 123456789012345, 0, '2014-07-09 10:00:00'),
(2, 453253453334445, 0, '2014-07-09 10:05:00'),
(3, 912841851252151, 0, '2014-07-09 10:08:00'),
(4, 123456789012345, 1, '2014-07-09 10:10:00'),
(5, 123456789012345, 1, '2014-07-09 10:15:00'),
(6, 123456789012345, 1, '2014-07-09 10:15:00'),
(7, 453253453334445, 1, '2014-07-09 10:15:00');
QUERY:
SELECT * FROM status
WHERE ACC = 1 AND IMEI IN(
SELECT DISTINCT IMEI FROM status
WHERE ACC = 0)
GROUP BY imei;
RESULTS:
works with multiple IMEI that have a 0 then a 1... IMAGE
EDIT:
if you would like to go by the date entered as well then you can just order it first by date and then group.
SELECT * FROM(
SELECT * FROM status
WHERE ACC = 1 AND IMEI IN(
SELECT DISTINCT IMEI FROM status
WHERE ACC = 0)
ORDER BY datetime
) AS t
GROUP BY imei;