mysql delete from query using special id and date field - mysql

What i have is two columns specialid and date in tblSpecialTable, my table has duplicate specialID's, i want to delete from the table where date column is the older date and where specialid's are duplicated.

See my example:
mysql> SELECT * FROM test;
+------+---------------------+
| id | d |
+------+---------------------+
| 1 | 2011-06-29 10:48:41 |
| 2 | 2011-06-29 10:48:44 |
| 3 | 2011-06-29 10:48:46 |
| 1 | 2011-06-29 10:48:52 |
| 2 | 2011-06-29 10:48:53 |
| 3 | 2011-06-29 10:48:55 |
+------+---------------------+
mysql> DELETE t1 FROM test t1 INNER JOIN test t2 ON t1.id = t2.id AND t1.d < t2.d;
Query OK, 3 rows affected (0.00 sec)
mysql> SELECT * FROM test;
+------+---------------------+
| id | d |
+------+---------------------+
| 1 | 2011-06-29 10:48:52 |
| 2 | 2011-06-29 10:48:53 |
| 3 | 2011-06-29 10:48:55 |
+------+---------------------+
See also http://dev.mysql.com/doc/refman/5.0/en/delete.html

You have to use a "double-barrelled" match on the combination of fields from another query.
DELETE FROM tblSpecialTable
WHERE CONCAT(specialid, date) IN (
SELECT CONCAT(specialid, date)
FROM (
SELECT specialid, MAX(date) AS DATE, COUNT(*)
FROM tblSpecialTable
GROUP BY 1
HAVING COUNT(*) > 1) x
)

use a tmp table , set specialid column is unique. then use below sql:
insert into tmp(specailid,date) values(select specialid,date from tplSpecialTable order by date desc)

DELETE FROM tblSpecialTable
WHERE specialid NOT IN
(SELECT specialid FROM tblSpecialTable
GROUP BY specialid
HAVING COUNT(table.date) > 1
ORDER BY date
LIMIT COUNT(table.date) - 1 )

This isn't a fancy single query, but it does the trick:
CREATE TABLE tmp as SELECT * FROM tblspecialtable ORDER BY date DESC;
DELETE FROM tblspecialtable WHERE 1;
INSERT INTO tblspecialtable SELECT * FROM tmp GROUP BY specialid;
DROP TABLE tmp;
The first line creates a temporary table where the values are ordered by date, most recent first. The second makes room in the original table for the fixed values. The third consolidates the values, and since the GROUP BY command goes from the top down, it takes the most recent first. The final line removes the temporary table. The end result is the original table containing unique values of specialid with only the most recent dates.
Also, if you are programatically accessing your mysql table, it would be best to check if an id exists first, and then use the update command to change the date, or else add a new row if there is no existing specialID. Also, you should consider making specialID UNIQUE if you don't want duplicates.

Related

find all rows that have duplicate entries inside json column

I am stuck with a problem where I have a table with JSON column like this:
ID|VALUE
1 |{"a":"text1234","b":"default"}
2 |{"a":"text1234","b":"default"}
3 |{"a":"text1234","b":"text234"}
4 |{"a":"text1234","b":"default2"}
5 |{"a":"text1234","b":"default2"}
I would like to get all rows where value "b" is duplicate, so with the table above I would get rows 1,2,4,5.
I tried to group rows by value->b
$value_ids = ProductsAttributesValues::groupBy("value->b")->get();
but when i dd($value_ids) rows are not grouped by value->default. And I can't find a way to group them, so I can then count them. Or would there be a better way with doing this?
Try the json_extract function:
select count(id) dup_count, json_extract(`value`,"$.b") as dup_value
from test
group by json_extract(`value`,"$.b")
having dup_count>1
;
-- result set:
| dup_count | dup_value |
+-----------+------------+
| 2 | "default" |
| 2 | "default2" |
-- to get the id involved:
select id,dup_count,dup_value
from (select id,json_extract(`value`,"$.b") as dup_v
from test) t1
join
(select count(id) dup_count, json_extract(`value`,"$.b") as dup_value
from test
group by json_extract(`value`,"$.b")
having dup_count>1) t2
on t1.dup_v=t2.dup_value
;
-- result set:
| id | dup_count | dup_value |
+------+-----------+------------+
| 1 | 2 | "default" |
| 2 | 2 | "default" |
| 4 | 2 | "default2" |
| 5 | 2 | "default2" |
Here is the queries that can do your task.
/*Extract value of "b" - Step 1*/
DROP TEMPORARY TABLE IF EXISTS d1;
CREATE TEMPORARY TABLE d1
SELECT
ID, `VALUE`, SUBSTR(VALUE FROM POSITION(',"b":' IN VALUE)+5 FOR 1000) AS v
FROM mytest
;
/*Extract value of "b" - Step 2*/
DROP TEMPORARY TABLE IF EXISTS d2;
CREATE TEMPORARY TABLE d2
SELECT
ID, LEFT(v, LENGTH(v)-1) AS b
FROM
d1
;
ALTER TABLE d2 ADD INDEX b(b);
/* Search for duplicates */
DROP TEMPORARY TABLE IF EXISTS duplicates;
CREATE TEMPORARY TABLE duplicates
SELECT
b, COUNT(b) AS b_count
FROM
d2
GROUP BY b HAVING COUNT(b)>1
;
ALTER TABLE duplicates ADD INDEX b(b);
/* Display for duplicates */
SELECT
d2.ID, d2.b
FROM
d2
INNER JOIN duplicates ON d2.b=duplicates.b
;
This should give you :
1 "default"
2 "default"
4 "default2"
5 "default2"

Filter a join on each primary-foreign key relation only

I am using MySql.
I have table job that has a primary key job_pk_id and the table stores details of every job. Also I have a table job_running_status where job table's job_pk_id is a foreign key and this table basically contains records of when a job ran for each job_pk_id.There will be multiple entries for the same job_pk_id as the same job runs multiple times. job_running_status table also has a field job_start_time that gives the start time for each instance of the running of the job.
Now my requirement is to get the latest job_running_status for every job . The latest job_running_status would be chosen based on the latest job_start_time(for that particular job only) value in job_running_status.
I know this can be achieved using INNER JOIN and ORDER BY job_start_time desc between job table and job_running_status table but my challenge is this ORDER BY becomes applicable across all the jobs in the JOIN but I need to be applicable only across records that are corresponding to a particular job.
EDIT
I understand it might be confusing to understand me by just reading so I am providing some examples:
job table:
job_running_status table:
My final requirement after joining both the tables
Note: while joining I a should be getting only 1 record corresponding to every JOB table record. This record is chosen based on the latest job_start_time for that JOB.
An example of a correlated sub query in a where clause
drop table if exists t,t1;
create table t(id int);
create table t1(jid int,dt date);
insert into t values
(1),(2),(3);
insert into t1 values
(1,'2018-01-01'),
(1,'2018-02-01'),
(2,'2018-01-01'),
(3,'2018-01-01'),
(3,'2018-02-01'),
(3,'2018-03-01');
select t.id,t1.dt
from t
join t1 on t1.jid = t.id
where t1.dt =(select max(dt) from t1 where t1.jid = t.id);
+------+------------+
| id | dt |
+------+------------+
| 1 | 2018-02-01 |
| 2 | 2018-01-01 |
| 3 | 2018-03-01 |
+------+------------+
3 rows in set (0.00 sec)
If you need the latest n records and you are not on version 8.0 or higher you can use row number simulation
select t.id,s.dt
from t
join
(select t1.jid,t1.dt ,
if(t1.jid<>#p,#rn:=1,#rn:=#rn+1) rn,
#p:=t1.jid p
from t1
cross join (select #rn:=0,#p:=0) r
order by t1.jid ,t1.dt desc
) s on s.jid = t.id
where s.rn <= 2;
+------+------------+
| id | dt |
+------+------------+
| 1 | 2018-01-01 |
| 1 | 2018-02-01 |
| 2 | 2018-01-01 |
| 3 | 2018-02-01 |
| 3 | 2018-03-01 |
+------+------------+
You can try this query. CROSS JOIN with subquery, which get MAX(job_running_status)
Then join job and job_running_status tables.
TestDLL
CREATE TABLE JOB(
job_pk_id int
);
INSERT INTO JOB VALUES (1),(2),(3);
CREATE TABLE job_running_status(
fk_job_id INT,
job_running_status DATE
);
INSERT INTO job_running_status VALUES (1,'2018-01-01');
INSERT INTO job_running_status VALUES (1,'2018-02-01');
INSERT INTO job_running_status VALUES (2,'2018-01-03');
INSERT INTO job_running_status VALUES (2,'2018-01-02');
Query
SELECT DISTINCT
j.job_pk_id,
jrs.fk_job_id,
t.job_running_status
FROM
(SELECT MAX(job_running_status) job_running_status FROM job_running_status) t
CROSS JOIN job j
inner join job_running_status jrs on j.job_pk_id = jrs.fk_job_id
[Results]:
| job_pk_id | fk_job_id | job_running_status |
|-----------|-----------|--------------------|
| 1 | 1 | 2018-02-01 |
| 2 | 2 | 2018-02-01 |
sqlfiddle

Odd behavior of max and having in MySQL when max==0

I have the following table:
mysql> select * from foo;
| id | value | bar |
+----+-------+------+
| 1 | 2 | 3 |
| 2 | 0 | 3 |
| 1 | 1 | 5 |
I want to select the tuple with the maximum value for each id. However, when max(value) is 0, I don't get a result.
mysql> select id,max(value),bar from foo group by id having max(value);
| id | max(value) | bar |
+----+------------+------+
| 1 | 2 | 3 |
Is this supposed to behave like that and if so, why?
HAVING cannot be used in any way to pick a record out of a group of records as defined by the fields used in the GROUP BY clause. It is rather applied to the group as a whole.
So, in your case, you have to do a self-join to get the rest of the table fields:
select t1.id, t1.value, t1...
from foo as t1
join (
select id, max(value) as max_value
from foo
group by id
) as t2 on t1.id = t2.id and t1.value = t2.max_value
IMHO you can get MAX couple by multiplying (id x value).
create table foo(id int, value int);
insert into foo values
(2,0),
(1,0),
(2,1),
(3,0),
(2,2);
select id, value
from foo
order by (id * value) desc
limit 1;
id | value
2 | 2
drop table foo;

GROUP BY max in mysql

Let's say I have the following two entries:
`id` | `timestamp` | `content` | `reference`
1 | 2012-01-01 | NEWER | 1
2 | 2013-01-01 | NEWEST | 1
3 | 2011-01-01 | OLD | 2
I need the following result from my query:
`id` | `timestamp` | `content` | `reference`
2 | 2013-01-01 | NEWEST | 1
3 | 2011-01-01 | OLD | 2
Here's what I have so far, but it is incorrect:
SELECT * FROM table GROUP BY reference
What would be the correct query here?
I am looking to get the newest piece of content per reference id. In the example above, there are two reference id's (1 & 2), and I want to get the most recent entry for each.
SELECT *
FROM (SELECT * FROM table ORDER BY timestamp desc) as sub
GROUP BY reference
If you wish to expand the query, put limiting logic into the subquery like so for better performance:
SELECT *
FROM (SELECT *
FROM table
WHERE 1=1 and 2=2
ORDER BY timestamp desc
) as sub
GROUP BY reference
I take it you want the newest of each reference? Something like this:
SELECT * FROM my_table
WHERE id IN (
SELECT id FROM my_table ORDER BY timestamp DESC GROUP BY reference LIMIT 1
);
select * from table where reference_id in
(select max(id) from table group by reference)

Updating each row separately

I'm trying to update some rows in my database.
I have a table like the following :
id | subid | creation_date
1 | 1/1 | 2011-06-23
1 | 1/2 | 0000-00-00
2 | 2/1 | 2011-06-20
2 | 2/2 | 0000-00-00
WHat i want is to update the entries having the creation_date set to "0000-00-00" with the creation_date of the one who have a real date.
The result after the request would be :
id | subid | creation_date
1 | 1/1 | 2011-06-23
1 | 1/2 | 2011-06-23
2 | 2/1 | 2011-06-20
2 | 2/2 | 2011-06-20
Can someone out there have an idea to help me ? Il would be perfet to make this with a single request.
Thanks ;)
B.
to get around the problem with that other answer of not being able to have the table in the sub query that you are updating. let's just create at temp table and use that...
CREATE TEMPORARY TABLE foo SELECT id, MAX(creation_date) FROM yo_table GROUP BY id;
UPDATE yo_table SET creation_date = ( SELECT foo.creation_date FROM foo WHERE foo.id = yo_table.id )
WHERE creation_date = '0000-00-00';
update yo_table outter
set creation_date =
(select min(creation date) from yo_table iner where iner.id = outter.id)
where creation_date = '0000-00-00' --note that you'll have to edit this according to the data type of your creation_date column
Edit: with temp. table
create table yo_table_tmp as select * from yo_table;
update yo_table outter
set creation_date =
(select min(creation date) from yo_table_tmp iner where iner.id = outter.id)
where creation_date = '0000-00-00' --note that you'll have to edit this according to the data type of your creation_date column
;
drop table yo_table_tmp;
update table_a as t1, table_a as t2
set t1.creation_date=t2.creation_date
where t1.id=t2.id and (t1.creation_date=0 and t2.creation_date>0);
I think this should work for you:
UPDATE `tableA` `ta`
INNER JOIN (
SELECT `id`, `creation_date`
FROM `tableA`
WHERE `creation_date` > '0000-00-00'
GROUP BY id
) `tb` ON `ta`.`id` = `tb`.`id`
SET `ta`.`creation_date` = `tb`.`creation_date`
WHERE `ta`.`creation_date` = '0000-00-00';
Hope this helps.
Create a temporary table.
I modified your subid for simplicity you can always combine them in the query result.
mysql> update table1 set creation_date = (SELECT x.creation_date
from (SELECT * from table1 WHERE subid=1) AS X
WHERE x.id =table1.id) WHERE subid=2;
Query OK, 2 rows affected (0.00 sec)
Rows matched: 2 Changed: 2 Warnings: 0
mysql> select * from table1;
+----+-------+---------------+
| id | subid | creation_date |
+----+-------+---------------+
| 1 | 1 | 2011-06-23 |
| 1 | 2 | 2011-06-23 |
| 2 | 1 | 2011-06-20 |
| 2 | 2 | 2011-06-20 |
+----+-------+---------------+
4 rows in set (0.00 sec)