I am doing some data clean up and I would like to remove duplicate rows by finding records that have the same "picture id" and "date" values:
Example:
picture_id - 2 date - "13-Jul-18"
picture_id - 2 date - "13-Jul-18"
picture_id - 2 date - "13-Jul-18"
picture_id - 2 date - "13-Jul-18"
DELETE FROM `pictures` WHERE `picture_id` = '2' AND `date` = '13-Jul-18'
Table columns (in order): ID (primary key), picture_id, date, followers
I would like to only delete all but one of the duplicate records. It does not matter which one. How can I accomplish this?
In MySQL, you can keep the smallest (or biggest) id using JOIN:
DELETE p
FROM pictures p JOIN
(SELECT p.picture_id, p.date, MIN(id) as min_id
FROM pictures p
WHERE p.picture_id = 2 AND p.date = '2018-07-13'
GROUP BY p.picture_id
) pp
ON p.picture_id = pp.picture_id AND p.date = pp.date AND p.id > p.min_id;
Assuming you don't care which ID you keep you can select one record all delete all those records which are not the one selected
DELETE
FROM pictures
WHERE ID NOT IN (
SELECT
ID
FROM pictures
WHERE picture_id = 2 AND
Date = '2018-07-13'
LIMIT 1
) AND
picture_id = 2 AND
Date = '2018-07-13'
The fact these are unwanted duplicates makes me think either your current Primary Key is insufficient for your purposes or you need to look at a unique constraints
you can try something like
DROP TABLE IF EXISTS pictures;
CREATE TABLE pictures(picture_id INT(11), `dt` DATE, followers INT(11));
INSERT INTO pictures VALUES
(2,'2018-07-13',4553),
(2,'2018-07-13',4552),
(2,'2018-07-13',4557),
(2,'2018-07-13',4577),
(3,'2018-07-13',4355),
(3,'2018-07-13',4351),
(3,'2018-07-13',4353),
(3,'2018-07-13',4374);
Delete query
DELETE P FROM pictures p
LEFT JOIN (
SELECT picture_id, dt, MAX(followers) AS fol
FROM pictures WHERE dt ='2018-07-13' GROUP BY picture_id
) AS main
ON main.dt = p.dt
WHERE main.picture_id = p.picture_id
AND main.fol <> p.followers;
I hope this will solve you problem.
simply use common table
With CTE_Duplicates as
(select picture_id ,date , row_number() over(partition by picture_id,date order by picture_id ,date ) rownumber
from `pictures` )
delete from CTE_Duplicates where rownumber!=1
it work for me.please check
Related
I am doing some data clean up and I would like to remove duplicate rows by finding records that have the same "picture id" and "date" values:
Example:
picture_id - 2 date - "13-Jul-18"
picture_id - 2 date - "13-Jul-18"
picture_id - 2 date - "13-Jul-18"
picture_id - 2 date - "13-Jul-18"
DELETE FROM `pictures` WHERE `picture_id` = '2' AND `date` = '13-Jul-18'
Table columns (in order): ID (primary key), picture_id, date, followers
I would like to only delete all but one of the duplicate records. It does not matter which one. How can I accomplish this?
In MySQL, you can keep the smallest (or biggest) id using JOIN:
DELETE p
FROM pictures p JOIN
(SELECT p.picture_id, p.date, MIN(id) as min_id
FROM pictures p
WHERE p.picture_id = 2 AND p.date = '2018-07-13'
GROUP BY p.picture_id
) pp
ON p.picture_id = pp.picture_id AND p.date = pp.date AND p.id > p.min_id;
Assuming you don't care which ID you keep you can select one record all delete all those records which are not the one selected
DELETE
FROM pictures
WHERE ID NOT IN (
SELECT
ID
FROM pictures
WHERE picture_id = 2 AND
Date = '2018-07-13'
LIMIT 1
) AND
picture_id = 2 AND
Date = '2018-07-13'
The fact these are unwanted duplicates makes me think either your current Primary Key is insufficient for your purposes or you need to look at a unique constraints
you can try something like
DROP TABLE IF EXISTS pictures;
CREATE TABLE pictures(picture_id INT(11), `dt` DATE, followers INT(11));
INSERT INTO pictures VALUES
(2,'2018-07-13',4553),
(2,'2018-07-13',4552),
(2,'2018-07-13',4557),
(2,'2018-07-13',4577),
(3,'2018-07-13',4355),
(3,'2018-07-13',4351),
(3,'2018-07-13',4353),
(3,'2018-07-13',4374);
Delete query
DELETE P FROM pictures p
LEFT JOIN (
SELECT picture_id, dt, MAX(followers) AS fol
FROM pictures WHERE dt ='2018-07-13' GROUP BY picture_id
) AS main
ON main.dt = p.dt
WHERE main.picture_id = p.picture_id
AND main.fol <> p.followers;
I hope this will solve you problem.
simply use common table
With CTE_Duplicates as
(select picture_id ,date , row_number() over(partition by picture_id,date order by picture_id ,date ) rownumber
from `pictures` )
delete from CTE_Duplicates where rownumber!=1
it work for me.please check
The query below collects temporary overview data for every user into memory table. Basicaly, user sees, count of items by keyword.
The problem is, it calculates total count of items by keyword_id.
What I need is, to calculate item_count by both keyword_id and item_type (Item.type).
SELECT
`Item`.`user_id` AS `user_id` ,
`ItemKeyword`.`keywordID` AS `keyword_id` ,
`Keyword`.`title` AS `keyword_title`,
count(`ItemKeyword`.`ItemID`) AS `ico_count`
FROM
(
(
`ItemKeyword`
JOIN `Item` ON(
(
`Item`.`id` = `ItemKeyword`.`ItemID`
)
)
)
JOIN `Keyword` ON(
(
`Keyword`.`id` = `ItemKeyword`.`keywordID`
)
)
)
GROUP BY
`Item`.`user_id` ,
`ItemKeyword`.`keywordID`;
Details
For example, now result looks like below
Basicaly, item_count is total of all item_types. What I need is, to separate the result below
user_id keyword_id keyword_title item_count
1 9645 surveillance 20
Into something like this:
user_id keyword_id keyword_title item_count item_type
1 9645 surveillance 18 1
1 9645 surveillance 2 2
Where, item_count are calculated by both keyword_id and item_type.
I can't figure out how to include item_type also into this query.
Any suggestions?
I do not understand your love of brackets (parenthesis). Why so many? in my opinion you lose the readability. It was just a side note.
If you need an extra grouping level you need to modify the query like this:
SELECT
`Item`.`user_id` AS `user_id` ,
`ItemKeyword`.`keywordID` AS `keyword_id` ,
`Keyword`.`title` AS `keyword_title`,
count(`ItemKeyword`.`ItemID`) AS `ico_count`,
`Item`.`type` AS `item_type`
FROM `ItemKeyword` JOIN `Item` ON
`Item`.`id` = `ItemKeyword`.`ItemID`
JOIN `Keyword` ON
`Keyword`.`id` = `ItemKeyword`.`keywordID`
GROUP BY
`Item`.`user_id` ,
`Item`.`type` ,
`ItemKeyword`.`keywordID`,
`Keyword`.`title`;
I am trying to select from two much big table using join:
EXPLAIN SELECT SQL_NO_CACHE
e.*
FROM `table_A` e
JOIN
(SELECT id FROM `table_B` /* FORCE index (primary, index_A) */
WHERE id > 338107 AND `index_field_A` = 900000000 AND `index_field_B` = 1
ORDER BY id) AS c
ON `c`.`id` = `e`.`fk_id`
WHERE e.`some_field` IS NULL;
LIMIT 2000;
/* BEST EXPLAIN RESULT
USING intersect(index_A,index_B); USING WHERE; USING INDEX
*/
I store current id at the application side (338107) to be able to get total result by batching (from start id to max id)
There are no problem if I get rid off ORDER BY id But I am not sure that MySQL order PK by default.
There are no problem TOO if I use separate SELECT without JOIN :
SELECT id FROM `table_B` WHERE id > 338107
AND `index_field_A` = 900000000 AND `index_field_B` = 1 ORDER BY id
but it is useless
A little better EXPLAIN I can get if I forcing index :
FORCE index (primary, index_A)
But it too far from good.
Can I get rid of ORDER BY id without negative aftermath?
Addition : ordering PK field is autoincrement, InnoDB tables
What about something like this?
SELECT *
FROM `table_A`
WHERE `some_field` IS NULL
AND `fk_id` IN (
SELECT `id`
FROM `table_B`
WHERE id > 338107 AND `index_field_A` = 900000000 AND `index_field_B` = 1
)
ORDER BY `fk_id`
;
or
SELECT e.*
FROM `table_B` AS c
LEFT JOIN `table_A` AS e
ON c.`id` = e.`fk_id`
AND e.`someField` IS NULL
WHERE c.id > 338107
AND c.`index_field_A` = 900000000
AND c.`index_field_B` = 1
HAVING e.fk_id IS NOT NULL
ORDER BY e.`fk_id`
;
Good day.
STRUCTURE TABLES AND ERROR WHEN EXECUTE QUERY ON SQLFIDDLE
I have some sql queries:
First query:
SELECT
n.Type AS Type,
n.UserIdn AS UserIdn,
u.Username AS Username,
n.NewsIdn AS NewsIdn,
n.Header AS Header,
n.Text AS Text,
n.Tags AS Tags,
n.ImageLink AS ImageLink,
n.VideoLink AS VideoLink,
n.DateCreate AS DateCreate
FROM News n
LEFT JOIN Users u ON n.UserIdn = u.UserIdn
SECOND QUERY:
SELECT
IFNULL(SUM(Type = 'up'),0) AS Uplikes,
IFNULL(SUM(Type = 'down'),0) AS Downlikes,
(IFNULL(SUM(Type = 'up'),0) - IFNULL(SUM(Type = 'down'),0)) AS SumLikes
FROM JOIN Likes
WHERE NewsIdn=NewsIdn //only for example- in main sql NewsIdn = value NewsIdn from row table News
ORDER BY UpLikes DESC
AND TREE QUERY
SELECT
count(*) as Favorit
Form Favorites
WHERE NewsIdn=NewsIdn //only for example- in main sql NewsIdn = value NewsIdn from row table News
I would like to combine both queries, display all rows from the table News, as well as the number of Uplikes, DownLikes and number of Favorit for each value NewsIdn from the table of News (i.e. number of Uplikes, DownLikes and number of Favorit for each row of News) and make order by Uplikes Desc.
Tell me please how to make it?
P.S.: in result i would like next values
TYPE USERIDN USERNAME NEWSIDN HEADER TEXT TAGS IMAGELINK VIDEOLINK DATECREATE UPLIKES DOWNLIKES SUMLIKES FAVORIT
image 346412 test 260806 test 1388152519.jpg December, 27 2013 08:55:27+0000 2 0 2 2
image 108546 test2 905554 test2 1231231111111111111111111 123. 123 1388153493.jpg December, 27 2013 09:11:41+0000 1 0 1 0
text 108546 test2 270085 test3 123 .123 December, 27 2013 09:13:30+0000 1 0 1 0
image 108546 test2 764955 test4 1388192300.jpg December. 27 2013 19:58:22+0000 0 1 -1 0
First, your table structures with all the "Idn" of varchar(30). It appears those would actually be ID keys to the other tables and should be integers for better indexing and joining performance.
Second, this type of process, especially web-based is a perfect example of DENORMALIZING the values for likes, dislikes, and favorites by actually having those columns as counters directly on the record (ex: News table). When a person likes, dislikes or makes as a favorite, stamp it right away and be done with it. If a first time through you do a bulk sql-update do so, but also have triggers on the table to automatically handle updating the counts appropriately. This way, you just query the table directly and order by that which you need and you are not required to query all likes +/- records joined to all news and see which is best. Having an index on the news table will be your best bet.
Now, that said, and with your existing table constructs, you can do via pre-aggregate queries and joining them as aliases in the sql FROM clause... something like
SELECT
N.Type,
N.UserIdn,
U.UserName,
N.NewsIdn,
N.Header,
N.Text,
N.Tags,
N.ImageLink,
N.VideoLink,
N.DateCreate,
COALESCE( SumL.UpLikes, 0 ) as Uplikes,
COALESCE( SumL.DownLikes, 0 ) as DownLikes,
COALESCE( SumL.NetLikes, 0 ) as NetLikes,
COALESCE( Fav.FavCount, 0 ) as FavCount
from
News N
JOIN Users U
ON N.UserIdn = U.UserIdn
LEFT JOIN ( select
L.NewsIdn,
SUM( L.Type = 'up' ) as UpLikes,
SUM( L.Type = 'down' ) as DownLikes,
SUM( ( L.Type = 'up' ) - ( L.Type = 'down' )) as NetLikes
from
Likes L
group by
L.NewsIdn ) SumL
ON N.NewsIdn = SumL.NewsIdn
LEFT JOIN ( select
F.NewsIdn,
COUNT(*) as FavCount
from
Favorites F
group by
F.NewsIdn ) Fav
ON N.NewsIdn = Fav.NewsIdn
order by
SumL.UpLikes DESC
Again, I do not understand why you would have an auto-increment numeric ID column for the news table, then ANOTHER value for it as NewsIdn as a varchar. I would just have this and your other tables reference the News.ID column directly... why have two columns representing the same component. And obviously, each table you are doing aggregates (likes, favorites), should have indexes on any such criteria you would join or aggregate on (hence NewsIdn) column, UserIdn, etc.
And final reminder, this type of query is ALWAYS running aggregates against your ENTIRE TABLE of likes, favorites EVERY TIME and suggest going with denormalized columns to hold the counts when someone so selects them. You can always go back to the raw tables if you ever want to show or update for a particular person to change their like/dislike/favorite status.
You'll have to look into reading on triggers as each database has its own syntax for handling.
As for table structures, this is a SIMPLIFIED version of what I would have (removed many other columns from you SQLFiddle sample)
CREATE TABLE IF NOT EXISTS `News` (
id int(11) NOT NULL AUTO_INCREMENT,
UserID integer NOT NULL,
... other fields
`DateCreate` datetime NOT NULL,
PRIMARY KEY ( id ),
KEY ( UserID )
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=5 ;
extra key on the User ID in case you wanted all news activity created by a specific user.
CREATE TABLE IF NOT EXISTS `Users` (
id int(11) NOT NULL AUTO_INCREMENT,
other fields...
PRIMARY KEY ( id ),
KEY ( LastName, Name )
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=5 ;
additional key in case you want to do a search by a user's name
CREATE TABLE IF NOT EXISTS `Likes` (
id int(11) NOT NULL AUTO_INCREMENT,
UserId integer NOT NULL,
NewsID integer NOT NULL,
`Type` enum('up','down') NOT NULL,
`IsFavorite` enum('yes','no') NOT NULL,
`DateCreate` datetime NOT NULL,
PRIMARY KEY (`id`),
KEY ( UserID ),
KEY ( NewsID, IsFavorite )
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=6 ;
additional keys here for joining and/or aggregates. I've also added a flag column for being a favorite too. This could prevent the need of a favorites table since they hold the same basic content of the LIKES. So someone could just LIKE/DISLIKE, against a given news item, but ALSO LIKE/DISLIKE it as a FAVORITE the end-user wants to quickly be able to reference.
Now, how do these table structures get simplified for querying? Each table has its own "id" column, but any OTHER table is uses the tableNameID (UserID, NewsID, LikesID or whatever) and that is the join.
select ...
from
News N
Join Users U
on N.UserID = U.ID
Join Likes L
on N.ID = L.NewsID
Integer columns are easier and more commonly identifiable by others when writing queries... Does this make a little more sense?
SELECT
n.Type AS Type,
n.UserIdn AS UserIdn,
u.Username AS Username,
n.NewsIdn AS NewsIdn,
n.Header AS Header,
n.Text AS Text,
n.Tags AS Tags,
n.ImageLink AS ImageLink,
n.VideoLink AS VideoLink,
n.DateCreate AS DateCreate,
IFNULL(SUM(Likes.Type = 'up'),0) AS Uplikes,
IFNULL(SUM(Likes.Type = 'down'),0) AS Downlikes,
(IFNULL(SUM(Likes.Type = 'up'),0) - IFNULL(SUM(Likes.Type = 'down'),0)) AS SumLikes,
COUNT(DISTINCT Favorites.id) as Favorit
FROM News n
LEFT JOIN Users u ON n.UserIdn = u.UserIdn
LEFT JOIN Likes ON Likes.NewsIdn = n.NewsIdn
LEFT JOIN Favorites ON n.NewsIdn=Favorites.NewsIdn
GROUP BY n.NewsIdn
I have a 'photos' table:
photoID (INT), setID (INT)....
18900 , 234 , ...
18901 , 234 , ...
18902 , 234 , ...
18903 , 249 , ...
18904 , 249 , ...
18905 , 249 , ...
I also have a 'photoKeyword' table:
photoID (INT), keywordID (INT)
18900 , 12
18900 , 21
18901 , 17
18905 , 26
18905 , 10
As you can see from my examples above, photos 18902, 18903 and 18904 do NOT have any keywords in the photoKeyword table. This is exactly what I am trying to establish.
I am trying to produce a list of photoID's that don't have keywords but one setID at a time. So as you can see, photo 18902 doesn't have keywords and so does 18903 and 18904 but these three photos have two different setID's.
So running this query once, should only return photo 18902. I would then add keywords to this photo so it won't be a problem again. The next time I run the query it should return photo 18903 and 18904, the next set (setID:249) of photos that do not have keywords.
How is this possible? Is it possible just using SQL? I hope you can understand what I looking to achieve, I lost myself just writing about it!!
Any thoughts gratefully received...
try
SELECT X.photoID FROM photos X
INNER JOIN
(SELECT DISTINCT P.setID FROM
photos P
LEFT OUTER JOIN (SELECT K.photoID, COUNT(*) C FROM photoKeyword K GROUP BY K.photoID) KC ON KC.photoID = P.photoID
GROUP BY P.setID
HAVING SUM (KC.C) < 1) Y ON X.setID = Y.SetID
SELECT photoID
, setID
FROM photos
WHERE photoID NOT IN
( SELECT photoID
FROM photoKeyword
)
AND setID =
( SELECT setID
FROM photos
WHERE photoID NOT IN
( SELECT photoID
FROM photoKeyword
)
ORDER BY setID
LIMIT 1
)
This might be what you need which is quite simple... when you think about it... Ensure your keyword table has an index on the photo id
select
p.PhotoID,
p.SetID
from
Photos p
LEFT JOIN photoKeyword pkey
on p.PhotoID = pkey.PhotoID
where
pkey.PhotoID = null
By doing a LEFT join, we know it will always attempt to the second table. Then, if the second table has no matches, we know the ID trying to join on will be null... So, left join and return only those IDs where the answer is NULL in the second table.