T-SQL remove duplicates from table?

T-SQL remove duplicates from table? - sql-server-2008

We have a table that lists png images and their source URL's.
Sometimes the table has rows with the same image URL's but different image pixel widths and heights. I want to remove such duplicates, keeping only the duplicate that has the biggest image width then biggest image height.
I've tried various methods that I used to use in MSAccess (like GroupBy and First but First for example isn't available in SQL Server so thought I'd ask for T-SQL help).
Can anyone give T-SQL that would remove the duplicates (keeping the biggest image row of each duplicate)?
CREATE TABLE [dbo].[tblImageSuggestions]
(
[CounterID] [bigint] IDENTITY(700996,1) NOT NULL,
[CreatedDateTime] [datetime] NOT NULL,
[EmailAddress] [nvarchar](200) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
[ImageOriginalURL] [nvarchar](2000) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
[ImageOriginalWidthPixels] [int] NOT NULL,
[ImageOriginalHeightPixels] [int] NOT NULL,
CONSTRAINT [PK_tblImageSuggestions]
PRIMARY KEY CLUSTERED ([CounterID] ASC)
WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF,
IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON,
ALLOW_PAGE_LOCKS = ON)
)
SET IDENTITY_INSERT [dbo].[tblImageSuggestions] ON
INSERT [dbo].[tblImageSuggestions] ([CounterID], [CreatedDateTime], [EmailAddress], [ImageOriginalURL], [ImageOriginalWidthPixels], [ImageOriginalHeightPixels])
VALUES (701030, CAST(0x0000A6AD0005543F AS DateTime), N'webmaster#mysite.org', N'MyURL1', 1024, 1024)
INSERT [dbo].[tblImageSuggestions] ([CounterID], [CreatedDateTime], [EmailAddress], [ImageOriginalURL], [ImageOriginalWidthPixels], [ImageOriginalHeightPixels])
VALUES (701031, CAST(0x0000A6AD00055445 AS DateTime), N'webmaster#mysite.org', N'MyURL2', 450, 450)
INSERT [dbo].[tblImageSuggestions] ([CounterID], [CreatedDateTime], [EmailAddress], [ImageOriginalURL], [ImageOriginalWidthPixels], [ImageOriginalHeightPixels])
VALUES (701032, CAST(0x0000A6AD00055489 AS DateTime), N'webmaster#mysite.org', N'MyURL3', 3000, 3000)
INSERT [dbo].[tblImageSuggestions] ([CounterID], [CreatedDateTime], [EmailAddress], [ImageOriginalURL], [ImageOriginalWidthPixels], [ImageOriginalHeightPixels])
VALUES (701033, CAST(0x0000A6AD00055768 AS DateTime), N'webmaster#mysite.org', N'MyURL2', 1024, 1024)
INSERT [dbo].[tblImageSuggestions] ([CounterID], [CreatedDateTime], [EmailAddress], [ImageOriginalURL], [ImageOriginalWidthPixels], [ImageOriginalHeightPixels])
VALUES (701034, CAST(0x0000A6AD00055771 AS DateTime), N'webmaster#mysite.org', N'MyURL1', 450, 450)
INSERT [dbo].[tblImageSuggestions] ([CounterID], [CreatedDateTime], [EmailAddress], [ImageOriginalURL], [ImageOriginalWidthPixels], [ImageOriginalHeightPixels])
VALUES (701035, CAST(0x0000A6AD0005577A AS DateTime), N'webmaster#mysite.org', N'MyURL4', 768, 768)
SET IDENTITY_INSERT [dbo].[tblImageSuggestions] OFF

;with cte as (
Select *,RowNr=Row_Number() over (Partition By ImageOriginalURL Order by ImageOriginalWidthPixels*ImageOriginalHeightPixels Desc)
From [tblImageSuggestions]
)
--Delete From cte Where RowNr>1
Select * from cte Where RowNr>1 -- To be deleted ... Remove if Satisfied

Related

MySQL get max(PRIMARY KEY) for each compound group

I've got the use case to version objects (identified by objectOwnerId and objectId group). I insert rows to ledger table with their respective hashes.
The order of the ledger table is identified by the compound PRIMARY KEY and its timestamp up to microsecond precision + additional 3 byte entropy at the end to prevent collisions (in case multiple rows gets inserted at the same microsecond).
Once data is stored I need efficient way to get the latest hash for multiple objects at once. I've came up with a query (please see end of this post) which is built from sub-selects with JOIN and GROUP BY, but it's pretty complex I think and I am looking for ways to address my problem in a simpler (if possible) way.
Is there any way for improvement?
It would've been simpler if I have PRIMARY KEY which isn't COMPOUND, in which case I could pass the max() value upwards, however that's not the case. I was also thinking if I could merge my TIMESTAMP(6) - 7 bytes with BINARY(3) - 3 bytes and store it as BINARY(10), but wasn't sure if that's easily possible.
Please find the schema, test data and SELECT queries below.
This is my table:
CREATE TABLE `ledger` (
`objectOwnerId` CHAR(10) NOT NULL,
`objectId` VARCHAR(50) NOT NULL,
`objectHash` BINARY(16) NOT NULL,
`timestamp` TIMESTAMP(6) NOT NULL DEFAULT CURRENT_TIMESTAMP(6),
`timestampAdditionalEntropy` BINARY(3) NOT NULL,
PRIMARY KEY (`timestamp`, `timestampAdditionalEntropy`),
UNIQUE(`objectHash`),
INDEX(`objectId`(10))
);
Let's insert some values:
INSERT INTO ledger (objectOwnerId, objectId, objectHash, timestampAdditionalEntropy) VALUES ('owneraaaaa', 'ida', unhex(substring(sha1(random_bytes(16)), 1, 32)), random_bytes(3));
INSERT INTO ledger (objectOwnerId, objectId, objectHash, timestampAdditionalEntropy) VALUES ('owneraaaaa', 'ida', unhex(substring(sha1(random_bytes(16)), 1, 32)), random_bytes(3));
INSERT INTO ledger (objectOwnerId, objectId, objectHash, timestampAdditionalEntropy) VALUES ('owneraaaab', 'idb', unhex(substring(sha1(random_bytes(16)), 1, 32)), random_bytes(3));
INSERT INTO ledger (objectOwnerId, objectId, objectHash, timestampAdditionalEntropy) VALUES ('owneraaaab', 'idb', unhex(substring(sha1(random_bytes(16)), 1, 32)), random_bytes(3));
INSERT INTO ledger (objectOwnerId, objectId, objectHash, timestampAdditionalEntropy) VALUES ('owneraaaab', 'idb', unhex(substring(sha1(random_bytes(16)), 1, 32)), random_bytes(3));
We've got this dataset:
# objectOwnerId, objectId, objectHash, timestamp, HEX(CAST(timestampAdditionalEntropy AS CHAR(6) CHARACTER SET utf8))
#'owneraaaab', 'idb', 'A8D3B63EFC6C63FD996B8D1931FBF748', '2019-05-29 11:38:12.353521', '725E3D'
#'owneraaaab', 'idb', '9B7395F9EE2F2363BA89C7FBAEDDBB54', '2019-05-29 11:38:12.352524', '8B8162'
#'owneraaaab', 'idb', '80393C5FF4492342D073B5F8B3388EC2', '2019-05-29 11:38:12.351569', 'FEAA02'
#'owneraaaaa', 'ida', '0D84F725ACAC87838C34742CA00BBEF7', '2019-05-29 11:38:12.350648', '41E425'
#'owneraaaaa', 'ida', '9A82C936A25C4648BFB75B692850841B', '2019-05-29 11:38:12.349625', '470685'
returned by this query:
select objectOwnerId, objectId, HEX(CAST(objectHash AS CHAR(32) CHARACTER SET utf8)) as objectHash, timestamp, HEX(CAST(timestampAdditionalEntropy AS CHAR(6) CHARACTER SET utf8))
from ledger
order by timestamp desc, timestampAdditionalEntropy desc;
I need to get this:
# objectOwnerId, objectId, objectHash, timestamp, HEX(CAST(s.timestampAdditionalEntropy AS CHAR(6) CHARACTER SET utf8))
#owneraaaaa, ida, 0D84F725ACAC87838C34742CA00BBEF7, 2019-05-29 11:38:12.350648, 41E425
#owneraaaab, idb, A8D3B63EFC6C63FD996B8D1931FBF748, 2019-05-29 11:38:12.353521, 725E3D
which this query can return:
select s.objectOwnerId, s.objectId, HEX(CAST(objectHash AS CHAR(32) CHARACTER SET utf8)) as objectHash, s.timestamp, HEX(CAST(s.timestampAdditionalEntropy AS CHAR(6) CHARACTER SET utf8)) from (
select s.objectOwnerId, s.objectId, s.timestamp, max(i.timestampAdditionalEntropy) as timestampAdditionalEntropy from (
select objectOwnerId, objectId, max(timestamp) as timestamp
from ledger where ((objectOwnerId = 'owneraaaaa' AND objectId = 'ida') OR (objectOwnerId = 'owneraaaab' AND objectId = 'idb'))
group by objectOwnerId, objectId
) s
JOIN ledger i on i.objectOwnerId = s.objectOwnerId and i.objectId = s.objectId and i.timestamp = s.timestamp
group by objectOwnerId, objectId, timestamp
) s
JOIN ledger i on i.objectOwnerId = s.objectOwnerId and i.objectId = s.objectId and i.timestamp = s.timestamp and i.timestampAdditionalEntropy = s.timestampAdditionalEntropy

SQL IF exist date by day do increment update else insert data

How can I express the below statement as a SQL query ?
IF EXISTS (SELECT * FROM expense_history
WHERE user_id = 40
AND DATE_FORMAT(expense_history.created_date , '%Y-%m-%d') = '2018-06-02'
AND camp_id='80')
UPDATE expense_history
SET clicks = clicks + 1,
amount = amount + 1
WHERE user_id = 40
AND DATE_FORMAT(expense_history.created_date, '%Y-%m-%d') = '2018-06-02'
AND camp_id = '80'
ELSE
INSERT INTO expense_history (camp_id, created_date, amount, user_id)
VALUES ('80', '2018-06-02 12:12:12', '1', '40')
END IF;
I just want to do increment clicks and amount if is set by day, else I want to add new row.

This is very tricky in MySQL. You are storing a datetime but you want the date part to be unique.
Starting in MySQL 5.7.?, you can use computed columns for the unique constraint. Here is an example:
create table expense_history (
user_id int,
camp_id int,
amount int default 0,
clicks int default 1,
. . .
created_datetime datetime, -- note I changed the name
created_date date generated always as (date(created_datetime)),
unique (user_id, camp_id, created_datetime)
);
You can then do the work as:
INSERT INTO expense_history (camp_id, created_datetime, amount, user_id)
VALUES (80, '2018-06-02 12:12:12', 1, 40)
ON DUPLICATE KEY UPDATE
amount = COALESCE(amount + 1, 1),
clicks = COALESCE(clicks + 1, 1);
Earlier versions of MySQL don't support generated columns. Nor do they support functions on unique. But you can use a trick on a prefix index on a varchar to do what you want:
create table expense_history (
user_id int,
camp_id int,
amount int default 0,
clicks int default 1,
. . .
created_datetime varchar(19),
unique (created_datetime(10))
);
This has the same effect.
Another alternative is to store the date and the time in separate columns.

I presumed your database is mysql, because of DATE_FORMAT() function(and edited your question as to be).
So, by using such a mechanism below, you can do what you want,
provided that a COMPOSITE PRIMARY KEY for camp_id, amount, user_id columns :
SET #camp_id = 80,
#amount = 1,
#user_id = 40,
#created_date = sysdate();
INSERT INTO expense_history(camp_id,created_date,amount,user_id,clicks)
VALUES(#camp_id,#created_date,#amount,#user_id,ifnull(clicks,1))
ON DUPLICATE KEY UPDATE
amount = #amount + 1,
clicks = ifnull(clicks,0)+1;
SQL Fiddle Demo

MySQL ORDER BY multiple elements in different where clauses

I have a question for which I do not know how exactly to call it. Maybe I just don't use the correct naming and therefore I cannot find the answer.
But the case is like this.
I have a database table with similair data to the following:
booking_id (int)
booking_start (Y-m-d)
booking_starttime (H:i)
booking_hotelstart (Y-m-d)
booking_hotelstarttime (Y-m-d)
booking_hotelend (Y-m-d)
booking_hotelendtime (H:i)
booking_end (Y-m-d)
booking_endtime (H:i)
booking_confirmed (bool)
Now I would like to make a query that does about this :
(invalid query just to demonstrate what I would like)
SELECT `booking_id` FROM `system_bookings` WHERE (
(`booking_start`='2014-10-20' ORDER BY `booking_starttime` ASC)
OR
(`booking_hotelstart`='2014-10-20' ORDER BY `booking_hotelstarttime` ASC)
OR
(`booking_hotelend`='2014-10-20' ORDER BY `booking_endtime` ASC)
OR
(`booking_end`='2014-10-20' ORDER BY `booking_endtime` ASC)
)
AND
`booking_confirmed` = TRUE LIMIT 0, 100
So basically an ORDER BY with a clause. But how to do this?? I have no clue how to correctly search for this. Hence I hope someone could show me a bit the direction I should be heading. Other than that. I would like to know how this is called. For next searches.
Thanx in advance!!!
edit :
I created some sample data as requested :
CREATE TABLE IF NOT EXISTS `system_bookings` (
`booking_id` int(6) NOT NULL AUTO_INCREMENT,
`booking_start` date NOT NULL,
`booking_starttime` varchar(5) NOT NULL,
`booking_hotelstart` date NOT NULL,
`booking_hotelstarttime` varchar(5) NOT NULL,
`booking_hotelend` date NOT NULL,
`booking_hotelendtime` varchar(5) NOT NULL,
`booking_end` date NOT NULL,
`booking_endtime` varchar(5) NOT NULL,
`booking_confirmed` tinyint(1) NOT NULL,
PRIMARY KEY (`booking_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=6 ;
INSERT INTO `system_bookings` (`booking_id`, `booking_start`, `booking_starttime`, `booking_hotelstart`, `booking_hotelstarttime`, `booking_hotelend`, `booking_hotelendtime`, `booking_end`, `booking_endtime`, `booking_confirmed`) VALUES
(1, '2014-10-09', '21:19', '2014-10-08', '21:19', '2014-10-23', '08:00', '2014-10-23', '22:00', 1),
(2, '2014-10-11', '16:00', '2014-10-27', '12:15', '2014-10-28', '17:45', '2014-10-28', '17:45', 1),
(3, '2014-10-10', '20:30', '2014-10-10', '20:30', '2014-10-11', '08:00', '2014-10-20', '14:00', 1),
(4, '2014-10-12', '20:00', '2014-10-12', '20:00', '2014-10-13', '05:00', '2014-10-29', '22:00', 0),
(5, '2014-10-22', '15:00', '2014-10-22', '20:30', '2014-10-23', '04:15', '2014-10-31', '12:00', 1);

You can have multiple conditions in the order by clause. So, formally, you seem to want this:
SELECT `booking_id`
FROM `system_bookings`
WHERE `booking_confirmed` = TRUE AND
(`booking_start` = '2014-10-20' OR
`booking_hotelstart` = '2014-10-20' OR
`booking_hotelend`='2014-10-20' OR
`booking_end`='2014-10-20'
)
ORDER BY (CASE WHEN `booking_start` = '2014-10-20' THEN `booking_starttime`
WHEN `booking_hotelstart` = '2014-10-20' THEN `booking_hotelstarttime`
WHEN `booking_hotelend` = '2014-10-20' THEN `booking_endtime`
WHEN `booking_end` = '2014-10-20' THEN `booking_endtime`
END)
LIMIT 0, 100;
However, this is a bit non-sensical, because you are comparing a value to a constant, and then sorting by that value. I suspect you want to prioritize the where clauses and really want something more like this:
ORDER BY (CASE WHEN `booking_start` = '2014-10-20' THEN 1
WHEN `booking_hotelstart` = '2014-10-20' THEN 2
WHEN `booking_hotelend` = '2014-10-20' THEN 3
WHEN `booking_end` = '2014-10-20' THEN 4
END)

MySQL. Counting just a part of my column value

I have a question relating tables in MySQL. To understand this better I'd rather show you. I have the following table:
CREATE TABLE IF NOT EXISTS `tip_masina` (
`id_tip` int(11) NOT NULL AUTO_INCREMENT,
`marca` varchar(40) NOT NULL,
`pret` int(11) NOT NULL,
PRIMARY KEY (`id_tip`),
UNIQUE KEY `marca` (`marca`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=16 ;
INSERT INTO `tip_masina` (`id_tip`, `marca`, `pret`) VALUES
(1, 'Chevrolet Impala', 8000),
(2, 'Chevrolet Camaro', 10000),
(3, 'Chevrolet Tahoe', 13000),
(4, 'Chevrolet Suburban', 12500),
(5, 'Chevrolet Cobalt', 4000),
(6, 'Dodge Charger', 14000),
(7, 'Dodge Avenger', 9000),
(8, 'Dodge Challenger', 6500),
(9, 'Dodge Dart', 3500),
(10, 'Dodge Durango', 3000),
(11, 'Ford Mustang', 7500),
(12, 'Ford Crown Victoria', 5000),
(13, 'Ford Focus', 4300),
(14, 'Ford Fiesta', 3700),
(15, 'Ford Escort', 1000);
What I want out of this table is to display the vehicle type and the number of vehicles, like:
marca | no_of_vehicles
Chevrolet 5
Dodge 5
Ford 5
Is there any way to do this without splitting the column marca in two columns?

Here is an easy way, using substring_index():
select substring_index(marca, ' ', 1) as marca, count(*)
from tip_masina
group by substring_index(marca, ' ', 1);

It might be better to split 'marca' into two columns so that it would be easier to find what you want. You won't need any special functions then (as e.g. substring_index).
EDIT:
You can use following code:
EDIT2:
Now it works (tested on mysql workbench with your queries creating table):
START TRANSACTION;
ALTER TABLE tip_masina ADD model VARCHAR(60) AFTER marca;
ALTER TABLE tip_masina CHANGE marca company VARCHAR(60);
UPDATE tip_masina SET model = SUBSTRING_INDEX(company, ' ', -1);
ALTER TABLE tip_masina DROP INDEX marca;
UPDATE tip_masina SET company = SUBSTRING_INDEX(company, ' ', 1);
SELECT * FROM tip_masina;

Increase Alphanumeric VARCHAR Entry by Value 1?

On an old project because of not thought through design I have a column which actually should be set to auto_increment, though it cannot be because it are alphanumeric entries as follows:
c01
c02
c03
(c99 would continue to c100 and more), the letter happened in the past and it would require to overhaul the system to take it out, thus I rather prefer this workaround.
Now I need a way to imitate the auto_increment functionality with the SQL statement myself, my own attempt has gotten as far as the following:
INSERT INTO tags (tag_id, tag_name, tag_description, added_by_user_id, creation_date, last_edited) VALUES (SELECT(MAX(tag_id)+1),
'Love', 'All about love', 7, now(), 0);
This one does not work as is, though the idea was to select the highest entry in the column "tag_id" and then simply increase it by the value 1.
Any ideas how to accomplish this?
By the way I am also not sure if you simply can increase an alphanumeric entry through this way, though I know it can be done, I just don't know how.

If you want to safely get the largest integer value of a tag id of the form c##.., you could use the following expression:
max( convert( substring(tag_id, 2) , unsigned integer) )
^^^ largest ^^^^^^^^^ after 'c' ^^^^^^^^^^^^^^^^ convert to positive number
Then your insert statement would look something like this:
set #newid = convert(
(select
max(convert( (substring(tag_id, 2)) , unsigned integer))+1
from tags), char(10)
);
set #newid = if(length(#newid) = 1, concat('0', #newid), #newid);
set #newid = concat('c', #newid);
INSERT INTO tags (tag_id, tag_name, tag_description, added_by_user_id,
creation_date, last_edited)
VALUES (#newid, 'Love', 'All about love', 7, now(), '2012-04-15');
Demo: http://www.sqlfiddle.com/#!2/0bd9f/1

this will increase from c01 to c02 to c03 ... to c99 to c100 to c101 ... to c999 to c1000 etc.
set #nextID = (SELECT CONCAT(SUBSTRING(`tag_id`, 1, 1), IF(CHAR_LENGTH(CAST(SUBSTRING(`tag_id`, 2)
AS UNSIGNED)) < 2, LPAD(CAST(CAST(SUBSTRING(`tag_id`, 2) AS UNSIGNED) + 1 AS CHAR), 2,
'0'), CAST(CAST(SUBSTRING(`tag_id`, 2) AS UNSIGNED) + 1 AS CHAR))) FROM `tags` ORDER BY
`tag_id` DESC LIMIT 1);
INSERT INTO tags (tag_id, tag_name, tag_description, added_by_user_id,
creation_date, last_edited) VALUES (#nextID, 'Love', 'All about love', 7, NOW(), null);

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

T-SQL remove duplicates from table? - sql-server-2008

;with cte as ( Select ,RowNr=Row_Number() over (Partition By ImageOriginalURL Order by ImageOriginalWidthPixelsImageOriginalHeightPixels Desc) From [tblImageSuggestions] ) --Delete From cte Where RowNr>1 Select * from cte Where RowNr>1 -- To be deleted ... Remove if Satisfied

Related

MySQL get max(PRIMARY KEY) for each compound group

SQL IF exist date by day do increment update else insert data

MySQL ORDER BY multiple elements in different where clauses

MySQL. Counting just a part of my column value

Increase Alphanumeric VARCHAR Entry by Value 1?

Categories

Resources

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

T-SQL remove duplicates from table? - sql-server-2008

;with cte as ( Select *,RowNr=Row_Number() over (Partition By ImageOriginalURL Order by ImageOriginalWidthPixels*ImageOriginalHeightPixels Desc) From [tblImageSuggestions] ) --Delete From cte Where RowNr>1 Select * from cte Where RowNr>1 -- To be deleted ... Remove if Satisfied

Related

MySQL get max(PRIMARY KEY) for each compound group

SQL IF exist date by day do increment update else insert data

MySQL ORDER BY multiple elements in different where clauses

MySQL. Counting just a part of my column value

Increase Alphanumeric VARCHAR Entry by Value 1?

Categories

Resources

;with cte as ( Select ,RowNr=Row_Number() over (Partition By ImageOriginalURL Order by ImageOriginalWidthPixelsImageOriginalHeightPixels Desc) From [tblImageSuggestions] ) --Delete From cte Where RowNr>1 Select * from cte Where RowNr>1 -- To be deleted ... Remove if Satisfied