I've almost finished a project involving customers and products and only identified at the end that we have duplicate records due to keying errors, where sales staff have added the same customer to the database more than once.
What I need to do is to identify the duplicate records by comparing Customer name and their Postcode and merge the Products so that the resulting updated products field is consistent with all of the products that are applicable to them, but only one customer record exists.
In order to explain this, I have put together a small example.
DROP TABLE IF EXISTS `tblProducts`;
CREATE TABLE `tblProducts` (
`ID` int(10) DEFAULT NULL,
`Customer` varchar(30) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`Postcode` varchar(30) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`Products` varchar(30) COLLATE utf8mb4_unicode_ci DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
INSERT INTO `tblProducts` VALUES ('1', 'Bradford', 'BR1 2HJ', '111&222&444');
INSERT INTO `tblProducts` VALUES ('2', 'Bradford', 'BR1 2HJ', '222');
INSERT INTO `tblProducts` VALUES ('3', 'Tanner', 'TE4 9PO', '777&333');
INSERT INTO `tblProducts` VALUES ('4', 'Smythe', 'SM3 8KO', '111&222');
INSERT INTO `tblProducts` VALUES ('5', 'Francis', 'FL2 6HG', '444&333');
INSERT INTO `tblProducts` VALUES ('6', 'Tanner', 'TE4 9PO', '555');
INSERT INTO `tblProducts` VALUES ('7', 'Peters', 'PE4 4PE', '444');
INSERT INTO `tblProducts` VALUES ('8', 'Jeffrey', 'JE9 4JK', '444&555&888');
INSERT INTO `tblProducts` VALUES ('9', 'Barnes', 'BA5 5AB', '999');
INSERT INTO `tblProducts` VALUES ('10', 'Smythe', 'SM1 4GE', '888&777&222');
If we run the following query, you will see that we have two duplicates, for Bradford and Tanner.
SELECT Customer, Postcode, COUNT(*) FROM tblProducts group by Customer, Postcode having count(*) > 1
Customer Postcode COUNT(*)
Bradford BR1 2HJ 2
Tanner TE4 9PO 2
The separate duplicate records are:
Customer Postcode Products
Bradford BR1 2HJ 111&222&444
Bradford BR1 2HJ 222
Tanner TE4 9PO 777&333
Tanner TE4 9PO 555
I need to run a MySQL query to 'merge products where customer and postcode count > 1' as above, so the end result will be:
Customer Postcode Products
Bradford BR1 2HJ 111&222&444
Tanner TE4 9PO 777&333&555
Note that there is only one instance of 222 in the first record as 222 already existed. The duplicate record will be removed from the MySQL table so that only one record exists.
I must admit, I had assumed this would be easy for MySQL to achieve and have spent ages researched merging rows, merging fields, removing duplicates and not found anything that seems to specifically to help.
Link to jsfiddle if it helps: http://sqlfiddle.com/#!9/966550/4/0
Can anyone help please as I am stuck.
Many thanks,
Rob
SELECT TP.Customer,TP.Postcode,TP.Products
FROM tblProducts TP
INNER JOIN
(
SELECT MIN(ID) ID FROM tblProducts GROUP BY Customer, Postcode
)INNERTABLE ON INNERTABLE.ID=TP.ID
You can try above query.
Related
So this is likely something simple, but I'm pulling my hair out trying to figure out an efficient way of doing this. I've looked at many other Q&A's, and I've messed with DISTINCT, GROUP BY, sub-queries, etc.
I've tried to super-simplify this example. (for the purpose of the example, there's no DB normalization) Here's a SQL fiddle:
http://sqlfiddle.com/#!9/948be7c/1
CREATE TABLE IF NOT EXISTS `orders` (
`id` int NOT NULL,
`name` varchar(90) NULL,
`email` varchar(200) NULL,
`phone` varchar(200) NULL,
PRIMARY KEY (`id`)
) DEFAULT CHARSET=utf8;
INSERT INTO `orders` (`id`, `name`, `email`, `phone`) VALUES
('1', 'Bob', 'bob#email.com', NULL),
('2', 'Bobby', 'bob#email.com', '1115551111'),
('3', 'Robert', 'robert#email.com', '1115551111'),
('4', 'Fred', 'fred#email.com', '1115552222'),
('5', 'Freddy', 'fred#email.com', '1115553333')
If I just run a simple select, I'll get:
But I'd like to "de-duplicate" any results that have the same email address or that have the same phone number - because they will be the same people, even if there are multiple ID's for them, and even if their names are spelled different. And then consolidate those results (one of the "distinct" email addresses and one of the "distinct" phone numbers along with one of the names and one of the ID's.)
So that for the above, I'd end up with something like this:
Any suggestions?
I think that you can do what you want by filtering with a correlated subquery:
select o.*
from orders o
where o.id = (
select o1.id
from orders o1
where o1.email = o.email or o1.phone = o.phone
order by o1.phone is not null desc, o1.email is not null desc, id
limit 1
)
This retains just one row out of those that have the same phone or email, while giving priority to the row whose phone and email is not null. Ties are broken by picking the lowest id.
For your sample data, this returns:
id name email phone
2 Bobby bob#email.com 1115551111
4 Fred fred#email.com 1115552222
There are a number of different ways your requirements could be interpreted.
One way would be to reframe it as a constraint: only return a record if one of these is true:
it has a non-null email and phone, and no record exists with the same email and phone and a lower id
it has a non-null email but null phone, and no record exists with the same email and a non-null phone, and no record exists with the same email and a null phone and a lower id
it has a non-null phone but null email, and no record exists with the same phone and a non-null email, and no record exists with the same phone and a null email and a lower id
This translates easily into a couple of joins, no group by or distinct required.
I have been trying to obtain the latest serial number of a particular product, so that I can show the next available serial number in my admin area.
Following is what I have been trying
SQL Fiddle
MySQL 5.5.32 Schema Setup:
CREATE TABLE IF NOT EXISTS `serials` (
`sn` varchar(11) NOT NULL,
`cxid` varchar(11) DEFAULT NULL,
`itmid` varchar(11) DEFAULT NULL,
PRIMARY KEY (`sn`)
);
INSERT INTO `serials` (`sn`, `cxid`, `itmid`) VALUES
('7', '00007', 'Name'),
('8', '00008', 'Name'),
('9', '00010', 'Name'),
('10', '00010', 'Name'),
('11', '00010', 'Name'),
('12', '00012', 'Name'),
('13', '00013', 'Name');
Query 1:
SELECT
sn
FROM serials AS t
INNER JOIN (SELECT MAX(sn) AS one FROM serials where cxid = '00010') AS s ON s.one = t.sn
Results:
I always get an empty result no matter what I do. What might be the problem? Maybe there is a much easier way?
And the point to note is that I have to get the serial only of a particular product, NOT from the entire table.
Are you using the right field? itmid = 0010? Shouldn't it be cxid?
You also have no values matching 0010. You should use '00010'. I'm fairly certain 0010 will not equal 00010 when you are using a varchar data type and you should wrap it in quotes to evaluate it as a string.
Lastly, sn from the looks of it should be an integer type. Without it being an integer, MAX() won't work correctly. There is a workaround for this if you are certain you need it as a varchar you can use ABS:
SELECT MAX(ABS(sn)) AS one FROM serials where cxid = '00010'
Are you using wrong field.
Use this :
SELECT sn FROM serials AS t INNER JOIN (SELECT MAX(sn) AS one FROM serials where cxid = '00010') AS s ON s.one = t.sn
You should use '00010' instead of '0010'. because you have no values matching 0010.
I need to copy a number of rows from a table that have the same id_shop value, then insert these rows back into the same table but with a different id_shop value. I'm not sure how to do the later part. I'm guessing that it will be a variation of the following.
INSERT INTO `ps_hook_module`(`id_module`, `id_shop`, `id_hook`, `position`)
SELECT `id_module`, `id_shop`, `id_hook`, `position` FROM `ps_hook_module` WHERE
`id_shop` = 1
INSERT INTO `ps_hook_module`(`id_module`, `id_shop`, `id_hook`, `position`)
SELECT `id_module`, 42, `id_hook`, `position` FROM `ps_hook_module`
WHERE `id_shop` = 1
42 is a different id_shop value you wanted
So I have this table named SAKAI_REALM_RL_FN that has 3 fields
REALM_KEY
ROLE_KEY
FUNCTION_KEY
What this statement needs to do is that if a certain 2 combinations of ROLE_KEY & FUNCTION_KEY don't exist for each REALM_KEY, than do an insert.
I was already taking a look at this StackOverflow post
I also have the query I was using for the singular inserts:
INSERT INTO `sakai`.`SAKAI_REALM_RL_FN` (`REALM_KEY`, `ROLE_KEY`, `FUNCTION_KEY`) VALUES (248620, 8, 308);
Psuedo-Code:
if(ROLE_KEY equals 8 and FUNCTION_KEY=308 don't exist for REALM_KEYS)
than insert ROLE_KEY=8 & FUNCTION_KEY=308
INSERT INTO `sakai`.`SAKAI_REALM_RL_FN` (`REALM_KEY`, `ROLE_KEY`, `FUNCTION_KEY`)
SELECT *primaryKey*
FROM `sakai`.`SAKAI_REALM_RL_FN`
WHERE not exists (SELECT *primaryKey*
from `sakai`.`SAKAI_REALM_RL_FN`
where role_key = 8 and function_key = 308);
Hope that helps...
I wasn't quite sure what you wanted, but here's something that you might find useful.
Schema with few entries:
CREATE TABLE ALOHA (
REALM_KEY VARCHAR(32) NOT NULL,
ROLE_KEY VARCHAR(32) NOT NULL,
FUNCTION_KEY VARCHAR(32) NOT NULL
);
INSERT INTO ALOHA VALUES ('1', '1', '1');
INSERT INTO ALOHA VALUES ('1', '1', '2');
INSERT INTO ALOHA VALUES ('1', '2', '1');
INSERT INTO ALOHA VALUES ('1', '2', '2');
INSERT INTO ALOHA VALUES ('1', '2', '3');
INSERT INTO ALOHA VALUES ('1', '2', '4');
Try to insert 3 entries (only one gets inserted):
INSERT INTO ALOHA (REALM_KEY, ROLE_KEY, FUNCTION_KEY)
SELECT * FROM (
SELECT '1' AS REALM_KEY, '2' AS ROLE_KEY, '1' AS FUNCTION_KEY
UNION ALL
SELECT '1', '2', '3'
UNION ALL
SELECT '1', '2', '5'
) s
WHERE NOT EXISTS
(SELECT 1 FROM ALOHA a
WHERE a.ROLE_KEY = s.ROLE_KEY
AND a.REALM_KEY = s.REALM_KEY
AND a.FUNCTION_KEY = s.FUNCTION_KEY);
The RDBMS is well-equipped to handle this, if you define the correct index.
Sounds like what you need is a compound UNIQUE index across all three columns. When you perform an INSERT IGNORE, the combination will be inserted if it does not already exist.
Note that this will fail if you already have non-unique rows in your table.
ALTER TABLE SAKAI_REALM_RL_FN ADD UNIQUE KEY `idx_unique_realm_role_function` (REALM_KEY, ROLE_KEY, FUNCTION_KEY)
Then the INSERT selects all the REALM_KEY values and static values for the other 2 columns. If the values already exist, they're ignored. Otherwise they are inserted along with the REALM_KEY.
INSERT IGNORE INTO SAKAI_REALM_RL_FN (REALM_KEY, ROLE_KEY, FUNCTION_KEY)
/* SELECT within INSERT gets all REALM_KEY plus the 2 static values */
SELECT
REALM_KEY,
8,
308
FROM SAKAI_REALM_RL_FN
Here's a demo
When you have completed the INSERT IGNORE, you can drop the UNIQUE KEY since it may no longer be needed.
ALTER TABLE SAKAI_REALM_RL_FN DROP KEY `idx_unique_realm_role_function`
I want to make a insert into 2 tables
visits:
visit_id int | card_id int
registration:
registration_id int | type enum('in','out') | timestamp int | visit_id int
I want something like:
INSERT INTO `visits` as v ,`registration` as v
(v.`visit_id`,v.`card_id`,r.`registration_id`, r.`type`, r.`timestamp`, r.`visit_id`)
VALUES (NULL, 12131141,NULL, UNIX_TIMESTAMP(), v.`visit_id`);
I wonder if its possible
It's not possible with one query as INSERT can only insert data to one table in mysql. You can either
write this as two queries and execute them as a batch
create a stored procedure that would execute two insert command
You can wrap those inserts in transaction if you need to make sure that both queries will write the data.
It seems like the problem you are trying to solve is to get the auto-increment value from the "visits" row to insert into "registration". Am I right?
If so, you can just use the LAST_INSERT_ID() function like this:
INSERT INTO `visits` (`visit_id`,`card_id`)
VALUES (NULL, 12131141);
INSERT INTO `registration` (`registration_id`, `type`, `timestamp`, `visit_id`)
VALUES (NULL, 'in', UNIX_TIMESTAMP(), LAST_INSERT_ID());
You can always do something like this
INSERT IGNORE INTO `table2` VALUES ((select id from table1 where col="value"), 3, 4, 5)
INSERT INTO designation as de,
department as da,
profile as pr
(designation_name,
depart_id,
id,
username,
department,
designation)
select de.designation_name,
de.depart_id,da.id,
pr.username,
pr.department,
pr.designation
from
designation,
department,
profile
de.designation_name='project manager' AND de.id='1' OR
de.depart_id='2' AND de.id='2' OR
da.id='2' OR
pr.username='kapil.purohit' AND pr.id='9' AND pr.status='1' OR
pr.department='1' AND pr.id='9' OR
pr.designation='3' AND pr.id='9' AND pr.status='1'
WHERE
de.id = da.id AND
da.id = pr.id AND
de.id = pr.id AND
ORDER BY de.id DESC