MySQL for removing back to back duplicates - mysql

I have a table where there is some duplicate file access information where a duplicate file access is defined as the same user accessing the same file back to back. In other words, if the user accesses file A,B,A in that order, it's NOT considered a duplicate. So basically, per user I want to make sure that every subsequent access is for a file different from the last one.
UserID FileID
1 1
2 1
1 1 <- Remove
2 1 <- Remove
2 2
1 2
2 2 <- Remove
1 1
1 2
Anyone know how to approach something like this in mysql? Ideally, I would like to use it without the use of a function but I'm open to a function if it's the only option.
The table has the following columns: ID (primary key), userID, fileID, accessTime

If you made a SPROC it would look something like this. You may need another temp table and loop as the DELETE statement may fail while the cursor is open.
CREATE PROCEDURE `proc_CURSOR` ()
BEGIN
CREATE TEMPORARY TABLE lastUserAccess;
DECLARE cur1 CURSOR FOR SELECT userId, fileId, pkId FROM table1 ORDER BY time_stamp;
DECLARE a, b, c, d INT;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET a = 1;
OPEN cur1;
WHILE a = 0 DO
FETCH cur1 INTO b, c, d;
SELECT fileId FROM lastUserAccess WHERE userId = b
IF fileId IS NULL THEN
INSERT INTO lastUserAccess (b, c, d);
ELSE
IF fileId = c THEN
DELETE FROM table1 WHERE pkId = d;
ELSE
UPDATE lastUserAccess SET fileId = c WHERE userId = b;
END IF;
END IF;
END WHILE;
CLOSE cur1;
END

For mysql
DELETE a from tbl a , tbl b WHERE a.Id>b.Id and
a.UserID = b.UserID and a.FileID= b.FileID;
Check this fiddle http://sqlfiddle.com/#!2/aece0a/1
Wont work in mysql, for sql server
DELETE FROM tbl WHERE ID NOT IN (SELECT MIN(ID)
FROM tbl GROUP BY userID, fileID)
Hope this works for you.

Related

Writing stored procedure which flags duplicate values in a comma separated field in MySQL

I have a database table like this sample:
ID THINGS HAS_DUPLICATES
1 AAA, BBB, AAA NULL
2 CCC, DDD NULL
I am trying to write a stored procedure to flag duplicate values in THINGS field.
After calling the procedure the table will become like this:
ID THINGS HAS_DUPLICATES
1 AAA, BBB, AAA YES
2 CCC, DDD NO
Please be informed that I am trying to resolve it using only SQL and without normalizing my database. I am also aware of other approaches like writing PHP code.
Schema:
DROP TABLE IF EXISTS evilThings; -- orig table with dupes
CREATE TABLE evilThings
( ID INT AUTO_INCREMENT PRIMARY KEY,
THINGS TEXT NOT NULL,
HAS_DUPLICATES INT NULL
);
INSERT evilThings(ID,THINGS) VALUES
(1,"'AAA, BBB, AAA'"),
(2,"'CCC, DDD'");
CREATE TABLE notEvilAssocTable
( ai INT AUTO_INCREMENT PRIMARY KEY, -- no shuffle on inserts
ID INT NOT NULL,
THING VARCHAR(100) NOT NULL,
UNIQUE KEY `unqK_id_thing` (ID,THING) -- no dupes, this is honorable
);
Stored Proc:
DROP PROCEDURE IF EXISTS splitEm;
DELIMITER $$
CREATE PROCEDURE splitEm()
BEGIN
DECLARE lv_ID,pos1,pos2,comma_pos INT;
DECLARE lv_THINGS TEXT;
DECLARE particle VARCHAR(100);
DECLARE strs_done INT DEFAULT FALSE; -- string search done
DECLARE done INT DEFAULT FALSE; -- cursor done
DECLARE cur111 CURSOR FOR SELECT ID,THINGS FROM evilThings ORDER BY ID;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = TRUE;
-- Please note in the above, CURSOR stuff MUST come LAST else "Error 1337: Variable or condition decl aft curs"
-- -------------------------------------------------------------------------------------------------------------------
TRUNCATE TABLE notEvilAssocTable;
OPEN cur111;
read_loop: LOOP
SET strs_done=FALSE;
FETCH cur111 INTO lv_ID,lv_THINGS;
IF done THEN
LEAVE read_loop;
END IF;
SET pos1=1,comma_pos=0;
WHILE !strs_done DO
SET pos2=LOCATE(',', lv_THINGS, comma_pos+1);
IF pos2=0 THEN
SET pos2=LOCATE("'", lv_THINGS, comma_pos+1);
IF pos2!=0 THEN
SET particle=SUBSTRING(lv_THINGS,comma_pos+1,pos2-comma_pos-1);
SET particle=REPLACE(particle,"'","");
SET particle=TRIM(particle);
INSERT IGNORE notEvilAssocTable (ID,THING) VALUES (lv_ID,particle);
END IF;
SET strs_done=1;
ELSE
SET particle=SUBSTRING(lv_THINGS,comma_pos+1,pos2-comma_pos-1);
SET particle=REPLACE(particle,"'","");
SET particle=TRIM(particle);
INSERT IGNORE notEvilAssocTable (ID,THING) VALUES (lv_ID,particle);
SET comma_pos=pos2;
END IF;
END WHILE;
END LOOP;
CLOSE cur111; -- close the cursor
END$$
DELIMITER ;
Test:
call splitEm();
See results of split:
select * from notEvilAssocTable;
Note that position 3, the InnoDB gap (from INSERT IGNORE). It is simply the innodb gap anomaly, an expected side effect like so many of InnoDB. In this case driven by the IGNORE part that creates a gap. No problem though. It forbids duplicates in our new table for split outs. It is common. It is there to protect you.
If you did not mean to have the single quote at the beginning and end of the string in the db, then change the routine accordingly.
Here is the answer to my question, assuming the data in THINGS field are separated by a bar '|'. Our original table will be myTABLE:
ID THINGS THINGSCount THINGSCountUnique HAS_DUPLICATES
1 AAA|BBB|AAA NULL NULL NULL
2 CCC|DDD NULL NULL NULL
Step 1. Check the maximum number of values separated by a bar '|' in THINGS field:
SELECT ROUND((CHAR_LENGTH(THINGS) - CHAR_LENGTH(REPLACE(THINGS,'|',''))) / CHAR_LENGTH('|')) + 1 FROM myTABLE;
Step 2. Assuming the answer from step 1 was 7, now use the following SQL to split the data in THINGS field into rows, there are many other approaches which you can Google to do the split:
CREATE TABLE myTABLE_temp
SELECT ID, SUBSTRING_INDEX(SUBSTRING_INDEX(myTABLE.THINGS, '|', n.n), '|', -1) THINGS
FROM myTABLE JOIN
( SELECT n FROM
( SELECT 1 AS N UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 ) a ) n
ON CHAR_LENGTH(THINGS) - CHAR_LENGTH(REPLACE(THINGS, '|', '')) >= n - 1
ORDER BY ID;
Our myTABLE_temp table will be something like:
ID THINGS
1 AAA
1 BBB
1 AAA
2 CCC
2 DDD
Step 3. Here we create two new tables to hold COUNT(THINGS) and COUNT(DISTINCT THINGS) as following:
# THINGSCount
CREATE TABLE myTABLE_temp_2
SELECT ID, COUNT(THINGS) AS THINGSCount FROM myTABLE_temp GROUP BY ID;
# Remember to ADD INDEX to ID field
UPDATE myTABLE A INNER JOIN myTABLE_temp_2 B ON(A.ID = B.ID) SET A.THINGSCount = B.THINGSCount;
# THINGSCountUnique
CREATE TABLE myTABLE_temp_3
SELECT ID, COUNT(THINGS) AS THINGSCountUnique FROM myTABLE_temp GROUP BY ID;
# Remember to ADD INDEX to ID field
UPDATE myTABLE A INNER JOIN myTABLE_temp_3 B ON(A.ID = B.ID) SET A.THINGSCountUnique = B.THINGSCountUnique;
Final Step: Flag duplicate values:
UPDATE myTABLE SET HAS_DUPLICATES = IF(THINGSCount>THINGSCountUnique, 'DUPLICATES', 'NO');

change one table data when single column all data of another table is equal to some value in mysql

I have created 2 tables named snag_list and defect_list. I need to change the status field of snag_list to 2 when all the defect_list status should to be 2
Not sure if this helps but try to create a trigger for defect_list
and check the distinct count of status column if it is one and the value is 2 then update the snag_list a example would look like this
DELIMITER $$
CREATE TRIGGER checkstatus
AFTER UPDATE ON defect_list
FOR EACH ROW
BEGIN
DECLARE cnt INT
SELECT COUNT(DISTINCT status) FROM defect_list INTO cnt
DECLARE st INT
SELECT DISTINCT status FROM defect_list LIMIT 1 INTO st
IF(cnt = 1 AND st = 2)
UPDATE snag_list SET status = 2
ENF IF
END$$
DELIMITER ;
Your question is very vague but I guess this is what you may be looking for.
DECLARE
count_rec VARCHAR2(10);
data_rec VARCHAR2(10);
BEGIN
SELECT COUNT(DISTINCT status) INTO count_rec FROM defect_list;
SELECT DISTINCT status INTO data_rec FROM defect_list;
IF (count_rec = '1' AND data_rec = '2') THEN
UPDATE snag_list SET status = '2';
END IF;
END;
edit -> You can change the datatype of the 2 variables as required. Go with VARCHAR2 if you're unsure whether the data would be numeric.

MySQL Count products from all subcategories

I have two tables; categories and products. For each category i would like to count how many products there are in all of its subcategories. I already have counted how many are in each category. Example tables are:
Categories:
ID ParentID ProductCount SubCategoryProducts
1 NULL 0
2 1 2
3 2 1
Products:
ProductID CategoryID
123 2
124 2
125 3
So i would like my function to make:
ID ParentID ProductCount SubCategoryProducts
1 NULL 0 3
2 1 2 1
3 2 1 0
It simply needs to be as a select query, no need to update the database.
Any ideas?
EDIT: SQL FIddle: http://sqlfiddle.com/#!2/1941a/4/0
If it were me I'd create a STORED PROCEDURE. The other option is to loop with PHP through the first query, then for each ID run another query - but this kind of logic can slow down your page drastically.
Here's a nice tutorial on stored procedures: http://net.tutsplus.com/tutorials/an-introduction-to-stored-procedures/
Basically you run the same loops I mentioned above you would with PHP (but it runs much faster). The procedure is stored in the database and can be called like a function. The result is the same as a query.
As requested, here's a sample procedure (or rather, it uses two) in my instance, "ags_orgs" acts in a similar way to your categories where there is a parentOrgID. "getChildOrgs" also acts kind of like a redundant function since I had no idea how many levels down I had to go (this was written for MSSQL - there are probably differences with mySQL) Unfortunately this doesn't count rows, rather it gets data. I highly recommend following a tutorial or two to get a better grip on how it works:
USE [dbname]
GO
/****** Object: StoredProcedure [dbo].[getChildOrgs] Script Date: 09/26/2012 15:30:06 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE PROCEDURE [dbo].[getChildOrgs]
#myParentID int,
#isActive tinyint = NULL
AS
BEGIN
SET NOCOUNT ON
DECLARE #orgID int, #orgName varchar(255), #level int
DECLARE cur CURSOR LOCAL FOR SELECT orgID FROM dbo.ags_orgs WHERE parentOrgID = #myParentID AND isActive = ISNULL(#isActive, isActive) ORDER BY orderNum, orgName
OPEN cur
fetch next from cur into #orgID
WHILE ##fetch_status = 0
BEGIN
INSERT INTO #temp_childOrgs SELECT orgID,orgName,description,parentOrgID,adminID,isActive,##NESTLEVEL-1 AS level FROM dbo.ags_orgs WHERE orgID = #orgID
EXEC getChildOrgs #orgID, #isActive
-- get next result
fetch next from cur into #orgID
END
CLOSE cur
DEALLOCATE cur
END
GO
Which is called by this proc:
USE [dbname]
GO
/****** Object: StoredProcedure [dbo].[execGetChildOrgs] Script Date: 09/26/2012 15:29:34 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE PROCEDURE [dbo].[execGetChildOrgs]
#parentID int,
#isActive tinyint = NULL,
#showParent tinyint = NULL
AS
BEGIN
CREATE TABLE #temp_childOrgs
(
orgID int,
orgName varchar(255),
description text,
parentOrgID int,
adminID int,
isActive tinyint,
level int
)
-- if this isn't AGS top level (0), make the first record reflect the requested organization
IF #parentID != 0 AND #showParent = 1
BEGIN
INSERT INTO #temp_childOrgs SELECT orgID,orgName,description,parentOrgID,adminID,isActive,0 AS level FROM dbo.ags_orgs WHERE orgID = #parentID
END
exec getChildOrgs #parentID, #isActive
SELECT * FROM #temp_childOrgs
DROP TABLE #temp_childOrgs
END
GO
Here is my procedure for counting products in all subcategories
DELIMITER $$
CREATE PROCEDURE CountItemsInCategories(IN tmpTable INT, IN parentId INT, IN updateId INT)
BEGIN
DECLARE itemId INT DEFAULT NULL;
DECLARE countItems INT DEFAULT NULL;
DECLARE done INT DEFAULT FALSE;
DECLARE recCount INT DEFAULT NULL;
DECLARE
bufItemCategory CURSOR FOR
SELECT
itemCategory.id AS id,
COUNT(CASE WHEN item.isVisible = 1 then 1 ELSE NULL END) items
FROM
itemCategory
LEFT JOIN item ON
item.categoryId = itemCategory.id
WHERE
itemCategory.isVisible = 1 AND itemCategory.categoryParentId = parentId
GROUP BY
itemCategory.id
ORDER BY
itemCategory.name;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = TRUE;
SET max_sp_recursion_depth = 10000;
IF tmpTable = 1 THEN
DROP TEMPORARY TABLE IF EXISTS tblResults;
CREATE TEMPORARY TABLE IF NOT EXISTS tblResults(
id INT NOT NULL PRIMARY KEY,
items INT
);
END IF;
OPEN bufItemCategory;
Reading_bufItemCategory: LOOP
FETCH FROM bufItemCategory INTO itemId, countItems;
IF done THEN
LEAVE Reading_bufItemCategory;
END IF;
IF tmpTable = 1 THEN
INSERT INTO tblResults VALUES(itemId, countItems);
ELSE
UPDATE tblResults SET items = items + countItems WHERE id = updateId;
END IF;
SET recCount = (SELECT count(*) FROM itemCategory WHERE itemCategory.categoryParentId = itemId AND itemCategory.isVisible = 1);
IF recCount > 0 THEN
CALL CountItemsInCategories(0, itemId, CASE WHEN updateId = 0 then itemId ELSE updateId END);
END IF;
END LOOP Reading_bufItemCategory;
CLOSE bufItemCategory;
IF tmpTable = 1 THEN
SELECT * FROM tblResults WHERE items > 0;
DROP TEMPORARY TABLE IF EXISTS tblResults;
END IF;
END $$
DELIMITER;
To call procedure just run:
CountItemsInCategories(firstLoop,parentId,updateId);
Where parameters are:
firstLoop - always "1" for first loop
parentId - parent of subcategories
updateId - id of row to update, always "0" for first loop
On example:
CountItemsInCategories(1,1,0);
I hope this example will be useful to someone.
This assumes you have
Product table named prods
prod_id|categ_id
and Category table named categ
categ_id|parent_categ_id
As you seem to be using Adjacency List structure where foreign key parent_categ_id column references prod_id column at the same table
the following query should work
select c1.categ_id,c1.parent_categ_id,count(prods.prod_id)
as product_count from categ c1
join prods on prods.categ_id=c1.categ_id or prods.categ_id
in( with recursive tree(id,parent_id)as
(select categ_id,parent_categ_id from categ
where categ_id=c1.categ_id
union all
select cat.categ_id,cat.parent_categ_id from categ cat
join tree on tree.id=cat.parent_categ_id) select id from tree)
group by c1.categ_id,c1.parent_categ_id
order by product_count
You can do this in one statement if you have a limit on the depth of the hierarchy. You said you only have 4 levels in total.
SELECT SUM(ProductCount)
FROM (
SELECT c0.ID, c0.ProductCount
FROM Categories AS c0
WHERE c0.ID = 1
UNION ALL
SELECT c1.ID, c1.ProductCount
FROM Categories AS c0
JOIN Categories AS c1 ON c0.ID = c1.ParentID
WHERE c0.ID = 1
UNION ALL
SELECT c2.ID, c2.ProductCount
FROM Categories AS c0
JOIN Categories AS c1 ON c0.ID = c1.ParentID
JOIN Categories AS c2 ON c1.ID = c2.ParentID
WHERE c0.ID = 1
UNION ALL
SELECT c3.ID, c3.ProductCount
FROM Categories AS c0
JOIN Categories AS c1 ON c0.ID = c1.ParentID
JOIN Categories AS c2 ON c1.ID = c2.ParentID
JOIN Categories AS c3 ON c2.ID = c3.ParentID
WHERE c0.ID = 1
) AS _hier;
That'll work for this query if you store the hierarchy in the way you're doing, which is called Adjacency List. Basically, the ParentID is the way each node records its position in the hierarchy.
There are a few other ways of storing hierarchies that allow for easier querying of whole trees or subtrees. The best data organization depends on which queries you want to run.
Here are some more resources:
Models for Hierarchical Data with SQL and PHP (user # RaymondNijland linked to it in a comment)
I gave that presentation as a webinar (free to view the recording, but requires registration).
My book, SQL Antipatterns Volume 1: Avoiding the Pitfalls of Database Programming.
What is the most efficient/elegant way to parse a flat table into a tree?

SQL Query to find unique values across multiple groups

So i have a table in the format of:
Name, url, xpath, value
The problem is Name A, B, and F have a url U1 and xpath X1 that are the same (don't care about value).
Name C, D, E do not have that url U1 or did not have xpath X1.
Name B, C, D, E may share U2 and X2
I am trying to find the best way to find where the URL and xpath exist in all Names (A-F).
I didn't know if i should create a temp table with all the unique values where URL and xpaths match. then use that temp table to go through all the names and if all the names have that URL and Xpath then add it to a second temp table, then i would just return all the results from the final temp table.
Thanks!
Here is some example data:
Name, URL, Xpath, value
John, /MyAttributes.xml, /attribute/arms, 2
John, /MyAttributes.xml, /attributes/legs, 2
John, /MyQualities.xml, /qualities/race, human
Derek, /MyAttributes.xml, /attribute/legs, 2
Derek, /MyQualities.xml, /qualities/race, non-human
So the names could grow to have hundreds of names, and the information i am trying to gather would be that "/MyAttributes.xml, /attributes/legs" exists in both John & Derek, and as the db grows i still need to be able to see which url with xpath exists across all names.
Hopefully that helps providing more data.
Try this:
select
url,
xpath
from table
group by
url,
xpath
having
max(case when name='A' then
1
else
0
end) +
max(case when name='B' then
2
else
0
end) +
max(case when name='C' then
4
else
0
end) +
max(case when name='D' then
8
else
0
end) +
max(case when name='E' then
16
else
0
end) +
max(case when name='F' then
32
else
0
end) = 63;
Here's a variation on the answer submitted by Mark Bannister:
SELECT t.url
FROM
myTable t
CROSS JOIN (SELECT COUNT(DISTINCT name) AS cnt FROM myTable) x
GROUP BY t.url
HAVING COUNT(DISTINCT t.name) = MAX(x.cnt);
To find all urls that exist for all names, try:
select url
from myTable
group by url
having count(distinct name) = (select count(distinct name) from myTable)
To find xpaths that exist for all names, swap xpath and url in the above query.
So this is what i ended up doing. Thanks to everyone who helped. If you know a good way to optimize this that would be AWESOME.
To summarize i did end up creating a temp table. i then go through all the unique results. and see if that url + xpath exists on all the unique names. then if it does i insert it into my temp table, where i just dump that out at the end.
BEGIN
DECLARE bDone INT;
DECLARE var1 VARCHAR(845);
DECLARE var2 VARCHAR(45);
DECLARE var3 VARCHAR(800);
DECLARE curs CURSOR FOR SELECT DISTINCT CONCAT(url, xpath), url, xpath FROM myTable;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET bDone = 1;
DROP TEMPORARY TABLE IF EXISTS tblResults;
CREATE TEMPORARY TABLE IF NOT EXISTS tblResults (
url VARCHAR(45),
xpath VARCHAR(800)
);
OPEN curs;
SET bDone = 0;
REPEAT
FETCH curs INTO var1, var2, var3;
IF
(
SELECT
COUNT(DISTINCT name)
FROM myTable as l
WHERE
l.url = var2 AND
l.xpath = var3
) = (
SELECT
COUNT(DISTINCT name)
FROM myTable
) THEN
INSERT INTO tblResults VALUES (var2, var3);
END IF;
UNTIL bDone END REPEAT;
CLOSE curs;
SELECT * FROM tblResults;
END

Create insert trigger to auto increment int field of composite PK (String, int), restart numbering at 1 for new Strings

I've read that this can be done without issue using MyISAM as it is the default behavior , but I'm using InnoDB so need a trigger for such.
The two PK fields are batch and lineItem. If a record is deleted I want the numbering to start from the largest integer for batch. Not fill in the holes.
This is to set up a testing environment for a legacy system. So the schema is the way it is, I thought I'd mention that to avoid any discussion about whether it is good or not.
Edit: I want something like the following insert statement as a trigger
INSERT INTO payroll(`batch`,`lineItem`)
(select 'T105',t1.lineItem + 1 from payroll as t1 where batch = 'T105' order by lineItem desc limit 1);
But where 'T105' (the batch id) is hard coded I want the trigger to pick that up from the insert.
So I want to be able to say something like:
INSERT INTO payroll(`batch`)VALUES('T001','T001','T001', 'T002', 'T002', 'T002');
and I would expect to see in the table:
batch lineItem
T001 1
T001 2
T001 3
T002 1
T002 2
T002 3
Getting further:
In trying to implement this I've come up with:
DELIMITER $$
CREATE TRIGGER `co05_test`.`ins_lineItem`
BEFORE INSERT ON `co05_test`.`my_table`
FOR EACH ROW
BEGIN
select lineItem + 1 into #newLineItem from my_table where batch = NEW.batch order by lineItem desc limit 1;
set NEW.lineItem = #newLineItem;
END$$
However when I try...
INSERT INTO `co05_test`.`my_table`(`batch`)VALUES('T001');
I get this error: Column 'lineItem' cannot be null
Which is defined as not being nullable but I though the trigger should set the value!
Solution which I used:
-- Trigger DDL Statements
DELIMITER $$
USE `co05_test`$$
CREATE TRIGGER `co05_test`.`ins_lineItem`
BEFORE INSERT ON `co05_test`.`my_table`
FOR EACH ROW
BEGIN
select count(*) into #batchCount from my_table where batch = NEW.batch;
select lineItem + 1 into #newLineItem from my_table where batch = NEW.batch order by lineItem desc limit 1;
if #batchCount > 0 then
set NEW.lineItem = #newLineItem;
else
set NEW.lineItem = 1;
end if;
END;
$$
Have you tried declaring the variable instead?
DELIMITER $$
CREATE TRIGGER `co05_test`.`ins_lineItem`
BEFORE INSERT ON `co05_test`.`my_table`
FOR EACH ROW
BEGIN
DECLARE newLineItem INT;
SELECT
lineItem + 1 into newLineItem
FROM my_table
WHERE batch = NEW.batch
ORDER BY lineItem DESC
LIMIT 1;
SET NEW.lineItem = newLineItem;
END$$