SQL Query to find unique values across multiple groups - mysql

So i have a table in the format of:
Name, url, xpath, value
The problem is Name A, B, and F have a url U1 and xpath X1 that are the same (don't care about value).
Name C, D, E do not have that url U1 or did not have xpath X1.
Name B, C, D, E may share U2 and X2
I am trying to find the best way to find where the URL and xpath exist in all Names (A-F).
I didn't know if i should create a temp table with all the unique values where URL and xpaths match. then use that temp table to go through all the names and if all the names have that URL and Xpath then add it to a second temp table, then i would just return all the results from the final temp table.
Thanks!
Here is some example data:
Name, URL, Xpath, value
John, /MyAttributes.xml, /attribute/arms, 2
John, /MyAttributes.xml, /attributes/legs, 2
John, /MyQualities.xml, /qualities/race, human
Derek, /MyAttributes.xml, /attribute/legs, 2
Derek, /MyQualities.xml, /qualities/race, non-human
So the names could grow to have hundreds of names, and the information i am trying to gather would be that "/MyAttributes.xml, /attributes/legs" exists in both John & Derek, and as the db grows i still need to be able to see which url with xpath exists across all names.
Hopefully that helps providing more data.

Try this:
select
url,
xpath
from table
group by
url,
xpath
having
max(case when name='A' then
1
else
0
end) +
max(case when name='B' then
2
else
0
end) +
max(case when name='C' then
4
else
0
end) +
max(case when name='D' then
8
else
0
end) +
max(case when name='E' then
16
else
0
end) +
max(case when name='F' then
32
else
0
end) = 63;

Here's a variation on the answer submitted by Mark Bannister:
SELECT t.url
FROM
myTable t
CROSS JOIN (SELECT COUNT(DISTINCT name) AS cnt FROM myTable) x
GROUP BY t.url
HAVING COUNT(DISTINCT t.name) = MAX(x.cnt);

To find all urls that exist for all names, try:
select url
from myTable
group by url
having count(distinct name) = (select count(distinct name) from myTable)
To find xpaths that exist for all names, swap xpath and url in the above query.

So this is what i ended up doing. Thanks to everyone who helped. If you know a good way to optimize this that would be AWESOME.
To summarize i did end up creating a temp table. i then go through all the unique results. and see if that url + xpath exists on all the unique names. then if it does i insert it into my temp table, where i just dump that out at the end.
BEGIN
DECLARE bDone INT;
DECLARE var1 VARCHAR(845);
DECLARE var2 VARCHAR(45);
DECLARE var3 VARCHAR(800);
DECLARE curs CURSOR FOR SELECT DISTINCT CONCAT(url, xpath), url, xpath FROM myTable;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET bDone = 1;
DROP TEMPORARY TABLE IF EXISTS tblResults;
CREATE TEMPORARY TABLE IF NOT EXISTS tblResults (
url VARCHAR(45),
xpath VARCHAR(800)
);
OPEN curs;
SET bDone = 0;
REPEAT
FETCH curs INTO var1, var2, var3;
IF
(
SELECT
COUNT(DISTINCT name)
FROM myTable as l
WHERE
l.url = var2 AND
l.xpath = var3
) = (
SELECT
COUNT(DISTINCT name)
FROM myTable
) THEN
INSERT INTO tblResults VALUES (var2, var3);
END IF;
UNTIL bDone END REPEAT;
CLOSE curs;
SELECT * FROM tblResults;
END

Related

SQL: GROUP BY Clause for Comma Separated Values

Can anyone help me how to check duplicate values from multiple comma separated value. I have a customer table and in that one can insert multiple comma separated contact number and I want to check duplicate values from last five digits.For reference check screenshot attached and the required output is
contact_no. count
97359506775 -- 2
390558073039-- 1
904462511251-- 1
I would advise you to redesign your database schema, if possible. Your current database violates First Normal Form since your attribute values are not indivisible.
Create a table where id together with a single phone number constitutes a key, this constraint enforces that no duplicates occur.
I don't remember much but I will try to put the idea (it's something which I had used a long time ago):
Create a table value function which will take the id and phone number as input and then generate a table with id and phone numbers and return it.
Use this function in query passing id and phone number. The query is such that for each id you get as many rows as the phone numbers. CROSS APPLY/OUTER APPLY needs to be used.
Then you can check for the duplicates.
The function would be something like this:
CREATE FUNCTION udf_PhoneNumbers
(
#Id INT
,#Phone VARCHAR(300)
) RETURNS #PhonesTable TABLE(Id INT, Phone VARCHAR(50))
BEGIN
DECLARE #CommaIndex INT
DECLARE #CurrentPosition INT
DECLARE #StringLength INT
DECLARE #PhoneNumber VARCHAR(50)
SELECT #StringLength = LEN(#Phone)
SELECT #CommaIndex = -1
SELECT #CurrentPosition = 1
--index is 1 based
WHILE #CommaIndex < #StringLength AND #CommaIndex <> 0
BEGIN
SELECT #CommaIndex = CHARINDEX(',', #Phone, #CurrentPosition)
IF #CommaIndex <> 0
SELECT #PhoneNumber = SUBSTRING(#Phone, #CurrentPosition, #CommaIndex - #CurrentPosition)
ELSE
SELECT #PhoneNumber = SUBSTRING(#Phone, #CurrentPosition, #StringLength - #CurrentPosition + 1)
SELECT #CurrentPosition = #CommaIndex + 1
INSERT INTO #UsersTable VALUES(#Id, #PhoneNumber)
END
RETURN
END
Then run CROSS APPLY query:
SELECT
U.*
,UD.*
FROM yourtable U CROSS APPLY udf_PhoneNumbers(Userid, Phone) UD
This will give you the table on which you can run query to find duplicate.

SQLserver Store Column as variable and loop through it

I am still pretty new to SQL server and I am not sure how to do this. I am first creating a table with just the IDs I need:
SELECT DISTINCT
ID_NUMBER
INTO
#IDlist
FROM
V_Rpt_IDs WITH (NOLOCK)
WHERE
ID_NUMBER in (
'1000764169'
,'1005870537'
,'1008053856'
,'1008054376'
,'1008410224'
,'1008411317'
,'1008465318'
,'1008466074'
,'1008492967'
,'1010546872'
,'1010554301')
Select * from #IDlist
And this works fine. But now I would like to declare a variable to represent this column, or each item in this column, so that I can then do a loop where it loops through each ID Number and returns information about each one and then presents all of that as a table. Here is my shot at that:
Declare #IDNumber as VARCHAR(10)
Set #IDNumber = #IDlist.ID_NUMBER
DECLARE #cnt INT = 0
WHILE #cnt < (Select Count(*) From #IDlist)
BEGIN
SELECT TOP 1
NAME
,MAILING_ADDRESS_1
,MAILING_ADDRESS_CITY
,MAILING_STATE
,MAILING_ZIP
from
V_Rpt_Info
WHERE
ID_NUMBER = #IDNumber
SET #cnt = #cnt + 1
END
DROP TABLE #IDlist
But when I Set the #IDNumber variable to #IDlist.ID_NUMBER, it says The multi-part identifier "#IDlist.ID_NUMBER" could not be bound.
How do I do this?
Thanks
The way you set the variable is not correct, SQL doesn't know which ID_NUMBER row it should assign to the #IDNumber variable.
You should do this with a SELECT, for example
SET #IDNumber = SELECT TOP 1 ID_NUMBER FROM #IDlist
But, why would you like to loop through this temporary table this way ? Isn't it possible to join the necessary data with this table instead of doing it one by one ?
Rather then loop through, you're going to want to join your ID table to your V_Rpt_Info view.
SELECT
NAME
, MAILING_ADDRESS_1
, MAILING_ADDRESS_CITY
, MAILING_STATE
, MAILING_ZIP
FROM V_Rpt_Info V
INNER JOIN #IDlist ID
ON V.ID_NUMBER = ID.ID_NUMBER

MySQL Count products from all subcategories

I have two tables; categories and products. For each category i would like to count how many products there are in all of its subcategories. I already have counted how many are in each category. Example tables are:
Categories:
ID ParentID ProductCount SubCategoryProducts
1 NULL 0
2 1 2
3 2 1
Products:
ProductID CategoryID
123 2
124 2
125 3
So i would like my function to make:
ID ParentID ProductCount SubCategoryProducts
1 NULL 0 3
2 1 2 1
3 2 1 0
It simply needs to be as a select query, no need to update the database.
Any ideas?
EDIT: SQL FIddle: http://sqlfiddle.com/#!2/1941a/4/0
If it were me I'd create a STORED PROCEDURE. The other option is to loop with PHP through the first query, then for each ID run another query - but this kind of logic can slow down your page drastically.
Here's a nice tutorial on stored procedures: http://net.tutsplus.com/tutorials/an-introduction-to-stored-procedures/
Basically you run the same loops I mentioned above you would with PHP (but it runs much faster). The procedure is stored in the database and can be called like a function. The result is the same as a query.
As requested, here's a sample procedure (or rather, it uses two) in my instance, "ags_orgs" acts in a similar way to your categories where there is a parentOrgID. "getChildOrgs" also acts kind of like a redundant function since I had no idea how many levels down I had to go (this was written for MSSQL - there are probably differences with mySQL) Unfortunately this doesn't count rows, rather it gets data. I highly recommend following a tutorial or two to get a better grip on how it works:
USE [dbname]
GO
/****** Object: StoredProcedure [dbo].[getChildOrgs] Script Date: 09/26/2012 15:30:06 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE PROCEDURE [dbo].[getChildOrgs]
#myParentID int,
#isActive tinyint = NULL
AS
BEGIN
SET NOCOUNT ON
DECLARE #orgID int, #orgName varchar(255), #level int
DECLARE cur CURSOR LOCAL FOR SELECT orgID FROM dbo.ags_orgs WHERE parentOrgID = #myParentID AND isActive = ISNULL(#isActive, isActive) ORDER BY orderNum, orgName
OPEN cur
fetch next from cur into #orgID
WHILE ##fetch_status = 0
BEGIN
INSERT INTO #temp_childOrgs SELECT orgID,orgName,description,parentOrgID,adminID,isActive,##NESTLEVEL-1 AS level FROM dbo.ags_orgs WHERE orgID = #orgID
EXEC getChildOrgs #orgID, #isActive
-- get next result
fetch next from cur into #orgID
END
CLOSE cur
DEALLOCATE cur
END
GO
Which is called by this proc:
USE [dbname]
GO
/****** Object: StoredProcedure [dbo].[execGetChildOrgs] Script Date: 09/26/2012 15:29:34 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE PROCEDURE [dbo].[execGetChildOrgs]
#parentID int,
#isActive tinyint = NULL,
#showParent tinyint = NULL
AS
BEGIN
CREATE TABLE #temp_childOrgs
(
orgID int,
orgName varchar(255),
description text,
parentOrgID int,
adminID int,
isActive tinyint,
level int
)
-- if this isn't AGS top level (0), make the first record reflect the requested organization
IF #parentID != 0 AND #showParent = 1
BEGIN
INSERT INTO #temp_childOrgs SELECT orgID,orgName,description,parentOrgID,adminID,isActive,0 AS level FROM dbo.ags_orgs WHERE orgID = #parentID
END
exec getChildOrgs #parentID, #isActive
SELECT * FROM #temp_childOrgs
DROP TABLE #temp_childOrgs
END
GO
Here is my procedure for counting products in all subcategories
DELIMITER $$
CREATE PROCEDURE CountItemsInCategories(IN tmpTable INT, IN parentId INT, IN updateId INT)
BEGIN
DECLARE itemId INT DEFAULT NULL;
DECLARE countItems INT DEFAULT NULL;
DECLARE done INT DEFAULT FALSE;
DECLARE recCount INT DEFAULT NULL;
DECLARE
bufItemCategory CURSOR FOR
SELECT
itemCategory.id AS id,
COUNT(CASE WHEN item.isVisible = 1 then 1 ELSE NULL END) items
FROM
itemCategory
LEFT JOIN item ON
item.categoryId = itemCategory.id
WHERE
itemCategory.isVisible = 1 AND itemCategory.categoryParentId = parentId
GROUP BY
itemCategory.id
ORDER BY
itemCategory.name;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = TRUE;
SET max_sp_recursion_depth = 10000;
IF tmpTable = 1 THEN
DROP TEMPORARY TABLE IF EXISTS tblResults;
CREATE TEMPORARY TABLE IF NOT EXISTS tblResults(
id INT NOT NULL PRIMARY KEY,
items INT
);
END IF;
OPEN bufItemCategory;
Reading_bufItemCategory: LOOP
FETCH FROM bufItemCategory INTO itemId, countItems;
IF done THEN
LEAVE Reading_bufItemCategory;
END IF;
IF tmpTable = 1 THEN
INSERT INTO tblResults VALUES(itemId, countItems);
ELSE
UPDATE tblResults SET items = items + countItems WHERE id = updateId;
END IF;
SET recCount = (SELECT count(*) FROM itemCategory WHERE itemCategory.categoryParentId = itemId AND itemCategory.isVisible = 1);
IF recCount > 0 THEN
CALL CountItemsInCategories(0, itemId, CASE WHEN updateId = 0 then itemId ELSE updateId END);
END IF;
END LOOP Reading_bufItemCategory;
CLOSE bufItemCategory;
IF tmpTable = 1 THEN
SELECT * FROM tblResults WHERE items > 0;
DROP TEMPORARY TABLE IF EXISTS tblResults;
END IF;
END $$
DELIMITER;
To call procedure just run:
CountItemsInCategories(firstLoop,parentId,updateId);
Where parameters are:
firstLoop - always "1" for first loop
parentId - parent of subcategories
updateId - id of row to update, always "0" for first loop
On example:
CountItemsInCategories(1,1,0);
I hope this example will be useful to someone.
This assumes you have
Product table named prods
prod_id|categ_id
and Category table named categ
categ_id|parent_categ_id
As you seem to be using Adjacency List structure where foreign key parent_categ_id column references prod_id column at the same table
the following query should work
select c1.categ_id,c1.parent_categ_id,count(prods.prod_id)
as product_count from categ c1
join prods on prods.categ_id=c1.categ_id or prods.categ_id
in( with recursive tree(id,parent_id)as
(select categ_id,parent_categ_id from categ
where categ_id=c1.categ_id
union all
select cat.categ_id,cat.parent_categ_id from categ cat
join tree on tree.id=cat.parent_categ_id) select id from tree)
group by c1.categ_id,c1.parent_categ_id
order by product_count
You can do this in one statement if you have a limit on the depth of the hierarchy. You said you only have 4 levels in total.
SELECT SUM(ProductCount)
FROM (
SELECT c0.ID, c0.ProductCount
FROM Categories AS c0
WHERE c0.ID = 1
UNION ALL
SELECT c1.ID, c1.ProductCount
FROM Categories AS c0
JOIN Categories AS c1 ON c0.ID = c1.ParentID
WHERE c0.ID = 1
UNION ALL
SELECT c2.ID, c2.ProductCount
FROM Categories AS c0
JOIN Categories AS c1 ON c0.ID = c1.ParentID
JOIN Categories AS c2 ON c1.ID = c2.ParentID
WHERE c0.ID = 1
UNION ALL
SELECT c3.ID, c3.ProductCount
FROM Categories AS c0
JOIN Categories AS c1 ON c0.ID = c1.ParentID
JOIN Categories AS c2 ON c1.ID = c2.ParentID
JOIN Categories AS c3 ON c2.ID = c3.ParentID
WHERE c0.ID = 1
) AS _hier;
That'll work for this query if you store the hierarchy in the way you're doing, which is called Adjacency List. Basically, the ParentID is the way each node records its position in the hierarchy.
There are a few other ways of storing hierarchies that allow for easier querying of whole trees or subtrees. The best data organization depends on which queries you want to run.
Here are some more resources:
Models for Hierarchical Data with SQL and PHP (user # RaymondNijland linked to it in a comment)
I gave that presentation as a webinar (free to view the recording, but requires registration).
My book, SQL Antipatterns Volume 1: Avoiding the Pitfalls of Database Programming.
What is the most efficient/elegant way to parse a flat table into a tree?

MySQL for removing back to back duplicates

I have a table where there is some duplicate file access information where a duplicate file access is defined as the same user accessing the same file back to back. In other words, if the user accesses file A,B,A in that order, it's NOT considered a duplicate. So basically, per user I want to make sure that every subsequent access is for a file different from the last one.
UserID FileID
1 1
2 1
1 1 <- Remove
2 1 <- Remove
2 2
1 2
2 2 <- Remove
1 1
1 2
Anyone know how to approach something like this in mysql? Ideally, I would like to use it without the use of a function but I'm open to a function if it's the only option.
The table has the following columns: ID (primary key), userID, fileID, accessTime
If you made a SPROC it would look something like this. You may need another temp table and loop as the DELETE statement may fail while the cursor is open.
CREATE PROCEDURE `proc_CURSOR` ()
BEGIN
CREATE TEMPORARY TABLE lastUserAccess;
DECLARE cur1 CURSOR FOR SELECT userId, fileId, pkId FROM table1 ORDER BY time_stamp;
DECLARE a, b, c, d INT;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET a = 1;
OPEN cur1;
WHILE a = 0 DO
FETCH cur1 INTO b, c, d;
SELECT fileId FROM lastUserAccess WHERE userId = b
IF fileId IS NULL THEN
INSERT INTO lastUserAccess (b, c, d);
ELSE
IF fileId = c THEN
DELETE FROM table1 WHERE pkId = d;
ELSE
UPDATE lastUserAccess SET fileId = c WHERE userId = b;
END IF;
END IF;
END WHILE;
CLOSE cur1;
END
For mysql
DELETE a from tbl a , tbl b WHERE a.Id>b.Id and
a.UserID = b.UserID and a.FileID= b.FileID;
Check this fiddle http://sqlfiddle.com/#!2/aece0a/1
Wont work in mysql, for sql server
DELETE FROM tbl WHERE ID NOT IN (SELECT MIN(ID)
FROM tbl GROUP BY userID, fileID)
Hope this works for you.

Fetch the occurrences of particular words in particular column of a table

I have near about 200 words. I want to see how many times those words occurred in a column of a table.
e.g: say we have table test with column statements which has two rows.
How are you. It's been long since I met you.
I am fine how are you.
Now I want to find the the occurrences of words "you" and "how". Output should be something like:
word count
you 3
how 2
since "you" has 3 and how has 2 occurrences in the two rows.
How can I do this?
You can do it like this:
Split the phrase and put all items in a different table;
Remove all ponctuation;
Make a select using the created table and the words that you want to identify.
The way I would approach this is to write a little user defined function to give me the number of times one string appears in another with some allowances for:
upper and lower case
common punctuation
I would then create a table with all of the words that I wish to search with i.e. your 200 list. Then use the function to count the number of occurrences of each word in every phrase, put that in a inline view and then sum the results up by search word.
Hence:
User Defined Function
DELIMITER $$
CREATE FUNCTION `get_word_count`(phrase VARCHAR(500),word VARCHAR(255), delimiter VARCHAR(1)) RETURNS int(11)
READS SQL DATA
BEGIN
DECLARE cur_position INT DEFAULT 1 ;
DECLARE remainder TEXT;
DECLARE cur_string VARCHAR(255);
DECLARE delimiter_length TINYINT UNSIGNED;
DECLARE total INT;
DECLARE result DOUBLE DEFAULT 0;
DECLARE string2 VARCHAR(255);
SET remainder = replace(phrase,'!',' ');
SET remainder = replace(remainder,'.',' ');
SET remainder = replace(remainder,',',' ');
SET remainder = replace(remainder,'?',' ');
SET remainder = replace(remainder,':',' ');
SET remainder = replace(remainder,'(',' ');
SET remainder = lower(remainder);
SET string2 = concat(delimiter,trim(word),delimiter);
SET delimiter_length = CHAR_LENGTH(delimiter);
SET cur_position = 1;
WHILE CHAR_LENGTH(remainder) > 0 AND cur_position > 0 DO
SET cur_position = INSTR(remainder, delimiter);
IF cur_position = 0 THEN
SET cur_string = remainder;
ELSE
SET cur_string = concat(delimiter,LEFT(remainder, cur_position - 1),delimiter);
END IF;
IF TRIM(cur_string) != '' THEN
set result = result + (select instr(string2,cur_string) > 0);
END IF;
SET remainder = SUBSTRING(remainder, cur_position + delimiter_length);
END WHILE;
RETURN result;
END$$
DELIMITER ;
You might have to play with this function a little depending on what allowances you need to make for punctuation and case. Hopefully you get the idea here though!
Populate tables
create table search_word
(id int unsigned primary key auto_increment,
word varchar(250) not null
);
insert into search_word (word) values ('you');
insert into search_word (word) values ('how');
insert into search_word (word) values ('to');
insert into search_word (word) values ('too');
insert into search_word (word) values ('the');
insert into search_word (word) values ('and');
insert into search_word (word) values ('world');
insert into search_word (word) values ('hello');
create table phrase_to_search
(id int unsigned primary key auto_increment,
phrase varchar(500) not null
);
insert into phrase_to_search (phrase) values ("How are you. It's been long since I met you");
insert into phrase_to_search (phrase) values ("I am fine how are you?");
insert into phrase_to_search (phrase) values ("Oh. Not bad. All is ok with the world, I think");
insert into phrase_to_search (phrase) values ("I think so too!");
insert into phrase_to_search (phrase) values ("You know what? I think so too!");
Run Query
select word,sum(word_count) as total_word_count
from
(
select phrase,word,get_word_count(phrase,word," ") as word_count
from search_word
join phrase_to_search
) t
group by word
order by total_word_count desc;
Here is a solution:
SELECT SUM(total_count) as total, value
FROM (
SELECT count(*) AS total_count, REPLACE(REPLACE(REPLACE(x.value,'?',''),'.',''),'!','') as value
FROM (
SELECT SUBSTRING_INDEX(SUBSTRING_INDEX(t.sentence, ' ', n.n), ' ', -1) value
FROM table_name t CROSS JOIN
(
SELECT a.N + b.N * 10 + 1 n
FROM
(SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) a
,(SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) b
ORDER BY n
) n
WHERE n.n <= 1 + (LENGTH(t.sentence) - LENGTH(REPLACE(t.sentence, ' ', '')))
ORDER BY value
) AS x
GROUP BY x.value
) AS y
GROUP BY value
Here is the full working fiddle: http://sqlfiddle.com/#!2/17481a/1
First we do a query to extract all words as explained here by #peterm(follow his instructions if you want to customize the total number of words processed). Then we convert that into a sub-query and then we COUNT and GROUP BY the value of each word, and then make another query on top of that to GROUP BY not grouped words cases where accompanied signs might be present. ie: hello = hello! with a REPLACE
Below is the simple solution for the case when you need to count certain word occurrences, not the complete statistics:
SELECT COUNT(*) FROM `words` WHERE `row1` LIKE '%how%';
SELECT COUNT(*) FROM `words` WHERE `row1` LIKE '%you%';