UPDATE BELOW!
Who can help me out
I have a table:
CREATE TABLE `group_c` (
`parent_id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`child_id` int(11) DEFAULT NULL,
`number` int(11) DEFAULT NULL,
PRIMARY KEY (`parent_id`)
) ENGINE=InnoDB;
INSERT INTO group_c(parent_id,child_id)
VALUES (1,1),(2,2),(3,3),(4,1),(5,4),(6,4),(7,6),(8,1),(9,2),(10,1),(11,1),(12,1),(13,0);
I want to update the number field to 1 for each child that has multiple parents:
SELECT group_concat(parent_id), count(*) as c FROM group_c group by child_id having c>1
Result:
GROUP_CONCAT(PARENT_ID) C
12,11,10,8,1,4 6
9,2 2
6,5 2
So all rows with parent_id 12,11,10,8,1,4,9,2,6,5 should be updated to number =1
I've tried something like:
UPDATE group_c SET number=1 WHERE FIND_IN_SET(parent_id, SELECT pid FROM (select group_concat(parent_id), count(*) as c FROM group_c group by child_id having c>1));
but that is not working.
How can I do this?
SQLFIDDLE: http://sqlfiddle.com/#!2/acb75/5
[edit]
I tried to make the example simple but the real thing is a bit more complicated since I'm grouping by multiple fields. Here is a new fiddle: http://sqlfiddle.com/#!2/7aed0/11
Why use GROUP_CONCAT() and then try to do something with it's result via FIND_IN_SET() ? That's not how SQL is intended to work. You may use simple JOIN to retrieve your records:
SELECT
parent_id
FROM
group_c
INNER JOIN
(SELECT
child_id,
count(*) as c
FROM
group_c
group by
child_id
having c>1) AS childs
ON childs.child_id=group_c.child_id
-check your modified demo. If you want UPDATE, then just use:
UPDATE
group_c
INNER JOIN
(SELECT
child_id,
count(*) as c
FROM
group_c
group by
child_id
having c>1) AS childs
ON childs.child_id=group_c.child_id
SET
group_c.number=1
For anyone interested. This is how I solved it. It's in two queries but in my case it's not really an issue.
UPDATE group_c INNER JOIN (
SELECT parent_id, count( * ) AS c
FROM `group_c`
GROUP BY child1,child2
HAVING c >1
) AS cc ON cc.parent_id = group_c.parent_id
SET group_c.number =1 WHERE number =0;
UPDATE group_c INNER JOIN group_c as gc ON
(gc.child1=group_c.child1 AND gc.child2=group_c.child2 AND gc.number=1)
SET group_c.number=1;
fiddle: http://sqlfiddle.com/#!2/46d0b4/1/0
Here's a similar solution...
UPDATE group_c a
JOIN
( SELECT DISTINCT x.child_id candidate
FROM group_c x
JOIN group_c y
ON y.child_id = x.child_id
AND y.parent_id < x.parent_id
) b
ON b.candidate = a.child_id
SET number = 1;
http://sqlfiddle.com/#!2/bc532/1
Related
I'm trying to get this to work. When I run the SELECT on the whole dataset I know that the record with cust_number shows up in position 6 (When Using ORDER BY) but this code returns position 37327 which is it's non ordered by position.
SELECT
x.position,
x.cust_number,
x.company,
x.surname,
x.first_name,
x.title
FROM
(SELECT
#rownum:=#rownum + 1 AS position,
c.cust_number,
company,
surname,
first_name,
title
FROM
1_customer_records c
LEFT JOIN addresses a ON c.fk_addresses_id = a.id
JOIN (SELECT #rownum:=0) r
ORDER BY a.company , c.surname , c.first_name , c.title) x
WHERE
x.cust_number = 43246;
Here is another approach using a temp table
CREATE TEMPORARY TABLE row_calc (id INT AUTO_INCREMENT, fk INT NULL, PRIMARY KEY (id)) ENGINE=MEMORY;
INSERT INTO row_calc(fk)
SELECT
cust_number
FROM
1_customer_records c
LEFT JOIN
addresses a ON c.fk_addresses_id = a.id
ORDER BY company,surname,first_name,title;
SELECT
id
FROM
row_calc
WHERE
fk = 43246 LIMIT 1;
DROP TABLE row_calc;
I have a table with following structure
Table name: matches
That basically stores which product is matching which product. I need to process this table
And store in a groups table like below.
Table Name: groups
group_ID stores the MIN Product_ID of the Product_IDS that form a group. To give an example let's say
If A is matching B and B is Matching C then three rows should go to group table in format (A, A), (A, B), (A, C)
I have tried looking into co-related subqueries and CTE, but not getting this to implement.
I need to do this all in SQL.
Thanks for the help .
Try this:
;WITH CTE
AS
(
SELECT DISTINCT
M1.Product_ID Group_ID,
M1.Product_ID
FROM matches M1
LEFT JOIN matches M2
ON M1.Product_Id = M2.matching_Product_Id
WHERE M2.matching_Product_Id IS NULL
UNION ALL
SELECT
C.Group_ID,
M.matching_Product_Id
FROM CTE C
JOIN matches M
ON C.Product_ID = M.Product_ID
)
SELECT * FROM CTE ORDER BY Group_ID
You can use OPTION(MAXRECURSION n) to control recursion depth.
SQL FIDDLE DEMO
Something like this (not tested)
with match_groups as (
select product_id,
matching_product_id,
product_id as group_id
from matches
where product_id not in (select matching_product_id from matches)
union all
select m.product_id, m.matching_product_id, p.group_id
from matches m
join match_groups p on m.product_id = p.matching_product_id
)
select group_id, product_id
from match_groups
order by group_id;
Sample of the Recursive Level:
DECLARE #VALUE_CODE AS VARCHAR(5);
--SET #VALUE_CODE = 'A' -- Specify a level
WITH ViewValue AS
(
SELECT ValueCode
, ValueDesc
, PrecedingValueCode
FROM ValuesTable
WHERE PrecedingValueCode IS NULL
UNION ALL
SELECT A.ValueCode
, A.ValueDesc
, A.PrecedingValueCode
FROM ValuesTable A
INNER JOIN ViewValue V ON
V.ValueCode = A.PrecedingValueCode
)
SELECT ValueCode, ValueDesc, PrecedingValueCode
FROM ViewValue
--WHERE PrecedingValueCode = #VALUE_CODE -- Specific level
--WHERE PrecedingValueCode IS NULL -- Root
Having these tables:
customers
---------------------
`id` smallint(5) unsigned NOT NULL auto_increment,
`name` varchar(100) collate utf8_unicode_ci default NOT NULL,
....
customers_subaccounts
-------------------------
`companies_id` mediumint(8) unsigned NOT NULL,
`customers_id` mediumint(8) unsigned NOT NULL,
`subaccount` int(10) unsigned NOT NULL
I need to get all the customers whom have been assigned more than one subaccount for the same company.
This is what I've got:
SELECT * FROM customers
WHERE id IN
(SELECT customers_id
FROM customers_subaccounts
GROUP BY customers_id, companies_id
HAVING COUNT(subaccount) > 1)
This query is too slow though. It's even slower if I add the DISTINCT modifier to customers_id in the SELECT of the subquery, which in the end retrieves the same customers list for the whole query. Maybe there's a better way without subquerying, anything faster will help, and I'm not sure whether it will retrieve an accurate correct list.
Any help?
You can replace the subquery with an INNER JOIN:
SELECT t1.id
FROM customers t1
INNER JOIN
(
SELECT DISTINCT customers_id
FROM customers_subaccounts
GROUP BY customers_id, companies_id
HAVING COUNT(*) > 1
) t2
ON t1.id = t2.customers_id
You can also try using EXISTS() which may be faster then a join :
SELECT * FROM customers t
WHERE EXISTS(SELECT 1 FROM customers_subaccounts s
WHERE s.customers_id = t.id
GROUP BY s.customers_id, s.companies_id
HAVING COUNT(subaccount) > 1)
You should also considering adding the following indexes(if not exists yet) :
customers_subaccounts (customers_id,companies_id,subaccount)
customers (id)
Assuming that you want different subaccounts for the company (or that they are guaranteed to be different anyway), then the following could be faster under some circumstances:
select c.*
from (select distinct cs.customers_id
from customers_subaccounts cs join
customers_subaccounts cs2
on cs.customers_id = cs2.customers_id and
cs.companies_id = cs2.companies_id and
cs.subaccount < cs2.subaccount
) cc join
customers c
on c.customers_id = cc.customers_id;
In particular, this can take advantage of an index on customers_subaccounts(customers_id, companies_id, subaccount).
Note: This assumes that the subaccounts are different for the rows you want. What is really needed is a way of defining unique rows in the customers_subaccounts table.
There is a way to speed up the query by using cache the sub-query result. A simple change in your query aware mysql that can cache the sub-query result:
SELECT * FROM customers
WHERE id IN
(select * from
(SELECT distinct customers_id
FROM customers_subaccounts
GROUP BY customers_id, companies_id
HAVING COUNT(subaccount) > 1) t1);
I used it many years ago and it helped me very much.
Try following;)
SELECT DISTINCT t1.*
FROM customers t1
INNER JOIN customers_subaccounts t2 ON t1.id = t2.customers_id
GROUP BY t1.id, t1.name, t2.companies_id
HAVING COUNT(t2.subaccount) > 1
Also you may add index on customers_id.
Have a table containing form data. Each row contains a section_id and field_id. There are 50 distinct fields for each section. As users update an existing field, a new row is inserted with an updated date_modified. This keeps a rolling archive of changes.
The problem is that I'm getting erratic results when pulling the most recent set of fields to display on a page.
I've narrowed down the problem to a couple of fields, and have recreated a portion of the table in question on SQLFiddle.
Schema:
CREATE TABLE IF NOT EXISTS `cTable` (
`section_id` int(5) NOT NULL,
`field_id` int(5) DEFAULT NULL,
`content` text,
`user_id` int(11) NOT NULL,
`date_modified` datetime NOT NULL,
KEY `section_id` (`section_id`),
KEY `field_id` (`field_id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
This query shows all previously edited rows for field_id 39. There are five rows returned:
SELECT cT.*
FROM cTable cT
WHERE
cT.section_id = 123 AND
cT.field_id=39;
Here's what I'm trying to do to pull the most recent row for field_id 39. No rows returned:
SELECT cT.*
FROM cTable cT
INNER JOIN (
SELECT field_id, MAX(date_modified) AS date_modified
FROM cTable GROUP BY field_id
) AS max USING (field_id, date_modified)
WHERE
cT.section_id = 123 AND
cT.field_id=39;
Record Count: 0;
If I try the same query on a different field_id, say 54, I get the correct result:
SELECT cT.*
FROM cTable cT
INNER JOIN (
SELECT field_id, MAX(date_modified) AS date_modified
FROM cTable GROUP BY field_id
) AS max USING (field_id, date_modified)
WHERE
cT.section_id = 123 AND
cT.field_id=54;
Record Count: 1;
Why would same query work on one field_id, but not the other?
In your subquery from where you are getting maxima you need to GROUP BY section_id,field_id using just GROUP BY field_id is skipping the section id, on which you are applying filter
SELECT cT.*
FROM cTable cT
INNER JOIN (
SELECT section_id,field_id, MAX(date_modified) AS date_modified
FROM cTable GROUP BY section_id,field_id
) AS max
ON(max.field_id =cT.field_id
AND max.date_modified=cT.date_modified
AND max.section_id=cT.section_id
)
WHERE
cT.section_id = 123 AND
cT.field_id=39;
See Fiddle Demo
You are looking for the max(date_modified) per field_id. But you should look for the max(date_modified) per field_id where the section_id is 123. Otherwise you may find a date for which you find no match later.
SELECT cT.*
FROM cTable cT
INNER JOIN (
SELECT field_id, MAX(date_modified) AS date_modified
FROM cTable
WHERE section_id = 123
GROUP BY field_id
) AS max USING (field_id, date_modified)
WHERE
cT.section_id = 123 AND
cT.field_id=39;
Here is the SQL fiddle: http://www.sqlfiddle.com/#!2/0cefd8/19.
Ok imagine the following DB structure
USERS:
id | name | company_id
1 John 1
2 Jane 1
3 Jack 2
4 Jill 3
COMPANIES:
id | name
1 CompanyA
2 CompanyB
3 CompanyC
4 CompanyD
First I want to SELECT all the companies that have more than one user
SELECT
`c`.`name`
FROM `companies` AS `c`
LEFT JOIN `users` AS `u` ON `c`.`id` = `u`.`company_id`
GROUP BY `c`.`id`
HAVING COUNT(`u`.`id`) > 1
Easy enough. Now I want to SELECT all the users that belong to a company that has more than one user. I have this combined query but I think this is not efficent
SELECT * FROM `users` WHERE `company_id` = (
SELECT
`c`.`id`
FROM `companies` AS `c`
LEFT JOIN `users` AS `u` ON `c`.`id` = `u`.`company_id`
GROUP BY `c`.`id`
HAVING COUNT(`u`.`id`) > 1
)
Basically I take the id returned from the first query (companies that have more than 1 user) and then query the users table to find all users with that company.
Why not
SELECT * FROM users u GROUP BY u.company_id HAVING COUNT(u.id) > 1
You don't really need any information from the companies table according to the data you say needs returning. "Now I want to SELECT all the users that belong to a company that has more than one user."
try this:
SELECT u.id,u.name,u.company_id FROM users u
inner join companies c on u.company_id = c.id
group by c.id
having count(u.id) > 1
Simplest way to get the users only is probably to keep the subquery but eliminate the join; since it's not a correlated subquery, it should be fairly efficient (obviously an index on company_id helps here);
SELECT u.* FROM USERS u WHERE company_id IN (
SELECT company_id FROM USERS GROUP BY company_id HAVING COUNT(*)>1
);
You could for example rewrite it as a LEFT JOIN, but I suspect it will actually be less efficient since you'd most likely need to use a DISTINCT when using a JOIN;
SELECT DISTINCT u.*
FROM USERS u
LEFT JOIN USERS u2
ON u.company_id=u2.company_id AND u.id<>u2.id
WHERE u2.id IS NOT NULL;
An SQLfiddle to test both.
Try also a semi-join query:
SELECT *
FROM users u
WHERE EXISTS (
SELECT null FROM users u1
WHERE u.company_id=u1.company_id
AND u.id <> u1.id
)
demo --> http://www.sqlfiddle.com/#!2/12dc34/2
Assumming that id is a primary key column, creating an index on company_id column gives better performance.
If you are really obsessed with the performance of this query, create a composite index on columns company_id + id:
CREATE INDEX very_fast ON users( company_id, id );
Could you try this?
SELECT users.*
FROM users INNER JOIN
(
SELECT company_id
FROM users
GROUP BY company_id
HAVING COUNT(*) > 1
) x USING(company_id);
You should have an index INDEX(company_id)
Peformance Test
I have tested 3 queries in answers.
Q1 = sub-query (with GROUP BY) and INNER JOIN
Q2 = LEFT JOIN and IS NOT NULL
Q3 = EXISTS
All queries return same result. Test was done with TPC-H lineitem table. And The problem is "find lineitem have more than 1 item"
Test Results
It depends on what you want is retrieving FIRST N row or entire rows.
Q1 (get FIRST 10K rows) : 2.85 sec
Q2 (get FIRST 10K rows) : 0.03 sec
Q3 (get FIRST 10K rows) : 0.03 sec
Q1 (get all rows) : 8.19 sec
Q2 (get all rows) : 34.12 sec
Q3 (get all rows) : 29.54 sec
Schema and DATA
mysql> SELECT SQL_NO_CACHE COUNT(*) FROM lineitem\G
*************************** 1. row ***************************
COUNT(*): 11997996
1 row in set (1.68 sec)
mysql> SHOW CREATE TABLE lineitem\G
*************************** 1. row ***************************
Table: lineitem
Create Table: CREATE TABLE `lineitem` (
`l_orderkey` int(11) NOT NULL,
`l_partkey` int(11) NOT NULL,
`l_suppkey` int(11) NOT NULL,
`l_linenumber` int(11) NOT NULL,
`l_quantity` decimal(15,2) NOT NULL,
`l_extendedprice` decimal(15,2) NOT NULL,
`l_discount` decimal(15,2) NOT NULL,
`l_tax` decimal(15,2) NOT NULL,
`l_returnflag` char(1) NOT NULL,
`l_linestatus` char(1) NOT NULL,
`l_shipDATE` date NOT NULL,
`l_commitDATE` date NOT NULL,
`l_receiptDATE` date NOT NULL,
`l_shipinstruct` char(25) NOT NULL,
`l_shipmode` char(10) NOT NULL,
`l_comment` varchar(44) NOT NULL,
PRIMARY KEY (`l_orderkey`,`l_linenumber`),
KEY `l_orderkey` (`l_orderkey`),
KEY `l_partkey` (`l_partkey`,`l_suppkey`),
CONSTRAINT `lineitem_ibfk_1` FOREIGN KEY (`l_orderkey`) REFERENCES `orders` (`o_orderkey`),
CONSTRAINT `lineitem_ibfk_2` FOREIGN KEY (`l_partkey`, `l_suppkey`) REFERENCES `partsupp` (`ps_partkey`, `ps_suppkey`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
1 row in set (0.00 sec)
Queries
Q1 FIRST 10K
SELECT SQL_NO_CACHE DISTINCT u.l_orderkey, u.l_linenumber
FROM lineitem u INNER JOIN
(
SELECT l_orderkey
FROM lineitem
GROUP BY l_orderkey
HAVING COUNT(*) > 1
) x USING (l_orderkey)
LIMIT 10000;
Q2 FIRST 10K
SELECT SQL_NO_CACHE DISTINCT u.l_orderkey, u.l_linenumber
FROM lineitem u
LEFT JOIN lineitem u2
ON u.l_orderkey=u2.l_orderkey AND u.l_linenumber<>u2.l_linenumber
WHERE u2.l_linenumber IS NOT NULL
LIMIT 10000;
Q3 FIRST 10K
SELECT SQL_NO_CACHE DISTINCT u.l_orderkey, u.l_linenumber
FROM lineitem u
WHERE EXISTS (
SELECT null FROM lineitem u1
WHERE u.l_orderkey=u1.l_orderkey
AND u.l_linenumber <> u1.l_linenumber
)
LIMIT 10000;
retrieve entire rows
Q1 ALL
SELECT SQL_NO_CACHE COUNT(*)
FROM lineitem u INNER JOIN
(
SELECT l_orderkey
FROM lineitem
GROUP BY l_orderkey
HAVING COUNT(*) > 1
) x USING (l_orderkey);
Q2 ALL
SELECT SQL_NO_CACHE COUNT(*)
FROM lineitem u
LEFT JOIN lineitem u2
ON u.l_orderkey=u2.l_orderkey AND u.l_linenumber<>u2.l_linenumber
WHERE u2.l_linenumber IS NOT NULL;
Q3 ALL
SELECT SQL_NO_CACHE COUNT(*)
FROM lineitem u
WHERE EXISTS (
SELECT null FROM lineitem u1
WHERE u.l_orderkey=u1.l_orderkey
AND u.l_linenumber <> u1.l_linenumber
);