I'm sure somebody already ask this question somewhere but can't seems to find it.
Is it possible in mysql to do sum or group concat (AGGREGATE FUNCTION) combined with a distinct ?
Exemple: I have an order product which can have many option and many beneficiary. How is it possible in onequery (With out using subquery), to get the list of options, the sum of option price and the list of beneficiary ?
I've constructed a sample data set:
CREATE TABLE `order`
(id INT NOT NULL PRIMARY KEY);
CREATE TABLE order_product
(id INT NOT NULL PRIMARY KEY
,order_id INT NOT NULL
);
CREATE TABLE order_product_options
(id INT NOT NULL PRIMARY KEY
,title VARCHAR(20) NOT NULL
,price INT NOT NULL
,order_product_id INT NOT NULL
);
CREATE TABLE order_product_beneficiary
(id INT NOT NULL PRIMARY KEY
,name VARCHAR(20) NOT NULL
,order_product_id INT NOT NULL
);
INSERT INTO `order` (`id`) VALUES (1);
INSERT INTO `order_product` (`id`, `order_id`) VALUES (1, 1);
INSERT INTO `order_product_options` (`id`, `title`, `price`, `order_product_id`)
VALUES (1,'option1', 1, 1), (2, 'option2', 2, 1), (3, 'option3', 3, 1), (4, 'option3', 3, 1);
INSERT INTO `order_product_beneficiary` (`id`, `name`, `order_product_id`)
VALUES (1,'mark', 1), (2, 'jack', 1), (3, 'jack', 1);
http://sqlfiddle.com/#!9/37e383/2
The result I would like to have is
id: 1
options: option1, option2, option3, option3
options price: 9
beneficiaries: mark, jack, jack
Is this possible in mysql without using subqueries ? (I know it is possible in oracle)
If it's possible, how would you do it ?
Thanks
Based on your description, I think you just want DISTINCT in the GROUP_CONCAT(). However, that won't work because of the duplicates (as explained in a comment but not the question).
One solution is to include the ids in the results:
SELECT op.id,
GROUP_CONCAT(DISTINCT opo.title, '(', opo.id, ')' SEPARATOR ', ') AS options,
SUM(opo.price) AS options_price,
GROUP_CONCAT(DISTINCT opb.name, '(', opb.id, ')' SEPARATOR ', ') AS 'beneficiaries'
FROM order_product op INNER JOIN
order_product_options opo
ON opo.order_product_id = op.id INNER JOIN
order_product_beneficiary opb
ON opb.order_product_id = op.id
GROUP BY op.id;
This is not exactly your results, but it might suffice.
EDIT:
Oh, I see. You are joining along two different dimensions and getting a Cartesian product. The solution is to aggregate before joining:
SELECT op.id, opo.options, opo.options_price,
opb.beneficiaries
FROM order_product op INNER JOIN
(SELECT opo.order_product_id,
GROUP_CONCAT(opo.title SEPARATOR ', ') AS options,
SUM(opo.price) AS options_price
FROM order_product_options opo
GROUP BY opo.order_product_id
) opo
ON opo.order_product_id = op.id INNER JOIN
(SELECT opb.order_product_id,
GROUP_CONCAT(opb.name SEPARATOR ', ') AS beneficiaries
FROM order_product_beneficiary opb
GROUP BY opb.order_product_id
) opb
ON opb.order_product_id = op.id;
Here is the SQL Fiddle.
Somewhat like , Do your price summation in inner query and then join with order_product table.
SELECT
op.id,
MAX(opo.title) AS 'options',
MAX(opo.price) AS 'options price',
GROUP_CONCAT(opb.name SEPARATOR ', ') AS 'beneficiaries'
FROM
order_product op
INNER JOIN (
SELECT order_product_id, SUM(price) price, GROUP_CONCAT(title SEPARATOR ', ') title
FROM order_product_options
GROUP BY order_product_id
) opo ON opo.order_product_id = op.id
INNER JOIN order_product_beneficiary opb ON opb.order_product_id = op.id
GROUP BY op.id
Demo
Related
Below are the tables and the SQL query. I am doing a left join and trying to get SUM of a column that's in the left table and count from the right table.
Is it possible to get both in 1 query?
https://www.db-fiddle.com/f/3QuxG1DLgWJ8aGXNbnnwU1/1
select
s.test,
count(distinct s.name),
sum(s.score) score, -- need accurate score
count(a.id) attempts -- need accurate attempt count
from question s
left join attempts a on s.id = a.score_id
group by s.test
create table question (
id int auto_increment primary key,
test varchar(25),
name varchar(25),
score int
);
create table attempts (
id int auto_increment primary key,
score_id int,
attempt_no int
);
insert into question (test, name, score) values
('test1','name1', 10),
('test1','name2', 15),
('test1','name3', 20),
('test1','name4', 25),
('test2','name1', 15),
('test2','name2', 25),
('test2','name3', 30),
('test2','name4', 20);
insert into attempts (score_id, attempt_no) values
(1, 1),
(1, 2),
(1, 3),
(1, 4),
(2, 1),
(2, 2),
(2, 3),
(2, 4);
You need to pre-aggregate before the join:
select q.test, count(distinct q.name),
sum(q.score) score, -- need accurate score
sum(a.num_attempts) attempts -- need accurate attempt count
from question q left join
(select a.score_id, count(*) as num_attempts
from attempts a
group by a.score_id
) a
on q.id = a.score_id
group by q.test;
Here is a db-fiddle.
As Gordon said above, you can pre-aggregate, but his answer will get you the incorrect number of attempts, unfortunately. This is due to an issue with how you're structuring your DB schema. It looks like your question table really records scores of attempts at questions, and your attempts table is unnecessary. You should really have a question table that simply contains an ID and a name for the question, and a attempts table that contains an attempt ID, question ID, name, and score.
create table question (
id int auto_increment primary key,
test varchar(25)
);
create table attempts (
id int auto_increment primary key,
question_id int,
name varchar(25),
score int
);
Then your query becomes as simple as:
select
q.id as question_id,
count(distinct a.name) as attempters,
sum(a.score) as total_score,
count(a.id) as total_attempts
from question q join attempts a on q.id = a.question_id
group by q.id
Currently I have two tables.
Customers:
id
name
status
1
adam
1
2
bob
1
3
cain
2
Orders:
customer_id
item
1
apple
1
banana
1
bonbon
2
carrot
3
egg
I'm trying to do an INNER JOIN first then use the resulting table to query against.
So a user can type in a partial name or partial item and get all the names and items.
For example if a user type in "b" it would kick back:
customer_id
name
status
items
1
adam
1
apple/banana/bonbon
2
bob
1
carrot
What I am currently doing is:
SELECT * FROM(
SELECT customers.* , GROUP_CONCAT(orders.item SEPARATOR '|') as items
FROM customers
LEFT JOIN orders
ON customers.id = orders.customer_id
group by customers.id
) as t
WHERE t.status = 1 AND ( t.name LIKE "%b%" OR t.items LIKE "%b%")
Which does work, but it is incredibly slow (+2 seconds).
The strange part though is if I run the queries individually the subquery executes in .0004 seconds and the outer query executes in .006 seconds.
But for some reason combining them increases the wait time a lot.
Is there a more efficient way to do this?
CREATE TABLE IF NOT EXISTS `customers` (
`id` int(6),
`name` varchar(255) ,
`status` int(6),
PRIMARY KEY (`id`,`name`,`status`)
);
INSERT INTO `customers` (`id`, `name` , `status`) VALUES
('1', 'Adam' , 1),
('2', 'bob' , 1),
('3', 'cain' , 2);
CREATE TABLE IF NOT EXISTS `orders` (
`customer_id` int(6),
`item` varchar(255) ,
PRIMARY KEY (`customer_id`,`item`)
);
INSERT INTO `orders` (`customer_id`, `item`) VALUES
('1', 'apple'),
('1', 'banana'),
('1', 'bonbon'),
('2', 'carrot'),
('3', 'egg');
According to the query, you are trying to perform a full-text search on the fields name and item. I would suggest adding full-text indexes to them using ngram tokenisation as you are looking up by part of a word:
ALTER TABLE customers ADD FULLTEXT INDEX ft_idx_name (name) WITH PARSER ngram;
ALTER TABLE orders ADD FULLTEXT INDEX ft_idx_item (item) WITH PARSER ngram;
In this case, your query would look as follows:
SELECT
customers.*, GROUP_CONCAT(orders.item SEPARATOR '|')
FROM
customers
LEFT JOIN orders on customers.id = orders.customer_id
WHERE
orders.customer_id IS NOT NULL
AND customers.status = 1
AND (MATCH(customers.name) AGAINST('bo')
OR MATCH(orders.item) AGAINST('bo'))
GROUP BY
customers.id
If needed, you could modify ngram_token_size MySQL system variable as its value is 2 by default, which means two or more characters should be input to perform the search.
Another approach is to implement it by means of a dedicated search engine, e.g. Elasticsearch, when requirements evolve.
SELECT * FROM(
SELECT customers.* , GROUP_CONCAT(orders.item SEPARATOR '|') as items
FROM customers
LEFT JOIN orders
ON customers.id = orders.customer_id AND customers.name LIKE "%adam" AND orders.item LIKE "%b"
group by customers.AI
It will be faster to filter the records when starting to left join
I am trying to limit returned results of users to results that are "recent" but where users have a parent, I also need to return the parent.
CREATE TABLE `users` (
`id` int(0) NOT NULL,
`parent_id` int(0) NULL,
`name` varchar(255) NULL,
PRIMARY KEY (`id`)
);
CREATE TABLE `times` (
`id` int(11) NOT NULL,
`time` datetime DEFAULT NULL,
PRIMARY KEY (`id`)
);
INSERT INTO `users`(`id`, `parent_id`, `name`) VALUES (1, NULL, 'Alan');
INSERT INTO `users`(`id`, `parent_id`, `name`) VALUES (2, 1, 'John');
INSERT INTO `users`(`id`, `parent_id`, `name`) VALUES (3, NULL, 'Jerry');
INSERT INTO `users`(`id`, `parent_id`, `name`) VALUES (4, NULL, 'Bill');
INSERT INTO `users`(`id`, `parent_id`, `name`) VALUES (5, 1, 'Carl');
INSERT INTO `times`(`id`, `time`) VALUES (2, '2019-01-01 14:40:38');
INSERT INTO `times`(`id`, `time`) VALUES (4, '2019-01-01 14:40:38');
http://sqlfiddle.com/#!9/91db19
In this case I would want to return Alan, John and Bill, but not Jerry because Jerry doesn't have a record in the times table, nor is he a parent of someone with a record. I am on the fence about what to do with Carl, I don't mind getting the results for him, but I don't need them.
I am filtering tens of thousands of users with hundreds of thousands of times records, so performance is important. In general I have about 3000 unique id's coming from times that could be either an id, or a parent_id.
The above is a stripped down example of what I am trying to do, the full one includes more joins and case statements, but in general the above example should be what we work with, but here is a sample of the query I am using (full query is nearly 100 lines):
SELECT id AS reference_id,
CASE WHEN (id != parent_id)
THEN
parent_id
ELSE null END AS parent_id,
parent_id AS family_id,
Rtrim(last_name) AS last_name,
Rtrim(first_name) AS first_name,
Rtrim(email) AS email,
missedappt AS appointment_missed,
appttotal AS appointment_total,
To_char(birth_date, 'YYYY-MM-DD 00:00:00') AS birthday,
To_char(first_visit_date, 'YYYY-MM-DD 00:00:00') AS first_visit,
billing_0_30
FROM users AS p
RIGHT JOIN(
SELECT p.id,
s.parentid,
Count(p.id) AS appttotal,
missedappt,
billing0to30 AS billing_0_30
FROM times AS p
JOIN (SELECT missedappt, parent_id, id
FROM users) AS s
ON p.id = s.id
LEFT JOIN (SELECT parent_id, billing0to30
FROM aging) AS aging
ON aging.parent_id = p.id
WHERE p.apptdate > To_char(Timestampadd(sql_tsi_year, -1, Now()), 'YYYY-MM-DD')
GROUP BY p.id,
s.parent_id,
missedappt,
billing0to30
) AS recent ON recent.patid = p.patient_id
This example is for a Faircom C-Tree database, but I also need to implement a similar solution in Sybase, MySql, and Pervasive, so just trying to understand what I should do for best performance.
Essentially what I need to do is somehow get the RIGHT JOIN to also include the users parent.
NOTES:
based on your fiddle config I'm assuming you're using MySQL 5.6 and thus don't have support for Common Table Expressions (CTE)
I'm assuming each name (child or parent) is to be presented as separate records in the final result set
We want to limit the number of times we have to join the times and users tables (a CTE would make this a bit easier to code/read).
The main query (times -> users(u1) -> users(u2)) will give us child and parent names in separate columns so we'll use a 2-row dynamic table plus a case statement to to pivot the columns into their own rows (NOTE: I don't work with MySQL and didn't have time to research if there's a pivot capability in MySQL 5.6)
-- we'll let 'distinct' filter out any duplicates (eg, 2 'children' have same 'parent')
select distinct
final.name
from
-- cartesian product of 'allnames' and 'pass' will give us
-- duplicate lines of id/parent_id/child_name/parent_name so
-- we'll use a 'case' statement to determine which name to display
(select case when pass.pass_no = 1
then allnames.child_name
else allnames.parent_name
end as name
from
-- times join users left join users; gives us pairs of
-- child_name/parent_name or child_name/NULL
(select u1.id,u1.parent_id,u1.name as child_name,u2.name as parent_name
from times t
join users u1
on u1.id = t.id
left
join users u2
on u2.id = u1.parent_id) allnames
join
-- poor man's pivot code:
-- 2-row dynamic table; no join clause w/ allnames will give us a
-- cartesian product; the 'case' statement will determine which
-- name (child vs parent) to display
(select 1 as pass_no
union
select 2) pass
) final
-- eliminate 'NULL' as a name in our final result set
where final.name is not NULL
order by 1
Result set:
name
==============
Alan
Bill
John
MySQL fiddle
I need to export my mysql db to a csv file. Where I'm going to be using it can't have related tables, so I need to concat related records into a single field. Is this possible to do? For example, assuming this table structure:
Items: id as INT, name as VARCHAR
ItemIdentifiers: id as INT, item_id as INT, identifier_id as INT
Identifiers: id as INT, identifier as VARCHAR
ItemColors: id as INT, item_id as INT, color_id as INT
Colors: id as INT, color as VARCHAR
and assuming this data:
Items: (1, 'some name')
ItemIdentifiers: (1, 1, 1), (2, 1, 2)
Identifiers: (1, 'ident1'), (2, 'ident2')
ItemColors: (1, 1, 1), (2, 1, 2)
Colors: (1, 'blue'), (2, 'green')
How would I get this:
'some name', 'ident1 ident2', 'blue green'
That's just a basic example, but I hope that conveys what I'm trying to do.
You can use group_concat function in combination with SELECT ... INTO
SELECT DISTINCT
items.name AS `Name`,
GROUP_CONCAT(DISTINCT identifiers.identifier
ORDER BY identifiers.identifier
SEPARATOR ' ') AS `Identifiers`,
GROUP_CONCAT(DISTINCT colors.color
ORDER BY colors.color
SEPARATOR ' ') AS `colors`
INTO OUTFILE '/tmp/data.csv'
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
LINES TERMINATED BY '\n'
FROM items
JOIN itemidentifiers
ON ( itemidentifiers.item_id = items.id )
JOIN identifiers
ON ( itemidentifiers.identifiers_id = identifiers.id )
JOIN itemcolors
ON ( itemcolors.item_id = items.id )
JOIN colors
ON ( colors.id = itemcolors.color_id )
GROUP BY Items.id
You might notice there are too many JOINs. This is because you have used relational tables. For each relational tables there are 1 additional JOIN
Note: The above query is experimental. I haven't tested it yet.
im trying to create a sql query, that will detect (possible) duplicate customers in my database:
I have two tables:
Customer with the columns: cid, firstname, lastname, zip. Note that cid is the unique customer id and primary key for this table.
IgnoreForDuplicateCustomer with the columns: cid1, cid2. Both columns are foreign keys, which references to Customer(cid). This table is used to say, that the customer with cid1 is not the same as the customer with the cid2.
So for example, if i have
a Customer entry with cid = 1, firstname="foo", lastname="anonymous" and zip="11231"
and another Customer entry with cid=2, firstname="foo", lastname="anonymous" and zip="11231".
So my sql query should search for customers, that have the same firstname, lastname and zip and the detect that customer with cid = 1 is the same as customer with cid = 2.
However, it should be possible to say, that customer cid = 1 and cid=2 are not the same, by storing a new entry in the IgnoreForDuplicateCustomer table by setting cid1 = 1 and cid2 = 2.
So detecting the duplicate customers work well with this sql query script:
SELECT cid, firstname, lastname, zip, COUNT(*) AS NumOccurrences
FROM Customer
GROUP BY fistname, lastname,zip
HAVING ( COUNT(*) > 1 )
My problem is, that i am not able, to integrate the IgnoreForDuplicateCustomer table, to that
like in my previous example the customer with cid = 1 and cid=2 will not be marked / queried as the same, since there is an entry/rule in the IgnoreForDuplicateCustomer table.
So i tried to extend my previous query by adding a where clause:
SELECT cid, firstname, lastname, COUNT(*) AS NumOccurrences
FROM Customer
WHERE cid NOT IN (
SELECT cid1 FROM IgnoreForDuplicateCustomer WHERE cid2=cid
UNION
SELECT cid2 FROM IgnoreForDuplicateCustomer WHERE cid1=cid
)
GROUP BY firstname, lastname, zip
HAVING ( COUNT(*) > 1 )
Unfortunately this additional WHERE clause has absolutely no impact on my result.
Any suggestions?
Here you are:
Select a.*
From (
select c1.cid 'CID1', c2.cid 'CID2'
from Customer c1
join Customer c2 on c1.firstname=c2.firstname
and c1.lastname=c2.lastname and c1.zip=c2.zip
and c1.cid < c2.cid) a
Left Join (
Select cid1 'CID1', cid2 'CID2'
From ignoreforduplicatecustomer one
Union
Select cid2 'CID1', cid1 'CID2'
From ignoreforduplicatecustomer two) b on a.cid1 = b.cid1 and a.cid2 = b.cid2
where b.cid1 is null
This will get you the IDs of duplicate records from customer table, which are not in table ignoreforduplicatecustomer.
Tested with:
CREATE TABLE IF NOT EXISTS `customer` (
`CID` int(11) NOT NULL AUTO_INCREMENT,
`Firstname` varchar(50) NOT NULL,
`Lastname` varchar(50) NOT NULL,
`ZIP` varchar(10) NOT NULL,
PRIMARY KEY (`CID`))
ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=100 ;
INSERT INTO `customer` (`CID`, `Firstname`, `Lastname`, `ZIP`) VALUES
(1, 'John', 'Smith', '1234'),
(2, 'John', 'Smith', '1234'),
(3, 'John', 'Smith', '1234'),
(4, 'Jane', 'Doe', '1234');
And:
CREATE TABLE IF NOT EXISTS `ignoreforduplicatecustomer` (
`CID1` int(11) NOT NULL,
`CID2` int(11) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
INSERT INTO `ignoreforduplicatecustomer` (`CID1`, `CID2`) VALUES
(1, 2);
Results for my test setup are:
CID1 CID2
1 3
2 3
Edit as per TPete's comment (dind't try it):
SELECT
C1.cid, C1.firstname, C1.lastname
FROM
Customer C1,
Customer C2
WHERE
C1.cid < C2.cid AND
C1.firstname = C2.firstname AND
C1.lastname = C2.lastname AND
C1.zip = C2.zip AND
CAST(C1.cid AS VARCHAR)+' ' +CAST(C2.cid AS VARCHAR) <>
(SELECT CAST(cid1 AS VARCHAR)+' '+CAST(cid2 AS VARCHAR) FROM IgnoreForDuplicateCustomer I WHERE I.cid1 = C1.cid AND I.cid2 = C2.cid);
Initially I thought that IgnoreForDuplicateCustomer was a field in the customer table.
crazy but I think it works :)
first I join the customer tables with itself on the names to get the duplicates
then I exclud the keys on the IgnoreForDuplicateCustomer table (the union is because the first query returns cid1, cid2 and cid2,cid1
the result will be duplicated but I think you can get the info you need
select c1.cid, c2.cid
from Customer c1
join Customer c2 on c1.firstname=c2.firstname
and c1.lastname=c2.lastname and c1.zip=c2.zip
and c1.cid!=c2.cid
except
(
select cid1,cid2 from IgnoreForDuplicateCustomer
UNION
select cid2,cid1 from IgnoreForDuplicateCustomer
)
second shot:
select firstname,lastname,zip from Customer
group by firstname,lastname,zip
having (count(*)>1)
except
select c1.firstname, c1.lastname, c1.zip
from Customer c1 join IgnoreForDuplicateCustomer IG on c1.cid=ig.cid1 join Customer c2 on ig.cid2=c2.cid
third:
select firstname,lastname,zip from (
select firstname,lastname,zip from Customer
group by firstname,lastname,zip
having (count(*)>1)
) X
where firstname not in (
select c1.firstname
from Customer c1 join IgnoreForDuplicateCustomer IG on c1.cid=ig.cid1 join Customer c2 on ig.cid2=c2.cid
)