MySQL select with group and one to many relations condition - mysql

For example have such structure:
CREATE TABLE clicks
(`date` varchar(50), `sum` int, `id` int)
;
CREATE TABLE marks
(`click_id` int, `name` varchar(50), `value` varchar(50))
;
where click can have many marks
So example data:
INSERT INTO clicks
(`sum`, `id`, `date`)
VALUES
(100, 1, '2017-01-01'),
(200, 2, '2017-01-01')
;
INSERT INTO marks
(`click_id`, `name`, `value`)
VALUES
(1, 'utm_source', 'test_source1'),
(1, 'utm_medium', 'test_medium1'),
(1, 'utm_term', 'test_term1'),
(2, 'utm_source', 'test_source1'),
(2, 'utm_medium', 'test_medium1')
;
I need to get agregated values of click grouped by date which contains all of selected values.
I make request:
select
c.date,
sum(c.sum)
from clicks as c
left join marks as m ON m.click_id = c.id
where
(m.name = 'utm_source' AND m.value='test_source1') OR
(m.name = 'utm_medium' AND m.value='test_medium1') OR
(m.name = 'utm_term' AND m.value='test_term1')
group by date
and get 2017-01-01 = 700, but I want to get 100 which means that only click 1 has all of marks.
Or if condition will be
(m.name = 'utm_source' AND m.value='test_source1') OR
(m.name = 'utm_medium' AND m.value='test_medium1')
I need to get 300 instead of 600
I found answer in getting distinct click_id by first query and then sum and group by date with condition whereIn, but on real database which is very large and has id as uuid this request executes extrimely slow. Any advices how to get it work propely?

You can achieve it using below queries:
When there are the three conditions then you have to pass the HAVING count(*) >= 3
SELECT cc.DATE
,sum(cc.sum)
FROM clicks AS cc
INNER JOIN (
SELECT id
FROM clicks AS c
LEFT JOIN marks AS m ON m.click_id = c.id
WHERE (
m.NAME = 'utm_source'
AND m.value = 'test_source1'
)
OR (
m.NAME = 'utm_medium'
AND m.value = 'test_medium1'
)
OR (
m.NAME = 'utm_term'
AND m.value = 'test_term1'
)
GROUP BY id
HAVING count(*) >= 3
) AS t ON cc.id = t.id
GROUP BY cc.DATE
When there are the three conditions then you have to pass the HAVING count(*) >= 2
SELECT cc.DATE
,sum(cc.sum)
FROM clicks AS cc
INNER JOIN (
SELECT id
FROM clicks AS c
LEFT JOIN marks AS m ON m.click_id = c.id
WHERE (
m.NAME = 'utm_source'
AND m.value = 'test_source1'
)
OR (
m.NAME = 'utm_medium'
AND m.value = 'test_medium1'
)
GROUP BY id
HAVING count(*) >= 2
) AS t ON cc.id = t.id
GROUP BY cc.DATE
Demo: http://sqlfiddle.com/#!9/fe571a/35
Hope this works for you...

You're getting 700 because the join generates multiple rows for the different IDs. There are 3 rows in the mark table with ID=1 and sum=100 and there are two rows with ID=2 and sum=200. On doing the join where shall have 3 rows with sum=100 and 2 rows with sum=200, so adding these sum gives 700. To fix this you have to aggregate on the click_id too as illustrated below:
select
c.date,
sum(c.sum)
from clicks as c
inner join (select * from marks where (name = 'utm_source' AND
value='test_source1') OR (name = 'utm_medium' AND value='test_medium1')
OR (name = 'utm_term' AND value='test_term1')
group by click_id) as m
ON m.click_id = c.id
group by c.date;
DEMO SQL FIDDLE

I found the right way myself, which works on large amounts of data
The main goal is to make request generate one table with subqueries(conditions) which do not depend on amount of data in results, so the best way is:
select
c.date,
sum(c.sum)
from clicks as c
join marks as m1 ON m1.click_id = c.id
join marks as m2 ON m2.click_id = c.id
join marks as m3 ON m3.click_id = c.id
where
(m1.name = 'utm_source' AND m1.value='test_source1') AND
(m2.name = 'utm_medium' AND m2.value='test_medium1') AND
(m3.name = 'utm_term' AND m3.value='test_term1')
group by date
So we need to make as many joins as many conditions we have

Related

Is there a method of counting an attribute that is in a GROUP BY clause?

I need have created a select statement to list out all the customers that have been to multiple merchants below.
I want to create another statement to display how many of those customers have been to each merchant.
What is the optimal method of approaching this problem?
Lists out all customers that have been to multiple merchants.
WITH valentinesDayMerchant AS (
SELECT m.MerchantId, m.MerchantGroupId, m.WebsiteName
FROM Merchant m
INNER JOIN OpeningHours oh ON m.MerchantId = oh.MerchantId AND oh.DayOfWeek = 'TUE'
LEFT JOIN devices.DeviceConnectionState AS dcs ON dcs.MerchantId = oh.MerchantId
WHERE MerchantStatus = '-' AND (m.PrinterType IN ('V','O') OR dcs.State = 1 OR dcs.StateTransitionDateTime > '2023-01-23')
)
SELECT DISTINCT ul.UserLoginId, ul.FullName, ul.EmailAddress, ul.Mobile
FROM dbo.UserLogin AS ul
INNER JOIN dbo.Patron AS p ON p.UserLoginId = ul.UserLoginId
INNER JOIN valentinesDayMerchant AS m ON (m.MerchantId = ul.ReferringMerchantId OR m.MerchantId IN (SELECT pml.MerchantId FROM dbo.PatronMerchantLink AS pml WHERE pml.PatronId = p.PatronId AND ISNULL(pml.IsBanned, 0) = 0))
LEFT JOIN (
SELECT mg.MerchantGroupId, mg.MerchantGroupName, groupHost.HostName [GroupHostName]
FROM dbo.MerchantGroup AS mg
INNER JOIN dbo.Merchant AS parent ON parent.MerchantId = mg.ParentMerchantId
INNER JOIN dbo.HttpHostName AS groupHost ON groupHost.MerchantID = parent.MerchantId AND groupHost.Priority = 0
) mGroup ON mGroup.MerchantGroupId = m.MerchantGroupId
LEFT JOIN (
SELECT po.PatronId, MAX(po.OrderDateTime) [LastOrder]
FROM dbo.PatronsOrder AS po
GROUP BY po.PatronId
) orders ON orders.PatronId = p.PatronId
INNER JOIN dbo.HttpHostName AS hhn ON hhn.MerchantID = m.MerchantId AND hhn.Priority = 1
WHERE ul.UserLoginId NOT IN (1,2,100,372) AND ul.UserStatus <> 'D' AND (
ISNULL(orders.LastOrder, '2000-01-01') > '2020-01-01' OR ul.RegistrationDate > '2022-01-01'
)
GROUP BY ul.UserLoginId, ul.FullName, ul.EmailAddress, ul.Mobile
HAVING COUNT(m.MerchantId) > 1
Methods I have tried include adding the merchant name to a group by and displaying the count of the customers, however this does not work as I cannot have anything related to the Merchant in the GROUP BY, or I wouldn't be able to use HAVING clause to identify the customers that have been to multiple merchants. I have also tried selecting all the merchants and counting the distinct customers which doesn't work as it takes into account all the customers, not specifically the customers that have been to multiple merchants only.

Mysql Select unique record based on multiple columns and display only group and sum amount

Hi I am trying to query a table that conatains multiple duplicates on Code,Amount and Status How will I do this if I only one to get a result group according to the client_group name and get the sum of amount under that group
SELECT `client`.`client_group`
, FORMAT(SUM(`Data_result`.`Data_result_amount` ),2) as sum
FROM
`qwer`.`Data_result`
INNER JOIN `qwer`.`Data`
ON (`Data_result`.`Data_result_lead` = `Data`.`Data_id`)
INNER JOIN `qwer`.`Data_status`
ON (`Data_result`.`Data_result_status_id` = `Data_status`.`Data_status_id`)
INNER JOIN `qwer`.`client`
ON (`Data`.`Data_client_id` = `client`.`client_id`)
WHERE `Data_status`.`Data_status_name` IN ('PAID') AND MONTH(`Data_result`.`result_ts`) = MONTH(CURRENT_DATE())
AND YEAR(`Data_result`.`result_ts`) = YEAR(CURRENT_DATE())
GROUP BY `client`.`client_group`
Result of said query:
Table
Try to distinct before run the 'sum' check whether this solve your problem
SELECT `client_group` , FORMAT(SUM(`Data_result_amount` ),2) as sum from (
SELECT DISTINCT `client`.`client_group` , `Data_result`.`Data_result_amount`
FROM
`qwer`.`Data_result`
INNER JOIN `qwer`.`Data`
ON (`Data_result`.`Data_result_lead` = `Data`.`Data_id`)
INNER JOIN `qwer`.`Data_status`
ON (`Data_result`.`Data_result_status_id` = `Data_status`.`Data_status_id`)
INNER JOIN `qwer`.`client`
ON (`Data`.`Data_client_id` = `client`.`client_id`)
WHERE `Data_status`.`Data_status_name` IN ('PAID') AND MONTH(`Data_result`.`result_ts`) = MONTH(CURRENT_DATE())
AND YEAR(`Data_result`.`result_ts`) = YEAR(CURRENT_DATE())
) T
GROUP BY `client_group`
you can check the query here http://sqlfiddle.com/#!9/36a3f8/6

How to get nested two values from nested select ? mysql

Query:
SELECT business.bussId,
(select count(invoices.userId) from invoice where invoice.userId = '3000' )
as invoiceCount,
(select SUM(invoices.price) from invoice where invoice.userId = '3000' )
as invoiceprice ,
FROM business WHERE business.bussId=100
How could I get invoice price and invoiceCount using one nested select ?
Move the subquery to the from clause:
SELECT b.bussId, i.invoiceCount, i.invoiceprice
FROM business b cross join
(select count(i.userId) as invoiceCount, SUM(i.price) as invoiceprice
from invoice i
where i.userId = '3000'
) i
WHERE b.bussId = 100;
You can actually write this without the subquery, but your question is specifically about using subqueries.
That form would be:
SELECT b.bussId, count(i.userId) as invoiceCount, SUM(i.price) as invoiceprice
FROM business b left join
invoice i
on i.userId = '3000' and b.bussId = 100
GROUP BY b.bussId;

MySQL search query with multiple joins and subqueries running slow

I have the following query which is actually within a stored procedure, but I removed it as there is too much going on inside the stored procedure. Basically this is the end result which takes ages (more than a minute) to run and I know the reason why - as you will also see from looking at the result of the explain - but I just cannot get it sorted.
Just to quickly explain what this query is doing. It is fetching all products from companies that are "connected" to the company where li.nToObjectID = 37. The result also returns some other information about the other companies like its name, company id, etc.
SELECT DISTINCT
SQL_CALC_FOUND_ROWS
p.id,
p.sTitle,
p.sTeaser,
p.TimeStamp,
p.ExpiryDate,
p.InStoreDate,
p.sCreator,
p.sProductCode,
p.nRetailPrice,
p.nCostPrice,
p.bPublic,
c.id as nCompanyID,
c.sName as sCompany,
m.id as nMID,
m.sFileName as sHighResFileName,
m.nSize,
(
Select sName
FROM tblBrand
WHERE id = p.nBrandID
) as sBrand,
(
Select t.sFileName
FROM tblThumbnail t
where t.nMediaID = m.id AND
t.sType = "thumbnail"
) as sFileName,
(
Select t.nWidth
FROM tblThumbnail t
where t.nMediaID = m.id AND
t.sType = "thumbnail"
) as nWidth,
(
Select t.nHeight
FROM tblThumbnail t
where t.nMediaID = m.id AND
t.sType = "thumbnail"
) as nHeight,
IF (
(
SELECT COUNT(id) FROM tblLink
WHERE
sType = "company"
AND sStatus = "active"
AND nToObjectID = 37
AND nFromObjectID = u.nCompanyID
),
1,
0
) AS bLinked
FROM tblProduct p
INNER JOIN tblMedia m
ON (
m.nTypeID = p.id AND
m.sType = "product"
)
INNER JOIN tblUser u
ON u.id = p.nUserID
INNER JOIN tblCompany c
ON u.nCompanyID = c.id
LEFT JOIN tblLink li
ON (
li.sType = "company"
AND li.sStatus = "active"
AND li.nToObjectID = 37
AND li.nFromObjectID = u.nCompanyID
)
WHERE c.bActive = 1
AND p.bArchive = 0
AND p.bActive = 1
AND NOW() <= p.ExpiryDate
AND (
li.id IS NOT NULL
OR (
li.id IS NULL
AND p.bPublic = 1
)
)
ORDER BY p.TimeStamp DESC
LIMIT 0, 52
Click here to see the output for EXPLAIN. Sorry, just couldn't get the formatting correct.
http://i60.tinypic.com/2hdqjgj.png
And lastly the number of rows for all the tables in this query:
tblProducts
Count: 5392
tblBrand
Count: 194
tblCompany
Count: 368
tblUser
Count: 416
tblMedia
Count: 5724
tblLink
Count: 24800
tblThumbnail
Count: 22207
So I have 2 questions:
1. Is there another way of writing this query which might potentially speed it up?
2. What index combination do I need for tblProducts so that not all the rows are searched through?
UPDATE 1
This is the new query after removing the subqueries and making use of left joins instead:
SELECT DISTINCT DISTINCT
SQL_CALC_FOUND_ROWS
p.id,
p.sTitle,
p.sTeaser,
p.TimeStamp,
p.ExpiryDate,
p.InStoreDate,
p.sCreator,
p.sProductCode,
p.nRetailPrice,
p.nCostPrice,
p.bPublic,
c.id as nCompanyID,
c.sName as sCompany,
m.id as nMID,
m.sFileName as sHighResFileName,
m.nSize,
brand.sName as sBrand,
thumb.sFilename,
thumb.nWidth,
thumb.nHeight,
IF (
(
SELECT COUNT(id) FROM tblLink
WHERE
sType = "company"
AND sStatus = "active"
AND nToObjectID = 37
AND nFromObjectID = u.nCompanyID
),
1,
0
) AS bLinked
FROM tblProduct p
INNER JOIN tblMedia m
ON (
m.nTypeID = p.id AND
m.sType = "product"
)
INNER JOIN tblUser u
ON u.id = p.nUserID
INNER JOIN tblCompany c
ON u.nCompanyID = c.id
LEFT JOIN tblLink li
ON (
li.sType = "company"
AND li.sStatus = "active"
AND li.nToObjectID = 37
AND li.nFromObjectID = u.nCompanyID
)
LEFT JOIN tblBrand AS brand
ON brand.id = p.nBrandID
LEFT JOIN tblThumbnail AS thumb
ON (
thumb.nMediaID = m.id
AND thumb.sType = 'thumbnail'
)
WHERE c.bActive = 1
AND p.bArchive = 0
AND p.bActive = 1
AND NOW() <= p.ExpiryDate
AND (
li.id IS NOT NULL
OR (
li.id IS NULL
AND p.bPublic = 1
)
)
ORDER BY p.TimeStamp DESC
LIMIT 0, 52;
UPDATE 2
ALTER TABLE tblThumbnail ADD INDEX (nMediaID,sType) USING BTREE;
ALTER TABLE tblMedia ADD INDEX (nTypeID,sType) USING BTREE;
ALTER TABLE tblProduct ADD INDEX (bArchive,bActive,ExpiryDate,bPublic,TimeStamp) USING BTREE;
After doing the above changes the explain showed that it is now only searching through 1464 rows on tblProduct instead of 5392.
That's a big query with a lot going on. It's going to take a few steps of work to optimize it. I will take the liberty of just presenting a couple of steps.
First step. Can you get rid of SQL_CALC_FOUND_ROWS and still have your program work correctly? If so, do that. When you specify SQL_CALC_FOUND_ROWS it sometimes means the server has to delay sending you the first row of your resultset until the last row is available.
Second step. Refactor the dependent subqueries to be JOINs instead.
Here's how you might approach that. Part of your query looks like this...
SELECT DISTINCT SQL_CALC_FOUND_ROWS
p.id,
...
c.id as nCompanyID,
...
m.id as nMID,
...
( /* dependent subquery to be removed */
Select sName
FROM tblBrand
WHERE id = p.nBrandID
) as sBrand,
( /* dependent subquery to be removed */
Select t.sFileName
FROM tblThumbnail t
where t.nMediaID = m.id AND
t.sType = "thumbnail"
) as sFileName,
( /* dependent subquery to be removed */
Select t.nWidth
FROM tblThumbnail t
where t.nMediaID = m.id AND
t.sType = "thumbnail"
) as nWidth,
( /* dependent subquery to be removed */
Select t.nHeight
FROM tblThumbnail t
where t.nMediaID = m.id AND
t.sType = "thumbnail"
) as nHeight,
...
Try this instead. Notice how the brand and thumbnail dependent subqueries disappear. You had three dependent subqueries for the thumbnail; they can disappear into a single JOIN.
SELECT DISTINCT SQL_CALC_FOUND_ROWS
p.id,
...
brand.sName,
thumb.sFilename,
thumb.nWidth,
thumb.nHeight,
...
FROM tblProduct p
INNER JOIN tblMedia AS m ON (m.nTypeID = p.id AND m.sType = 'product')
... (other table joins) ...
LEFT JOIN tblBrand AS brand ON p.id = p.nBrandID
LEFT JOIN tblMedia AS thumb ON (t.nMediaID = m.id AND thumb.sType = 'thumbnail')
I used LEFT JOIN rather than INNER JOIN so MySQL will present NULL values if the joined rows are missing.
Edit
You're using a join pattern that looks like this:
JOIN sometable AS s ON (s.someID = m.id AND s.sType = 'string')
You seem to do this for a few tables. You probably can speed up the JOIN operations by creating compound indexes in those tables. For example, try adding the following index to tblThumbnail: (sType, nMediaID). You can do that with this DDL statement.
ALTER TABLE tblThumbnail ADD INDEX (sType, nMediaID) USING BTREE
You can do similar things to other tables with the same join pattern.

How can I adjust a JOIN clause so that rows that have columns with NULL values are returned in the result?

How can I adjust this JOIN clause so that rows with a NULL value for the CountLocId or CountNatId columns are returned in the result?
In other words, if there is no match in the local_ads table, I still want the user's result from the nat_ads table to be returned -- and vice-versa.
SELECT u.franchise, CountLocId, TotalPrice, CountNatId, TotalNMoney, (
TotalPrice + TotalNMoney
)TotalRev
FROM users u
LEFT JOIN local_rev lr ON u.user_id = lr.user_id
LEFT JOIN (
SELECT lrr_id, COUNT( lad_id ) CountLocId, SUM( price ) TotalPrice
FROM local_ads
GROUP BY lrr_id
)la ON lr.lrr_id = la.lrr_id
LEFT JOIN nat_rev nr ON u.user_id = nr.user_id
INNER JOIN (
SELECT nrr_id, COUNT( nad_id ) CountNatId, SUM( tmoney ) TotalNMoney
FROM nat_ads
WHERE MONTH = 'April'
GROUP BY nrr_id
)na ON nr.nrr_id = na.nrr_id
WHERE lr.month = 'April'
AND franchise != 'Corporate'
ORDER BY franchise
Thanks in advance for your help!
try the following in where clause while making a left join. This will take all rows from right table with matched condition
eg.
LEFT JOIN local_rev lr ON (u.user_id = lr.user_id) or (u.user_id IS NULL)
Use this template, as it ensures that :
you have only one record per user_id (notice all subquerys have a GROUP BY user_id) so for one record on user table you have one (or none) record on subquery
independent joins (and calculated data) are not messed togeder
-
SELECT u.franchise, one.CountLocId, one.TotalPrice, two.CountNatId, two.TotalNMoney, (COALESCE(one.TotalPrice,0) + COALESCE(two.TotalNMoney,0)) TotalRev
FROM users u
LEFT JOIN (
SELECT x.user_id, sum(xORy.whatever) as TotalPrice, count(xORy.whatever) as CountLocId
FROM x -- where x is local_rev or local_ads I dont know
LEFT JOIN y on x.... = y.... -- where y is local_rev or local_ads I dont know
GROUP BY x.user_id
) as one on u.user_id = one.user_id
LEFT JOIN (
SELECT x.user_id, sum(xORy.whatever) as TotalNMoney, count(xORy.whatever) as CountNatId
FROM x -- where x is nat_rev or nat_ads I dont know
LEFT JOIN y on x.... = y.... -- where y is nat_rev or nat_ads I dont know
GROUP BY x.user_id
) as two on u.user_id = two.user_id