Count on multiple tables with missing zero counts

Count on multiple tables with missing zero counts - mysql

I am running this query to return data with count < 0. It works fine until count is > 0 and < 50. But when count becomes 0, it doesnot return the data. Count is defined by coupons`.`status. On count zero, there will be no data in coupons table with status as 1. This is creating the issue, as it omits the whole row.
SELECT count(*) AS count, clients.title, plans.name
FROM `coupons`
INNER JOIN `clients` ON `coupons`.`client_id` = `clients`.`id`
INNER JOIN `plans` ON `coupons`.`plan_id` = `plans`.`id`
WHERE `coupons`.`status` = 1
GROUP BY `coupons`.`client_id`, `coupons`.`plan_id`
HAVING count < 50
Please help how to fix it.
Table definitions.
coupons (id, client_id, plan_id, customer_id, status, code)
plans (id, name)
clients (id, name...)
client_plans (id, client_id, plan_id)
Basically, a client can have multiple plans and a plan can belong to multiple clients.
Coupons table stores predefined coupons which can be allocated to customers. Non allocated coupons have status as 0, while as allocated coupons get status as 1
Here I am trying to fetch non allocated client wise, plan wise coupon count where either the count is less than 50 or count has reached 0
For example,
If coupons table as 10 rows of client_id = 1 & plan_id = 1 with status as 1, it should return count as 10, but when the table has 0 rows with client_id = 1 and plan_id = 1 with status as 1, it does not return anything in the above query.

Thank you all for your inputs, this worked.
select
sum(CASE WHEN `coupons`.`status` = 1 THEN 1 ELSE 0 END) as count,
clients.title,
plans.name
from
`clients`
left join
`coupons`
on
`coupons`.`client_id` = `clients`.`id`
left join
`plans`
on
`coupons`.`plan_id` = `plans`.`id`
group by
`coupons`.`client_id`,
`coupons`.`plan_id`
having
count < 50

With the inner joins, the query is not going to return any "zero" counts.
If you want to return "zero" counts, you are going to need an outer join somewhere.
But it's not clear what you are actually trying to count.
Assuming that what you are trying to get is a count of rows from coupons, for every possible combination of rows from plans and clients, you could do something like this:
SELECT COUNT(`coupons`.`client_id`) AS `count`
, clients.title
, plans.name
FROM `plans`
CROSS
JOIN `clients`
LEFT
JOIN `coupons`
ON `coupons`.`client_id` = `clients`.`id`
AND `coupons`.`plan_id` = `plans`.`id`
AND `coupons`.`status` = 1
GROUP
BY `clients`.`id`
, `plans`.`id`
HAVING `count` < 50
This is just a guess at result set you are expecting to return. Absent table definitions, example data, and the expected result, we're just guessing.
FOLLOWUP
Based on your comment, it sounds like you want conditional aggregation.
To "count" only the rows in coupons that have status=1, you can do something like this:
SELECT SUM( `coupons`.`status` = 1 ) AS `count`
, clients.title
, plans.name
FROM `coupons`
JOIN `plans`
ON `plans`.`id` = `coupons`.`plan_id`
JOIN `clients`
ON `clients`.`id` = `coupons`.`client_id`
GROUP
BY `clients`.`id`
, `plans`.`id`
HAVING `count` < 50
There are other expressions you can use to get the conditional "count". For example
SELECT COUNT( IF(`coupons`.`status`=1, 1, NULL) ) AS `count`
or
SELECT SUM( IF(`coupons`.`status`=1, 1, 0) ) AS `count`
or, for a more ANSI standards compatible approach
SELECT SUM( CASE WHEN `coupons`.`status` = 1 THEN 1 ELSE 0 END ) AS `count`

Related

Select most recent record grouped by 3 columns

I am trying to return the price of the most recent record grouped by ItemNum and FeeSched, Customer can be eliminated. I am having trouble understanding how I can do that reasonably.
The issue is that I am joining about 5 tables containing hundreds of thousands of rows to end up with this result set. The initial query takes about a minute to run, and there has been some trouble with timeout errors in the past. Since this will run on a client's workstation, it may run even slower, and I have no access to modify server settings to increase memory / timeouts.
Here is my data:
Customer Price ItemNum FeeSched Date
5 70.75 01202 12 12-06-2017
5 70.80 01202 12 06-07-2016
5 70.80 01202 12 07-21-2017
5 70.80 01202 12 10-26-2016
5 82.63 02144 61 12-06-2017
5 84.46 02144 61 06-07-2016
5 84.46 02144 61 07-21-2017
5 84.46 02144 61 10-26-2016
I don't have access to create temporary tables, or views and there is no such thing as a #variable in C-tree, but in most ways it acts like MySql. I wanted to use something like GROUP BY ItemNum, FeeSched and select MAX(Date). The issue is that unless I put Price into the GROUP BY I get an error.
I could run the query again only selecting ItemNum, FeeSched, Date and then doing an INNER JOIN, but with the query taking a minute to run each time, it seems there is a better way that maybe I don't know.
Here is my query I am running, it isn't really that complicated of a query other than the amount of data it is processing. Final results are about 50,000 rows. I can't share much about the database structure as it is covered under an NDA.
SELECT DISTINCT
CustomerNum,
paid as Price,
ItemNum,
n.pdate as newest
from admin.fullproclog as f
INNER JOIN (
SELECT
id,
itemId,
MAX(TO_CHAR(pdate, 'MM-DD-YYYY')) as pdate
from admin.fullproclog
WHERE pdate > timestampadd(sql_tsi_year, -3, NOW())
group by id, itemId
) as n ON n.id = f.id AND n.itemId = f.itemId AND n.pdate = f.pdate
LEFT join (SELECT itemId AS linkid, ItemNum FROM admin.itemlist) AS codes ON codes.linkid = f.itemId AND ItemNum >0
INNER join (SELECT DISTINCT parent_id,
MAX(ins1.feesched) as CustomerNum
FROM admin.customers AS p
left join admin.feeschedule AS ins1
ON ins1.feescheduleid = p.primfeescheduleid
left join admin.group AS c1
ON c1.insid = ins1.feesched
WHERE status =1
GROUP BY parent_id)
AS ip ON ip.parent_id = f.parent_id
WHERE CustomerNum >0 AND ItemNum >0
UNION ALL
SELECT DISTINCT
CustomerNum,
secpaid as Price,
ItemNum,
n.pdate as newest
from admin.fullproclog as f
INNER JOIN (
SELECT
id,
itemId,
MAX(TO_CHAR(pdate, 'MM-DD-YYYY')) as pdate
from admin.fullproclog
WHERE pdate > timestampadd(sql_tsi_year, -3, NOW())
group by id, itemId
) as n ON n.id = f.id AND n.itemId = f.itemId AND n.pdate = f.pdate
LEFT join (SELECT itemId AS linkid, ItemNum FROM admin.itemlist) AS codes ON codes.linkid = f.itemId AND ItemNum >0
INNER join (SELECT DISTINCT parent_id,
MAX(ins1.feesched) as CustomerNum
FROM admin.customers AS p
left join admin.feeschedule AS ins1
ON ins1.feescheduleid = p.secfeescheduleid
left join admin.group AS c1
ON c1.insid = ins1.feesched
WHERE status =1
GROUP BY parent_id)
AS ip ON ip.parent_id = f.parent_id
WHERE CustomerNum >0 AND ItemNum >0

I feel it quite simple when I'd read the first three paragraphs, but I get a little confused when I've read the whole question.
Whatever you have done to get the data posted above, once you've got the data like that it's easy to retrive "the most recent record grouped by ItemNum and FeeSched".
How to:
Firstly, sort the whole result set by Date DESC.
Secondly, select fields you need from the sorted result set and group by ItemNum, FeeSched without any aggregation methods.
So, the query might be something like this:
SELECT t.Price, t.ItemNum, t.FeeSched, t.Date
FROM (SELECT * FROM table ORDER BY Date DESC) AS t
GROUP BY t.ItemNum, t.FeeSched;
How it works:
When your data is grouped and you select rows without aggregation methods, it will only return you the first row of each group. As you have sorted all rows before grouping, so the first row would exactly be "the most recent record".
Contact me if you got any problems or errors with this approach.

You can also try like this:
Select Price, ItemNum, FeeSched, Date from table where Date IN (Select MAX(Date) from table group by ItemNum, FeeSched,Customer);
Internal sql query return maximum date group by ItemNum and FeeSched and IN statement fetch only the records with maximum date.

Group by and group concat , optimization mysql query without using main pk

my example is on
MYSQL VERSION is
5.6.34-log
Problem summary the below query takes 40 seconds, ORDER_ITEM table
has 758423 records
And PAYMENT table
has 177272 records
And submission_entry table
has 2165698 records
as A Whole Table count.
DETAILS HERE: BELOW:
I Have This Query, Refer to [1]
I Have added SQL_NO_CACHE for testing repeated tests when re
query.
I Have Optimized indexes Refer to [2], but no significant
improvement.
Find Table Structures here [3]
Find explain plan used [4]
[1]
SELECT SQL_NO_CACHE
`payment`.`id` AS id,
`order_item`.`order_id` AS order_id,
GROUP_CONCAT(DISTINCT (CASE WHEN submission_entry.text = '' OR submission_entry.text IS NULL
THEN ' '
ELSE submission_entry.text END) ORDER BY question.var DESC SEPARATOR 0x1D) AS buyer,
event.name AS event,
COUNT(DISTINCT CASE WHEN (`order_item`.status > 0 OR (
`order_item`.status != -1 AND `order_item`.status >= -2 AND `payment`.payment_type_id != 8 AND
payment.make_order_free = 1))
THEN `order_item`.id
ELSE NULL END) AS qty,
payment.currency AS `currency`,
(SELECT SUM(order_item.sub_total)
FROM order_item
WHERE payment_id =
payment.id) AS sub_total,
CASE WHEN payment.make_order_free = 1
THEN ROUND(payment.total + COALESCE(refunds_total, 0), 2)
ELSE ROUND(payment.total, 2) END AS 'total',
`payment_type`.`name` AS payment_type,
payment_status.name AS status,
`payment_status`.`id` AS status_id,
DATE_FORMAT(CONVERT_TZ(order_item.`created`, '+0:00', '-8:00'),
'%Y-%m-%d %H:%i') AS 'created',
`user`.`name` AS 'agent',
event.id AS event_id,
payment.checked,
DATE_FORMAT(CONVERT_TZ(payment.checked_date, '+0:00', '-8:00'),
'%Y-%m-%d %H:%i') AS checked_date,
DATE_FORMAT(CONVERT_TZ(`payment`.`complete_date`, '+0:00', '-8:00'),
'%Y-%m-%d %H:%i') AS `complete date`,
`payment`.`delivery_status` AS `delivered`
FROM `order_item`
INNER JOIN `payment`
ON payment.id = `order_item`.`payment_id` AND (payment.status > 0.0 OR payment.status = -3.0)
LEFT JOIN (SELECT
sum(`payment_refund`.total) AS `refunds_total`,
payment_refunds.payment_id AS `payment_id`
FROM payment
INNER JOIN `payment_refunds` ON payment_refunds.payment_id = payment.id
INNER JOIN `payment` AS `payment_refund`
ON `payment_refund`.id = `payment_refunds`.payment_id_refund
GROUP BY `payment_refunds`.payment_id) AS `refunds` ON `refunds`.payment_id = payment.id
# INNER JOIN event_date_product ON event_date_product.id = order_item.event_date_product_id
# INNER JOIN event_date ON event_date.id = event_date_product.event_date_id
INNER JOIN event ON event.id = order_item.event_id
INNER JOIN payment_status ON payment_status.id = payment.status
INNER JOIN payment_type ON payment_type.id = payment.payment_type_id
LEFT JOIN user ON user.id = payment.completed_by
LEFT JOIN submission_entry ON submission_entry.form_submission_id = `payment`.`form_submission_id`
LEFT JOIN question ON question.id = submission_entry.question_id AND question.var IN ('name', 'email')
WHERE 1 = '1' AND (order_item.status > 0.0 OR order_item.status = -2.0)
GROUP BY `order_item`.`order_id`
HAVING 1 = '1'
ORDER BY `order_item`.`order_id` DESC
LIMIT 10
[2]
CREATE INDEX order_id
ON order_item (order_id);
CREATE INDEX payment_id
ON order_item (payment_id);
CREATE INDEX status
ON order_item (status);
Second Table
CREATE INDEX payment_type_id
ON payment (payment_type_id);
CREATE INDEX status
ON payment (status);
[3]
CREATE TABLE order_item
(
id INT AUTO_INCREMENT
PRIMARY KEY,
order_id INT NOT NULL,
form_submission_id INT NULL,
status DOUBLE DEFAULT '0' NULL,
payment_id INT DEFAULT '0' NULL
);
SECOND TABLE
CREATE TABLE payment
(
id INT AUTO_INCREMENT,
payment_type_id INT NOT NULL,
status DOUBLE NOT NULL,
form_submission_id INT NOT NULL,
PRIMARY KEY (id, payment_type_id)
);
[4] Run the snippet to see the table of EXPLAIN in HTML format
<!DOCTYPE html>
<html>
<head>
<title></title>
</head>
<body>
<table border="1" style="border-collapse:collapse">
<tr><th>id</th><th>select_type</th><th>table</th><th>type</th><th>possible_keys</th><th>key</th><th>key_len</th><th>ref</th><th>rows</th><th>Extra</th></tr>
<tr><td>1</td><td>PRIMARY</td><td>payment_status</td><td>range</td><td>PRIMARY</td><td>PRIMARY</td><td>8</td><td>NULL</td><td>4</td><td>Using where; Using temporary; Using filesort</td></tr>
<tr><td>1</td><td>PRIMARY</td><td>payment</td><td>ref</td><td>PRIMARY,payment_type_id,status</td><td>status</td><td>8</td><td>exp_live_18092017.payment_status.id</td><td>17357</td><td></td></tr>
<tr><td>1</td><td>PRIMARY</td><td>payment_type</td><td>eq_ref</td><td>PRIMARY</td><td>PRIMARY</td><td>4</td><td>exp_live_18092017.payment.payment_type_id</td><td>1</td><td></td></tr>
<tr><td>1</td><td>PRIMARY</td><td>user</td><td>eq_ref</td><td>PRIMARY</td><td>PRIMARY</td><td>4</td><td>exp_live_18092017.payment.completed_by</td><td>1</td><td></td></tr>
<tr><td>1</td><td>PRIMARY</td><td>submission_entry</td><td>ref</td><td>form_submission_id,idx_submission_entry_1</td><td>form_submission_id</td><td>4</td><td>exp_live_18092017.payment.form_submission_id</td><td>2</td><td></td></tr>
<tr><td>1</td><td>PRIMARY</td><td>question</td><td>eq_ref</td><td>PRIMARY,var</td><td>PRIMARY</td><td>4</td><td>exp_live_18092017.submission_entry.question_id</td><td>1</td><td>Using where</td></tr>
<tr><td>1</td><td>PRIMARY</td><td>order_item</td><td>ref</td><td>status,payment_id</td><td>payment_id</td><td>5</td><td>exp_live_18092017.payment.id</td><td>3</td><td>Using where</td></tr>
<tr><td>1</td><td>PRIMARY</td><td>event</td><td>eq_ref</td><td>PRIMARY</td><td>PRIMARY</td><td>4</td><td>exp_live_18092017.order_item.event_id</td><td>1</td><td></td></tr>
<tr><td>1</td><td>PRIMARY</td><td><derived3></td><td>ref</td><td>key0</td><td>key0</td><td>5</td><td>exp_live_18092017.payment.id</td><td>10</td><td>Using where</td></tr>
<tr><td>3</td><td>DERIVED</td><td>payment_refunds</td><td>index</td><td>payment_id,payment_id_refund</td><td>payment_id</td><td>4</td><td>NULL</td><td>1110</td><td></td></tr>
<tr><td>3</td><td>DERIVED</td><td>payment</td><td>ref</td><td>PRIMARY</td><td>PRIMARY</td><td>4</td><td>exp_live_18092017.payment_refunds.payment_id</td><td>1</td><td>Using index</td></tr>
<tr><td>3</td><td>DERIVED</td><td>payment_refund</td><td>ref</td><td>PRIMARY</td><td>PRIMARY</td><td>4</td><td>exp_live_18092017.payment_refunds.payment_id_refund</td><td>1</td><td></td></tr>
<tr><td>2</td><td>DEPENDENT SUBQUERY</td><td>order_item</td><td>ref</td><td>payment_id</td><td>payment_id</td><td>5</td><td>func</td><td>3</td><td></td></tr></table>
</body>
</html>
Expected Restul
It has to be instead of 40 seconds less than 5
IMPORTANT
Updates
1) Reply to comment 1: there is no foreign key at all on those two tables.
UPDATE-1:
On local the original query takes 40 seconds
if i removed only the following it becomes 25 seconds saves 15 seconds
GROUP_CONCAT(DISTINCT (CASE WHEN submission_entry.text = '' OR submission_entry.text IS NULL
THEN ' '
ELSE submission_entry.text END) ORDER BY question.var DESC SEPARATOR 0x1D) AS buyer
if I removed only its the same time around 40 seconds no save!
COUNT(DISTINCT CASE WHEN (`order_item`.status > 0 OR (
`order_item`.status != -1 AND `order_item`.status >= -2 AND `payment`.payment_type_id != 8 AND
payment.make_order_free = 1))
THEN `order_item`.id
ELSE NULL END) AS qty,
if I removed only it takes around 36 seconds saves 4 seconds
(SELECT SUM(order_item.sub_total)
FROM order_item
WHERE payment_id =
payment.id) AS sub_total,
CASE WHEN payment.make_order_free = 1
THEN ROUND(payment.total + COALESCE(refunds_total, 0), 2)
ELSE ROUND(payment.total, 2) END AS 'total',

Remove HAVING 1=1; the Optimizer may not be smart enough to ignore it. Please provide EXPLAIN SELECT (not in html) to see what the Optimizer is doing.
It seems wrong to have a composite PK in this case: PRIMARY KEY (id, payment_type_id). Please justify it.
Please explain the meaning of status or the need for DOUBLE: status DOUBLE
It will take some effort to figure out why the query is so slow. Let's start by tossing the normalization parts, such as dates and event name and currency. That is whittle down the query to enough to find the desired rows, but not the details on each row. If it is still slow, let's debug that. If it is 'fast', then add back on the other stuff, one by one, to find out what is causing a performance issue.
Is just id the PRIMARY KEY of each table? Or are there more exceptions (like payment)?
It seems 'wrong' to specify a value for question.var, but then use LEFT to imply that it is optional. Please change all LEFT JOINs to INNER JOINs unless I am mistaken on this issue.
Are any of the tables (perhaps submission_entry and event_date_product) "many-to-many" mapping tables? If so, then follow the tips here to get some performance gains.
When you come back please provide SHOW CREATE TABLE for each table.

Guided by the strategies below,
pre-evaluating agregations onto temporary tables
placing payment at the top - since this seems to be the most deterministic
grouping joins - enforcing to the query optimizer the tables relationship
i present a revised version of your query:
-- -----------------------------------------------------------------------------
-- Summarization of order_item
-- -----------------------------------------------------------------------------
drop temporary table if exists _ord_itm_sub_tot;
create temporary table _ord_itm_sub_tot(
primary key (payment_id)
)
SELECT
payment_id,
--
COUNT(
DISTINCT
CASE
WHEN(
`order_item`.status > 0 OR
(
`order_item`.status != -1 AND
`order_item`.status >= -2 AND
`payment`.payment_type_id != 8 AND
payment.make_order_free = 1
)
) THEN `order_item`.id
ELSE NULL
END
) AS qty,
--
SUM(order_item.sub_total) sub_total
FROM
order_item
inner join payment
on payment.id = order_item.payment_id
where order_item.status > 0.0 OR order_item.status = -2.0
group by payment_id;
-- -----------------------------------------------------------------------------
-- Summarization of payment_refunds
-- -----------------------------------------------------------------------------
drop temporary table if exists _pay_ref_tot;
create temporary table _pay_ref_tot(
primary key(payment_id)
)
SELECT
payment_refunds.payment_id AS `payment_id`,
sum(`payment_refund`.total) AS `refunds_total`
FROM
`payment_refunds`
INNER JOIN `payment` AS `payment_refund`
ON `payment_refund`.id = `payment_refunds`.payment_id_refund
GROUP BY `payment_refunds`.payment_id;
-- -----------------------------------------------------------------------------
-- Summarization of submission_entry
-- -----------------------------------------------------------------------------
drop temporary table if exists _sub_ent;
create temporary table _sub_ent(
primary key(form_submission_id)
)
select
submission_entry.form_submission_id,
GROUP_CONCAT(
DISTINCT (
CASE WHEN coalesce(submission_entry.text, '') THEN ' '
ELSE submission_entry.text
END
)
ORDER BY question.var
DESC SEPARATOR 0x1D
) AS buyer
from
submission_entry
LEFT JOIN question
ON(
question.id = submission_entry.question_id
AND question.var IN ('name', 'email')
)
group by submission_entry.form_submission_id;
-- -----------------------------------------------------------------------------
-- The result
-- -----------------------------------------------------------------------------
SELECT SQL_NO_CACHE
`payment`.`id` AS id,
`order_item`.`order_id` AS order_id,
--
_sub_ent.buyer,
--
event.name AS event,
--
_ord_itm_sub_tot.qty,
--
payment.currency AS `currency`,
--
_ord_itm_sub_tot.sub_total,
--
CASE
WHEN payment.make_order_free = 1 THEN ROUND(payment.total + COALESCE(refunds_total, 0), 2)
ELSE ROUND(payment.total, 2)
END AS 'total',
--
`payment_type`.`name` AS payment_type,
`payment_status`.`name` AS status,
`payment_status`.`id` AS status_id,
--
DATE_FORMAT(
CONVERT_TZ(order_item.`created`, '+0:00', '-8:00'),
'%Y-%m-%d %H:%i'
) AS 'created',
--
`user`.`name` AS 'agent',
event.id AS event_id,
payment.checked,
--
DATE_FORMAT(CONVERT_TZ(payment.checked_date, '+0:00', '-8:00'), '%Y-%m-%d %H:%i') AS checked_date,
DATE_FORMAT(CONVERT_TZ(payment.complete_date, '+0:00', '-8:00'), '%Y-%m-%d %H:%i') AS `complete date`,
--
`payment`.`delivery_status` AS `delivered`
FROM
`payment`
INNER JOIN(
`order_item`
INNER JOIN event
ON event.id = order_item.event_id
)
ON `order_item`.`payment_id` = payment.id
--
inner join _ord_itm_sub_tot
on _ord_itm_sub_tot.payment_id = payment.id
--
LEFT JOIN _pay_ref_tot
on _pay_ref_tot.payment_id = `payment`.id
--
INNER JOIN payment_status ON payment_status.id = payment.status
INNER JOIN payment_type ON payment_type.id = payment.payment_type_id
LEFT JOIN user ON user.id = payment.completed_by
--
LEFT JOIN _sub_ent
on _sub_ent.form_submission_id = `payment`.`form_submission_id`
WHERE
1 = 1
AND (payment.status > 0.0 OR payment.status = -3.0)
AND (order_item.status > 0.0 OR order_item.status = -2.0)
ORDER BY `order_item`.`order_id` DESC
LIMIT 10
The query from your question present aggregated functions without explicit groupings... this is pretty awkward and in my solution I try to devise aggregations that 'make sense'.
Please, run this version and tell us your findings.
Be, please, very careful not just on the running statistics, but also on the summarization results.

(The tables and query are too complex for me to do the transformation for you. But here are the steps.)
Reformulate the query without any mention of refunds. That is, remove the derived table and the mention of it in the complex CASE.
Debug and time the resulting query. Keep the GROUP BY order_item ORDER BY order_item DESC LIMIT 10 and do any other optimizations already suggested. In particular, get rid of HAVING 1=1 since it is in the way of a likely optimization.
Make the query from step #2 be a 'derived table'...
Something like:
SELECT lots of stuff
FROM ( query from step 2 ) AS step2
LEFT JOIN ( ... ) AS refunds ON step2... = refunds...
ORDER BY step2.order_item DESC
The ORDER BY is repeated, but neither the GROUP BY, nor the LIMIT need be repeated.
Why? The principle here is...
Currently, it is going into the refunds correlated subquery thousands of times, only to toss it all but 10 times. The reformulation cuts that back to only 10 times.
(Caveat: I may have missed a subtlety preventing this reformulation from working as I presented it. If it does not work, see if you can make the 'principle' help you anyway.)

Here is the minimum you should do each time you see a query with a lot of joins and pagination: you should select those 10 (LIMIT 10) ids that you group by from the first table (order_item) with as minimum joins as possible and then join the ids back to the first table and make all other joins. That way you will not move around in temporary tables all those thousands of columns and rows that you do not need to display.
You look at the inner joins and WHERE conditions, GROUP BYs and ORDER BYs to see whether you need any other tables to filter out rows, group or order ids from the first table. In your case, it doesn't seem you need any joins, except for payment.
Now you write the query to select those ids:
SELECT o.order_id, o.payment_id
FROM order_item o
JOIN payment p
ON p.id = o.payment_id AND (p.status > 0.0 OR p.status = -3.0)
WHERE order_item.status > 0.0 OR order_item.status = -2.0
ORDER BY order_id DESC
LIMIT 10
If there might be several payments for a single order, you should use GROUP BY order_id DESC instead of ORDER BY. To make the query work quicker you need a BTREE index on status column for order_item table, or even a composite index on (status, payment_id).
Now, when you are sure that the ids are those that you expected, you make all other joins:
SELECT order_item.order_id,
`payment`.`id`,
GROUP_CONCAT ... -- and so on from the original query
FROM (
SELECT o.order_id, o.payment_id
FROM order_item o
JOIN payment p
ON p.id = o.payment_id AND (p.status > 0.0 OR p.status = -3.0)
WHERE order_item.status > 0.0 OR order_item.status = -2.0
ORDER BY order_id DESC
LIMIT 10
) as ids
JOIN order_item ON ids.order_id = order_item.order_id
JOIN payment ON ids.payment_id = payment.id
LEFT JOIN ( ... -- and so on
The idea is that you significantly lower the temporary tables you need to process. Now every row selected by the joins will be used in the result set.
UPD1: Another thing is that you should simplify the aggregation in LEFT JOIN:
SELECT
sum(payment.total) AS `refunds_total`,
refs.payment_id AS `payment_id`
FROM payment_refunds refs
JOIN payment ON payment.id = refs.payment_id_refund
GROUP BY refs.payment_id
or even replace the LEFT JOIN with a correlated subquery, since the correlation will be executed only for those 10 rows (make sure, you use this whole query with three columns as the subquery, otherwise, the correlation will be computed for each row in the resulting join before the GROUP BY):
SELECT
ids.order_id,
ids.payment_id,
(SELECT SUM(p.total)
FROM payment_refunds refs
JOIN payment p
ON refs.payment_id_refund = p.id
WHERE refs.payment_id = ids.payment_id
) as refunds_total
FROM (
SELECT o.order_id, o.payment_id
FROM order_item o
JOIN payment p
ON p.id = o.payment_id AND (p.status > 0.0 OR p.status = -3.0)
WHERE order_item.status > 0.0 OR order_item.status = -2.0
ORDER BY order_id DESC
LIMIT 10
) as ids
You will also need to an index (payment_id, payment_id_refund) on payment_refunds and you can even try a covering index (payment_id, total) on payment as well.

MySQL row count

I have a very large table (~1 000 000 rows) and complicated query with unions, joins and where statements (user can select different ORDER BY columns and directions). I need to get a row count for pagination. If I run query without counting rows it completes very fast. How can I implement pagination in fastest way?
I tried to use EXPLAIN SELECT and SHOW TABLE STATUS to get approximate row count, but it is very different from real row count.
My query is like this one (simplyfied):
SELECT * FROM (
(
SELECT * FROM table_1
LEFT JOIN `table_a` ON table_1.record_id = table_a.id
LEFT JOIN `table_b` ON table_a.id = table_b.record_id
WHERE table_1.a > 10 AND table_a.b < 500 AND table_b.c = 1
ORDER BY x ASC
LIMIT 0, 10
)
UNION
(
SELECT * FROM table_2
LEFT JOIN `table_a` ON table_2.record_id = table_a.id
LEFT JOIN `table_b` ON table_a.id = table_b.record_id
WHERE table_2.d < 10 AND table_a.e > 500 AND table_b.f = 1
ORDER BY x ASC
LIMIT 0, 10
)
) tbl ORDER BY x ASC LIMIT 0, 10
Query result without limiting is about ~100 000 rows, how can I get this approximate count in fastest way?
My production query example is like this one:
SELECT SQL_CALC_FOUND_ROWS * FROM (
(
SELECT
articles_log.id AS log_id, articles_log.source_table,
articles_log.record_id AS id, articles_log.dat AS view_dat,
articles_log.lang AS view_lang, '1' AS view_count, '1' AS unique_view_count,
articles_log.user_agent, articles_log.ref, articles_log.ip,
articles_log.ses_id, articles_log.bot, articles_log.source_type, articles_log.link,
articles_log.user_country, articles_log.user_platform,
articles_log.user_os, articles_log.user_browser,
`contents`.dat AS source_dat, `contents_trans`.header, `contents_trans`.custom_text
FROM articles_log
INNER JOIN `contents` ON articles_log.record_id = `contents`.id
AND articles_log.source_table = 'contents'
INNER JOIN `contents_trans` ON `contents`.id = `contents_trans`.record_id
AND `contents_trans`.lang='lv'
WHERE articles_log.dat > 0
AND articles_log.dat >= 1488319200
AND articles_log.dat <= 1489355999
AND articles_log.bot = '0'
AND (articles_log.record_id NOT LIKE '%\_404' AND articles_log.record_id <> '404'
OR articles_log.source_table <> 'contents')
)
UNION
(
SELECT
articles_log.id AS log_id, articles_log.source_table,
articles_log.record_id AS id, articles_log.dat AS view_dat,
articles_log.lang AS view_lang, '1' AS view_count, '1' AS unique_view_count,
articles_log.user_agent, articles_log.ref, articles_log.ip,
articles_log.ses_id, articles_log.bot,
articles_log.source_type, articles_log.link,
articles_log.user_country, articles_log.user_platform,
articles_log.user_os, articles_log.user_browser,
`news`.dat AS source_dat, `news_trans`.header, `news_trans`.custom_text
FROM articles_log
INNER JOIN `news` ON articles_log.record_id = `news`.id
AND articles_log.source_table = 'news'
INNER JOIN `news_trans` ON `news`.id = `news_trans`.record_id
AND `news_trans`.lang='lv'
WHERE articles_log.dat > 0
AND articles_log.dat >= 1488319200
AND articles_log.dat <= 1489355999
AND articles_log.bot = '0'
AND (articles_log.record_id NOT LIKE '%\_404' AND articles_log.record_id <> '404'
OR articles_log.source_table <> 'contents')
)
) tbl ORDER BY view_dat ASC LIMIT 0, 10
Many thanks!

If you can use UNION ALL instead of UNION (which is a shortcut for UNION DISTINCT) - In other words - If you don't need to remove duplicates you can try to add the counts of the two subqueries:
SELECT
(
SELECT COUNT(*) FROM table_1
LEFT JOIN `table_a` ON table_1.record_id = table_a.id
LEFT JOIN `table_b` ON table_a.id = table_b.record_id
WHERE table_1.a > 10 AND table_a.b < 500 AND table_b.c = 1
)
+
(
SELECT COUNT(*) FROM table_2
LEFT JOIN `table_a` ON table_2.record_id = table_a.id
LEFT JOIN `table_b` ON table_a.id = table_b.record_id
WHERE table_2.d < 10 AND table_a.e > 500 AND table_b.f = 1
)
AS cnt
Without ORDER BY and without UNION the engine might not need to create a huge temp table.
Update
For your original query try the following:
Select only count(*).
Remove OR articles_log.source_table <> 'contents' from first part (contents) since we know it's never true.
Remove AND (articles_log.record_id NOT LIKE '%\_404' AND articles_log.record_id <> '404' OR articles_log.source_table <> 'contents') from second part (news) since we know it's allways true because OR articles_log.source_table <> 'contents' is allways true.
Remove the joins with contents and news. You can join the *_trans tables directly using record_id
Remove articles_log.dat > 0 since it's redundant with articles_log.dat >= 1488319200
The resulting query:
SELECT (
SELECT COUNT(*)
FROM articles_log
INNER JOIN `contents_trans`
ON `contents_trans`.record_id = articles_log.record_id
AND `contents_trans`.lang='lv'
WHERE articles_log.bot = '0'
AND articles_log.dat >= 1488319200
AND articles_log.dat <= 1489355999
AND articles_log.record_id NOT LIKE '%\_404'
AND articles_log.record_id <> '404'
) + (
SELECT COUNT(*)
FROM articles_log
INNER JOIN `news_trans`
ON `news_trans`.record_id = articles_log.record_id
AND `news_trans`.lang='lv'
WHERE articles_log.bot = '0'
AND articles_log.dat >= 1488319200
AND articles_log.dat <= 1489355999
) AS cnt
Try the following index combinations:
articles_log(bot, dat, record_id)
contents_trans(lang, record_id)
news_trans(lang, record_id)
or
contents_trans(lang, record_id)
news_trans(lang, record_id)
articles_log(record_id, bot, dat)
It depends on the data, which combination ist the better one.
I might be wrong on one ore more points, since i don't know your data and business logic. If so, try to adjust the other.

You can get the calculation when you run the query using SQL_CALC_FOUND_ROWS as explained in the documentation:
select SQL_CALC_FOUND_ROWS *
. . .
And then running:
select FOUND_ROWS()
However, the first run needs to generate all the data, so you are going to get up to 20 possible rows -- I don't think it respects LIMIT in subqueries.
Given the structure of your query and you want to do, I would think first about optimizing the query. For instance, is UNION really needed (it incurs overhead for removing duplicates)? As pointed out in a comment, your joins are really inner joins disguised as outer joins. Indexes might improve performance.
You might want to ask another question, providing sample data and desired results to get advice on such issues.

how to mysql group by date multiple left join

I have the following schema:
http://sqlfiddle.com/#!9/bd3a4/1
I would like to
group by date() and add where user_id = ?..
per day and count the results per day.
required result Day|TotalRequests|TotalOrders

Since you could have an order on Day 1, and a request on Day 8, you may have entries on one side but not the other. To qualifify your needs, I would do a UNION of all orders and requests individually by date. Then roll those values up. The inner Pre-Aggregate result query is where the WHERE clause per user would be applied. The pre-aggregate query also has a recSource column to indicate where the record originated from as 'O' from orders and 'R' from requests, so the roll-up knows which column to store the total count respectively.
select
preAgg.recDate,
SUM( case when preAgg.recSource = 'O' then preAgg.recCount else 0 end ) as OrderCount,
SUM( case when preAgg.recSource = 'R' then preAgg.recCount else 0 end ) as RequestCount
from
( select
date(o.created_at) recDate,
'O' as recSource,
count(*) as recCount
from
orders o
where
o.user_id = 3
group by
date(o.created_at)
UNION ALL
select
date(r.created_at) recDate,
'R' as recSource,
count(*) as recCount
from
requests r
where
r.user_id = 3
group by
date(r.created_at) ) preAgg
group by
preAgg.recDate
order by
preAgg.recDate
For query optimization, I would ensure your order and request table both have have an index on ( user_id, created_at ).
SQL Fiddle result

You can use the following query:
SELECT
DATE(o.created_at) AS Day
,COUNT(r.id) AS TotalRequests
,COUNT(o.id) AS TotalOrders
FROM orders o
LEFT JOIN
requests r ON
r.id = o.request_id
WHERE o.user_id = 3
GROUP BY DATE(r.created_at), DATE(o.created_at),o.user_id

Fixing SQL Query so it will become more Efficient

I've got 3 tables:
mobile_users - with id,phone_type,...
2+3. iphone_purchases AND android_purchases - with id,status,user_id,..
I am trying to get all of the users who made 2 or more purchases.
successful purchase is identified by status > 0.
Also I am tring to get the total amount of users in the mobile_users table in the same query.
this is the query I came up with:
SELECT COUNT(*) AS `users`,
( SELECT COUNT(*)
FROM `mobile_users`
) AS `total`
FROM `mobile_users`
WHERE `mobile_users`.`phone_type` = 'iphone'
AND ( SELECT COUNT(*)
FROM ( SELECT `status`,
`user_id`
FROM `iphone_purchases`
UNION
SELECT `status`,
`user_id`
FROM `android_purchases`
) AS `purchase_list`
WHERE `purchase_list`.`status` > 0
AND `purchase_list`.`user_id` = `mobile_users`.`id`
) >= 2
It's very slow, and I have to find a way to improve it.
Any help would be appreciated!
Edit:
Also you should take in consideration that i'm building this query with sub-queries in PHP.
I'm building it with more conditions on the WHERE statment.

Your query is just returning counts of users, not each user.
The following restructures your query. It counts the number of purchases for iphones and androids separately, and then combines them using left outer join. The where clause simply combines the counts:
select mu.*, i.cnt as iphones, a.cnt as androids
from mobile_users mu left outer join
(SELECT `user_id`, count(*) as cnt
FROM `iphone_purchases`
where `status` > 0
group by user_id
) i
on i.user_id = mu.id left outer join
(SELECT `user_id`, count(*) as cnt
FROM `android_purchases`
where `status` > 0
group by user_id
) a
on a.user_id = mu.id
where coalesce(i.cnt, 0) + coalesce(a.cnt, 0) >= 2;

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Count on multiple tables with missing zero counts - mysql

Related

Select most recent record grouped by 3 columns

Group by and group concat , optimization mysql query without using main pk

MySQL row count

how to mysql group by date multiple left join

Fixing SQL Query so it will become more Efficient

Categories

Resources