Best way to subtract SUM and COUNT with 2 table select? - mysql

I have came up with solution to count total of fields based on specific group, but it looks quite lengthy to get to the result i expect.
I have some basic knowledge when it comes to sql.
Is there obvious improvements to be made and why?
Why i would like to shorten this: Easier to implement in ORM type systems.
Changing scheme is not an option.
Schema and sample data: http://sqlfiddle.com/#!9/62df6
Query i'm using:
SELECT s.release_id,
(s.shipments_total - IFNULL(sh.shipment_entries, 0)) AS shipments_left
FROM
( SELECT release_id,
SUM(shipments) AS shipments_total
FROM subscriptions
WHERE is_paid = 1
AND shipments > 1
GROUP BY release_id ) AS s
LEFT JOIN
( SELECT release_id,
COUNT(*) AS shipment_entries
FROM shipments
GROUP BY release_id ) AS sh ON s.release_id = sh.release_id
Expected result on sample data is in sqlfiddle.

If you bring the condition in-line and remove the group by, then you don't need ifnull():
SELECT s.release_id,
(SUM(s.Shipments) -
(SELECT COUNT(*)
FROM shipments sh
WHERE sh.release_id = s.release_id
)
) AS shipments_left
FROM subscriptions s
WHERE is_paid = 1 AND shipments > 1
GROUP BY s.release_id;
The subquery returns 0 if nothing matches, not NULL (with the GROUP BY, it would return NULL). I am not sure if this is easier with your ORM model. Your original version is fine from a SQL point of view.

You can bring the join inline instead:
SELECT s.release_id,
SUM(s.Shipments) - IFNULL(( SELECT COUNT(*) AS shipment_entries
FROM shipments sh
WHERE sh.release_id = s.release_id
GROUP BY sh.release_id ), 0) AS shipments_left
FROM subscriptions s
WHERE is_paid = 1
AND shipments > 1
GROUP BY s.release_id
The execution plan for this is more performant too.

Related

Using the results of a function multiple times for duplicates - SQL

I am trying to produce a result that shows duplicates in a table. One method I found for getting duplicates and showing them is to run the select statement again through an inner join. However, one of my columns needs to be the result of a function, and the only thing I can think to do is use an alias, however I can't use the alias twice in a SELECT statement.
I am not sure what the best way to run this code for getting the duplicates I need.
My code below
SELECT EXTRACT(YEAR_MONTH FROM date) as 'ndate', a.transponderID
FROM dispondo_prod_disposition.event a
inner JOIN (SELECT EXTRACT(YEAR_MONTH FROM date) as ???,
transponderID, COUNT(*)
FROM dispondo_prod_disposition.event
GROUP BY mdate, transponderID
HAVING count(*) > 1 ) b
ON ndate = ???
AND a.transponderID = b.transponderID
ORDER BY b.transponderID
SELECT b.ndate, transponderID
FROM dispondo_prod_disposition.event a
INNER JOIN ( SELECT EXTRACT(YEAR_MONTH FROM date) as ndate,
transponderID
FROM dispondo_prod_disposition.event
GROUP BY 1, 2
HAVING COUNT(*) > 1 ) b USING (transponderID)
WHERE b.ndate = ??? -- for example, WHERE b.ndate = 202201
ORDER BY transponderID

MySQL Inner join naming error?

http://sqlfiddle.com/#!9/e6effb/1
I'm trying to get a top 10 by revenue per brand for France on december.
There are 2 tables (first table has date, second table has brand and I'm trying to join them)
I get this error "FUNCTION db_9_d870e5.SUM does not exist. Check the 'Function Name Parsing and Resolution' section in the Reference Manual"
Is my use of Inner join there correct?
It's because you had an extra space after SUM. Please change it from
SUM (o1.total_net_revenue)to SUM(o1.total_net_revenue).
See more about it here.
Also after correcting it, your query still had more error as you were not selecting order_id on your intermediate table i2 so edited here as :
SELECT o1.order_id, o1.country, i2.brand,
SUM(o1.total_net_revenue)
FROM orders o1
INNER JOIN (
SELECT i1.brand, SUM(i1.net_revenue) AS total_net_revenue,order_id
FROM ordered_items i1
WHERE i1.country = 'France'
GROUP BY i1.brand
) i2
ON o1.order_id = i2.order_id AND o1.total_net_revenue = i2.total_net_revenue
AND o1.total_net_revenue = i2.total_net_revenue
WHERE o1.country = 'France' AND o1.created_at BETWEEN '2016-12-01' AND '2016-12-31'
GROUP BY 1,2,3
ORDER BY 4
LIMIT 10`
--EDIT stack Fan is correct that the o2.total_net_revenue exists. My confusion was because the data structure duplicated three columns between the tables, including one that was being looked for.
There were a couple errors with your SQL statement:
1. You were referencing an invalid column in your outer-select-SUM function. I believe you're actually after i2.total_net_revenue.
The table structure is terrible, the "important" columns (country, revenue, order_id) are duplicated between the two tables. I would also expect the revenue columns to share the same name, if they always have the same values in them. In the example, there's no difference between i1.net_revenue and o1.total_net_revenue.
In your inner join, you didn't reference i1.order_id, which meant that your "on" clause couldn't execute correctly.
PROTIP:
When you run into an issue like this, take all the complicated bits out of your query and get the base query working correctly first. THEN add your functions.
PROTIP:
In your GROUP BY clause, reference the actual columns, NOT the column numbers. It makes your query more robust.
This is the query I ended up with:
SELECT o1.order_id, o1.country, i2.brand,
SUM(i2.total_net_revenue) AS total_rev
FROM orders o1
INNER JOIN (
SELECT i1.order_id, i1.brand, SUM(i1.net_revenue) AS total_net_revenue
FROM ordered_items i1
WHERE i1.country = 'France'
GROUP BY i1.brand
) i2
ON o1.order_id = i2.order_id AND o1.total_net_revenue = i2.total_net_revenue
AND o1.total_net_revenue = i2.total_net_revenue
WHERE o1.country = 'France' AND o1.created_at BETWEEN '2016-12-01' AND '2016-12-31'
GROUP BY o1.order_id, o1.country, i2.brand
ORDER BY total_rev
LIMIT 10

SQL query needs optimization

SELECT LM.user_id,LM.users_lineup_id, min( LM.total_score ) AS total_score
FROM vi_lineup_master LM JOIN
vi_contest AS C
ON C.contest_unique_id = LM.contest_unique_id join
(SELECT min( total_score ) as total_score
FROM vi_lineup_master
GROUP BY group_unique_id
) as preq
ON LM.total_score = preq.total_score
WHERE LM.contest_unique_id = 'iledhSBDO' AND
C.league_contest_type = 1
GROUP BY group_unique_id
Above query is to find the loser per group of game, query return accurate result but its not responding with large data. How can I optimize this?
You can try to move your JOINs to subqueries. Also, you should pay attention on your "wrong" GROUP BY usage on the outer query. In Mysql you can group by some columns and select others not specified in the group clause without any aggregation function, but the database can't ensure what data it will return to you. For the sake of consistency of your application, wrap them in an aggregation function.
Check if this one helps:
SELECT
MIN(LM.user_id) AS user_id,
MIN(LM.users_lineup_id) AS users_lineup_id,
MIN(LM.total_score) AS total_score
FROM vi_lineup_master LM
WHERE 1=1
-- check if this "contest_unique_id" is equals
-- to 'iledhSBDO' for a "league_contest_type" valued 1
AND LM.contest_unique_id IN
(
SELECT C.contest_unique_id
FROM vi_contest AS C
WHERE 1=1
AND C.contest_unique_id = 'iledhSBDO'
AND C.league_contest_type = 1
)
-- check if this "total_score" is one of the
-- "min(total_score)" from each "group_unique_id"
AND LM.total_score IN
(
SELECT MIN(total_score)
FROM vi_lineup_master
GROUP BY group_unique_id
)
GROUP BY LM.group_unique_id
;
Also, some pieces of this query may seem redundant, but it's because I did not want to change the filters you wrote, just moved them.
Also, your query logic seems a bit strange to me, based on the tables/columns names and how you wrote it... please, check the comments in my query which reflects what I understood of your implementation.
Hope it helps.

How to use actual row count (COUNT(*)) in WHERE clause without writing the same query as subquery?

I have something like this:
SELECT id, fruit, pip
FROM plant
WHERE COUNT(*) = 2;
This weird query is self explanatory I guess. COUNT(*) here means the number of rows in plant table. My requirement is that I need to retrieve values from specified fields only if total number of rows in table = 2. This doesn't work but: invalid use of aggregate function COUNT.
I cannot do this:
SELECT COUNT(*) as cnt, id, fruit, pip
FROM plant
WHERE cnt = 2;
for one, it limits the number of rows outputted to 1, and two, it gives the same error: invalid use of aggregate function.
What I can do is instead:
SELECT id, fruit, pip
FROM plant
WHERE (
SELECT COUNT(*)
FROM plant
) = 2;
But then that subquery is the main query re-run. I'm presenting here a small example of the larger part of the problem, though I know an additional COUNT(*) subquery in the given example isn't that big an overhead.
Edit: I do not know why the question is downvoted. The COUNT(*) I'm trying to get is from a view (a temporary table) in the query which is a large query with 5 to 6 joins and additional where clauses. To re-run the query as a subquery to get the count is inefficient, and I can see the bottleneck as well.
Here is the actual query:
SELECT U.UserName, E.Title, AE.Mode, AE.AttemptNo,
IF(AE.Completed = 1, 'Completed', 'Incomplete'),
(
SELECT COUNT(DISTINCT(FK_QId))
FROM attempt_question AS AQ
WHERE FK_ExcAttemptId = #excAttemptId
) AS Inst_Count,
(
SELECT COUNT(DISTINCT(AQ.FK_QId))
FROM attempt_question AS AQ
JOIN `question` AS Q
ON Q.PK_Id = AQ.FK_QId
LEFT JOIN actions AS A
ON A.FK_QId = AQ.FK_QId
WHERE AQ.FK_ExcAttemptId = #excAttemptId
AND (
Q.Type = #descQtn
OR Q.Type = #actQtn
AND A.type = 'CTVI.NotImplemented'
AND A.IsDelete = #status
AND (
SELECT COUNT(*)
FROM actions
WHERE FK_QId = A.FK_QId
AND type != 'CTVI.NotImplemented'
AND IsDelete = #status
) = 0
)
) AS NotEvalInst_Count,
(
SELECT COUNT(DISTINCT(FK_QId))
FROM attempt_question AS AQ
WHERE FK_ExcAttemptId = #excAttemptId
AND Mark = #mark
) AS CorrectAns_Count,
E.AllottedTime, AE.TimeTaken
FROM attempt_exercise AS AE
JOIN ctvi_exercise_tblexercise AS E
ON AE.FK_EId = E.PK_EId
JOIN ctvi_user_table AS U
ON AE.FK_UId = U.PK_Id
JOIN ctvi_grade AS G
ON AE.FK_GId = G.PK_GId
WHERE AE.PK_Id = #excAttemptId
-- AND COUNT(AE.*) = #number --the portion in contention.
Kindly ignore the above query and guide me to right direction from the small example query I posted, thanks.
In MySQL, you can only do what you tried:
SELECT id, fruit, pip
FROM plant
WHERE (
SELECT COUNT(*)
FROM plant
) = 2;
or this variation:
SELECT id, fruit, pip
FROM plant
JOIN
(
SELECT COUNT(*) AS cnt
FROM plant
) AS c
ON c.cnt = 2;
Whether the 1st or the 2nd is more efficient, depends on the version of MySQL (and the optimizer). I would bet on the 2nd one, on most versions.
In other DBMSs, that have window functions, you can also do the first query that #Andomar suggests.
Here is a suggestion to avoid the bottleneck of calculating the derived table twice, once to get the rows and once more to get the count. If the derived table is expensive to be calculated, and its rows are thousands or millions, calculating them twice only to throw them away, is a problem, indeed. This may improve efficiency as it will limit the intermediately (twice) calculated rows to 3:
SELECT p.*
FROM
( SELECT id, fruit, pip
FROM plant
LIMIT 3
) AS p
JOIN
( SELECT COUNT(*) AS cnt
FROM
( SELECT 1
FROM plant
LIMIT 3
) AS tmp
) AS c
ON c.cnt = 2 ;
After re-reading your question, you're trying to return rows only if there are 2 rows in the entire table. In that case I think your own example query is already the best.
On another DBMS, you could use a Windowing function:
select *
from (
select *
, count(*) over () as cnt
from plant
) as SubQueryAlias
where cnt = 2
But the over clause is not supported on MySQL.
old wrong anser below
The where clause works before grouping. It works on single rows, not groups of rows, so you can't use aggregates like count or max in the where clause.
To set filters that work on groups of rows, use the having clause. It works after grouping and can be used to filter with aggregates:
SELECT id, fruit, pip
FROM plant
GROUP BY
id, fruit, pip
HAVING COUNT(*) = 2;
The other answers do not fulfill the original question which was to filter the results "without using a subquery".
You can actually do this by using a variable in 2 consecutive MySQL statements:
SET #count=0;
SELECT * FROM
(
SELECT id, fruit, pip, #count:=#count+1 AS count
FROM plant
WHERE
) tmp
WHERE #count = 2;

optimize Mysql: get latest status of the sale

In the following query, I show the latest status of the sale (by stage, in this case the number 3). The query is based on a subquery in the status history of the sale:
SELECT v.id_sale,
IFNULL((
SELECT (CASE WHEN IFNULL( vec.description, '' ) = ''
THEN ve.name
ELSE vec.description
END)
FROM t_record veh
INNER JOIN t_state_campaign vec ON vec.id_state_campaign = veh.id_state_campaign
INNER JOIN t_state ve ON ve.id_state = vec.id_state
WHERE veh.id_sale = v.id_sale
AND vec.id_stage = 3
ORDER BY veh.id_record DESC
LIMIT 1
), 'x') sale_state_3
FROM t_sale v
INNER JOIN t_quarters sd ON v.id_quarters = sd.id_quarters
WHERE 1 =1
AND v.flag =1
AND v.id_quarters =4
AND EXISTS (
SELECT '1'
FROM t_record
WHERE id_sale = v.id_sale
LIMIT 1
)
the query delay 0.0057seg and show 1011 records.
Because I have to filter the sales by the name of the state as it would have to repeat the subquery in a where clause, I have decided to change the same query using joins. In this case, I'm using the MAX function to obtain the latest status:
SELECT
v.id_sale,
IFNULL(veh3.State3,'x') AS sale_state_3
FROM t_sale v
INNER JOIN t_quarters sd ON v.id_quarters = sd.id_quarters
LEFT JOIN (
SELECT veh.id_sale,
(CASE WHEN IFNULL(vec.description,'') = ''
THEN ve.name
ELSE vec.description END) AS State3
FROM t_record veh
INNER JOIN (
SELECT id_sale, MAX(id_record) AS max_rating
FROM(
SELECT veh.id_sale, id_record
FROM t_record veh
INNER JOIN t_state_campaign vec ON vec.id_state_campaign = veh.id_state_campaign AND vec.id_stage = 3
) m
GROUP BY id_sale
) x ON x.max_rating = veh.id_record
INNER JOIN t_state_campaign vec ON vec.id_state_campaign = veh.id_state_campaign
INNER JOIN t_state ve ON ve.id_state = vec.id_state
) veh3 ON veh3.id_sale = v.id_sale
WHERE v.flag = 1
AND v.id_quarters = 4
This query shows the same results (1011). But the problem is it takes 0.0753 sec
Reviewing the possibilities I have found the factor that makes the difference in the speed of the query:
AND EXISTS (
SELECT '1'
FROM t_record
WHERE id_sale = v.id_sale
LIMIT 1
)
If I remove this clause, both queries the same time delay... Why it works better? Is there any way to use this clause in the joins? I hope your help.
EDIT
I will show the results of EXPLAIN for each query respectively:
q1:
q2:
Interesting, so that little statement basically determines if there is a match between t_record.id_sale and t_sale.id_sale.
Why is this making your query run faster? Because Where statements applied prior to subSelects in the select statement, so if there is no record to go with the sale, then it doesn't bother processing the subSelect. Which is netting you some time. So that's why it works better.
Is it going to work in your join syntax? I don't really know without having your tables to test against but you can always just apply it to the end and find out. Add the keyword EXPLAIN to the beginning of your query and you will get a plan of execution which will help you optimize things. Probably the best way to get better results in your join syntax is to add some indexes to your tables.
But I ask you, is this even necessary? You have a query returning in <8 hundredths of a second. Unless this query is getting ran thousands of times an hour, this is not really taxing your DB at all and your time is probably better spent making improvements elsewhere in your application.