MariaDB (MySQL) "AVG" not working with HAVING - mysql

My question is: Why do the following two SQL statements produce different results (I am explaining both afterwards. Tested with MariaDB bundeled with XAMPP 7.0.8)
(1)
SELECT
stock_exchange_code,
summed
FROM (
SELECT
stock_exchange_code,
summed
FROM (
SELECT
STOCK_EXCHANGE_CODE,
sum(SHARE_PRICE * SHARE_CNT) AS summed
FROM LISTED_AT
WHERE DATE_VALID = STR_TO_DATE('04-12-2015', '%d-%m-%Y')
GROUP BY STOCK_EXCHANGE_CODE
) a) b
HAVING summed > avg(summed)
(2)
SELECT
stock_exchange_code,
summed
FROM (
SELECT
STOCK_EXCHANGE_CODE,
sum(SHARE_PRICE * SHARE_CNT) AS summed
FROM LISTED_AT
WHERE DATE_VALID = STR_TO_DATE('04-12-2015', '%d-%m-%Y')
GROUP BY STOCK_EXCHANGE_CODE
) a
WHERE summed > (SELECT avg(a.summed)
FROM (SELECT
sum(SHARE_PRICE * SHARE_CNT) AS summed
FROM LISTED_AT
WHERE DATE_VALID = STR_TO_DATE('04-12-2015', '%d-%m-%Y')
GROUP BY STOCK_EXCHANGE_CODE) a)
Result of those queries:
(1) will give you an empty set (I do not understand why)
(2) will give you 2 rows, which is the correct answer
Explanation of the 2 Select statements:
SELECT
STOCK_EXCHANGE_CODE,
sum(SHARE_PRICE * SHARE_CNT) AS summed
FROM LISTED_AT
WHERE DATE_VALID = STR_TO_DATE('04-12-2015', '%d-%m-%Y')
GROUP BY STOCK_EXCHANGE_CODE
This is the part of the Select statement, which sums up all share values at a specific stock exchange.
The output is:
BRX 122653.50
L&S 275000.00
MXK 500000.00
STU 140415.00
XETRA 254610.00
And AVG(summed) = 258535.6
With statement (1) [which I tried first) I use an select around to be certain that the group by is global. Looking at it now, there is one unnecessary "serlect all columns by name", bur this should not matter here. With the outer select I try to apply the "HAVING" clause.
I do want all stock exchanges which summed value ( => "summed") on a specific day is above average. As far as I understand HAVING, it should calculate the global average ( => of the 5 stock exchanges above) and check against that.
I do not know, why this is not working. Changing the summed > avg(summed) to summed <> avg(summed) results in one row ( BRX 122653.50).
summed > 0 results in all 5 rows returned.
This is the reason why I think the average does not work with the having and not the other way round.
(2)
This is quite the same as the first, replacing the HAVING clause with an more explecit average calculation. As you can see there are 2 subqueries with the name "a" and both are the same (the second one lacks the stock_exchange_code field. Practically this query is ident with the first one, with worse code quality than the first one (duplication).
My question is: For me the 2 queries should have an identical result. Why do they have a different result?
tl;dr
Average or having clause does not seem to work in MySQL (MariaDB). Why do the 2 SQL statements from the beginning not return the same?

The use of an aggregate triggers grouping the entire table into one row. That is, HAVING summed > avg(summed) causes it to be one row, not some subset of the collection of rows. Hence, #1 is probably not useful.
In the second query, spelling out the avg(summed) as SELECT ... is generating one value that is then used for each row.
It seems that you have an extra level of SELECTs in both queries.
You can use EXPLAIN SELECT ... to get more clues on what is going on.

Related

Limiting the count query in MySQL?

I am trying to do a simple test where I'm pulling from a table the information of a specific part number as such:
SELECT *
FROM table_name
WHERE part_no IN ('abc123')
This returns 25 rows. Now I want to count the number that meet the "accepted" condition in a specific column but the result is limited to only the 10 most recent. My approach is to write it as follows:
Select Count(*)
FROM table_name
WHERE part_no IN ('abc123') AND lot IN ('accepted')
ORDER BY date DESC
LIMIT 10
I'm having a hard time to get the ORDER BY and LIMIT operations to work. I could use help just getting it to limit appropriately, and I can figure out the rest from there.
Edit: I understand that the operations are happening on the COUNT which only returns one row with a value; but I put the second clip to show where I am stuck in my thought process.
Your query SELECT Count(*) FROM ... will always return exactly one row.
It's not 100% clear what exactly you want to do, but if you want to know how many of the last 10 have been accepted, you could use a subquery - something like:
SELECT COUNT(*) FROM (
SELECT lot
FROM table_name
WHERE part_no IN ('abc123')
ORDER BY date DESC
LIMIT 10
)
WHERE lot IN ('accepted')
The inner query will return the 10 most recent rows for part abc123, then the outer query will count the accepted ones.
There are also other solution (for example, you could have the inner query output a field that is 0 when the part is not accepted and 1 when the part is accepted, then take the sum). Depending on which exact dialect/database you are using, you may also have more elegant options.
Select count returns ONE ROW therefore the ORDER BY and the LIMIT will not work on the results

Mysql DISTINCT with more than one column (remove duplicates)

My database is called: (training_session)
I try to print out some information from my data, but I do not want to have any duplicates. I do get it somehow, may someone tell me what I do wrong?
SELECT DISTINCT athlete_id AND duration FROM training_session
SELECT DISTINCT athlete_id, duration FROM training_session
It works perfectly if i use only one column, but when I add another. it does not work.
I think you misunderstood the use of DISTINCT.
There is big difference between using DISTINCT and GROUP BY.
Both have some sort of goal, but they have different purpose.
You use DISTINCT if you want to show a series of columns and never repeat. That means you dont care about calculations or group function aggregates. DISTINCT will show different RESULTS if you keep adding more columns in your SELECT (if the table has many columns)
You use GROUP BY if you want to show "distinctively" on a certain selected columns and you use group function to calculate the data related to it. Therefore you use GROUP BY if you want to use group functions.
Please check group functions you can use in this link.
https://dev.mysql.com/doc/refman/8.0/en/group-by-functions.html
EDIT 1:
It seems like you are trying to get the "latest" of a certain athlete, I'll assume the current scenario if there is no ID.
Here is my alternate solution:
SELECT a.athlete_id ,
( SELECT b.duration
FROM training_session as b
WHERE b.athlete_id = a.athlete_id -- connect
ORDER BY [latest column to sort] DESC
LIMIT 1
) last_duration
FROM training_session as a
GROUP BY a.athlete_id
ORDER BY a.athlete_id
This syntax is called IN-SELECT subquery. With the help of LIMIT 1, it shows the topmost record. In-select subquery must have 1 record to return or else it shows error.
MySQL's DISTINCT clause is used to filter out duplicate recordsets.
If your query was SELECT DISTINCT athlete_id FROM training_session then your output would be:
athlete_id
----------
1
2
3
4
5
6
As soon as you add another column to your query (in your example, the column called duration) then each record resulting from your query are unique, hence the results you're getting. In other words the query is working correctly.

Specific where clause in Mysql query

So i have a mysql table with over 9 million records. They are call records. Each record represents 1 individual call. The columns are as follows:
CUSTOMER
RAW_SECS
TERM_TRUNK
CALL_DATE
There are others but these are the ones I will be using.
So I need to count the total number of calls for a certain week in a certain Term Trunk. I then need to sum up the number of seconds for those calls. Then I need to count the total number of calls that were below 7 seconds. I always do this in 2 queries and combine them but I was wondering if there were ways to do it in one? I'm new to mysql so i'm sure my syntax is horrific but here is what I do...
Query 1:
SELECT CUSTOMER, SUM(RAW_SECS), COUNT(*)
FROM Mytable
WHERE TERM_TRUNK IN ('Mytrunk1', 'Mytrunk2')
GROUP BY CUSTOMER;
Query 2:
SELECT CUSTOMER, COUNT(*)
FROM Mytable2
WHERE TERM_TRUNK IN ('Mytrunk1', 'Mytrunk2') AND RAW_SECS < 7
GROUP BY CUSTOMER;
Is there any way to combine these two queries into one? Or maybe just a better way of doing it? I appreciate all the help!
There are 2 ways of achieving the expected outcome in a single query:
conditional counting: use a case expression or if() function within the count() (or sum()) to count only specific records
use self join: left join the table on itself using the id field of the table and in the join condition filter the alias on the right hand side of the join on calls shorter than 7 seconds
The advantage of the 2nd approach is that you may be able to use indexes to speed it up, while the conditional counting cannot use indexes.
SELECT m1.CUSTOMER, SUM(m1.RAW_SECS), COUNT(m1.customer), count(m2.customer)
FROM Mytable m1
LEFT JOIN Mytable m2 ON m1.id=m2.id and m2.raw_secs<7
WHERE TERM_TRUNK IN ('Mytrunk1', 'Mytrunk2')
GROUP BY CUSTOMER;

SQL Error (1242): Subquery returns more than 1 row - Not fixable with (IN) or a JOIN (I think?)

All - I'm trying to do a basic (in theory) On-Time completion report. I'd like to to list
assigned_to_id | Percent on-time (as a percent - but this is not important now)
I figure I tell MySQL get count of all tasks and a list of all tasks marked close on a date prior to the due date and give me that number... Seems simple?
I'm a sysadmin - not a SQL Developer so excuse the grossness to follow!
I've got
select issues.assigned_to_id, tb1.percent from (
Select
(select count(*) from issues where issues.due_date >= date(issues.closed_on) group by issues.assigned_to_id)/
(select count(*) from issues group by issues.assigned_to_id) as percent
from issues)
as tb1
group by tb1.percent;
It's been mixed up a bit with me trying to solve the multple rows issues so it may be even worse off when I started - but if I could get a list of users with their percentage that would be great!
I'd love to have use something like a "for each" but i know that doesn't exist.
Thanks!
You've got a division operation, e.g (foo) / (bar) and both the numerator and denominator are subqueries, Since you're expecting to take those subqueries and divide their answers, they MUST return a SINGLE value each, e.g. 1 / 2.
The error message indicates that one (or probably both) is returning a multi-value query result, so in effect you're trying to do something like 1,2,3 / 4,5,6, which is not a valid math operation, and you end up with your error message.
Fix the subqueries so they return only a SINGLE value each.
MySQL has equivalent of cross/outer apply which match for this case
SELECT T.*,Data.Value FROM [Table] T OUTER APPLY
you can try to use that.
What you should probably be doing is using an IF statement:
SELECT
assigned_to_id,
SUM(IF(due_date >= date(closed_on), 1, 0))/SUM(1) AS percent
FROM issues
GROUP BY assigned_to_id
ORDER BY percent DESC
Note here I am grouping by assigned_to_id and ordering by percent. This allows you to calculate the percentage for each assigned_to_id group and order those groups by percent.
if we have to exactly rewrite your query, try this.
select issues.assigned_to_id, ((select count(*) from issues where issues.due_date >= date(issues.closed_on) )/
(select count(*) from issues)) as perc from issues group by issues.assigned_to_id;
I think you want something like this, a list of issue ids and percentage done on time.
select distinct issues.assigned_to_id, done_on_time.c / issue_count.c as percent
from issues
join
(select issues.assigned_to_id, count(*) as c
from issues
where issues.due_date >= date(issues.closed_on)
group by issues.assigned_to_id) as done_on_time
on issues.assigned_to_id = done_on_time.assigned_to_id
(select issues.assigned_to_idm, count(*) as c
from issues
group by issues.assigned_to_id) as issue_count
on issues.assigned_to_id = issue_count.assigned_to_id

Performing arithmetic operations on derived SQL values

I'm doing some statistics based on a database of states. I would like to output the rank of a state and it's percentage as compared to the other states (i.e. state X's value is higher then 55% of the other states' value).
I'm trying something like this:
SELECT
count(*) AS TotalStates,
(SELECT COUNT(*) FROM states) AS NumberStates,
(TotalStates/NumStates) AS percentage
FROM states
WHERE CRITERIA > 7.5
I'm getting an SQL error, TotalStates (my derived value) is not found. How can I get all three of these values returned with one query?
You can put the main calculations in a subselect, then reference the aliased columns in the outer query, both to pull the already calculated values and to obtain another one from them:
SELECT
TotalStates,
NumberStates,
TotalStates / NumberStates AS percentage
FROM (
SELECT
COUNT(*) AS TotalStates,
(SELECT COUNT(*) FROM states) AS NumberStates
FROM states
WHERE CRITERIA > 7.5
) s
The error you are getting comes from the fact that you are trying to use the derived value in the same select clause that you are creating it in. You will need to maybe do something along these lines:
SELECT count(*) as TotalStates,
(SELECT count(*) from states) as NumberStates,
(count(*)/(SELECT count(*) from states)) as percentage
FROM states
WHERE criteria = x
However, this is not very efficient or desirable for readability or maintainability. Is there a design reason that you cannot perform this in two queries, or better yet, get the two data items in separate queries and calculate the percentage in the consuming code?