Average on a count() in same query - mysql

I'm currently working on an assignment which requires me to find the average on the number of resources for each module. The current table looks like this:
ResourceID ModulID
1 1
2 7
3 2
4 4
5 1
6 1
So basically, I'm trying to figure out how to get the average number of resources. The only
relevant test data here is for module 1, which has 3 different resources connected to it. But I need to display all of the results.
This is my code:
select avg(a.ress) GjSnitt, modulID
from
(select count(ressursID) as ress
from ressursertiloppgave
group by modulID) as a, ressursertiloppgave r
group by modulID;
Obviously it isn't working, but I'm currently at loss on what to change at this point. I would really appreciate any input you guys have.

This is the query you are executing, written in a slightly less obtuse syntax.
SELECT
avg(a.ress) as GjSnitt
, modulID
FROM
(SELECT COUNT(ressursID) as ress
FROM ressursertiloppgave
GROUP BY modulID) as a
CROSS JOIN ressursertiloppgave r <--- Cross join are very very rare!
GROUP BY modulID;
You are cross joining the table, making (6x6=) 36 rows in total and condensing this down to 4, but because the total count is 36, the outcome is wrong.
This is why you should never use implicit joins.
Rewrite the query to:
SELECT AVG(a.rcount) FROM
(select count(*) as rcount
FROM ressursertiloppgave r
GROUP BY r.ModulID) a
If you want the individual rowcount and the average at the bottom do:
SELECT r1.ModulID, count(*) as rcount
FROM ressursertiloppgave r1
GROUP BY r1.ModulID
UNION ALL
SELECT 'avg = ', AVG(a.rcount) FROM
(select count(*) as rcount
FROM ressursertiloppgave r2
GROUP BY r2.ModulID) a

I got the solution
SELECT AVG(counter)
FROM
(
SELECT COUNT(column to count) AS counter FROM table
) AS counter
Note that the nickname {counter} was added in SELECT COUNT and at the end of the inner SELECT

Related

Adding Row Values when there are no results - MySQL

Problem Statement: I need my result set to include records that would not naturally return because they are NULL.
I'm going to put some simplified code here since my code seems to be too long.
Table Scores has Company_type, Company, Score, Project_ID
Select Score, Count(Project_ID)
FROM Scores
WHERE company_type= :company_type
GROUP BY Score
Results in the following:
Score Projects
5 95
4 94
3 215
2 51
1 155
Everything is working fine until I apply a condition to company_type that does not include results in one of the 5 score categories. When this happens, I don't have 5 rows in my result set any more.
It displays like this:
Score Projects
5 5
3 6
1 3
I'd like it to display like this:
Score Projects
5 5
4 0
3 6
2 0
1 3
I need the results to always display 5 rows. (Scores = 1-5)
I tried one of the approaches below by Spencer7593. My simplified query now looks like this:
SELECT i.score AS Score, IFNULL(count(*), 0) AS Projects
FROM (SELECT 5 AS score
UNION ALL
SELECT 4
UNION ALL
SELECT 3
UNION ALL
SELECT 2
UNION ALL
SELECT 1) i
LEFT JOIN Scores ON Scores.score = i.score
GROUP BY Score
ORDER BY i.score DESC
And gives the following results, which is accurate except that the rows with 1 in Projects should actually be 0 because they are derived by the "i". There are no projects with a score of 5 or 2.
Score Projects
5 1
4 5
3 6
2 1
1 3
Solved! I just needed to adjust my count to specifically look at the project count - count(project) rather than count(*). This returned the expected results.
If you always want your query to return 5 rows, with Score values of 5,4,3,2,1... you'll need a rowsource that supplies those Score values.
One approach would be to use a simple query to return those fixed values, e.g.
SELECT 5 AS score
UNION ALL SELECT 4
UNION ALL SELECT 3
UNION ALL SELECT 2
UNION ALL SELECT 1
Then use that query as inline view, and do an outer join operation to the results from your current query
SELECT i.score AS `Score`
, IFNULL(q.projects,0) AS `Projects`
FROM ( SELECT 5 AS score
UNION ALL SELECT 4
UNION ALL SELECT 3
UNION ALL SELECT 2
UNION ALL SELECT 1
) i
LEFT
JOIN (
-- the current query with "missing" Score rows goes here
-- for completeness of this example, without the query
-- we emulate that result with a different query
SELECT 5 AS score, 95 AS projects
UNION ALL SELECT 3, 215
UNION ALL SELECT 1, 155
) q
ON q.score = i.score
ORDER BY i.score DESC
It doesn't have to be the view query in this example. But there does need to be a rowsource that the rows can be returned from. You could, for example, have a simple table that contains those five rows, with those five score values.
This is just an example approach for the general approach. It might be possible to modify your existing query to return the rows you want. But without seeing the query, the schema, and example data, we can't tell.
FOLLOWUP
Based on the edit to the question, showing an example of the current query.
If we are guaranteed that the five values of Score will always appear in the Scores table, we could do conditional aggregation, writing a query like this:
SELECT s.score
, COUNT(IF(s.company_type = :company_type,s.project_id,NULL)) AS projects
FROM Scores s
GROUP BY s.score
ORDER BY s.score DESC
Note that this will require a scan of all the rows, so it may not perform as well. The "trick" is the IF function, which returns a NULL value in place of project_id, when the row would have been excluded by the WHERE clause.)
If we are guaranteed that project_id is non-NULL, we could use a more terse MySQL shorthand expression to achieve an equivalent result...
, IFNULL(SUM(s.company_type = :company_type),0) AS projects
This works because MySQL returns 1 when the comparison is TRUE, and otherwisee returns 0 or NULL.
Try something like this:
select distinct score
from (
select distinct score from scores
) s
left outer join (
Select Score, Count(Project_ID) cnt
FROM Scores
WHERE company_type= :company_type
) x
on s.score = x.score
Your posted query would not work without a group by statement. However, even there, if you don't have those particular scores for that company type, it wouldn't work either.
One option is to use an outer join. That would require a little more work though.
Here's another option using conditional aggregation:
select Score, sum(company_type=:company_type)
from Scores
group by Score

Mysql: Order by max N values from subquery

I'm about to throw in the towel with this.
Preface: I want to make this work with any N, but for the sake of simplicity, I'll set N to be 3.
I've got a query (MySQL, specifically) that needs to pull in data from a table and sort based on top 3 values from that table and after that fallback to other sort criteria.
So basically I've got something like this:
SELECT tbl.id
FROM
tbl1 AS maintable
LEFT JOIN
tbl2 AS othertable
ON
maintable.id = othertable.id
ORDER BY
othertable.timestamp DESC,
maintable.timestamp DESC
Which is all basic textbook stuff. But the issue is I need the first ORDER BY clause to only get the three biggest values in othertable.timestamp and then fallback on maintable.timestamp.
Also, doing a LIMIT 3 subquery to othertable and join it is a no go as this needs to work with an arbitrary number of WHERE conditions applied to maintable.
I was almost able to make it work with a user variable based approach like this, but it fails since it doesn't take into account ordering, so it'll take the FIRST three othertable values it finds:
ORDER BY
(
IF(othertable.timestamp IS NULL, 0,
IF(
(#rank:=#rank+1) > 3, null, othertable.timestamp
)
)
) DESC
(with a #rank:=0 preceding the statement)
So... any tips on this? I'm losing my mind with the problem. Another parameter I have for this is that since I'm only altering an existing (vastly complicated) query, I can't do a wrapping outer query. Also, as noted, I'm on MySQL so any solutions using the ROW_NUMBER function are unfortunately out of reach.
Thanks to all in advance.
EDIT. Here's some sample data with timestamps dumbed down to simpler integers to illustrate what I need:
maintable
id timestamp
1 100
2 200
3 300
4 400
5 500
6 600
othertable
id timestamp
4 250
5 350
3 550
1 700
=>
1
3
5
6
4
2
And if for whatever reason we add WHERE NOT maintable.id = 5 to the query, here's what we should get:
1
3
4
6
2
...because now 4 is among the top 3 values in othertable referring to this set.
So as you see, the row with id 4 from othertable is not included in the ordering as it's the fourth in descending order of timestamp values, thus it falls back into getting ordered by the basic timestamp.
The real world need for this is this: I've got content in "maintable" and "othertable" is basically a marker for featured content with a timestamp of "featured date". I've got a view where I'm supposed to float the last 3 featured items to the top and the rest of the list just be a reverse chronologic list.
Maybe something like this.
SELECT
id
FROM
(SELECT
tbl.id,
CASE WHEN othertable.timestamp IS NULL THEN
0
ELSE
#i := #i + 1
END AS num,
othertable.timestamp as othertimestamp,
maintable.timestamp as maintimestamp
FROM
tbl1 AS maintable
CROSS JOIN (select #i := 0) i
LEFT JOIN tbl2 AS othertable
ON maintable.id = othertable.id
ORDER BY
othertable.timestamp DESC) t
ORDER BY
CASE WHEN num > 0 AND num <= 3 THEN
othertimestamp
ELSE
maintimestamp
END DESC
Modified answer:
select ilv.* from
(select sq.*, #i:=#i+1 rn from
(select #i := 0) i
CROSS JOIN
(select m.*, o.id o_id, o.timestamp o_t
from maintable m
left join othertable o
on m.id = o.id
where 1=1
order by o.timestamp desc) sq
) ilv
order by case when o_t is not null and rn <=3 then rn else 4 end,
timestamp desc
SQLFiddle here.
Amend where 1=1 condition inside subquery sq to match required complex selection conditions, and add appropriate limit criteria after the final order by for paging requirements.
Can you use a union query as below?
(SELECT id,timestamp,1 AS isFeatured FROM tbl2 ORDER BY timestamp DESC LIMIT 3)
UNION ALL
(SELECT id,timestamp,2 AS isFeatured FROM tbl1 WHERE NOT id in (SELECT id from tbl2 ORDER BY timestamp DESC LIMIT 3))
ORDER BY isFeatured,timestamp DESC
This might be somewhat redundant, but it is semantically closer to the question you are asking. This would also allow you to parameterize the number of featured results you want to return.

One MySQL query to get AVG by different Groupings?

Wondering is there is a way to write the following in ONE MySQL query.
I have a table:
cust_ID | rpt_name | req_secs
In the query I'd like to get:
the AVG req_secs when grouped by cust_ID
the AVG req_secs when grouped by rpt_name
the total req_secs AVG
I know I can do separate grouping queries on the same table then UNION the results into one. But I was hoping there was some way to do it in one query.
Thanks.
Well, the following would does two out of three:
select n,
(case when n = 1 then cast(cust_id as varchar(255)) else rpt_name end) as grouping,
avg(req_secs)
from t cross join
(select 1 as n union all select 2
) n
group by n, (case when n = 1 then cust_id else rpt_name end);
This essentially "doubles" the data and then does the aggregation for each group. This assumes that cust_id and rpt_name are of compatible types. (The query could be tweaked if this is not the case.)
Actually, you can get the overall average by using rollup:
select n,
(case when n = 1 then cust_id else rpt_name end) as grouping,
avg(req_secs)
from t cross join
(select 1 as n union all select 2
) n
group by n, (case when n = 1 then cast(cust_id as varchar(255)) else rpt_name end) with rollup
This works for average because the average is the same on the "doubled" data as for the original data. It wouldn't work for sum() or count().
No there is not. You can group by a combination of cust_ID and rpt_name at the same time (i.e. two levels of grouping) but you are not going to be able to do separate top-level groupings and then a non-grouped aggregation at the same time.
Because of the way GROUP BY works, the SQL to do this is a little tricky. One way to get the result is to get three copies of the rows, and group each set of rows separately.
SELECT g.gkey
, IF(g.grp='cust_id',t.cust_ID,IF(g.grp='rpt_name',t.rpt_name,'')) AS gval
, AVG(t.req_secs) AS avg_req_secs
FROM (SELECT 'cust_id' AS gkey UNION ALL SELECT 'rpt_name' UNION ALL SELECT 'total') g
CROSS
JOIN mytable t
GROUP
BY g.gkey
, IF(g.grp='cust_id',t.cust_ID,IF(g.grp='rpt_name',t.rpt_name,''))
The inline view aliased as "g" doesn't have to use UNION ALL operators, you just need a rowset that returns exactly 3 rows with distinct values. I just used the UNION ALL as a convenient way to return three literal values as a rowset, so I could join that to the original table.

MySQL select occurence by count

I have a table with two columns like this:
source_cid inchikey
---------- --------
1 qqmn
2 qqmn
3 ccmm
Now I want to select source_cids which have same inchikeys
Here is my query:
SELECT source_cid, count(*) as c
FROM inchikey
GROUP BY inchikey HAVING count(*)>1
This code runs forever. How can I modify it?
First off, as Anigel has stated we need to see your create statements and you should be using indexes.
Secondly, your query does not display all the rows that should be displayed.
See: http://www.sqlfiddle.com/#!2/a810d/7
SELECT source_cid, count(*) as c
FROM inchikey
GROUP BY inchikey HAVING count(*)>1;
Unfortunately only row with source_cids 1 is outputted.
select * from inchikey i,
(
SELECT i2.inchikey, count(i2.source_cid) as c
FROM inchikey i2
GROUP BY i2.inchikey HAVING count(i2.source_cid)>1
) as cd
where cd.inchikey = i.inchikey;
With this, rows with 1 and 2 are outputted.
Try creating a repeating index on (source_cid, inchikey) on table inchikey, then try running the query:
SELECT inchikey, group_concat(distinct source_cid) source_cids, count(*) as c
FROM inchikey
GROUP BY inchikey HAVING count(distinct source_cid)>1
(Your existing query will only show one source_cid for each repeating inchikey.)

Union All Query takes too long

This question have been asked multiple times I am sure, but every case is different.
I have MySQL setup on a strong computer with 2GB RAM, it does not do too much so the computer is sufficient.
The following query has been built as a view :
create view view_orders as
select distinct
tbl_orders_order.order_date AS sort_col,
tbl_orders_order.order_id AS order_id,
_utf8'website' AS src,tbl_order_users.company AS company,
tbl_order_users.phone AS phone,
tbl_order_users.full_name AS full_name,
time_format(tbl_orders_order.order_date,_utf8'%H:%i') AS c_time,
date_format(tbl_orders_order.order_date,_utf8'%d/%m/%Y') AS c_date,
tbl_orders_order.comments AS comments,
tbl_orders_order.tmp_cname AS tmp_cname,
tbl_orders_order.tmp_pname AS tmp_pname,
count(tbl_order_docfiles.docfile_id) AS number_of_files,
(case tbl_orders_order.status when 1 then _utf8'completed' when 2 then _utf8'hc' when 0 then _utf8'not-completed' when 3 then _utf8'hc-canceled' end) AS status,
tbl_orders_order.employee_name AS employee_name,
tbl_orders_order.status_date AS status_date,
tbl_orders_order.cancel_reason AS cancel_reason
from
tbl_orders_order left join tbl_order_users on tbl_orders_order.user_id = tbl_order_users.user_id
left join
tbl_order_docfiles on tbl_order_docfiles.order_id = tbl_orders_order.order_id
group by
tbl_orders_order.order_id
union all
select distinct tbl_h.h_date AS sort_col,
(case tbl_h.sub_oid when 0 then tbl_h.order_number else concat(tbl_h.order_number,_utf8'-',tbl_h.sub_oid) end) AS order_id,
(case tbl_h.type when 1 then _utf8'פקס' when 2 then _utf8'email' end) AS src,_utf8'' AS company,
_utf8'' AS phone,_utf8'' AS full_name,time_format(tbl_h.h_date,_utf8'%H:%i') AS c_time,
date_format(tbl_h.h_date,_utf8'%d/%m/%Y') AS c_date,_utf8'' AS comments,tbl_h.client_name AS tmp_cname,
tbl_h.project_name AS tmp_pname,
tbl_h.quantity AS number_of_files,
_utf8'completed' AS status,
tbl_h.computer_name AS employee_name,
_utf8'' AS status_date,
_utf8'' AS cancel_reason
from tbl_h;
The query used UNION, than I read an article about UNION ALL and now uses that.
Query alone takes about 3 seconds to execute (UNION took 4.5-5.5 seconds)
Each part in seperate runs in seconds.
The application does sorting and select on this view, which makes it processing time even larger - about 6 seconds when query is cached, about 12 seconds or more if data has changed.
I see no other way to combine these two results, as both sorted needs to display to the user and I guess something I am doing is wrong.
Of course both tables uses primary keys.
UPDATE!!!!
It didn't help, I got the utf8/case/date_format out of the union query, and removed distincts, now query takes 4 seconds (even longer).
query without case/date/utf8 (only union) was shortened to 2.3 seconds (0.3 seconds improvement).
create view view_orders as
select *,
(CASE src
WHEN 1 THEN
_utf8'fax'
WHEN 2 THEN
_utf8'mail'
WHEN 3 THEN
_utf8'website'
END) AS src,
time_format(order_date,'%H:%i') AS c_time,
date_format(order_date,'%d/%m/%Y') AS c_date,
(CASE status
WHEN 1 THEN
_utf8'completed'
WHEN 2 THEN
_utf8'hc handling'
WHEN 0 THEN
_utf8'not completed'
WHEN 3 THEN
_utf8'canceled'
END) AS status
FROM
(
select
o.order_date AS sort_col,
o.order_id,
3 AS src,
u.company,
u.phone,
u.full_name,
o.order_date,
o.comments,
o.tmp_cname,
o.tmp_pname,
count(doc.docfile_id) AS number_of_files,
o.status,
o.employee_name,
o.status_date,
o.cancel_reason
from
tbl_orders_order o
LEFT JOIN
tbl_order_users u ON u.user_id = o.user_id
LEFT JOIN
tbl_order_docfiles doc ON doc.order_id = o.order_id
GROUP BY
o.order_id
union all
select
h.h_date AS sort_col,
(case h.sub_oid when 0 then h.order_number else concat(h.order_number,'-',h.sub_oid) end) AS order_id,
h.type as src,
'' AS company,
'' AS phone,
'' AS full_name,
h.h_date,
'' AS comments,
h.client_name AS tmp_cname,
h.project_name AS tmp_pname,
h.quantity AS number_of_files,
1 AS status,
h.computer_name AS employee_name,
'' AS status_date,
'' AS cancel_reason
from tbl_h h
)
Think about your using UNION and DISTINCT keywords. Can your query really result in duplicate rows? If yes, the optimal query for removing duplicates would probably be of this form:
SELECT ... -- No "DISTINCT" here
UNION
SELECT ... -- No "DISTINCT" here
There is probably no need for DISTINCT in the two subqueries. If duplicates are impossible anyway, try using this form instead. This will be the fastest execution of your query (without further optimising the subqueries):
SELECT ... -- No "DISTINCT" here
UNION ALL
SELECT ... -- No "DISTINCT" here
Rationale: Both UNION and DISTINCT apply a "UNIQUE SORT" operation on your intermediate result sets. Depending on how much data your subqueries return, this can be very expensive. That's one reason why omitting DISTINCT and replacing UNION by UNION ALL is much faster.
UPDATE Another idea, if you do have to remove duplicates: Remove duplicates first in an inner query, and format dates and codes only afterwards in an outer query. That will accelerate the "UNIQUE SORT" operation because comparing 32/64-bit integers is less expensive than comparing varchars:
SELECT a, b, date_format(c), case d when 1 then 'completed' else '...' end
FROM (
SELECT a, b, c, d ... -- No date format here
UNION
SELECT a, b, c, d ... -- No date format here
)
It may be related to the UNION triggering a character set conversion. For example cancel_reason in the one query is defined as utf8, but in the other it is not specified.
Check if there is a very high cpu spike when you run this query, this would indicate conversion.
Personally I would have done a union of the raw data first, and then applied the case and conversion statements. But I am not sure that that would make a difference in the performance.
Can you try this one:
SELECT
o.order_date AS sort_col,
o.order_id AS order_id,
_utf8'website' AS src,
u.company AS company,
u.phone AS phone,
u.full_name AS full_name,
time_format(o.order_date,_utf8'%H:%i') AS c_time,
date_format(o.order_date,_utf8'%d/%m/%Y') AS c_date,
o.comments AS comments,
o.tmp_cname AS tmp_cname,
o.tmp_pname AS tmp_pname,
COALESCE(d.number_of_files, 0) AS number_of_files,
( CASE o.status WHEN 1 THEN _utf8'completed'
WHEN 2 THEN _utf8'hc'
WHEN 0 THEN _utf8'not-completed'
WHEN 3 THEN _utf8'hc-canceled'
END ) AS status,
o.employee_name AS employee_name,
o.status_date AS status_date,
o.cancel_reason AS cancel_reason
FROM
tbl_orders_order AS o
LEFT JOIN
tbl_order_users AS u
ON o.user_id = u.user_id
LEFT JOIN
( SELECT order_id
, COUNT(*) AS number_of_files
FROM tbl_order_docfiles
GROUP BY order_id
) AS d
ON d.order_id = o.order_id
UNION ALL
SELECT
tbl_h.h_date AS sort_col,
...
FROM tbl_h