Mysql count id that appear in 2 columns - mysql

I have a table with these columns:
id
user_id
player_in
player_out
date
I need to make a report that count the number of repetitions each "player" both in player_in field, as in player_out field.
For example, if I have this 2 rows in the table (in the respective order).
id user_id player_in player_out
1 1 88 56
2 7 77 88
The result for the player 88 will be 2, and for the players 56 and 77, just 1

Use a subquery that employs union all to get the two column into one column, then use a standard count(*):
Note: Thus query included individual totals for ins and outs as per further request in comments to this answer.
select
player_id,
count(*) as total,
sum(ins) as ins,
sum(outs) as outs
from (
select
player_in as player_id,
1 as ins,
0 as outs
from mytable
union all
select player_out, 0, 1
from mytable
) x
group by player_id
Note: you must use union all (not just union), because union removes duplicates whereas union all does not.

You could use a cross-join to a 2-row virtual table to unpivot the player_* columns, then group the results, like this:
SELECT
player,
COUNT(*) AS total_count
FROM (
SELECT
CASE WHEN x.is_in THEN t.player_in ELSE t.player_out END AS player
FROM mytable t
CROSS JOIN (SELECT TRUE AS is_in UNION ALL SELECT FALSE) x
) s
GROUP BY
player
;
That is, every row of the original table is essentially duplicated and each copy of the row supplies either player_in or player_out, depending on whether the derived table's is_in column is TRUE or FALSE, to form a single player column. This method of unpivoting might perform better than the UNION method suggested by #Bohemian because this way the (physical) table is passed just once (but you'd need to test and compare both methods to determine if there's any substantial benefit to this approach in your particular situation).
To calculate in and out counts, as you have requested in one of your comments to the above mentioned answer, you could extend my original suggestion like this:
SELECT
player,
COUNT( is_in OR NULL) AS in_count,
COUNT(NOT is_in OR NULL) AS out_count,
COUNT(*) AS total_count
FROM (
SELECT
x.is_in,
CASE WHEN x.is_in THEN t.player_in ELSE t.player_out END AS player
FROM mytable t
CROSS JOIN (SELECT TRUE AS is_in UNION ALL SELECT FALSE) x
) s
GROUP BY
player
;
As you can see, the derived table now additionally returns the is_in column in its own right, and the column is used in two conditional aggregations for counting how many times a player was in and out. (If you are interested, the OR NULL trick is explained here.)
You could also rewrite the COUNT(condition OR NULL) entries as SUM(condition). That would certainly shorten both expressions, some also find the SUM method of counting clearer/more elegant. In either event, there would likely be no difference in performance, so choose whichever method suits your taste better.
A SQL Fiddle demo of the second query can be found here.

Related

Getting the aggregation result from the single query

In the following, I am querying the same table 2 times. The second query is a nested query inside left join but queries the same table. The only difference is the addition of the aggregation function count, the result of which is used by the outer query. Is there a better way to approach this?
select sm.student_id, sm.marks, smarks.d as d_marks from student_marks as sm
left join(
select m.student_id, count(distinct m.marks) as d from student_marks as m group by m.student_id
) as smarks on smarks.student_id = sm.student_id;
Is it possible to do this in a single query without using a left join.
Yes there is an alternative approach which is using windowed functions. There's no way of doing COUNT(DISTINCT in a windowed function, but you can do this using DENSE_RANK() twice, once sorting by what you want a distinct count of ascending, and once descending, adding these together then taking one away:
SELECT sm.student_id,
sm.marks,
DENSE_RANK() OVER(PARTITION BY sm.student_id ORDER BY sm.marks DESC) +
DENSE_RANK() OVER(PARTITION BY sm.student_id ORDER BY sm.marks ASC) - 1 AS d_marks
FROM student_marks AS sm
N.B. this is not guaranteed to perform any better just because you are referencing a table one fewer times.
To explain the DENSE_RANK() trick, consider a simple data set:
marks
dense_rank ASC
dense_rank DESC
1
1
3
1
1
3
2
2
2
3
3
1
The two ranks added together will always be one more than the total number of items in the set (i.e. 1+3, 2+2, and 3+1 all equal 4), so we just need to take one off the result and this gives us our distinct count of items in the set without actually using COUNT(DISTINCT which isn't allowed (as noted in the restrictions)
ADENDUM
If marks is nullable (which I had assumed it would not be) and you don't want null rows included in the count, then as noted in the comments this wouldn't quite work as it is, you'd need to remove any null rows from the total, which can be done using:
- MAX(CASE WHEN sm.marks IS NULL THEN 1 ELSE 0 END) OVER(PARTITION BY sm.student_id)

SELECT count of subquery before applying LIMIT (clickhouse)

I have a subquery that aggregates some UNION ALL selects. Over that I prepare the SELECT to create cross-tab and limit it to let's say 20. I would like to be able to retrieve the total COUNT of sub query results before I am limiting them in main query. This is for the purpose of trying to build a pagination that receives the total number of records and then the specific page record grid.
Sample query:
SELECT
name,
sumIf(metric_value, metric_name = 'data') AS data,
sumif(....
FROM
(SELECT
name, metric_name, SUM(metric_value) as metric_value
FROM
(SELECT
name, 'data' AS metric_name, SUM(data) AS metric_value
FROM
table
WHERE
date > '2017-01-01 00:00:00'
GROUP BY
name
UNION ALL
SELECT
name, 'data' AS metric_name, SUM(data) AS metric_value
FROM
table2
WHERE
date > '2017-01-01 00:00:00'
GROUP BY
name
UNION ALL
SELECT
name, 'data' AS metric_name, SUM(data) AS metric_value
FROM
table3
WHERE
date > '2017-01-01 00:00:00'
GROUP BY
name
UNION ALL
.
.
.)
GROUP BY
name, metric_name)
GROUP BY
name
ORDER BY
name ASC
LIMIT 0,20;
The first subselect returns tons of data, so I thought I can count it and return as one column value, or row and it would propagate to main select that limits 20 results. Because I need to know the entire set of results but don;t want to call the same query twice without limit and with limit just to get COUNT. There are at least 12 UNION ALL third level sub selects, so why waste resources. I am looking to try generic SQL solutions not necessarily related to ClickHouse
I was thinking of using count(*) OVER (), however that is not supported, so if thats only option I know I need to run query twice.
The first thing that one should mention is that nobody is usually interested in the exact number of pages on a query. It can be easily estimated and almost no one will care how exact is the estimation. However, if you have a link to the last page in your GUI, people will often click to link just to see whether it works.
Nevertheless, there are cases when an analyst should visit all the pages, and then the GUI should display the exact amount of work. A good news is that in that latter case, a better strategy is to cache a snapshot of the whole results table and counting the rows in the table becomes not a problem anymore.
I mean, it makes sense to discuss with the customers whether they really need it, because unneeded full scans many times per day may have effect on the database load and billing sums.
Anyway, if you still need to estimate the number of rows, you can simplify the query just to count the number of rows. As I understand this is something like:
SELECT SUM(cnt) as row_count
FROM (
SELECT COUNT(DISTINCT name) as cnt FROM table1 WHERE date > ...
UNION ALL
SELECT COUNT(DISTINCT name) as cnt FROM table2 WHERE date > ...
...
) as counts;
or if data is a constant metric name
SELECT COUNT(DISTINCT name) as row_count
FROM (
SELECT DISTINCT name FROM table1 WHERE date > ...
UNION ALL
SELECT DISTINCT name FROM table2 WHERE date > ...
...
) as names;

Adding Row Values when there are no results - MySQL

Problem Statement: I need my result set to include records that would not naturally return because they are NULL.
I'm going to put some simplified code here since my code seems to be too long.
Table Scores has Company_type, Company, Score, Project_ID
Select Score, Count(Project_ID)
FROM Scores
WHERE company_type= :company_type
GROUP BY Score
Results in the following:
Score Projects
5 95
4 94
3 215
2 51
1 155
Everything is working fine until I apply a condition to company_type that does not include results in one of the 5 score categories. When this happens, I don't have 5 rows in my result set any more.
It displays like this:
Score Projects
5 5
3 6
1 3
I'd like it to display like this:
Score Projects
5 5
4 0
3 6
2 0
1 3
I need the results to always display 5 rows. (Scores = 1-5)
I tried one of the approaches below by Spencer7593. My simplified query now looks like this:
SELECT i.score AS Score, IFNULL(count(*), 0) AS Projects
FROM (SELECT 5 AS score
UNION ALL
SELECT 4
UNION ALL
SELECT 3
UNION ALL
SELECT 2
UNION ALL
SELECT 1) i
LEFT JOIN Scores ON Scores.score = i.score
GROUP BY Score
ORDER BY i.score DESC
And gives the following results, which is accurate except that the rows with 1 in Projects should actually be 0 because they are derived by the "i". There are no projects with a score of 5 or 2.
Score Projects
5 1
4 5
3 6
2 1
1 3
Solved! I just needed to adjust my count to specifically look at the project count - count(project) rather than count(*). This returned the expected results.
If you always want your query to return 5 rows, with Score values of 5,4,3,2,1... you'll need a rowsource that supplies those Score values.
One approach would be to use a simple query to return those fixed values, e.g.
SELECT 5 AS score
UNION ALL SELECT 4
UNION ALL SELECT 3
UNION ALL SELECT 2
UNION ALL SELECT 1
Then use that query as inline view, and do an outer join operation to the results from your current query
SELECT i.score AS `Score`
, IFNULL(q.projects,0) AS `Projects`
FROM ( SELECT 5 AS score
UNION ALL SELECT 4
UNION ALL SELECT 3
UNION ALL SELECT 2
UNION ALL SELECT 1
) i
LEFT
JOIN (
-- the current query with "missing" Score rows goes here
-- for completeness of this example, without the query
-- we emulate that result with a different query
SELECT 5 AS score, 95 AS projects
UNION ALL SELECT 3, 215
UNION ALL SELECT 1, 155
) q
ON q.score = i.score
ORDER BY i.score DESC
It doesn't have to be the view query in this example. But there does need to be a rowsource that the rows can be returned from. You could, for example, have a simple table that contains those five rows, with those five score values.
This is just an example approach for the general approach. It might be possible to modify your existing query to return the rows you want. But without seeing the query, the schema, and example data, we can't tell.
FOLLOWUP
Based on the edit to the question, showing an example of the current query.
If we are guaranteed that the five values of Score will always appear in the Scores table, we could do conditional aggregation, writing a query like this:
SELECT s.score
, COUNT(IF(s.company_type = :company_type,s.project_id,NULL)) AS projects
FROM Scores s
GROUP BY s.score
ORDER BY s.score DESC
Note that this will require a scan of all the rows, so it may not perform as well. The "trick" is the IF function, which returns a NULL value in place of project_id, when the row would have been excluded by the WHERE clause.)
If we are guaranteed that project_id is non-NULL, we could use a more terse MySQL shorthand expression to achieve an equivalent result...
, IFNULL(SUM(s.company_type = :company_type),0) AS projects
This works because MySQL returns 1 when the comparison is TRUE, and otherwisee returns 0 or NULL.
Try something like this:
select distinct score
from (
select distinct score from scores
) s
left outer join (
Select Score, Count(Project_ID) cnt
FROM Scores
WHERE company_type= :company_type
) x
on s.score = x.score
Your posted query would not work without a group by statement. However, even there, if you don't have those particular scores for that company type, it wouldn't work either.
One option is to use an outer join. That would require a little more work though.
Here's another option using conditional aggregation:
select Score, sum(company_type=:company_type)
from Scores
group by Score

Mysql Ranking Query on 2 columns

Table
id user_id rank_solo lp
1 1 15 45
2 2 7 79
3 3 17 15
How can I sort out a ranking query that sorts on rank_solo ( This ranges from 0 to 28) and if rank_solo = rank_solo , uses lp ( 0-100) to further determine ranking?
(If lp = lp, add a ranking for no tie rankings)
The query should give me the ranking from a certain random user_id. How is this performance wise on 5m+ rows?
So
User_id 1 would have ranking 2
User_id 2 would have ranking 3
User_id 3 would have ranking 1
You can get the ranking using variablesL
select t.*, (#rn := #rn + 1) as ranking
from t cross join
(select #rn := 0) params
order by rank_solo desc, lp;
You can use ORDER BY to sort your query:
SELECT *
FROM `Table`
ORDER BY rank_solo, lp
I'm not sure I quite understand what you're saying. With that many rows, create a query on the fields you're using to do your selects. For example, in MySQL client use:
create index RANKINGS on mytablename(rank_solo,lp,user_id);
Depending on what you use in your query to select the data, you may change the index or add another index with a different field combination. This has improved performance on my tables by a factor of 10 or more.
As for the query, if you're selecting a specific user then could you not just use:
select rank_solo from table where user_id={user id}
If you want the highest ranking individual, you could:
select * from yourtable order by rank_solo,lp limit 1
Remove the limit 1 to list them all.
If I've misunderstood, please comment.
An alternative would be to use a 2nd table.
table2 would have the following fields:
rank (auto_increment)
user_id
rank_solo
lp
With the rank field as auto increment, as it's populated, it will automatically populate with values beginning with "1".
Once the 2nd table is ready, just do this when you want to update the rankings:
delete from table2;
insert into table2 select user_id,rank_solo,lp from table1 order by rank_solo,lp;
It may not be "elegant" but it gets the job done. Plus, if you create an index on both tables, this query would be very quick since the fields are numeric.

One MySQL query to get AVG by different Groupings?

Wondering is there is a way to write the following in ONE MySQL query.
I have a table:
cust_ID | rpt_name | req_secs
In the query I'd like to get:
the AVG req_secs when grouped by cust_ID
the AVG req_secs when grouped by rpt_name
the total req_secs AVG
I know I can do separate grouping queries on the same table then UNION the results into one. But I was hoping there was some way to do it in one query.
Thanks.
Well, the following would does two out of three:
select n,
(case when n = 1 then cast(cust_id as varchar(255)) else rpt_name end) as grouping,
avg(req_secs)
from t cross join
(select 1 as n union all select 2
) n
group by n, (case when n = 1 then cust_id else rpt_name end);
This essentially "doubles" the data and then does the aggregation for each group. This assumes that cust_id and rpt_name are of compatible types. (The query could be tweaked if this is not the case.)
Actually, you can get the overall average by using rollup:
select n,
(case when n = 1 then cust_id else rpt_name end) as grouping,
avg(req_secs)
from t cross join
(select 1 as n union all select 2
) n
group by n, (case when n = 1 then cast(cust_id as varchar(255)) else rpt_name end) with rollup
This works for average because the average is the same on the "doubled" data as for the original data. It wouldn't work for sum() or count().
No there is not. You can group by a combination of cust_ID and rpt_name at the same time (i.e. two levels of grouping) but you are not going to be able to do separate top-level groupings and then a non-grouped aggregation at the same time.
Because of the way GROUP BY works, the SQL to do this is a little tricky. One way to get the result is to get three copies of the rows, and group each set of rows separately.
SELECT g.gkey
, IF(g.grp='cust_id',t.cust_ID,IF(g.grp='rpt_name',t.rpt_name,'')) AS gval
, AVG(t.req_secs) AS avg_req_secs
FROM (SELECT 'cust_id' AS gkey UNION ALL SELECT 'rpt_name' UNION ALL SELECT 'total') g
CROSS
JOIN mytable t
GROUP
BY g.gkey
, IF(g.grp='cust_id',t.cust_ID,IF(g.grp='rpt_name',t.rpt_name,''))
The inline view aliased as "g" doesn't have to use UNION ALL operators, you just need a rowset that returns exactly 3 rows with distinct values. I just used the UNION ALL as a convenient way to return three literal values as a rowset, so I could join that to the original table.