get distinct user rows from table while counting a different row - mysql

I am trying to count rows from a SQL Table named "reports", sorted by "working" being 0 / 1 / 2.
Currently I have this SQL query which works okay to give me three rows, each with a counter of how many there are that had "working" as either 0 / 1 / 2.
SELECT `working`, COUNT(`working`) AS `total` FROM `reports`
WHERE `appid` = 379720
GROUP BY `working`
ORDER BY `report_id` DESC LIMIT 30
So it currently (correctly as per the SQL) gives me something like:
Working
Total
0
12
1
34
2
18
What I want to do though, is have only one row per user counted, which I can't quite wrap my head around. I can't use a distinct select on an "author_id" field as that ends up included and I can't group by it since I need it grouped by the working int.
To be clear: I want the same results display, but only count one per unique "author_id" from each row.
Any pointers?

You seem to want count(distinct):
SELECT `working`, COUNT(DISTINCT author_id) AS `total`
FROM `reports`
WHERE `appid` = 379720
GROUP BY `working`
ORDER BY `report_id` DESC

Related

Average value for top n records?

i have this SQL Schema: http://sqlfiddle.com/#!9/eb34d
In particular these are the relevant columns for this question:
ut_id,ob_punti
I need to get the average of the TOP n (where n is 4) values of "ob_punti" for each user (ut_id)
This query returns the AVG of all values of ob_punti grouped by ut_id:
SELECT ut_id, SUM(ob_punti), AVG(ob_punti) as coefficiente
FROM vw_obiettivi_2015
GROUP BY ut_id ORDER BY ob_punti DESC
But i can't figure out how to get the AVG for only the TOP 4 values.
Can you please help?
It will give SUM and AVG of top 4. You may replace 4 by n to get top n.
select ut_id,SUM(ob_punti), AVG(ob_punti) from (
select #rank:=if(#prev_cat=ut_id,#rank+1,1) as rank,ut_id,ob_punti,#prev_cat:=ut_id
from Table1,(select #rank:=0, #prev_cat:="")t
order by ut_id, ob_punti desc
) temp
where temp.rank<=4
group by ut_id;
This is not exactly related to the question asked, I am placing this because some one might get benefited.
I got the hackerearth problem to write mysql query to fetch top 10 records based on average of product quantity in stock available.
SELECT productName, avg(quantityInStock) from products
group by quantityInStock
order by quantityInStock desc
limit 10
Note: If someone can make better the above query, please welcome to modify.

Assistance with complex MySQL query (using LIMIT ?)

I wonder if anyone could help with a MySQL query I am trying to write to return relevant results.
I have a big table of change log data, and I want to retrieve a number of record 'groups'. For example, in this case a group would be where two or more records are entered with the same timestamp.
Here is a sample table.
==============================================
ID DATA TIMESTAMP
==============================================
1 Some text 1379000000
2 Something 1379011111
3 More data 1379011111
3 Interesting data 1379022222
3 Fascinating text 1379033333
If I wanted the first two grouped sets, I could use LIMIT 0,2 but this would miss the third record. The ideal query would return three rows (as two rows have the same timestamp).
==============================================
ID DATA TIMESTAMP
==============================================
1 Some text 1379000000
2 Something 1379011111
3 More data 1379011111
Currently I've been using PHP to process the entire table, which mostly works, but for a table of 1000+ records, this is not very efficient on memory usage!
Many thanks in advance for any help you can give...
Get the timestamps for the filtering using a join. For instance, the following would make sure that the second timestamp is in a completed group:
select t.*
from t join
(select timestamp
from t
order by timestamp
limit 2
) tt
on t.timestamp = tt.timestamp;
The following would get the first three groups, no matter what their size:
select t.*
from t join
(select distinct timestamp
from t
order by timestamp
limit 3
) tt
on t.timestamp = tt.timestamp;

MySQL equally distributed random rows with WHERE clause

I have this table,
person_id int(10) pk
points int(6) index
other columns not very important
I have this random function which is very fast on a table with 10M rows:
SELECT person_id
FROM persons AS r1 JOIN
(SELECT (RAND() *
(SELECT MAX(person_id)
FROM persons)) AS id)
AS r2
WHERE r1.person_id >= r2.id
ORDER BY r1.person_id ASC
LIMIT 1
This is all great but now I wish to show only people with points > 0. Example table:
PERSON_ID POINTS
1 4
2 6
3 0
4 3
When I append AND points > 0 to the where clause, person_id 3 can't be selected, so a gap is created and when the random select person_id 3, person_id 4 will be selected. This gives person 4 a bigger chance to be chosen. Any one got suggestions how I can adjust the query to make it work with the where clause and give all rows same % of chance to be selected.
Info table: The table is uniform, no gaps in person_id's. About 90% will have 0 points. I want to make the query for where points = 0 and points > 0.
Before someone will say, use rand(): this is not solution for tables with more than a few 100k rows.
Bonus question: will it be possible to select x random rows in 1 query, so I do not have to call this query a few times when i want more random rows?
Important note: performance is key, with 10M+ rows the query may not take much longer than the current query, which takes 0.0005 seconds, I prefer to stay under 0.05 second.
Last note: If you think the query will never be this fast with above requirements, but another solution is possible (like fetching 100 rows and showing x random which has more than 0 points), please tell :)
Really appreciate your help and all help is welcome :)
You could generate in-line gap-free id's for the records that you really want to work with, and generate then the random selector using the total number of records available.
Try with this (props to the chosen answer here for the row_number generator):
SELECT r1.*
FROM
(SELECT person_id,
#curRow := #curRow + 1 AS row_number
FROM persons as p,
(SELECT #curRow := 0) r0
WHERE points>0) r1
, (SELECT COUNT(1) * RAND() id
FROM persons
WHERE points>0) r2
WHERE r1.person_id>=r2.id
ORDER BY r1.person_id ASC
LIMIT 1;
You can mess with it in this sqlfiddle.

specific ordering in mysql

I have a sql statement that brings back ids. Currently I am ordering the id's with the usual "ORDER BY id". What I need to be able to do is have the query order the first 3 rows by specific id's that I set. The order the remaining as it is currently. For example, I want to say the first 3 rows will be id's 7,10,3 in that order, then the rest of the rows will be ordered by the id as usual.
right now i just have a basic sql statement...
SELECT * from cards ORDER BY card_id
SELECT *
FROM cards
ORDER BY
CASE card_id WHEN 7 THEN 1 WHEN 10 THEN 2 WHEN 3 THEN 3 ELSE 4 END,
card_id
A bit shorter than Quassnoi's query, with FIELD :
-- ...
ORDER BY FIELD(card_id, 3, 10, 7) DESC
You have to invert the order because of the DESC, I didn't find a way to do it more naturally.

Mysql subquery with sum causing problems

This is a summary version of the problems I am encountering, but hits the nub of my problem. The real problem involves huge UNION groups of monthly data tables, but the SQL would be huge and add nothing. So:
SELECT entity_id,
sum(day_call_time) as day_call_time
from (
SELECT entity_id,
sum(answered_day_call_time) as day_call_time
FROM XCDRDNCSum201108
where (day_of_the_month >= 10 AND day_of_the_month<=24)
and LPAD(core_range,4,"0")="0987"
and LPAD(subrange,3,"0")="654"
and SUBSTR(LPAD(core_number,7,"0"),4,7)="3210"
) as summary
is the problem: when the table in the subquery XCDRDNCSum201108 returns no rows, because it is a sum, the column values contain null. And entity_id is part of the primary key, and cannot be null.
If I take out the sum, and just query entity_id, the subquery contains no rows, and thus the outer query does not fail, but when I use sum, I get error 1048 Column 'entity_id' cannot be null
how do I work around this problem ? Sometimes there is no data.
You are completely overworking the query... pre-summing inside, then summing again outside. In addition, I understand you are not a DBA, but if you are ever doing an aggregation, you TYPICALLY need the criteria that its grouped by. In the case presented here, you are getting sum of calls for all entity IDs. So you must have a group by any non-aggregates. However, if all you care about is the Grand total WITHOUT respect to the entity_ID, then you could skip the group by, but would also NOT include the actual entity ID...
If you want inclusive to show actual time per specific entity ID...
SELECT
entity_id,
sum(answered_day_call_time) as day_call_time,
count(*) number_of_calls
FROM
XCDRDNCSum201108
where
(day_of_the_month >= 10 AND day_of_the_month<=24)
and LPAD(core_range,4,"0")="0987"
and LPAD(subrange,3,"0")="654"
and SUBSTR(LPAD(core_number,7,"0"),4,7)="3210"
group by
entity_id
This would result in something like (fictitious data)
Entity_ID Day_Call_Time Number_Of_Calls
1 10 3
2 45 4
3 27 2
If all you cared about were the total call times
SELECT
sum(answered_day_call_time) as day_call_time,
count(*) number_of_calls
FROM
XCDRDNCSum201108
where
(day_of_the_month >= 10 AND day_of_the_month<=24)
and LPAD(core_range,4,"0")="0987"
and LPAD(subrange,3,"0")="654"
and SUBSTR(LPAD(core_number,7,"0"),4,7)="3210"
This would result in something like (fictitious data)
Day_Call_Time Number_Of_Calls
82 9
Would:
sum(answered_day_call_time) as day_call_time
changed to
ifnull(sum(answered_day_call_time),0) as day_call_time
work? I'm assuming mysql here but the coalesce function would/should work too.