Obtain running frequency distribution from previous N rows of MySQL database - mysql

I have a MySQL database where one column contains status codes. The column is of type int and the values will only ever be 100,200,300,400. It looks like below; other columns removed for clarity.
id | status
----------------
1 300
2 100
3 100
4 200
5 300
6 300
7 100
8 400
9 200
10 300
11 100
12 400
13 400
14 400
15 300
16 300
The id field is auto-generated and will always be sequential. I want to have a third column displaying a comma-separated string of the frequency distribution of the status codes of the previous 10 rows. It should look like this.
id | status | freq
-----------------------------------
1 300
2 100
3 100
4 200
5 200
6 300
7 100
8 400
9 300
10 300
11 100 300,100,200,400 -- from rows 1-10
12 400 100,300,200,400 -- from rows 2-11
13 400 100,300,200,400 -- from rows 3-12
14 400 300,400,100,200 -- from rows 4-13
15 300 400,300,100,200 -- from rows 5-14
16 300 300,400,100 -- from rows 6-15
I want the most frequent code listed first. And where two status codes have the same frequency it doesn't matter to me which is listed first but I did list the smaller code before the larger in the example. Lastly, where a code doesn't appear at all in the previous ten rows, it shouldn't be listed in the freq column either.
And to be very clear the row number that the frequency string appears on does NOT take into account the status code of that row; it's only the previous rows.
So what have I done? I'm pretty green with SQL. I'm a programmer and I find this SQL language a tad odd to get used to. I managed the following self-join select statement.
select *, avg(b.status) freq
from sample a
join sample b
on (b.id < a.id) and (b.id > a.id - 11)
where a.id > 10
group by a.id;
Using the aggregate function avg, I can at least demonstrate the concept. The derived table b provides the correct rows to the avg function but I just can't figure out the multi-step process of counting and grouping rows from b to get a frequency distribution and then collapse the frequency rows into a single string value.
Also I've tried using standard stored functions and procedures in place of the built-in aggregate functions, but it seems the b derived table is out of scope or something. I can't seem to access it. And from what I understand writing a custom aggregate function is not possible for me as it seems to require developing in C, something I'm not trained for.
Here's sql to load up the sample.
create table sample (
id int NOT NULL AUTO_INCREMENT,
PRIMARY KEY(id),
status int
);
insert into sample(status) values(300),(100),(100),(200),(200),(300)
,(100),(400),(300),(300),(100),(400),(400),(400),(300),(300),(300)
,(100),(400),(100),(100),(200),(500),(300),(100),(400),(200),(100)
,(500),(300);
The sample has 30 rows of data to work with. I know it's a long question, but I just wanted to be as detailed as I could be. I've worked on this for a few days now and would really like to get it done.
Thanks for your help.

The only way I know of to do what you're asking is to use a BEFORE INSERT trigger. It has to be BEFORE INSERT because you want to update a value in the row being inserted, which can only be done in a BEFORE trigger. Unfortunately, that also means it won't have been assigned an ID yet, so hopefully it's safe to assume that at the time a new record is inserted, the last 10 records in the table are the ones you're interested in. Your trigger will need to get the values of the last 10 ID's and use the GROUP_CONCAT function to join them into a single string, ordered by the COUNT. I've been using SQL Server mostly and I don't have access to a MySQL server at the moment to test this, but hopefully my syntax will be close enough to at least get you moving in the right direction:
create trigger sample_trigger BEFORE INSERT ON sample
FOR EACH ROW
BEGIN
DECLARE _freq varchar(50);
SELECT GROUP_CONCAT(tbl.status ORDER BY tbl.Occurrences) INTO _freq
FROM (SELECT status, COUNT(*) AS Occurrences, 1 AS grp FROM sample ORDER BY id DESC LIMIT 10) AS tbl
GROUP BY tbl.grp
SET new.freq = _freq;
END

SELECT id, GROUP_CONCAT(status ORDER BY freq desc) FROM
(SELECT a.id as id, b.status, COUNT(*) as freq
FROM
sample a
JOIN
sample b ON (b.id < a.id) AND (b.id > a.id - 11)
WHERE
a.id > 10
GROUP BY a.id, b.status) AS sub
GROUP BY id;
SQL Fiddle

Related

How this query can be answered ? Select SUM(1) FROM table

select * from "Test"."EMP"
id
1
2
3
4
5
Select SUM(1) FROM "Test"."EMP";
Select SUM(2) FROM "Test"."EMP";
Select SUM(3) FROM "Test"."EMP";
why the output of these queries is?
5
10
15
And
I don't understand why they write table name like this "Test"."EMP"
your table has 5 records. the statement select 1 from test.emp returns 5 records with values as 1 for all 5 records.
id
1
1
1
1
1
This is because db engine simply returns 1 for each existing record without reading the contents of the cell. and same happens for select <any static value> from test.emp
same happens for 2 and 3
id
2
2
2
2
2
hence there are 5 records returned with the static values and sum of those values will be the product of static number passed in the select statement and total records in the table
additional fact: It is always recommended to perform count(1) than count(*) as it consumes less resource and hence less load on the server
I don't think it's "Test"."EMP" with double quotes.. it's probably `Test`.`EMP` with backticks instead. The definition means its database_name.table_name. This is the recommended format to get the correct table_name from database_name; in this case, you're specifically making the syntax to query from `Test`.`EMP`. Read more about identifier qualifiers.
As for SUM(x), the x get's repeated according to the rows present in the table. So SUM(1) on 5 rows is 1+1+1+1+1, SUM(2) on 5 rows is 2+2+2+2+2, and so on.

ORDER BY and GROUP BY those results in a single query

I am trying to query a dataset from a single table, which contains quiz answers/entries from multiple users. I want to pull out the highest scoring entry from each individual user.
My data looks like the following:
ID TP_ID quiz_id name num_questions correct incorrect percent created_at
1 10154312970149546 1 Joe 3 2 1 67 2015-09-20 22:47:10
2 10154312970149546 1 Joe 3 3 0 100 2015-09-21 20:15:20
3 125564674465289 1 Test User 3 1 2 33 2015-09-23 08:07:18
4 10153627558393996 1 Bob 3 3 0 100 2015-09-23 11:27:02
My query looks like the following:
SELECT * FROM `entries`
WHERE `TP_ID` IN('10153627558393996', '10154312970149546')
GROUP BY `TP_ID`
ORDER BY `correct` DESC
In my mind, what that should do is get the two users from the IN clause, order them by the number of correct answers and then group them together, so I should be left with the 2 highest scores from those two users.
In reality it's giving me two results, but the one from Joe gives me the lower of the two values (2), with Bob first with a score of 3. Swapping to ASC ordering keeps the scores the same but places Joe first.
So, how could I achieve what I need?
You're after the groupwise maximum, which can be obtained by joining the grouped results back to the table:
SELECT * FROM entries NATURAL JOIN (
SELECT TP_ID, MAX(correct) correct
FROM entries
WHERE TP_ID IN ('10153627558393996', '10154312970149546')
GROUP BY TP_ID
) t
Of course, if a user has multiple records with the maximal score, it will return all of them; should you only want some subset, you'll need to express the logic for determining which.
MySql is quite lax when it comes to group-by-clauses - but as a rule of thumb you should try to follow the rule that other DBMSs enforce:
In a group-by-query each column should either be part of the group-by-clause or contain a column-function.
For your query I would suggest:
SELECT `TP_ID`,`name`,max(`correct`) FROM `entries`
WHERE `TP_ID` IN('10153627558393996', '10154312970149546')
GROUP BY `TP_ID`,`name`
Since your table seems quite denormalized the group by name-par could be omitted, but it might be necessary in other cases.
ORDER BY is only used to specify in which order the results are returned but does nothing about what results are returned - so you need to apply the max()-function to get the highest number of right answers.

Iterate through a column and summarize findings

I have a table (t1) in mySQL that generates the following table:
type time full
0 11 yes
1 22 yes
0 11 no
3 13 no
I would like to create a second table (t2) from this that will summarize the information found in t1 like the following:
type time num_full total
0 11 1 2
1 22 1 1
3 13 0 1
I want to be able to iterate through the type column in order to be able to start this summary, something like a for-loop. The types can be up to a value of n, so I would rather not write n+1 WHERE statements, then have to update the code every time more types are added.
Notice how t2 skipped the type of value 2? This has also been escaping me when I try looping. I only want the the types found to have rows created in t2.
While a direct answer would be nice, it would be much more helpful to be pointed to some sources where I could figure this out, or both.
This may do what you want
create table t2 if not exists select type, time, sum(full) num_full, count(*) count
from t1
group by type,time
order by type,time;
depending on how you want to aggregate the time column.
This is a starting point for reference on the group by functions : https://dev.mysql.com/doc/refman/5.0/en/group-by-functions.html
here for create syntax
https://dev.mysql.com/doc/refman/5.6/en/create-table.html

Limit On Accumulated Column in MySQL

I'm trying to find an elegant way to write a query that only returns enough rows for a certain column to add up to at least n.
For example, let's say n is 50, and the table rows look like this:
id count
1 12
2 13
3 5
4 18
5 14
6 21
7 13
Then the query should return:
id count
1 12
2 13
3 5
4 18
5 14
Because the counts column adds up to n > 50. (62, to be exact)
It must return the results consecutively starting with the smallest id.
I've looked a bit into accumulators, like in this one: MySQL select "accumulated" column
But AFAIK, there is no way to have the LIMIT clause in an SQL query limit on an SUM instead of a row count.
I wish I could say something like this, but alas, this is not valid SQL:
SELECT *
FROM elements
LIMIT sum(count) > 50
Also, please keep in my the goal here is to insert the result of this query into another table atomically in an automated, performance efficient fashion, so please no suggestions to use a spreadsheet or anything that's not SQL compatible.
Thanks
There are many ways to do this. One is by using Correlated Subquery
SELECT id,
count
FROM (SELECT *,
(SELECT Isnull(Sum(count), 0)
FROM yourtable b
WHERE b.id < a.id) AS Run_tot
FROM yourtable a) ou
WHERE Run_tot < 50

Getting the MAX Record for the Most Recent Serial Numbers

I have the following table with some sample data.
Record_ID Counter Serial Owner
1 0 AAA Jack
2 1 AAA Kevin
3 0 BBB Jane
4 1 BBB Wendy
Based on data similar to the above, I am trying to write a SQL query for MySQL that gets the record with the maximum Counter value per Serial number. The part I seem to be having trouble with is getting the query to get the last 50 unique serial numbers that were updated.
Below is the query I came up with so far based on this StackOverflow question.
SELECT *
FROM `history` his
INNER JOIN(SELECT serial,
Max(counter) AS MaxCount
FROM `tracking`
WHERE serial IN (SELECT serial
FROM `history`)
GROUP BY serial
ORDER BY record_id DESC) q
ON his.serial = q.serial
AND his.counter = q.maxcount
LIMIT 0, 50
It looks like a classic greatest-n-per-group problem, which can be solved by something like this:
select his.Record_ID, his.Counter, his.Serial, his.Owner
from History his
inner join(
select Serial, max(Counter) Counter
from History
group by Serial
) ss on his.Serial = ss.Serial and his.Counter = ss.Counter
If you are to have specific filters on your data set, you should apply the said filters in the sub-query.
Another source with more explanation on the problem here: SQL Select only rows with Max Value on a Column