Table statistics (aka row count) over time - mysql

i'm preparing a presentation about one of our apps and was asking myself the following question: "based on the data stored in our database, how much growth have happend over the last couple of years?"
so i'd like to basically show in one output/graph, how much data we're storing since beginning of the project.
my current query looks like this:
SELECT DATE_FORMAT(created,'%y-%m') AS label, COUNT(id) FROM table GROUP BY label ORDER BY label;
the example output would be:
11-03: 5
11-04: 200
11-05: 300
unfortunately, this query is missing the accumulation. i would like to receive the following result:
11-03: 5
11-04: 205 (200 + 5)
11-05: 505 (200 + 5 + 300)
is there any way to solve this problem in mysql without the need of having to call the query in a php-loop?

Yes, there's a way to do that. One approach uses MySQL user-defined variables (and behavior that is not guaranteed)
SELECT s.label
, s.cnt
, #tot := #tot + s.cnt AS running_subtotal
FROM ( SELECT DATE_FORMAT(t.created,'%y-%m') AS `label`
, COUNT(t.id) AS cnt
FROM articles t
GROUP BY `label`
ORDER BY `label`
) s
CROSS
JOIN ( SELECT #tot := 0 ) i
Let's unpack that a bit.
The inline view aliased as s returns the same resultset as your original query.
The inline view aliased as i returns a single row. We don't really care what it returns (except that we need it to return exactly one row because of the JOIN operation); what we care about is the side effect, a value of zero gets assigned to the #tot user variable.
Since MySQL materializes the inline view as a derived table, before the outer query runs, that variable gets initialized before the outer query runs.
For each row processed by the outer query, the value of cnt is added to #tot.
The return of s.cnt in the SELECT list is entirely optional, it's just there as a demonstration.
N.B. The MySQL reference manual specifically states that this behavior of user-defined variables is not guaranteed.

Related

How do I find rows in a database that are not in the expected order?

Say I have a table with the structure
recordNumber: INTEGER (autoincrement)
insertedOn: DATETIME
Normally data gets inserted into the table it increments the recordNumber and insertedOn is always current time. Normally the following should be true
select insertedOn order by recordNumber === select insertedOn order by insertedOn
But that's actually not the case the question I have is how do I query the database so I can find the first recordNumber that would break the condition.
You can use LAG window function, but it depends on the particular database you are using.
If we assume that your increments are by '1', then this is a little more generic:
select top 1 *
from YrTbl Ths
where exists (select 1
from YrTbl Prev
where Prev.recordNumber+1=Ths.recordNumber
and Prev.insertedOn>=Ths.insertedOn
)
order by Ths.RecordNumber
TOP n might work a little differently in your environment; if you are using MySQL you might like to use LIMIT 1 at the end of the query, for example.

DENSE_RANK() OVER and IFNULL()

Let's say I have a table like this -
id
number
1
1
2
1
3
1
I want to return the second largest number, and if there isn't, return NULL instead. In this case, since all the numbers in the table are the same, there isn't the second largest number, so it should return NULL.
These codes work -
SELECT IFNULL((
SELECT number
FROM (SELECT *, DENSE_RANK() OVER(ORDER BY number DESC) AS ranking
FROM test) r
WHERE ranking = 2), NULL) AS SecondHighestNumber;
However, after I changed the order of the query, it doesn't work anymore -
SELECT IFNULL(number, NULL) AS SecondHighestNumber
FROM (SELECT *, DENSE_RANK() OVER(ORDER BY number DESC) AS ranking
FROM test) r
WHERE ranking = 2;
It returns blank instead of NULL. Why?
Explanation
This is something of a byproduct of the way you are using subquery in your SELECT clause, and really without a FROM clause.
It is easy to see with a very simple example. We create an empty table. Then we select from it where id = 1 (no results as expected).
CREATE TABLE #foo (id int)
SELECT * FROM #foo WHERE id = 1; -- Empty results
But now if we take a left turn and turn that into a subquery in the select statement - we get a result!
CREATE TABLE #foo (id int)
SELECT (SELECT * FROM #foo WHERE id = 1) AS wtf; -- one record in results with value NULL
I'm not sure what else we could ask our sql engine to do for us - perhaps cough up an error and say I can't do this? Maybe return no results? We are telling it to select an empty result set as a value in the SELECT clause, in a query that doesn't have any FROM clause (personally I would like SQL to cough up and error and say I can't do this ... but it's not my call).
I hope someone else can explain this better, more accurately or technically - or even just give a name to this behavior. But in a nutshell there it is.
tldr;
So your first query has SELECT clause with an IFNULL function in it that uses a subquery ... and otherwise is a SELECT without a FROM. So this is a little weird but does what you want, as shown above. On the other hand, your second query is "normal" sql that selects from a table, filters the results, and lets you know it found nothing -- which might not be what you want but I think actually makes more sense ;)
Footnote: my "sql" here is T-SQL, but I believe this simple example would work the same in MySQL. And for what it's worth, I believe Oracle (back when I learned it years ago) actually would cough up errors here and say you can't have a SELECT clause with no FROM.

MYSQL error causing calculated result to be written to next row

Using MYSQL 5.7
This query creates a new table with the columns order_month and SKU plus the calculated columns qty_mth, count_mth and avg_month. The resulting table correctly reflects the first 4 columns but the final column (avg_month), while being correctly calculated is written on the next row.
CREATE TABLE tbl_temp_si_trans4
(temp_si_trans4id INT NOT NULL AUTO_INCREMENT,
PRIMARY KEY temp_trans4idkey (temp_si_trans4id),
INDEX index1 (order_month,SKU))
SELECT order_month, SKU,
#qty_mth := SUM(net_qty_after_refund) AS qty_mth,
#count_mth := COUNT(DISTINCT(order_year)) AS count_mth,
#avg_month := #qty_mth/#count_mth AS avg_month
FROM order_trans4
GROUP BY order_month, SKU
Please see below example of result:
I have tried to following modifications to the avg_month calculation line with the same result.
(#qty_mth/#count_mth) AS avg_month and
#qty_mth/#count_mth AS avg_month
From the documentation:
As a general rule, you should never assign a value to a user variable
and read the value within the same statement. You might get the
results you expect, but this is not guaranteed.
You can use subqueries without using user defined variables to achieve the effect you are looking for however. Something similar to the following
SELECT x.total_sale,
x.f1 / x.total_sale AS f1_percent
FROM (
SELECT s.f1,
s.f1 + s.f2 AS total_sale,
FROM sales s
) x

Missing values in a query

I encounter some strange results in the following query :
SET #indi_id = 768;
SET #generations = 8;
SELECT num, sosa, seq, len, dernier, ful_ful_nom
FROM fullindi
LEFT JOIN lignee_new
ON ((ful_indi_id = dernier) AND (len BETWEEN 1 AND #generations))
RIGHT JOIN numbers
ON ((sosa = num) AND (premier = #indi_id))
WHERE num BETWEEN 1 AND pow(2, #generations)
GROUP BY num
ORDER BY num;
The result looks like this :
Why the row just before a full NULL one doesn't display the existing values 'sosa', 'len', 'dernier', ful_ful_nom') but only the 'seq' value (see rows 43 and 47 in this example) ?
What am I missing?
As requested, here are data :
table lignee_new :
table fullindi :
The problem is that MySQL does really dumb things when an Aggregate function is introduced, or a GROUP BY is included, but not all of the fields are in an Aggregate Function or your GROUP BY.
You are asking it to GROUP BY num but none of the other columns in your SELECT are included in the Group BY nor are they being aggregated with a function (SUM, MAX, MIN, AVG, etc..)
In any other RDBMS this query wouldn't run and would throw an error, but MySQL just carries on. It uses the logic to decide which value it should show for each field that isn't num by just grabbing the first value it finds in it's data storage which may be different between innoDB and whatever else folks use anymore.
My guess is that in your case you have more than one record in lignee_new that has a num of 43. Since you GROUP BY num and nothing else, it just grabs values randomly from your multiple records where num=43 and displays them... which is reasonable. By not including them in an aggregate function you are pretty much saying "I don't care what you display for these other fields, just bring something back" and so MySQL does.
Remove your GROUP BY clause completely and you'll see data that makes sense. Perhaps use WHERE to further filter your records to get rid of nulls or other things you don't need (don't use GROUP BY to filter).

Stored procedure to execute a query and return selected values if the query returns only 1 result

So my query is the following, which may return many results:
SELECT P_CODE, NAME FROM TEST.dbo.PEOPLE
WHERE NAME LIKE '%JA%'
AND P_CODE LIKE '%003%'
AND DOB LIKE '%1958%'
AND HKID = ''
AND (MOBILE LIKE '%28%' OR TEL LIKE '%28%')
I would like to integrate this into a Stored Procedure (or View?) so that it will only return a result if the query results in exactly 1 row. If there's 0 or > 1, then it should return no results.
If you just want to return an empty resultset in cases other than 1:
;WITH x AS
(
SELECT P_CODE, NAME, c = COUNT(*) OVER()
FROM TEST.dbo.PEOPLE
WHERE NAME LIKE '%JA%'
AND P_CODE LIKE '%003%'
AND DOB LIKE '%1958%'
AND HKID = ''
AND (MOBILE LIKE '%28%' OR TEL LIKE '%28%')
)
SELECT P_CODE, NAME FROM x WHERE c = 1;
Otherwise, you'll have to run the query twice (or dump the results to intermediate storage, such as a #temp table) - once to get the count, and once to decide based on the count whether to run the SELECT or not.
Effectively you want something akin to FirstOrDefault() from the Linq-to-SQL implementation but done on the server-side which means you will need to execute the query in a stored procedure, dumping the results into a temp table variable and then access ##ROWCOUNT afterwards to get the number of rows that were returned and then decide whether or not to forward the results on to the caller. If you do, be sure to use TOP 1 in the query from the temp table so that you only get a single result out as you desire.
UPDATE:
I described the alternate solution from what Aaron describes in his answer (which I like better).
Removed unnecessary TOP specifier in solution specification.