I have a table which looks like below.
CREATE TABLE `table_growth` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`timestamp` datetime DEFAULT CURRENT_TIMESTAMP,
`table_name` varchar(50) DEFAULT NULL,
`rows` int(11) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=184 DEFAULT CHARSET=utf8
Example of rows in the table:
+-----+---------------------+--------------------------+-------+
| id | timestamp | table_name | rows |
+-----+---------------------+--------------------------+-------+
| 110 | 2019-03-01 06:00:00 | attachments | 640 |
| 111 | 2019-03-01 06:00:00 | contacts | 0 |
| 112 | 2019-03-01 06:00:00 | copy_menuitem_options | 3038 |
| 113 | 2019-03-01 06:00:00 | copy_menuitem_suboptions | 9779 |
| 114 | 2019-03-01 06:00:00 | copy_menuitems | 12118 |
| 115 | 2019-03-02 06:00:00 | attachments | 638 |
| 116 | 2019-03-02 06:00:00 | contacts | 0 |
| 117 | 2019-03-02 06:00:00 | copy_menuitem_options | 3039 |
| 118 | 2019-03-02 06:00:00 | copy_menuitem_suboptions | 9789 |
| 119 | 2019-03-02 06:00:00 | copy_menuitems | 12128 |
+-----+---------------------+--------------------------+-------+
I want to calculate the diff(rows) between 2 days. like date(timestamp)='2019-03-02' - date(timestamp)='2019-03-01'
Expected results
table_name | Rows Diff
------------------------------
attachments | 2
contacts | 0
copy_menuitem_options | 1
copy_menuitem_suboptions| 10
copy_menuitems | 10
I tried these queries, but somewhere its failing.
SELECT x.table_name
, (y.rows-x.rows)as diff
FROM dbadmin.table_growth x
JOIN dbadmin.table_growth y
ON y.id = x.id
AND DATE(y.timestamp) = '2019-03-02'
WHERE DATE(x.timestamp) = '2019-03-01';
select x.table_name, (y.rows - x.rows) as doff
from table_growth x join
table_growth y on y.id=x.id and DATE(y.timestamp) = '2019-03-02'
WHERE DATE(x.timestamp) = '2019-03-01';
Your second query is on the right track, but the join condition is partially off. You should be asserting that the table names, not ids, match:
SELECT
x.table_name,
(x.rows - y.rows) AS diff
FROM table_growth x
INNER JOIN table_growth y
ON x.table_name = y.table_name and
DATE(y.timestamp) = '2019-03-02'
WHERE
DATE(x.timestamp) = '2019-03-01';
Note: Your current output is slightly ambiguous, because it is not clear which rows value comes first in the difference, or if perhaps you want to report an absolute value.
this will work;
select distinct a.table_name,(a.rows-b.rows) diff from table_growth a,table_growth b
where a.table_name=b.table_name;
If you have only one row per date, then this might be the fastest approach:
SELECT g.table_name,
SUM(CASE WHEN DATE(g.timestamp) = '2019-03-02'
THEN g.rows
WHEN DATE(g.timestamp) = '2019-03-01'
THEN -g.rows
ELSE 0
END) as diff
FROM dbadmin.table_growth g
WHERE g.timestamp >= '2019-03-01' AND
g.timestamp < '2019-03-03'
GROUP BY g.table_name;
In particular, this can make use of an index on table_growth(timestamp, table_name, rows).
Related
I have a problem with a SQL select query, I can't figure out what it needs to be.
This is what my items table look like:
| id | i_id | last_seen | spot |
----------------------------------------------------
| 1 | ls100 | 2017-03-10 15:30:40 | spot800 |
| 2 | ls100 | 2017-03-10 16:20:15 | spot753 |
| 3 | ls200 | 2017-03-10 16:33:10 | spot800 |
| 4 | ls300 | 2017-03-10 15:30:40 | spot800 |
| 5 | ls300 | 2017-03-10 12:10:30 | spot800 |
| 6 | ls400 | 2017-03-10 10:30:10 | spot800 |
This is what I'm trying to obtain:
| id | i_id | last_seen | spot |
----------------------------------------------------
| 3 | ls200 | 2017-03-10 16:33:10 | spot800 |
| 5 | ls300 | 2017-03-10 12:10:30 | spot800 |
So I need to have the rows where spot= 'spot800', last_seen = MAX(but only if the DateTime is the newest compared to all spots with the samei_id`), and at last the DateTime must be bigger than '2017-03-10 11:00:00'.
This is what I have so far:
SELECT *
FROM items
WHERE spot = 'spot800'
HAVING MAX(`last_seen`)
AND `last_seen` > '2017-03-10 11:00:00'
E.g.:
DROP TABLE IF EXISTS my_table;
CREATE TABLE my_table
(id INT NOT NULL AUTO_INCREMENT PRIMARY KEY
,i_id INT NOT NULL
,last_seen DATETIME NOT NULL
,spot INT NOT NULL
);
INSERT INTO my_table VALUES
(1,100,'2017-03-10 15:30:40',800),
(2,100,'2017-03-10 14:20:15',753),
(3,200,'2017-03-10 16:33:10',800),
(4,300,'2017-03-10 15:30:40',800),
(5,300,'2017-03-10 12:10:30',800),
(6,400,'2017-03-10 10:30:10',800);
SELECT [DISTINCT] x.*
FROM my_table x
LEFT
JOIN my_table y
ON y.i_id = x.i_id
AND y.last_seen < x.last_seen
WHERE x.last_seen > '2017-03-10 11:00:00'
AND x.spot = 800
AND y.id IS NULL;
----+------+---------------------+------+
| id | i_id | last_seen | spot |
+----+------+---------------------+------+
| 3 | 200 | 2017-03-10 16:33:10 | 800 |
| 5 | 300 | 2017-03-10 12:10:30 | 800 |
+----+------+---------------------+------+
2 rows in set (0.00 sec)
Use MAX and GROUP BY.
SELECT id, i_id, MAX(last_seen), spot
FROM items
WHERE spot = 'spot800'
AND last_seen > '2017-03-10 11:00:00'
GROUP BY id, i_id, spot
There is several things wrng with your statement.
Firstly, HAVING must be accompanied with a GROUP BY clause, so it's not what you are looking for.
Also, MAX is an aggregate, not a boolean, function. That is, it cannot be used in filters, such as a where clause or a having clause. Also, if it did work, MAX would only return the entry that contains the time as '2017-03-10 16:33:10'. Not what you expected.
Try this instead:
SELECT * FROM items WHERE (spot='spot800' AND last_seen > '2017-03-10 11:00:00');
This query does a group by on lead_source_id:
SELECT ch.lead_source_id,
Count(DISTINCT ch.repurchased_date)
FROM customers_history ch
WHERE ch.repurchased_date >= '2014-04-01'
AND ch.repurchased_date < '2014-05-01'
AND ch.lead_source_id IS NOT NULL
GROUP BY ch.lead_source_id;
And this query totals the records in the table:
SELECT Count(DISTINCT( repurchased_date ))
FROM customers_history
INNER JOIN (SELECT DISTINCT( customer_id ) AS xcid
FROM customers_history
WHERE repurchased_date >= '2014-04-01'
AND repurchased_date < '2014-05-01'
AND lead_source_id IS NOT NULL) AS Temp
ON Temp.xcid = customer_id
WHERE repurchased_date >= '2014-04-01'
AND repurchased_date < '2014-05-01'
AND lead_source_id IS NOT NULL;
On our production data, the totals from Query1 come to 7963, but the second query prints 7905. Why the difference and how can we fix our queries?
Here's our table layout:
+--------+-------------+----------------+---------------------+--------+
| id | customer_id | lead_source_id | repurchased_date | Rating |
+--------+-------------+----------------+---------------------+--------+
| 422923 | 420450 | 4 | 2014-04-14 09:16:48 | Warm |
| 422924 | 420450 | 4 | 2014-04-14 09:16:48 | Cold |
| 422956 | 420450 | 4 | 2014-04-14 09:16:49 | Hot |
| 422933 | 420451 | 37 | 2014-04-14 09:18:41 | Hot |
| 422938 | 420452 | 1 | 2014-04-10 20:50:30 | Hot |
| 422984 | 420452 | 1 | 2014-04-12 20:50:30 | Warm |
| 422940 | 420453 | 47 | 2014-04-14 09:20:27 | Hot |
+--------+-------------+----------------+---------------------+--------+
EDIT
To answer some of the possibilities about nulls:
select count(id) from customers_history where customer_id is null: 0
select count(id) from customers_history where lead_source_id is null: 5103
select count(id) from customers_history where repurchased_date is null: 0
The most obvious conclusion is that some lead_source_ids share values of repurchased_date.
Another possibility is that you have NULL values for customer_id and the second filters these out.
The third possibility is that NULL values of lead_source_id are adding additional values in the first query.
Suppose I have a table that tracks if a payment is missed like this:
+----+---------+------------+------------+---------+--------+
| id | loan_id | amount_due | due_at | paid_at | missed |
+----+---------+------------+------------+---------+--------+
| 1 | 1 | 100 | 2013-08-17 | NULL | NULL |
| 5 | 1 | 100 | 2013-09-17 | NULL | NULL |
| 7 | 1 | 100 | 2013-10-17 | NULL | NULL |
+----+---------+------------+------------+---------+--------+
And, for example, I ran a query that checks if a payment is missed like this:
UPDATE loan_payments
SET missed = 1
WHERE DATEDIFF(NOW(), due_at) >= 10
AND paid_at IS NULL
Then suppose that the row with id = 1 gets affected. I want the amount_due of row with id = 1 be added to the amount_due of the next row so the table would look like this:
+----+---------+------------+------------+---------+--------+
| id | loan_id | amount_due | due_at | paid_at | missed |
+----+---------+------------+------------+---------+--------+
| 1 | 1 | 100 | 2013-08-17 | NULL | 1 |
| 5 | 1 | 200 | 2013-09-17 | NULL | NULL |
| 7 | 1 | 100 | 2013-10-17 | NULL | NULL |
+----+---------+------------+------------+---------+--------+
Any advice on how to do it?
Thanks
Take a look at this :
SQL Fiddle
MySQL 5.5.32 Schema Setup:
CREATE TABLE loan_payments
(`id` int, `loan_id` int, `amount_due` int,
`due_at` varchar(10), `paid_at` varchar(4), `missed` varchar(4))
;
INSERT INTO loan_payments
(`id`, `loan_id`, `amount_due`, `due_at`, `paid_at`, `missed`)
VALUES
(1, 1, 100, '2013-09-17', NULL, NULL),
(3, 2, 100, '2013-09-17', NULL, NULL),
(5, 1, 100, '2013-10-17', NULL, NULL),
(7, 1, 100, '2013-11-17', NULL, NULL)
;
UPDATE loan_payments AS l
LEFT OUTER JOIN (SELECT loan_id, MIN(ID) AS ID
FROM loan_payments
WHERE DATEDIFF(NOW(), due_at) < 0
GROUP BY loan_id) AS l2 ON l.loan_id = l2.loan_id
LEFT OUTER JOIN loan_payments AS l3 ON l2.id = l3.id
SET l.missed = 1, l3.amount_due = l3.amount_due + l.amount_due
WHERE DATEDIFF(NOW(), l.due_at) >= 10
AND l.paid_at IS NULL
;
Query 1:
SELECT *
FROM loan_payments
Results:
| ID | LOAN_ID | AMOUNT_DUE | DUE_AT | PAID_AT | MISSED |
|----|---------|------------|------------|---------|--------|
| 1 | 1 | 100 | 2013-09-17 | (null) | 1 |
| 3 | 2 | 100 | 2013-09-17 | (null) | 1 |
| 5 | 1 | 200 | 2013-10-17 | (null) | (null) |
| 7 | 1 | 100 | 2013-11-17 | (null) | (null) |
Unfortunately I don't have time at the moment to write out full-blown SQL, but here's the psuedocode I think you need to implement:
select all DISTINCT loan_id from table loan_payments
for each loan_id:
set missed = 1 for all outstanding payments for loan_id (as determined by date)
select the sum of all outstanding payments for loan_id
add this sum to the amount_due for the loan's next due date after today
Refer to this for how to loop using pure MySQL: http://dev.mysql.com/doc/refman/5.7/en/cursors.html
I fixed my own problem by adding a missed_at field. I put the current timestamp ($now) in a variable before I update the first row to missed = 1 and missed_at = $now then I ran this query to update the next row's amount_due:
UPDATE loan_payments lp1 JOIN loan_payments lp2 ON lp1.due_at > lp2.due_at
SET lp1.amount_due = lp2.amount_due + lp1.amount_due
WHERE lp2.missed_at = $now AND DATEDIFF(lp1.due_at, lp2.due_at) <= DAYOFMONTH(LAST_DAY(lp1.due_at))
I wish I could use just use LIMIT 1 to that query but it turns out that it's not possible for an UPDATE query with a JOIN.
So all in all, I used two queries to achieve what I want. It did the trick.
Please advise if you have better solutions.
Thanks!
I have a table called asset_usages which records the viewing of an asset by a viewer. The relevant fields are
id (int)
asset_id (int)
viewer_type (string)
viewer_id (int)
viewed_at (datetime)
I have a new field i just added called time_between_viewings, which is an int field representing seconds. I want to set this to the time, in seconds, since that asset was last viewed. So, if i had these four records:
+-----+----------+-----------+-------------+---------------------+-----------------------+
| id | asset_id | viewer_id | viewer_type | viewed_at | time_between_viewings |
+-----+----------+-----------+-------------+---------------------+-----------------------+
| 506 | 7342 | 1182 | User | 2009-01-05 11:10:01 | NULL |
| 509 | 7342 | 1182 | User | 2009-01-05 11:12:47 | NULL |
| 514 | 6185 | 1182 | User | 2009-01-05 11:14:28 | NULL |
| 524 | 6185 | 1182 | User | 2009-01-05 11:28:18 | NULL |
| 618 | 1234 | 1182 | User | 2009-01-05 11:29:03 | NULL |
| 729 | 1234 | 1182 | User | 2009-01-05 11:29:01 | NULL |
+-----+----------+-----------+-------------+---------------------+-----------------------+
then time_between_viewings should be set as follows:
+-----+----------+-----------+-------------+---------------------+-----------------------+
| id | asset_id | viewer_id | viewer_type | viewed_at | time_between_viewings |
+-----+----------+-----------+-------------+---------------------+-----------------------+
| 506 | 7342 | 1182 | User | 2009-01-05 11:10:01 | NULL |
| 509 | 7342 | 1182 | User | 2009-01-05 11:12:47 | 166 |
| 514 | 6185 | 1182 | User | 2009-01-05 11:14:28 | NULL |
| 524 | 6185 | 1182 | User | 2009-01-05 11:28:18 | 830 |
| 618 | 1234 | 1182 | User | 2009-01-05 11:29:03 | 2 |
| 729 | 1234 | 1182 | User | 2009-01-05 11:29:01 | NULL |
+-----+----------+-----------+-------------+---------------------+-----------------------+
where 166 and 830 are the time difference between each pair, in seconds.
What would be the sql to populate this field? I can't quite figure it out.
IMPORTANT NOTE: the data is not always inserted into the db in chronological order. Ie, you could have two records A and B, where B has a higher id but A has a later value for viewed_at. So, looking for the first matching record with a lower id would not necesarily give you the previous viewing by the same person - you'll need to examine all the records in the database.
thanks! max
EDIT - stated that time_between_viewings is an int field representing seconds.
EDIT - added a couple of rows as an example of a row with a higher id but earlier timestamp
EDIT - i just realised that i didn't stipulate the question properly. The time_between_viewings should be equal to the time since the asset was last viewed by the same viewer, ie the time between the record and the previous (based on viewed_at) record that has the same asset_id, viewer_id and viewer_type. The example data i gave still holds, but i could have put in some different viewer_id and viewer_type values to flesh the example out a bit.
If would be helpful if you prepared sample table and data inserts.
Read this link to learn why it is so important if you want to get help : http://tkyte.blogspot.com/2005/06/how-to-ask-questions.html
This time I created it for you, click on this link: http://sqlfiddle.com/#!2/9719a/2
And try this query (you will find this query together with sample data under the above link) :
select alias1.*,
timestampdiff( second, previous_viewed_at, viewed_at )
as time_between_viewings
from (
select alias.*,
(
select viewed_at from (
select
( select count(*) from asset_usages y
where x.asset_id = y.asset_id
and y.viewed_at < x.viewed_at
) as rn,
x.*
from asset_usages x
) xyz
where xyz.asset_id = alias.asset_id
and xyz.rn = alias.rn - 1
) previous_viewed_at
from (
select
( select count(*) from asset_usages y
where x.asset_id = y.asset_id
and y.viewed_at < x.viewed_at
) as rn,
x.*
from asset_usages x
) alias
) alias1;
This SELECT statement will give you the right data. You might need to do the update in chunks, though.
You can drop the ORDER BY clause for the UPDATE statement. As written, the derived data doesn't depend on the order of rows in the outer SELECT statement.
select asset_id, viewer_id, viewer_type, viewed_at,
prev_viewed_at, timestampdiff(second, prev_viewed_at, viewed_at) elapsed_sec
from (select asset_id, viewer_id, viewer_type, viewed_at,
(select max(t2.viewed_at)
from Table1 t2
where t2.viewed_at < Table1.viewed_at
and t2.asset_id = Table1.asset_id
and t2.viewer_id = Table1.viewer_id
) prev_viewed_at
from Table1
)t3
order by asset_id, viewer_id, viewed_at;
I've got a mysql table that has a running total:
+---------------------+--------+
| Timestamp | Total |
+---------------------+--------+
| 2012-07-04 05:35:00 | 1.280 | 1.280-1.280 = 0
| 2012-07-04 09:25:00 | 2.173 | 2.173-1.280 = 0.893
| 2012-07-04 09:30:00 | 2.219 | 2.219-1.280 = 0.939
| 2012-07-04 15:00:00 | 7.778 | 7.778-1.280 = 6.498
| 2012-07-04 21:05:00 | 13.032 | 13.032-1.280 = 11.752
| 2012-07-04 22:00:00 | 13.033 | 13.033-1.280 = 11.753
| 2012-07-05 05:20:00 | 13.033 | 13.033-13.033 = 0
| 2012-07-05 07:10:00 | 13.140 | 13.140-13.033 = 0.107
| 2012-07-05 10:15:00 | 14.993 | 14.993-13.033 = 1.960
| 2012-07-05 11:35:00 | 16.870 | 16.870-13.033 = 3.837
+---------------------+--------+
What I'm looking for is a query that determines the aggregated daily increase for each interval.
I've tried to show the desired outcome as well as the calculation behind each row. I've tried already several things with a join, but somehow I fail to determine what the starting value for each day is.
Thanks.
I can't vouch for the efficiency of this query, but it does get you the results you are looking for:
SELECT t1.`Timestamp`, t1.`Total`,
CASE WHEN t1.`timestamp` =
(SELECT MIN(t2.`Timestamp`)
FROM myTable t2
WHERE DATE(t2.`Timestamp`)=DATE(t1.`Timestamp`))
THEN 0
ELSE t1.`Total` - (SELECT MIN(t3.`Total`)
FROM myTable t3
WHERE DATE(t3.`Timestamp`)=DATE(t1.`Timestamp`))
END AS Diff
FROM myTable t1
ORDER BY `Timestamp`
Alternate Solution (more efficient I think)
SELECT t1.`Timestamp`, t1.`Total`, (t1.`Total` - d1.MinVal) diff
FROM myTable t1
INNER JOIN
(SELECT DATE(`Timestamp`) ts_date,
MIN(`Total`) AS MinVal
FROM myTable
GROUP BY ts_date) d1
ON DATE(t1.`Timestamp`) = d1.ts_date