average of duplicate data from one table in mysql

average of duplicate data from one table in mysql - mysql

I have a table like this:
+----+---------------------+----------+
| ID | startDate | uvIndex |
+----+---------------------+----------+
| 1 | 2014-10-29 06:37:21 | 120 |
| 2 | 2014-10-29 08:57:00 | 135 |
| 3 | 2014-10-28 05:37:21 | 120 |
| 4 | 2014-10-28 09:30:21 | 160 |
| 5 | 2014-10-28 10:28:21 | 150 |
| 6 | 2014-10-26 16:27:01 | 150 |
| 7 | 2014-10-26 17:57:21 | 110 |
+----+----------+-----+---------------+
From there I want distinct values but if data is repetitive then the average of that data should come. Like this:
+----+-------------+---------------+
| ID | startDate | AVG(uvIndex) |
+----+-------------+---------------+
| 1 | 2014-10-29 | 127.5 |
| 2 | 2014-10-28 | 98.33 |
| 3 | 2014-10-26 | 130 |
+----+----------+-----+------------+
My attempt so far :
SELECT AVG(accumulatedLux), startTime FROM `blutooth_accumulated_data` WHERE `startTime` BETWEEN '2014-10-25' AND '2014-10-30' GROUP BY startTime
Please help me out.

As far as I can tell, your query should produce the data you want. The only issue is the id, which you can get using a variable. Does this do what you want?
SELECT (#id := #id + 1) as id, startTime, AVG(accumulatedLux)
FROM `blutooth_accumulated_data` CROSS JOIN
(SELECT #id := 0) vars
WHERE `startTime` BETWEEN '2014-10-25' AND '2014-10-30'
GROUP BY startTime
ORDER BY startTime;
One possible complication is if the StartTime has a time component. If so, you want this:
SELECT (#id := #id + 1) as id, date(startTime) as startDate, AVG(accumulatedLux)
FROM `blutooth_accumulated_data` CROSS JOIN
(SELECT #id := 0) vars
WHERE date(startTime) BETWEEN '2014-10-25' AND '2014-10-30'
GROUP BY date(startTime)
ORDER BY date(startTime);
And a small note. So you can use an index on startTime, you should write the where clause as:
WHERE startTime >= date('2014-10-15') and startTime < date('2014-10-30') + interval 1 day

Related

Cumulative sum as dependant subquery

I have a table called transactions which contains sellers and their transactions: sale, waste, and whenever they receive products. The structure is essentially as follows:
seller_id transaction_date quantity reason product unit_price
--------- ---------------- -------- ------ ------- ----------
1 2018-01-01 10 import 1 100.0
1 2018-01-01 -5 sale 1 100.0
1 2018-01-01 -1 waste 1 100.0
2 2018-01-01 -3 sale 4 95.5
I need a daily summary of each seller, including the value of their sales, waste and starting inventory. The problem is, the starting inventory is a cumulative sum of quantities up until the given day (the imports at the given day is also included). I have the following query:
SELECT
t.seller_id,
t.transaction_date,
t.SUM(quantity * unit_price) as amount,
t.reason as reason,
(
SELECT SUM(unit_price * quantity) FROM transactions
WHERE seller_id = t.seller_id
AND (transaction_date <= t.transaction_date)
AND (
transaction_date < t.transaction_date
OR reason = 'import'
)
) as opening_balance
FROM transactions t
GROUP BY
t.transaction_date,
t.seller_id
t.reason
The query works and I get the desired results. However, even after creating indices for both the outer and the subquery, it takes way too much time (about 30 seconds), because the opening_balance query is a dependant subquery which is calculated for each row over and over again.
How can i optimize, or rewrite this query?
Edit: the subquery had a small bug with a missing WHERE condition, i fixed it, but the essence of the question is the same. I created a fiddle with example data to play around:
https://www.db-fiddle.com/f/ma7MhufseHxEXLfxhCtGbZ/2

Following approach utilizing User-defined variables can be more performant than using the Correlated Subquery. In your case, a temp variable was used to account for the calculation logic, which also get outputted. You can ignore that.
You can try the following query (can add more explanation if needed):
Query
SELECT dt.reason,
dt.amount,
#bal := CASE
WHEN dt.reason = 'import'
AND #sid <> dt.seller_id THEN dt.amount
WHEN dt.reason = 'import' THEN #bal + #temp + dt.amount
WHEN #sid = 0
OR #sid = dt.seller_id THEN #bal
ELSE 0
end AS opening_balance,
#temp := CASE
WHEN dt.reason <> 'import'
AND #sid = dt.seller_id
AND #td = dt.transaction_date THEN #temp + dt.amount
ELSE 0
end AS temp,
#sid := dt.seller_id AS seller_id,
#td := dt.transaction_date AS transaction_date
FROM (SELECT seller_id,
transaction_date,
reason,
Sum(quantity * unit_price) AS amount
FROM transactions
WHERE seller_id IS NOT NULL
GROUP BY seller_id,
transaction_date,
reason
ORDER BY seller_id,
transaction_date,
Field(reason, 'import', 'sale', 'waste')) AS dt
CROSS JOIN (SELECT #sid := 0,
#td := '',
#bal := 0,
#temp := 0) AS user_vars;
Result (note that I have ordered by seller_id first and then transaction_date)
| reason | amount | opening_balance | temp | seller_id | transaction_date |
| ------ | ------ | --------------- | ----- | --------- | ---------------- |
| import | 1250 | 1250 | 0 | 1 | 2018-12-01 |
| sale | -850 | 1250 | -850 | 1 | 2018-12-01 |
| waste | -100 | 1250 | -950 | 1 | 2018-12-01 |
| import | 950 | 1250 | 0 | 1 | 2018-12-02 |
| sale | -650 | 1250 | -650 | 1 | 2018-12-02 |
| waste | -450 | 1250 | -1100 | 1 | 2018-12-02 |
| import | 2000 | 2000 | 0 | 2 | 2018-12-01 |
| sale | -1200 | 2000 | -1200 | 2 | 2018-12-01 |
| waste | -250 | 2000 | -1450 | 2 | 2018-12-01 |
| import | 750 | 1300 | 0 | 2 | 2018-12-02 |
| sale | -600 | 1300 | -600 | 2 | 2018-12-02 |
| waste | -450 | 1300 | -1050 | 2 | 2018-12-02 |
View on DB Fiddle

do thing something like this ?
SELECT s.* ,#balance:=#balance+(s.quantity*s.unit_price) AS opening_balance FROM (
SELECT t.* FROM transactions t
ORDER BY t.seller_id,t.transaction_date,t.reason
) s
CROSS JOIN ( SELECT #balance:=0) AS INIT
GROUP BY s.transaction_date, s.seller_id, s.reason;
SAMPLE
MariaDB [test]> select * from transactions;
+----+-----------+------------------+----------+------------+--------+
| id | seller_id | transaction_date | quantity | unit_price | reason |
+----+-----------+------------------+----------+------------+--------+
| 1 | 1 | 2018-01-01 | 10 | 100 | import |
| 2 | 1 | 2018-01-01 | -5 | 100 | sale |
| 3 | 1 | 2018-01-01 | -1 | 100 | waste |
| 4 | 2 | 2018-01-01 | -3 | 99.5 | sale |
+----+-----------+------------------+----------+------------+--------+
4 rows in set (0.000 sec)
MariaDB [test]> SELECT s.* ,#balance:=#balance+(s.quantity*s.unit_price) AS opening_balance FROM (
-> SELECT t.* FROM transactions t
-> ORDER BY t.seller_id,t.transaction_date,t.reason
-> ) s
-> CROSS JOIN ( SELECT #balance:=0) AS INIT
-> GROUP BY s.transaction_date, s.seller_id, s.reason;
+----+-----------+------------------+----------+------------+--------+-----------------+
| id | seller_id | transaction_date | quantity | unit_price | reason | opening_balance |
+----+-----------+------------------+----------+------------+--------+-----------------+
| 1 | 1 | 2018-01-01 | 10 | 100 | import | 1000 |
| 2 | 1 | 2018-01-01 | -5 | 100 | sale | 500 |
| 3 | 1 | 2018-01-01 | -1 | 100 | waste | 400 |
| 4 | 2 | 2018-01-01 | -3 | 99.5 | sale | 101.5 |
+----+-----------+------------------+----------+------------+--------+-----------------+
4 rows in set (0.001 sec)
MariaDB [test]>

SELECT
t.seller_id,
t.transaction_date,
SUM(quantity) as amount,
t.reason as reason,
quantityImport
FROM transaction t
inner join
(
select sum(ifnull(quantityImport,0)) quantityImport,p.transaction_date,p.seller_id from
( /* subquery get all the date and seller distinct row */
select transaction_date ,seller_id ,reason
from transaction
group by seller_id, transaction_date
)
as p
left join
( /* subquery get all the date and seller and the import quantity */
select sum(quantity) quantityImport,transaction_date ,seller_id
from transaction
where reason='Import'
group by seller_id, transaction_date
) as n
on
p.seller_id=n.seller_id
and
p.transaction_date>=n.transaction_date
group by
p.seller_id,p.transaction_date
) as q
where
t.seller_id=q.seller_id
and
t.transaction_date=q.transaction_date
GROUP BY
t.transaction_date,
t.seller_id,
t.reason;

Accumulated formula in mysql

I have 3 tables in my database. Here how it looks:
tbl_production:
+--------+------------+-----+-------+
| id_pro | date | qty | stock |
+--------+------------+-----+-------+
| 1 | 2017-09-09 | 100 | 93 |
| 2 | 2017-09-10 | 100 | 100 |
tbl_out:
+--------+------------+-----+
| id_out | date | qty |
+--------+------------+-----+
| 1 | 2017-09-09 | 50 |
| 2 | 2017-09-09 | 50 |
| 3 | 2017-09-10 | 50 |
| 4 | 2017-09-10 | 50 |
tbl_return:
+--------+------------+-----+
| id_out | date | qty |
+--------+------------+-----+
| 1 | 2017-09-09 | 48 |
| 2 | 2017-09-09 | 50 |
| 3 | 2017-09-10 | 60 |
| 4 | 2017-09-10 | 35 |
I would like to get the result the stock of the day. This what the table should be:
+------------+------+
| date | sotd |
+------------+------+
| 2017-09-09 | 98 |
| 2017-09-09 | 193 |
This result is from the
accumulated stock from the days before + tbl_production.qty -
SUM(tbl_out.qty) GROUP by date + SUM(tbl_return.qty) GROUP by date
The stock of the date from 2017-09-09 is from 0 (because this is the first production) + 100 - 100 + 98 = 98
The stock of the date from 2017-09-10 is from 98 (accumulated stock from the days before) + 100 - 100 + 95 = 193
I already have the query something like this, but it can't be executed
SET #running_count := 0;
SELECT *,
#running_count := #running_count + qty - (SELECT SUM(qty) FROM tbl_out GROUP BY date) + (SELECT SUM(qty) FROM tbl_return GROUP BY date) AS Counter
FROM tbl_production
ORDER BY id_prod;
How can I get this result?

In MySQL, GROUP BY and variables don't always work well together. Try:
SELECT p.date,
(#qty := #qty + qty) as running_qty
FROM (SELECT p.date, SUM(qty) as qty
FROM tbl_production p
GROUP BY p.date
) p CROSS JOIN
(SELECT #qty := 0) params
ORDER BY p.date;
EDIT:
If you want the value from the day before, the expression is a bit complicated, but not hard:
SELECT p.date,
(CASE WHEN (#save_qty := #qty) = NULL THEN -1 -- never happens
WHEN (#qty := #qty + qty) = NULL THEN -1 -- never happens
ELSE #save_qty
END) as start_of_day
FROM (SELECT p.date, SUM(qty) as qty
FROM tbl_production p
GROUP BY p.date
) p CROSS JOIN
(SELECT #qty := 0) params
ORDER BY p.date;

Get last balance sign change in (My)SQL

I have a Transaction table that records every amount added to or subtracted from the balance of a Customer, with the new balance:
+----+------------+------------+--------+---------+
| id | customerId | timestamp | amount | balance |
+----+------------+------------+--------+---------+
| 1 | 1 | 1000000001 | 10 | 10 |
| 2 | 1 | 1000000002 | -20 | -10 |
| 3 | 1 | 1000000003 | -10 | -20 |
| 4 | 2 | 1000000004 | -5 | -5 |
| 5 | 2 | 1000000005 | -5 | -10 |
| 6 | 2 | 1000000006 | 10 | 0 |
| 7 | 3 | 1000000007 | -5 | -5 |
| 8 | 3 | 1000000008 | 10 | 5 |
| 9 | 3 | 1000000009 | 10 | 15 |
| 10 | 4 | 1000000010 | 5 | 5 |
+----+------------+------------+--------+---------+
The Customer table stores the current balance, and looks like:
+----+---------+
| id | balance |
+----+---------+
| 1 | -20 |
| 2 | 0 |
| 3 | 15 |
| 4 | 5 |
+----+---------+
I would like to add a balanceSignSince column, that would store the timestamp at which the balance sign last changed. Transitioning to and from positive, negative, or zero counts as a balance change.
After the update, based on the above data, the Customer table should contain:
+----+---------+------------------+
| id | balance | balanceSignSince |
+----+---------+------------------+
| 1 | -20 | 1000000002 |
| 2 | 0 | 1000000006 |
| 3 | 15 | 1000000008 |
| 4 | 5 | 1000000010 |
+----+---------+------------------+
How can I write a SQL query that updates every Customer with the last time the balance sign changed, based on the Transaction table?
I suspect I can't do this without a quite complex stored procedure, but am curious to see if any clever ideas come up.

This uses a simulated rank() function.
select customerId, min(tstamp) from
(
select tstamp,
if (#cust = customerId and sign(#bal) = sign(balance), #rn := #rn,
if (#cust = customerId and sign(#bal) <> sign(balance), #rn := #rn + 1, #rn := 0)) as rn,
#cust := customerId as customerId, #bal := balance as balance
from
(select #rn := 0) x,
(select id, #cust := customerId as customerId, tstamp, amount, #bal := balance as balance
from trans order by customerId, tstamp desc) y
) z
where rn = 0
group by customerId;
Check it: http://rextester.com/XJVKK61181
This script returns a table like this:
+------------+----+------------+---------+
| tstamp | rn | customerId | balance |
+------------+----+------------+---------+
| 1000000003 | 0 | 1 | -20 |
| 1000000002 | 0 | 1 | -10 |
| 1000000001 | 1 | 1 | 10 |
| 1000000006 | 0 | 2 | 0 |
| 1000000005 | 2 | 2 | -10 |
| 1000000004 | 2 | 2 | -5 |
| 1000000009 | 0 | 3 | 15 |
| 1000000008 | 2 | 3 | 5 |
| 1000000007 | 3 | 3 | -5 |
| 1000000010 | 0 | 4 | 5 |
+------------+----+------------+---------+
Then selecting min(timestamp) of files where rn = 0:
+------------+-------------+
| customerId | min(tstamp) |
+------------+-------------+
| 1 | 1000000002 |
+------------+-------------+
| 2 | 1000000006 |
+------------+-------------+
| 3 | 1000000009 |
+------------+-------------+
| 4 | 1000000010 |
+------------+-------------+

Updated answer with the restriction that this needs to work on the existing data
The following query should work for most cases, there is still an issue with customers having only a single transaction or no sign change. As this is a one time update, I would run the query below and then do a simple update for all users not having a timestamp set, for them it's going to be the timestamp of the first transaction:
# Find the smallest timestamp, e.g. the
# transaction which changed the signum.
SELECT
p.customerId as customerId,
MIN(t.timestamp) as balanceSignSince
FROM
transaction as t,
(
# find the latest timestamp having
# a different sign for each user.
# Here is the issue with users having
# only a single transaction or no sign
# changes.
SELECT
u.customerId as customerId,
MAX(t.timestamp) as balanceSignSince
FROM
transaction as t,
customer as c,
(
# find the timestamp of the very last
# transaction for every user.
SELECT
t.customerId as customerId,
MAX(t.timestamp) as lastTransaction
FROM
transaction as t
GROUP BY
t.customerId
) as u
WHERE
u.customerId = c.id
AND u.customerId = t.customerId
AND SIGN(c.balance) <> SIGN(t.balance)
GROUP BY
u.customerId
) as p
WHERE
p.customerId = t.customerId
AND p.balanceSignSince < t.timestamp
GROUP BY
p.customerId;
Fiddle: http://sqlfiddle.com/#!9/bd0760/13
Original Answer
This should work to get the timestamp of a sign change:
SELECT
c.id as id,
MAX(t.timestamp) as balanceSignSince
FROM
transaction as t,
customer as c
WHERE
t.customerId = c.id
AND SIGN(t.balance) <> SIGN(c.balance)
This needs to be executed before the customer table is updated with the new balance. If you have a trigger on transation:insert you should probably put the above into the query updating the customer table.

Cumulative sum on time interval

Following is users table
---------------------------------------
| uid | reg_date |
---------------------------------------
| 1 | 2011-07-20 02:24:36 |
---------------------------------------
| 2 | 2012-10-03 07:37:43 |
---------------------------------------
| ... | ... ... ... ... ... |
---------------------------------------
| 300000 | 2015-12-19 04:13:51 |
---------------------------------------
I want to get last 1 year from curdate() data by month basis from this table.
I have tried following query.
SELECT month,
#cnt := #cnt + total cum_sum
FROM (
SELECT MONTH(reg_date) month,
COUNT(*) total
FROM users
WHERE reg_date >= CURDATE() - INTERVAL 1 YEAR
GROUP BY YEAR(reg_date), MONTH(reg_date)
) n, (SELECT #cnt := 0) users_alias
but it generates last twelve months data as there were no data before that. But I want it to be start from actual cumulative count at that month. How can I achieve this? Thanks.
UPDATE
desired output
-----------------------
| month | cum_sum |
-----------------------
| 10 | 1000 |
-----------------------
| 11 | 1500 |
-----------------------
| 12 | 2550 |
-----------------------
| 1 | 9700 |
-----------------------
| 2 | 11000 |
-----------------------
| 3 | 14000 |
-----------------------
| 4 | 15700 |
-----------------------
| 5 | 20000 |
-----------------------
| 6 | 22000 |
-----------------------
| 7 | 27000 |
-----------------------
| 8 | 31000 |
-----------------------
| 9 | 35000 |
-----------------------
| 10 | 41000 |
-----------------------

You initialize #cnt variable to 0 at the end of the sql statement:
(SELECT #cnt := 0) users_alias
You need to change this to initialize the counter to the number of users registered before:
(SELECT #cnt := count(*) from users
where regdate<CURDATE() - INTERVAL 1 YEAR) users_alias

MySQL - Time difference between rows of the same type

I have a table with a list of agent_ids, a previous_status, a new status, and a time stamp. I'm trying to determine the time difference between each status change, by agent, in order to determine how long an agent was active in a particular status.
For example:
+------+--------------+--------------+----------------+----------------------+
| id | agent_id | old_status | new_status | date_time |
+----------------------------------------------------------------------------+
| 1 | 1 | offline | online | 2015-06-11 09:00:01 |
| 2 | 1 | online | busy | 2015-06-11 09:30:23 |
| 3 | 3 | offline | online | 2015-06-11 09:31:27 |
| 4 | 1 | busy | offline | 2015-06-11 09:31:45 |
| 5 | 3 | online | offline | 2015-06-11 09:32:10 |
+----------------------------------------------------------------------------+
The goal would be to create a new result table with a time_difference column,
and the time_difference column for row 5 for example, should be 43 seconds, which is the difference between row 5 (the most recent status for agent_id 3) and row 3, the previous status for agent_id 3. Likewise, the time_difference for row 4 should be difference between row 4 and row 2.

You can do something along the lines of
SELECT id, agent_id, old_status, new_status, date_time, seconds
FROM
(
SELECT id, agent_id, old_status, new_status, date_time,
IF(#a = agent_id, TIMESTAMPDIFF(SECOND, #p, date_time), NULL) seconds,
#a := agent_id, #p := date_time
FROM table1 t CROSS JOIN (SELECT #p := NULL, #a := NULL) i
ORDER BY agent_id, id
) q
Output:
+------+----------+------------+------------+---------------------+---------+
| id | agent_id | old_status | new_status | date_time | seconds |
+------+----------+------------+------------+---------------------+---------+
| 1 | 1 | offline | online | 2015-06-11 09:00:01 | NULL |
| 2 | 1 | online | busy | 2015-06-11 09:30:23 | 1822 |
| 4 | 1 | busy | offline | 2015-06-11 09:31:45 | 82 |
| 3 | 3 | offline | online | 2015-06-11 09:31:27 | NULL |
| 5 | 3 | online | offline | 2015-06-11 09:32:10 | 43 |
+------+----------+------------+------------+---------------------+---------+
Here is a SQLFiddle demo

You can approach this without variables, using a correlated subquery:
select t.*,
timestampdiff(second, t.date_time, t.next_date_time) as secs
from (select t.*,
(select t2.date_time
from table t2
where t2.agent_id = t.agent_id and
t2.date_time > t.date_time
order by t2.date_time
limit 1
) as next_date_time
from table t
) t

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

average of duplicate data from one table in mysql - mysql

Related

Cumulative sum as dependant subquery

Accumulated formula in mysql

Get last balance sign change in (My)SQL

Cumulative sum on time interval

MySQL - Time difference between rows of the same type

Categories

Resources