Consider I have the following rows in the table
| id | user_id | amount | date |
------------------------------------------------
| 1 | 1 | 100 | 2019-09-30 |
------------------------------------------------
| 2 | 2 | 100 | 2019-09-30 |
------------------------------------------------
| 3 | 1 | 100 | 2019-09-30 |
------------------------------------------------
| 4 | 3 | 100 | 2019-10-01 |
------------------------------------------------
| 5 | 1 | 75 | 2019-10-01 |
------------------------------------------------
| 6 | 3 | 100 | 2019-10-01 |
------------------------------------------------
| 7 | 1 | 35 | 2019-10-01 |
------------------------------------------------
I am trying find a way to get all the rows with user_id = 1 where the sum(amount) < 300 and date <= '2019-10-01'.
What I am trying to do is to only process records that meet a certain threshold sum. I am not quite sure where to start.
Expected Result
| id | user_id | amount | date |
------------------------------------------------
| 1 | 1 | 100 | 2019-09-30 |
------------------------------------------------
| 3 | 1 | 100 | 2019-09-30 |
------------------------------------------------
| 5 | 1 | 75 | 2019-10-01 |
------------------------------------------------
Here is what I have tried so far
SELECT id, SUM(amount) as total_sum
FROM table
WHERE date <= '2019-10-01' AND user_id = 1
ORDER BY date ASC
HAVING total_sum <= 300
I don't get the desired output based on the above query.
MySQL Version currently using: 5.7.25
I did look at this question MySQL select records with sum greater than threshold assuming they are trying to do the same thing, but this isn't what I am looking at
It is a Rolling Sum problem. In MySQL 8.0.2 and above, you can solve this using Window functions with Frames. In older versions, we can do the same using User-defined Session variables.
We first calculate the rolling sum using Session variables.
Then, use the result-set in a Derived table, and find the id where total sum crosses the "barrier" of 300. Barrier is reached when the New rolling Sum is greater than 300. We set the barrier value to 1 at this point, 0 for rows before it, and 2 and more, for the rows afterwards.
We will only consider the rows where barrier is 0.
Try (works for all MySQL versions):
Query #1
SELECT dt.id,
dt.user_id,
dt.amount,
dt.date
FROM
(
SELECT
t.id,
t.user_id,
t.amount,
t.date,
#barrier := CASE
WHEN
(#tot_qty := #tot_qty + t.amount) > 300
THEN (#barrier + 1)
ELSE 0
END AS barrier
FROM
your_table AS t
CROSS JOIN (SELECT #tot_qty := 0,
#barrier := 0) AS user_init
WHERE t.user_id = 1
AND t.date <= '2019-10-01'
ORDER BY t.user_id, t.date, t.id
) AS dt
WHERE dt.barrier = 0
ORDER BY dt.user_id, dt.date, dt.id;
Result
| id | user_id | amount | date |
| --- | ------- | ------ | ---------- |
| 1 | 1 | 100 | 2019-09-30 |
| 3 | 1 | 100 | 2019-09-30 |
| 5 | 1 | 75 | 2019-10-01 |
View on DB Fiddle
If you don't like to use Session Variables (some experienced SO users dislike them vehemently), you can utilize a technique based on "Self-Join" and then use GROUP BY with HAVING to filter out.
General idea is that we left join to get previous rows for the specific user_id, and then aggregate to get the rolling sum, and then filtering using Having clause.
Query
SELECT
t1.*
FROM
your_table AS t1
LEFT JOIN your_table AS t2
ON t2.user_id = t1.user_id
AND t2.date <= t1.date
AND t2.id <= t1.id
WHERE t1.user_id = 1
AND t1.date <= '2019-10-31'
GROUP BY t1.user_id, t1.date, t1.id, t1.amount
HAVING COALESCE(SUM(t2.amount),0) < 300;
Result
| id | user_id | amount | date |
| --- | ------- | ------ | ---------- |
| 1 | 1 | 100 | 2019-09-30 |
| 3 | 1 | 100 | 2019-09-30 |
| 5 | 1 | 75 | 2019-10-01 |
View on DB Fiddle
You can benchmark both the approaches and decide which one is suitable.
For this query, you will need the composite index: (user_id, date)
Related
I have a table like this:
created_date | id | status | completed_date
2019-03-20 | 1 | Open |
2019-03-20 | 2 | Open |
2019-03-19 | 3 | Comp | 2019-03-21
2019-03-21 | 4 | Comp | 2019-03-22
2019-03-22 | 5 | Comp | 2019-03-22
2019-03-18 | 6 | Open |
I want to find count of all the IDs that were created before '2019-03-21' and had a status of 'Open' OR they were created before '2019-03-21' and even had a 'Comp' status but they were completed after '2019-03-21'.
Below is the query I have:
SELECT
CAST(CREATED AS DATE),
COUNT(DISTINCT id)
FROM testtable
WHERE
(CAST(CREATED AS DATE) <= '2019-03-21' AND status = 'Open')
OR (
CAST(CREATED AS DATE) <= '2019-03-21' AND status='Comp'
AND CAST(COMPLETED AS DATE) > '2019-03-21'
)
It gives the correct result. i.e., on 21st, 4 IDs were open. But now I want this information for the last 4 days. How do I modify this query to do that??
The output should be:
created_date | count(ID)
2019-03-21 | 4
2019-03-20 | 4
2019-03-19 | 2
2019-03-18 | 1
Please help!!
Here is a solution that works correctly with your sample data. It generates a list of dates using the distinct values that can be found in column created_date, and then LEFT JOINs it with the table. The JOIN conditions carry the logic your described. It seems to me like you do not need to check the status of the records, since, apparently, any record that has a non-NULL completed_date is in status Comp.
SELECT
dt.created_date,
COUNT(t.id)
FROM
(SELECT DISTINCT created_date FROM mytable) dt
LEFT JOIN mytable t
ON t.created_date <= dt.created_date
AND (t.completed_date IS NULL OR t.completed_date > dt.created_date)
GROUP BY dt.created_date
This Demo on DB Fiddle with your sample data returns:
| created_date | COUNT(t.id) |
| ------------ | ----------- |
| 2019-03-18 | 1 |
| 2019-03-19 | 2 |
| 2019-03-20 | 4 |
| 2019-03-21 | 4 |
| 2019-03-22 | 3 |
I have a table called transactions which contains sellers and their transactions: sale, waste, and whenever they receive products. The structure is essentially as follows:
seller_id transaction_date quantity reason product unit_price
--------- ---------------- -------- ------ ------- ----------
1 2018-01-01 10 import 1 100.0
1 2018-01-01 -5 sale 1 100.0
1 2018-01-01 -1 waste 1 100.0
2 2018-01-01 -3 sale 4 95.5
I need a daily summary of each seller, including the value of their sales, waste and starting inventory. The problem is, the starting inventory is a cumulative sum of quantities up until the given day (the imports at the given day is also included). I have the following query:
SELECT
t.seller_id,
t.transaction_date,
t.SUM(quantity * unit_price) as amount,
t.reason as reason,
(
SELECT SUM(unit_price * quantity) FROM transactions
WHERE seller_id = t.seller_id
AND (transaction_date <= t.transaction_date)
AND (
transaction_date < t.transaction_date
OR reason = 'import'
)
) as opening_balance
FROM transactions t
GROUP BY
t.transaction_date,
t.seller_id
t.reason
The query works and I get the desired results. However, even after creating indices for both the outer and the subquery, it takes way too much time (about 30 seconds), because the opening_balance query is a dependant subquery which is calculated for each row over and over again.
How can i optimize, or rewrite this query?
Edit: the subquery had a small bug with a missing WHERE condition, i fixed it, but the essence of the question is the same. I created a fiddle with example data to play around:
https://www.db-fiddle.com/f/ma7MhufseHxEXLfxhCtGbZ/2
Following approach utilizing User-defined variables can be more performant than using the Correlated Subquery. In your case, a temp variable was used to account for the calculation logic, which also get outputted. You can ignore that.
You can try the following query (can add more explanation if needed):
Query
SELECT dt.reason,
dt.amount,
#bal := CASE
WHEN dt.reason = 'import'
AND #sid <> dt.seller_id THEN dt.amount
WHEN dt.reason = 'import' THEN #bal + #temp + dt.amount
WHEN #sid = 0
OR #sid = dt.seller_id THEN #bal
ELSE 0
end AS opening_balance,
#temp := CASE
WHEN dt.reason <> 'import'
AND #sid = dt.seller_id
AND #td = dt.transaction_date THEN #temp + dt.amount
ELSE 0
end AS temp,
#sid := dt.seller_id AS seller_id,
#td := dt.transaction_date AS transaction_date
FROM (SELECT seller_id,
transaction_date,
reason,
Sum(quantity * unit_price) AS amount
FROM transactions
WHERE seller_id IS NOT NULL
GROUP BY seller_id,
transaction_date,
reason
ORDER BY seller_id,
transaction_date,
Field(reason, 'import', 'sale', 'waste')) AS dt
CROSS JOIN (SELECT #sid := 0,
#td := '',
#bal := 0,
#temp := 0) AS user_vars;
Result (note that I have ordered by seller_id first and then transaction_date)
| reason | amount | opening_balance | temp | seller_id | transaction_date |
| ------ | ------ | --------------- | ----- | --------- | ---------------- |
| import | 1250 | 1250 | 0 | 1 | 2018-12-01 |
| sale | -850 | 1250 | -850 | 1 | 2018-12-01 |
| waste | -100 | 1250 | -950 | 1 | 2018-12-01 |
| import | 950 | 1250 | 0 | 1 | 2018-12-02 |
| sale | -650 | 1250 | -650 | 1 | 2018-12-02 |
| waste | -450 | 1250 | -1100 | 1 | 2018-12-02 |
| import | 2000 | 2000 | 0 | 2 | 2018-12-01 |
| sale | -1200 | 2000 | -1200 | 2 | 2018-12-01 |
| waste | -250 | 2000 | -1450 | 2 | 2018-12-01 |
| import | 750 | 1300 | 0 | 2 | 2018-12-02 |
| sale | -600 | 1300 | -600 | 2 | 2018-12-02 |
| waste | -450 | 1300 | -1050 | 2 | 2018-12-02 |
View on DB Fiddle
do thing something like this ?
SELECT s.* ,#balance:=#balance+(s.quantity*s.unit_price) AS opening_balance FROM (
SELECT t.* FROM transactions t
ORDER BY t.seller_id,t.transaction_date,t.reason
) s
CROSS JOIN ( SELECT #balance:=0) AS INIT
GROUP BY s.transaction_date, s.seller_id, s.reason;
SAMPLE
MariaDB [test]> select * from transactions;
+----+-----------+------------------+----------+------------+--------+
| id | seller_id | transaction_date | quantity | unit_price | reason |
+----+-----------+------------------+----------+------------+--------+
| 1 | 1 | 2018-01-01 | 10 | 100 | import |
| 2 | 1 | 2018-01-01 | -5 | 100 | sale |
| 3 | 1 | 2018-01-01 | -1 | 100 | waste |
| 4 | 2 | 2018-01-01 | -3 | 99.5 | sale |
+----+-----------+------------------+----------+------------+--------+
4 rows in set (0.000 sec)
MariaDB [test]> SELECT s.* ,#balance:=#balance+(s.quantity*s.unit_price) AS opening_balance FROM (
-> SELECT t.* FROM transactions t
-> ORDER BY t.seller_id,t.transaction_date,t.reason
-> ) s
-> CROSS JOIN ( SELECT #balance:=0) AS INIT
-> GROUP BY s.transaction_date, s.seller_id, s.reason;
+----+-----------+------------------+----------+------------+--------+-----------------+
| id | seller_id | transaction_date | quantity | unit_price | reason | opening_balance |
+----+-----------+------------------+----------+------------+--------+-----------------+
| 1 | 1 | 2018-01-01 | 10 | 100 | import | 1000 |
| 2 | 1 | 2018-01-01 | -5 | 100 | sale | 500 |
| 3 | 1 | 2018-01-01 | -1 | 100 | waste | 400 |
| 4 | 2 | 2018-01-01 | -3 | 99.5 | sale | 101.5 |
+----+-----------+------------------+----------+------------+--------+-----------------+
4 rows in set (0.001 sec)
MariaDB [test]>
SELECT
t.seller_id,
t.transaction_date,
SUM(quantity) as amount,
t.reason as reason,
quantityImport
FROM transaction t
inner join
(
select sum(ifnull(quantityImport,0)) quantityImport,p.transaction_date,p.seller_id from
( /* subquery get all the date and seller distinct row */
select transaction_date ,seller_id ,reason
from transaction
group by seller_id, transaction_date
)
as p
left join
( /* subquery get all the date and seller and the import quantity */
select sum(quantity) quantityImport,transaction_date ,seller_id
from transaction
where reason='Import'
group by seller_id, transaction_date
) as n
on
p.seller_id=n.seller_id
and
p.transaction_date>=n.transaction_date
group by
p.seller_id,p.transaction_date
) as q
where
t.seller_id=q.seller_id
and
t.transaction_date=q.transaction_date
GROUP BY
t.transaction_date,
t.seller_id,
t.reason;
I have a similar table:
+----+--------+--------+------------+-----------+
| id | amount | rif_id | date | is_closed |
+----+--------+--------+------------+-----------+
| 1 | 20 | NULL | 2017-11-12 | 1 |
| 2 | -5 | 1 | 2017-11-13 | NULL |
| 3 | -10 | 1 | 2017-11-24 | NULL |
| 4 | 7 | NULL | 2017-11-25 | 0 |
| 5 | -5 | 1 | 2017-11-26 | NULL |
| 6 | -5 | 4 | 2017-11-28 | NULL |
| 7 | 11.20 | NULL | 2017-11-30 | 0 |
+----+--------+--------+------------+-----------+
I need to get a list of more recent ID where SUM of amount is equal to zero, refering with rif_id.
In my example, I need to get this result:
+----+--------+--------+------------+-----------+
| id | amount | rif_id | date | is_closed |
+----+--------+--------+------------+-----------+
| 5 | -5 | 1 | 2017-11-26 | NULL |
+----+--------+--------+------------+-----------+
I need the ID 5 because that -5 amount, with sum of ID 1, 2 and 3, is exactly zero.
FYI, The column "is_closed" is updated with 1 when is inserted last transaction inside the table. In effect, when is inserted the ID 5, my script calculate and update the ID 1 with "is_closed" 1. Don't know if can be important for the SQL SELECT create.
I think you can do something like this:
get all records with the same ref_id and has id < the current record
Filter only the records with acumulated value equals 0. sum(t2.amount) = 0
Sort by the id and get the first one. Order by id desc Limit 1
Select
t1.id,
t1.amount,
t1.rif_id,
t1.date,
t1.is_closed
From
table t1
left join table t2 on t2.ref_id = t1.ref_id and t2.id < t1.id
Group by
t1.id,
t1.amount,
t1.rif_id,
t1.date,
t1.is_closed
Having
sum(t2.amount) = 0
Order by
id desc
Limit 1
I have a Transaction table that records every amount added to or subtracted from the balance of a Customer, with the new balance:
+----+------------+------------+--------+---------+
| id | customerId | timestamp | amount | balance |
+----+------------+------------+--------+---------+
| 1 | 1 | 1000000001 | 10 | 10 |
| 2 | 1 | 1000000002 | -20 | -10 |
| 3 | 1 | 1000000003 | -10 | -20 |
| 4 | 2 | 1000000004 | -5 | -5 |
| 5 | 2 | 1000000005 | -5 | -10 |
| 6 | 2 | 1000000006 | 10 | 0 |
| 7 | 3 | 1000000007 | -5 | -5 |
| 8 | 3 | 1000000008 | 10 | 5 |
| 9 | 3 | 1000000009 | 10 | 15 |
| 10 | 4 | 1000000010 | 5 | 5 |
+----+------------+------------+--------+---------+
The Customer table stores the current balance, and looks like:
+----+---------+
| id | balance |
+----+---------+
| 1 | -20 |
| 2 | 0 |
| 3 | 15 |
| 4 | 5 |
+----+---------+
I would like to add a balanceSignSince column, that would store the timestamp at which the balance sign last changed. Transitioning to and from positive, negative, or zero counts as a balance change.
After the update, based on the above data, the Customer table should contain:
+----+---------+------------------+
| id | balance | balanceSignSince |
+----+---------+------------------+
| 1 | -20 | 1000000002 |
| 2 | 0 | 1000000006 |
| 3 | 15 | 1000000008 |
| 4 | 5 | 1000000010 |
+----+---------+------------------+
How can I write a SQL query that updates every Customer with the last time the balance sign changed, based on the Transaction table?
I suspect I can't do this without a quite complex stored procedure, but am curious to see if any clever ideas come up.
This uses a simulated rank() function.
select customerId, min(tstamp) from
(
select tstamp,
if (#cust = customerId and sign(#bal) = sign(balance), #rn := #rn,
if (#cust = customerId and sign(#bal) <> sign(balance), #rn := #rn + 1, #rn := 0)) as rn,
#cust := customerId as customerId, #bal := balance as balance
from
(select #rn := 0) x,
(select id, #cust := customerId as customerId, tstamp, amount, #bal := balance as balance
from trans order by customerId, tstamp desc) y
) z
where rn = 0
group by customerId;
Check it: http://rextester.com/XJVKK61181
This script returns a table like this:
+------------+----+------------+---------+
| tstamp | rn | customerId | balance |
+------------+----+------------+---------+
| 1000000003 | 0 | 1 | -20 |
| 1000000002 | 0 | 1 | -10 |
| 1000000001 | 1 | 1 | 10 |
| 1000000006 | 0 | 2 | 0 |
| 1000000005 | 2 | 2 | -10 |
| 1000000004 | 2 | 2 | -5 |
| 1000000009 | 0 | 3 | 15 |
| 1000000008 | 2 | 3 | 5 |
| 1000000007 | 3 | 3 | -5 |
| 1000000010 | 0 | 4 | 5 |
+------------+----+------------+---------+
Then selecting min(timestamp) of files where rn = 0:
+------------+-------------+
| customerId | min(tstamp) |
+------------+-------------+
| 1 | 1000000002 |
+------------+-------------+
| 2 | 1000000006 |
+------------+-------------+
| 3 | 1000000009 |
+------------+-------------+
| 4 | 1000000010 |
+------------+-------------+
Updated answer with the restriction that this needs to work on the existing data
The following query should work for most cases, there is still an issue with customers having only a single transaction or no sign change. As this is a one time update, I would run the query below and then do a simple update for all users not having a timestamp set, for them it's going to be the timestamp of the first transaction:
# Find the smallest timestamp, e.g. the
# transaction which changed the signum.
SELECT
p.customerId as customerId,
MIN(t.timestamp) as balanceSignSince
FROM
transaction as t,
(
# find the latest timestamp having
# a different sign for each user.
# Here is the issue with users having
# only a single transaction or no sign
# changes.
SELECT
u.customerId as customerId,
MAX(t.timestamp) as balanceSignSince
FROM
transaction as t,
customer as c,
(
# find the timestamp of the very last
# transaction for every user.
SELECT
t.customerId as customerId,
MAX(t.timestamp) as lastTransaction
FROM
transaction as t
GROUP BY
t.customerId
) as u
WHERE
u.customerId = c.id
AND u.customerId = t.customerId
AND SIGN(c.balance) <> SIGN(t.balance)
GROUP BY
u.customerId
) as p
WHERE
p.customerId = t.customerId
AND p.balanceSignSince < t.timestamp
GROUP BY
p.customerId;
Fiddle: http://sqlfiddle.com/#!9/bd0760/13
Original Answer
This should work to get the timestamp of a sign change:
SELECT
c.id as id,
MAX(t.timestamp) as balanceSignSince
FROM
transaction as t,
customer as c
WHERE
t.customerId = c.id
AND SIGN(t.balance) <> SIGN(c.balance)
This needs to be executed before the customer table is updated with the new balance. If you have a trigger on transation:insert you should probably put the above into the query updating the customer table.
I have three tables with schema as below:
Table: Apps
| ID (bigint) | USERID (Bigint)| START_TIME (datetime) |
-------------------------------------------------------------
| 1 | 13 | 2013-05-03 04:42:55 |
| 2 | 13 | 2013-05-12 06:22:45 |
| 3 | 13 | 2013-06-12 08:44:24 |
| 4 | 13 | 2013-06-24 04:20:56 |
| 5 | 13 | 2013-06-26 08:20:26 |
| 6 | 13 | 2013-09-12 05:48:27 |
Table: Hosts
| ID (bigint) | APPID (Bigint)| DEVICE_ID (Bigint) |
-------------------------------------------------------------
| 1 | 1 | 1 |
| 2 | 2 | 1 |
| 3 | 1 | 1 |
| 4 | 3 | 3 |
| 5 | 1 | 4 |
| 6 | 2 | 3 |
Table: Usage
| ID (bigint) | APPID (Bigint)| HOSTID (Bigint) | Factor (varchar) |
-------------------------------------------------------------------------------------
| 1 | 1 | 1 | Low |
| 2 | 1 | 3 | High |
| 3 | 2 | 2 | Low |
| 4 | 3 | 4 | Medium |
| 5 | 1 | 5 | Low |
| 6 | 2 | 2 | Medium |
Now if put is userid, i want to get the count of rows of table rows for each month (of all app) for each "Factor" month wise for the last 6 months.
If a DEVICE_ID appears more than once in a month (based on START_TIME, based on joining Apps and Hosts), only the latest rows of Usage (based on combination of Apps, Hosts and Usage) be considered for calculating count.
Example output of the query for the above example should be: (for input user id=13)
| MONTH | USAGE_COUNT | FACTOR |
-------------------------------------------------------------
| 5 | 0 | High |
| 6 | 0 | High |
| 7 | 0 | High |
| 8 | 0 | High |
| 9 | 0 | High |
| 10 | 0 | High |
| 5 | 2 | Low |
| 6 | 0 | Low |
| 7 | 0 | Low |
| 8 | 0 | Low |
| 9 | 0 | Low |
| 10 | 0 | Low |
| 5 | 1 | Medium |
| 6 | 1 | Medium |
| 7 | 0 | Medium |
| 8 | 0 | Medium |
| 9 | 0 | Medium |
| 10 | 0 | Medium |
How is this calculated?
For Month May 2013 (05-2013), there are two Apps from table Apps
In table Hosts , these apps are associated with device_id's 1,1,1,4,3
For this month (05-2013) for device_id=1, the latest value of start_time is: 2013-05-12 06:22:45 (from tables hosts,apps), so in table Usage, look for combination of appid=2&hostid=2 for which there are two rows one with factor Low and other Medium,
For this month (05-2013) for device_id=4, by following same procedure we get one entry i.e 0 Low
Similarly all the values are calculated.
To get the last 6 months via query i'm trying to get it with the following:
SELECT MONTH(DATE_ADD(NOW(), INTERVAL aInt MONTH)) AS aMonth
FROM
(
SELECT 0 AS aInt UNION SELECT -1 UNION SELECT -2 UNION SELECT -3 UNION SELECT -4 UNION SELECT -5
)
Please check sqlfiddle: http://sqlfiddle.com/#!2/55fc2
Because the calculation you're doing involves the same join multiple times, I started by creating a view.
CREATE VIEW `app_host_usage`
AS
SELECT a.id "appid", h.id "hostid", u.id "usageid",
a.userid, a.start_time, h.device_id, u.factor
FROM apps a
LEFT OUTER JOIN hosts h ON h.appid = a.id
LEFT OUTER JOIN `usage` u ON u.appid = a.id AND u.hostid = h.id
WHERE a.start_time > DATE_ADD(NOW(), INTERVAL -7 MONTH)
The WHERE condition is there because I made the assumption that you don't want July 2005 and July 2006 to be grouped together in the same count.
With that view in place, the query becomes
SELECT months.Month, COUNT(DISTINCT device_id), factors.factor
FROM
(
-- Get the last six months
SELECT (MONTH(NOW()) + aInt + 11) % 12 + 1 "Month" FROM
(SELECT 0 AS aInt UNION SELECT -1 UNION SELECT -2 UNION SELECT -3 UNION SELECT -4 UNION SELECT -5) LastSix
) months
JOIN
(
-- Get all known factors
SELECT DISTINCT factor FROM `usage`
) factors
LEFT OUTER JOIN
(
-- Get factors for each device...
SELECT
MONTH(start_time) "Month",
device_id,
factor
FROM app_host_usage a
WHERE userid=13
AND start_time IN (
-- ...where the corresponding usage row is connected
-- to an app row with the highest start time of the
-- month for that device.
SELECT MAX(start_time)
FROM app_host_usage a2
WHERE a2.device_id = a.device_id
GROUP BY MONTH(start_time)
)
GROUP BY MONTH(start_time), device_id, factor
) usageids ON usageids.Month = months.Month
AND usageids.factor = factors.factor
GROUP BY factors.factor, months.Month
ORDER BY factors.factor, months.Month
which is insanely complicated, but I've tried to comment explaining what each part does. See this sqlfiddle: http://sqlfiddle.com/#!2/5c871/1/0