MySQL SUM previous row by date column using Union - mysql

I am hoping I am just stumped because its the end of the work day on a Monday, and someone here can give me a hand.
Basically I have 2 tables that have invoice information and a table that has payment information. Using the following I get the first part of my display.
SELECT d.id, i.id as invid, i.company_id, d.total, created, adjustment FROM tbl_finance_invoices as i
LEFT JOIN tbl_finance_invoice_details as d ON d.invoice_id = i.id
WHERE company_id = '69350'
UNION
SELECT id, 0, comp_id, amount_paid, uploaded_date, 'paid' FROM tbl_finance_invoice_paid_items
WHERE comp_id = '69350'
ORDER BY created
What I want to do is:
Create a new column called "Balance" that adds total to the previous total by the created column regardless of how the rest of the table is sorted.
To give a quick example, my current output is something like:
id | invid | company_id | total | created | adjustment
12 | 16 | 1 | 40 | 01/01/16| 0
100| 0 | 1 | 10 | 01/05/16| 0
50 | 20 | 1 | 50 | 05/01/16| 0
What my goal is would be:
id | invid | company_id | total | created | adjustment | balance |Notes
12 | 16 | 1 | 40 | 01/01/16| 0 | 40 | 0 + 40
100| 0 | 1 | 10 | 01/05/16| 1 | 50 | 40 + 10
50 | 20 | 1 | 50 | 05/01/16| 0 | 100 | 50 + 50
And regardless of sorting by id, invid, total, created, etc, the balance would always be tied to the created date.
So if I added a "Where adjustment = '1'" to my sql, I would get:
100| 0 | 1 | 10 | 01/05/16| 1 | 50 | 40 + 10

Since the OP confirmed my understanding in comments, I'm basing my answer on the following assumption:
The running total would be tied to the order of created_date. The
running total would only be affected by company id as a filtering
criterion, all other filters should be disregarded for that
calculation.
Since the running total may have a different order by and filtering criteria than the rest of the query, therefore the running total calculation has to be placed in a subquery.
The other assumption I have to make is that there cannot be more than one invoice with the same created date for a single customer id, since the original query in the OP does not have any group by or summing either.
I prefer to use the approach suggested by #OMG Ponies in this post on SO, where he initiates the mysql variable holding the running total in a subquery, thus there is no need to initialize the variable in a separate set statement.
SELECT d.id, i.id as invid, i.company_id, rt.total, rt.cumulative_sum, rt.created, adjustment
FROM tbl_finance_invoices as i
LEFT JOIN tbl_finance_invoice_details as d ON d.invoice_id = i.id
LEFT JOIN
(SELECT d.total, created, #running_total := #running_total + t.count AS cumulative_sum
FROM tbl_finance_invoices as i
LEFT JOIN tbl_finance_invoice_details as d ON d.invoice_id = i.id
JOIN (SELECT #running_total := 0) r -- no join condition, so this produces a carthesian join
WHERE company_id = '69350'
ORDER BY created) rt
ON i.created=rt.created --this is also an assumption, I do not know which original table holds the created field
WHERE company_id = '69350' and adjustment=1
ORDER BY d.id
If you need to take the amounts from the tbl_finance_invoice_paid_items into account as well, then you need to add that to the subquery.

Related

Combine multiple table and use Group By Function in MYSQL

I have 5 different datasets from 5 different tables.. From those 5 different tables I have taken below group by data..
select number,count(*) as total from tb01 group by number limit 5;
select number,count(*) as total from tb02 group by number limit 5;
Like that I can retrieve 5 different datasets. Here is an example.
+-----------+-------+
| number | total |
+-----------+-------+
| 114000259 | 1 |
| 114000400 | 1 |
| 114000686 | 1 |
| 114000858 | 1 |
| 114003895 | 1 |
+-----------+-------+
Now I need to combine those 5 different tables such as below tabular format.
+-----------+-------+-------+-------+
| number | tb01 | tb02 | tb03 |
+-----------+-------+-------+-------+
| 114000259 | 1 | 2 | 1 |
| 114000400 | 1 | 0 | 1 |
| 114000686 | 1 | 3 | 1 |
| 114000858 | 1 | 1 | 5 |
| 114003895 | 1 | 0 | 1 |
+-----------+-------+-------+-------+
Can someone help me to combine those 5 grouped data sets and get the union as above.
Note: I dont need the header as same as table names..these headers can be anything
Further I dont need to limit 5, above is to get a sample of 5 data only. I have a large dataset.
It's a job for JOINs and subqueries. My answer will consider three tables. It should be obvious how to expand it to five.
Your first subquery: get all possible numbers.
SELECT number FROM tb01 UNION
SELECT number FROM tb02 UNION
SELECT number FROM tb03
Then you have a subquery for each table to get the count.
SELECT number, COUNT(*) AS total
FROM tb02 GROUP BY number
Then you LEFT JOIN everything and SELECT from that.
SELECT numbers.number,
tb01.total tb01,
tb02.total tb02,
tb03.total tb03
FROM (
SELECT number FROM tb01 UNION
SELECT number FROM tb02 UNION
SELECT number FROM tb03
) numbers
LEFT JOIN (
SELECT number, COUNT(*) AS total
FROM tb01 GROUP BY number
) tb01 ON numbers.number = tb01.number
LEFT JOIN (
SELECT number, COUNT(*) AS total
FROM tb02 GROUP BY number
) tb02 ON numbers.number = tb02.number
LEFT JOIN (
SELECT number, COUNT(*) AS total
FROM tb03 GROUP BY number
) tb03 ON numbers.number = tb01.number
You can add ORDER BY and LIMIT clauses to that overall query as necessary.
The first subquery together with the LEFT JOIN ensures that you get results even if some of your tables are missing number rows. (Some DBMSs have FULL OUTER JOIN, but MySQL does not.)
Pro tip: If you use LIMIT without ORDER BY, you get an unpredictable subset of your rows. Unpredictable is worse than random, because you get the same subset in testing with small tables, but when your tables grow you may start getting different subsets. You'll never catch the problem in unit testing. LIMIT without ORDER BY is a serious error.

How can I calculate prices based on currency table in one select?

I have a table of invoices that can be in multiple currencies that looks like this:
| id | issue_date | total | currency |
|----|------------|-------|----------|
| 1 | 2020-04-20 | 1234 | EUR |
| 2 | 2020-04-26 | 2345 | USD |
| 1 | 2020-04-27 | 9876 | EUR |
| 3 | 2020-04-28 | 3456 | RON |
And i have a currency table that holds currency exchange rates that looks like this:
| id | date | currency_id | rate |
|----|------------|-------------|---------|
| 1 | 2020-04-20 | EUR | 1 |
| 2 | 2020-04-20 | USD | 1.08600 |
| 3 | 2020-04-20 | RON | 4.83560 |
What I would like to achieve is to calculate each invoice price based on its issue_date, currency and a target currency.
All currency exchange rates are based on EUR so its value will always be 1. Currencies are updated daily but there are dates missing (during weekend exchange rates don't update) so calculation needs to be based on most recent exchange rate until invoice.issue_date
So what I tried was this:
SELECT
`i`.`id`,
`i`.`total`,
`i`.`currency`,
`exr1`.`rate` as `invoice_rate`,
`exr2`.`rate` AS `target_rate`,
`i`.`total` * `exr1`.`rate` as `euro_price`,
`i`.`total` * `exr1`.`rate` / `exr2`.`rate` AS `target_price`
FROM `invoices` as `i`
LEFT JOIN `exchange_rates` AS `exr1`
ON
`exr1`.`currency_id` = `i`.`currency` AND
`exr1`.`date` = `i`.`issue_date`
LEFT JOIN `exchange_rates` as `exr2`
ON
`exr2`.`currency_id` = 'RON' AND
`exr2`.`date` = `i`.`issue_date`
GROUP BY
`i`.`id`,
`invoice_rate`,
`target_rate`
ORDER BY `i`.`issue_date` DESC
Problem nr. 1
Because there are no exhange rates for the exact invoice dates I get null values. I tried changing the LEFT JOIN ON to something like exr1.date <= i.issue_date but GROUP BY invoice doesn't work anymore (i get duplicates).
Problem nr. 2
For rows that have exchange rates on that exact day I get wrong values because based on the target currency I need to either multiply or divide:
i.total * exr1.rate * exr2.rate AS usd_price vs i.total * exr1.rate / exr2.rate AS usd_price
https://www.db-fiddle.com/f/e5GnVnry5sAiXwbuScV6JT/19
This is a (rare) case where a dependent subquery is the way to go. Here's the overall query (https://www.db-fiddle.com/f/e5GnVnry5sAiXwbuScV6JT/21)
SELECT id,
total,
currency,
rate,
total / rate euro_price
FROM ( SELECT i.id,
i.total,
i.currency,
(SELECT e.rate
FROM exchange_rates e
WHERE e.currency_id = i.currency
AND e.date <= i.issue_date
ORDER BY e.date DESC
LIMIT 1) rate
FROM invoices i
) d
The dependent subquery is this:
SELECT e.rate
FROM exchange_rates e
WHERE e.currency_id = i.currency
AND e.date <= i.issue_date
ORDER BY e.date DESC
LIMIT 1
It finds the exchange rate for the most recent date equal to or before the issue_date. It's called dependent because it refers to column values in its outer query.
This isn't going to be fast. A covering index on exchange_rates(currency_id, date DESC, rate) will help. Like this.
CREATE INDEX lookup ON exchange_rates(currency_id, date DESC, rate);
I used a nested query so the outer query can simply refer to rate as a column when it needs to, rather than repeating the dependent subquery.
Also note I think you want to divide, not multiply, when computing your euro_price.
I left the second rate lookup to you.
**Pro tip* Only use the backtick marks when your table or column name is a reserved word in the query language. Your queries are MUCH easier to read without them.

MySQL - Retrieve the max value of an associated column within a LEFT JOIN with a different perimeter than the WHERE clause of the main query

I'm using MySql 5.6 and have a select query with a LEFT JOIN but i need to retrieve the max of a associated column email_nb) but with a different "perimeter" of constraints.
Let's take an example: let me state that it is a mere example with only 5 rows but it should work also when I have thousands... (I'm stating this since there is a LIMIT clause in my query)
Table 'query_results'
+-----------------------------+------------+--------------+
| query_result_id | query_id | author |
+-----------------------------+------------+--------------+
| 2 | 1 | john |
| 3 | 1 | eric |
| 7 | 3 | martha |
| 9 | 4 | john |
| 10 | 1 | john |
+-----------------------------+------------+--------------+
Table 'customers_emails'
+-------------------+-----------------+--------------+-----------+-------------+------------------------
| customer_email_id | query_result_id | customer_id | author | email_nb | days_since_sending
+-------------------+-----------------+--------------+-----------+-------------+------------------------
| 5 | 2 | 12 | john | 2 | 150
| 12 | 3 | 7 | eric | 4 | 90
| 27 | 3 | 12 | eric | 2 | 86
| 40 | 9 | 15 | john | 9 | 87
| 42 | 2 | 12 | john | 7 | 23
| 51 | 10 | 12 | john | 3 | 89
+-------------------+-----------------+--------------+-----------+-------------+-----------------------
Notes:
you can have a query_result where the author appears in NO row at all in any of the customers_emails, hence the LEFT JOIN I'm using.
You can see author is by design kind of duplicated as it's both on the first table and the second table each time associated with a query_result_id. It's important to note.
email_nb is an integer between 0 and 10
there is a LIMIT clause as I need to retrieve a set number of records
Today my query aims at retrieving query_results with a certain number of conditions on The specificity is that I make sure to retrieve query_results with an author who does not appear in any customer_email_id where the days_since_sending would be less than 60 days: it means i check these days_since_sending not only within the records for this query, but across all customers_emails thanks to the subquery NOT IN (see below).
This is my current query for customer_id = 12 and query_id = 1
SELECT
qr.query_result_id,
qr.author,
FROM
query_results qr
LEFT JOIN
customers_emails ce
ON
qr.author = ce.author
WHERE
qr.query_id = 1 AND
qr.author IS NOT NULL
AND qr.author NOT IN (
SELECT recipient
FROM customers_emails
WHERE
(
customer_id = 12 AND
( days_since_sending >= 60) )
)
)
# we don't take by coincidence/bad luck 2 query results with the same author
GROUP BY
qr.author
ORDER BY
qr.query_result_id ASC
LIMIT
20
This is the expected output:
+-----------------------------+------------+--------------+
| query_result_id | author | email_nb |
+-----------------------------+------------+--------------+
| 10 | john | 7 |
| 3 | eric | 2 |
+-----------------------------+------------+--------------+
My challenge/difficulty today:
Notice on the 2nd line Eric is tied to email_nb 2 and not the max of all Eric's emails which could have been 4 if we had taken the max of email_nb across ALL messages to author=eric. but we stay within the limit of customer_id = 12 so there's only one left with email_nb = 2
Also notice that on the first line, the email_nb associated with query_result = 10 is 7, and not 3, which could have been the case as 3 is what appears in table customers_emails on the last line.
Indeed for emails to 'john' i had the choice between email_nb 2, 7 and 3 but I take highest so it's 7 (even if this email is from more than 60 days ago !! This is very important and part of what I don't know how to do: the perimeters are different: today I retrieve all the query_results where the author has NOT been sent a email for the past 60 days (see the NOT IN subquery) BUT I need to have in the column the max email_nb sent to john by customer_id=12 and query_id=1 EVEN if it was sent more than 60 days ago so these are different perimeters...Don't really know how to do this...
It means in other words I don't want to find the max (email_nb) within the same WHERE clauses such as days_since_sending >= 60 or within the same LIMIT and GROUP BY...as my current query: what I neeed is to retrieve the maximum value of email_nb for customer_id=12 AND query_id=1 and sent to john across ALL records on the customers_emails table!
If there is no associated row on customers_emails at all (it means no email have been ever sent by this customer for this query in the past) then the email_nb should be sth like NULL..
This means I do NOT want this output:
+-----------------------------+------------+--------------+
| query_result_id | author | email_nb |
+-----------------------------+------------+--------------+
| 10 | john | 3 |
| 3 | eric | 2 |
+-----------------------------+------------+--------------+
How to achieve this in MySQL 5.6 ?
Since you were confusing a bit, I came up on this.
select
max(q.query_result_id) as query_result_id,q.author,max(email_nb) as email_nb
from query_results q
left join customers_emails c on q.author=c.author
where customer_id=12 and query_id=1
group by q.author;
I think the best thing to do in a situation like this is break it down into smaller queries and then combine them together.
The first thing you want to do is this:
The specificity is that I make sure to retrieve query_results with an author who does not appear in any customer_email_id where the days_since_sending would be less than 60 days
This might look something like this:
-- Query A
SELECT DISTINCT q.author FROM query_results q
WHERE q.author NOT IN (
SELECT c.author FROM customers_emails c
WHERE c.days_since_sending < 60
)
AND q.query_id = 1
This will get you the list of authors (with duplicates removed) that haven't had an email in the last 60 days that appear for the given query ID. Your next requirement is the following:
I need to have in the column the max email_nb sent to john by customer_id=12 and query_id=1 EVEN if it was sent more than 60 days ago
This query could look like this:
-- Query B
SELECT c.query_result_id, c.author, MAX(c.email_nb) as max_email_nb
FROM customers_emails c
LEFT JOIN query_results q ON c.author = q.author
WHERE c.customer_id = 12
AND q.query_id = 1
GROUP BY c.query_result_id, c.author
That gets you the maximum email_nb for each author/query_result combination, not taking into consideration the date at all.
The only thing left to do is reduce the set of results from the second query down to only the authors that appear in the first query. There are a few different methods for doing that. For example, you could INNER JOIN the two queries by author:
SELECT b.* FROM (
-- Query B
SELECT c.query_result_id, c.author, MAX(c.email_nb) as max_email_nb
FROM customers_emails c
LEFT JOIN query_results q ON c.author = q.author
WHERE c.customer_id = 12
AND q.query_id = 1
GROUP BY c.query_result_id, c.author
) b INNER JOIN (
-- Query A
SELECT DISTINCT q.author FROM query_results q
WHERE q.author NOT IN (
SELECT c.author FROM customers_emails c
WHERE c.days_since_sending < 60
)
AND q.query_id = 1
) a ON a.author = b.author
You could use another NOT IN clause:
SELECT b.* FROM (
-- Query B
SELECT c.query_result_id, c.author, MAX(c.email_nb) as max_email_nb
FROM customers_emails c
LEFT JOIN query_results q ON c.author = q.author
WHERE c.customer_id = 12
AND q.query_id = 1
GROUP BY c.query_result_id, c.author
) b
WHERE b.author NOT IN (
-- Query A
SELECT DISTINCT q.author FROM query_results q
WHERE q.author NOT IN (
SELECT c.author FROM customers_emails c
WHERE c.days_since_sending < 60
)
AND q.query_id = 1
) a
There are most likely ways to improve the speed or reduce down the lines of code for this query, but if you need to do that you now have a query that works at least that you can compare the results to.

Mysql join with counting results in another table

I have two tables, one with ranges of numbers, second with numbers. I need to select all ranges, which have at least one number with status in (2,0). I have tried number of different joins, some of them took forever to execute, one which I ended with is fast, but it select really small number of ranges.
SELECT SQL_CALC_FOUND_ROWS md_number_ranges.*
FROM md_number_list
JOIN md_number_ranges
ON md_number_list.range_id = md_number_ranges.id
WHERE md_number_list.phone_num_status NOT IN (2, 0)
AND md_number_ranges.reseller_id=1
GROUP BY range_id
LIMIT 10
OFFSET 0
What i need is something like "select all ranges, join numbers where number.range_id = range.id and where there is at least one number with phone_number_status not in (2, 0).
Any help would be really appreciated.
Example data structure:
md_number_ranges:
id | range_start | range_end | reseller_id
1 | 000001 | 000999 | 1
2 | 100001 | 100999 | 2
md_number_list:
id | range_id | number | phone_num_status
1 | 1 | 0000001 | 1
2 | 1 | 0000002 | 2
3 | 2 | 1000012 | 0
4 | 2 | 1000015 | 2
I want to be able select range 1, because it has one number with status 1, but not range 2, because it has two numbers, but with status which i do not want to select.
It's a bit hard to tell what you want, but perhaps this will do:
SELECT *
from md_number_ranges m
join (
SELECT md_number_ranges.id
, count(*) as FOUND_ROWS
FROM md_number_list
JOIN md_number_ranges
ON md_number_list.range_id = md_number_ranges.id
WHERE md_number_list.phone_num_status NOT IN (2, 0)
AND md_number_ranges.reseller_id=1
GROUP BY range_id
) x
on x.id=m.id
LIMIT 10
OFFSET 0
Is this what you're looking for?
SELECT DISTINCT r.*
FROM md_number_ranges r
JOIN md_number_list l ON r.id = l.range_id
WHERE l.phone_num_status NOT IN (0,2)
SQL Fiddle Demo

How to query number of changes in a column in MySQL

I have a table that stores items with two properties. So the table has three columns:
item_id | property_1 | property_2 | insert_time
1 | 10 | 100 | 2012-08-24 00:00:01
1 | 11 | 100 | 2012-08-24 00:00:02
1 | 11 | 101 | 2012-08-24 00:00:03
2 | 20 | 200 | 2012-08-24 00:00:04
2 | 20 | 201 | 2012-08-24 00:00:05
2 | 20 | 200 | 2012-08-24 00:00:06
That is, each time either property of any item changes, a new row is inserted. There is also a column storing the insertion time. Now I want to get the number of changes in property_2. For the table above, I should get
item_id | changes_in_property_2
1 | 2
2 | 3
How can I get this?
This will tell you how many distinct values were entered. If it was changed back to a previous value, it will not be counted as a new change, though. Without a chronology to your data, hard to do much more.
select item_id, count(distinct property_2)
from Table1
group by item_id
Here is the closest that I could get to your desired result. I should note however, that you are asking for the number of changes to property_2 based on item_id. If you are analyzing strictly those two columns, then there is only 1 change for item_id 1 and 2 changes for item_id 2. You would need to expand your result to aggregate by property_1. Hopefully, this fiddle will show you why.
SELECT a.item_id,
SUM(
CASE
WHEN a.property_2 <>
(SELECT property_2 FROM tbl b
WHERE b.item_id = a.item_id AND b.insert_time > a.insert_time LIMIT 1) THEN 1
ELSE 0
END) AS changes_in_property_2
FROM tbl a
GROUP BY a.item_id
My take :
SELECT
i.item_id,
SUM(CASE WHEN i.property_1 != p.property_1 THEN 1 ELSE 0 END) + 1
AS changes_1,
SUM(CASE WHEN i.property_2 != p.property_2 THEN 1 ELSE 0 END) + 1
AS changes_2
FROM items i
LEFT JOIN items p
ON p.time =
(SELECT MAX(q.insert_time) FROM items q
WHERE q.insert_time < i.insert_time AND i.item_id = q.item_id)
GROUP BY i.item_id;
There is one entry for each item that is not selected in i, the one that has no predecessor. It counts for a change though, that's why the sums are incremented.
I would do it this way, with user-defined variables to keep track of the previous row's value.
SELECT item_id, MAX(c) AS changes_in_property_2
FROM (
SELECT IF(#i = item_id, IF(#p = property_2, #c, #c:=#c+1), #c:=1) AS c,
(#i:=item_id) AS item_id,
(#p:=property_2)
FROM `no_one_names_their_table_in_sql_questions` AS t,
(SELECT #i:=0, #p:=0) AS _init
ORDER BY insert_time
) AS sub
GROUP BY item_id;