MySQL Query for getting Payments & Invoices in one query - mysql

A customer has asked me to pull some extra accounting data for their statements. I have been having trouble writing a query that can accomodate this.
What they initially wanted was to get all "payments" from a certain date range that belongs to a certain account, ordered as oldest first, which was a simple statement as follows
SELECT * FROM `payments`
WHERE `sales_invoice_id` = ?
ORDER_BY `created_at` ASC;
However, now they want to have newly raised "invoices" as part of the statement. I cannot seem to write the correct query for this. Union does not seem to work, and JOINS behave like, well, joins, so that I cannot get a seperate row for each payment/invoice; it instead merges them together. Ideally, I would retrieve a set of results as follows:
payment.amount | invoice.amount | created_at
-----------------------------------------------------------
NULL | 250.00 | 2014-02-28 8:00:00
120.00 | NULL | 2014-02-28 8:00:00
This way I can loop through these to generate a full statement. The latest query I tried was the following:
SELECT * FROM `payments`
LEFT OUTER JOIN `sales_invoices`
ON `payments`.`account_id` = `sales_invoices`.`account_id`
ORDER BY `created_at` ASC;
The first problem would be that the "created_at" column is ambiguous, so I am not sure how to merge these. The second problem is that it merges rows and brings back duplicate rows which is incorrect.
Am I thinking about this the wrong way or is there a feature of MySQL I am not aware of that can do this for me? Thanks.

You can use union all for this. You just need to define the columns carefully:
select ip.*
from ((select p.invoice_id, p.amount as payment, NULL as invoice, p.created_at
from payments p
) union all
(select i.invoice_id, NULL, i.amount, i.created_at
from sales_invoices i
)
) ip
order by created_at asc;
Your question is specifically about MySQL. Other databases support a type of join called the full outer join, which can also be used for this type of query (MySQL does not support full outer join).

Related

MySQL Query Join and Group by rows not in sync

I have a query i thought was working until i added more values into the equation
This has resulted in the rows in the results not matching the actual data.
Here is the Data set i am querying:
My Query:
SELECT c.Company_Name, c.Branch_Name, Till_No, Database_Build_Date, Database_Updated_Date, Touchkeys_Build_Date, Touchkeys_Updated_Date, MAX(DateTime)
FROM `TillDatabase`
INNER JOIN Customers c USING(ClientID, Branch)
WHERE `ClientID` = 1 AND `Branch` = 1
Group by Till_No
Order by Till_No Asc;
My Results:
With the group by and using Max date, i would of expected only Till 1, 2 and 3
To show using the Latest Date
What have i failed to acknowledge or am i going about the query the wrong way?
Thanks
(Sorry I tried to format for table as it shows but i have no idea how to do it as it came out so badly formatted)

How to JOIN a table on an aggregate

Issue Summary:
I need to join a single row to a table output based on an aggregate function, in this case of most recent corresponding record. Various other questions on this topic seem to work on the basis that both tables values are required (INNER JOIN, etc.) but in my case the aggregate needs to work on a LEFT JOIN table that is many times going to be NULL.
MySQL 5.7;
Here is an couple of illistrative tables:
Tables
Core_data:
-----------
create table `core`
(
core_id int(8) unsigned auto_increment primary key,
some_name varchar(80) null,
some_data varchar(80) null,
some_values .... etc.
)
Linked_data:
------------
create table `linked_data`
(
link_id smallint(6) unsigned auto_increment primary key,
core_id int(8) unsigned
data_date date,
some_linked_data_values varchar(80) null
)
I have a query dealing with dozens of tables. selecting 1 row from the Core table and selecting various LEFT JOIN data from dozens of other tables.
The illustrated linked data table has data that is dated, and the date is important, that only the most recent is returned.
Example Data:
linked_data
------------
link_id | core_id | data_date | data_value | ...
-------------------------------------------------
1 | 2 | 2020-09-03 | something | ...
2 | 4 | 2019-07-29 | whatever | ...
3 | 1 | 2017-11-09 | yews | ...
4 | 4 | 2018-04-10 | socks | ...
I want to only join the row with core_id = 4 AND the maximum date value. How can I create this within the JOIN scenario; I can't put the MAX aggregate into the JOIN ... ON condition.
My Current SQL:
My SQL is something like this:
SELECT ... many columns ...,
ld.data_value,
ld.data_date,
more.columns ...
FROM core
LEFT JOIN table1 ON core.core_id = table1.core_id
LEFT JOIN table2 ON core.core_id = table2.core_id
LEFT JOIN table3 ON core.core_id = table3.core_id
... etc ...
LEFT JOIN linked_data ld ON core.core_id = ld.core_id AND MAX(ld.data_date)
WHERE ... core_id = value
One table I need only a result row that has the highest value of a column (data based), there is no reason for linked_data to hold any data so the LEFT JOIN may return NULL
Expected result:
For core_id = 4 I want to be able to output a single SQL row result containing linked_data.data_value = whatever . For core_id = 5 I want to be able to output the rest of the data but nothing from linked_data. table.
What Have I tried already?
This answer is noted as correct but is also noted that it will become very slow very quickly with larger amounts of data.
This answer put the qualifier in the WHERE clause, but there's no promise that linked_data will contain any result at all so, I can of course add further conditionals (check if into the WHERE clause here but I was hoping to avoid this.
This MySQL post has another possible solution but comments on this also state it is very slow (that may be user error on their part, I've not tested it yet).
I have also tried using a SELECT in the LEFT JOIN like so:
SELECT ... many columns ...,
ld.data_value,
ld.data_date,
more.columns ...
FROM core
LEFT JOIN table1 ON core.core_id = table1.core_id
LEFT JOIN table2 ON core.core_id = table2.core_id
LEFT JOIN table3 ON core.core_id = table3.core_id
... etc ...
LEFT JOIN (
SELECT linked_data FROM linked_data ldi WHERE core.core_id = ldi.core_id AND MAX(ldi.data_date)
) as ld ON core.core_id = ldi.core_id
WHERE ... core_id = value
Referenced from this Q&A
But this still tells me Aggregate calls are not allowed here
EDIT: I found why the aggregate wasn't allowed; a simple syntax mistake on my part; but I have put up a full answer to clarify this Q&A as I couldn't find any relative answers when I was searching, so this may be useful to someone.
If anyone has a more correct way of solving the original issue please share!
In its shortest sample answer for you from what provided... You can have a pre-query resulting in an alias for the join. This pre-query can group / max per Core.
That said, and since the linked-data table is auto-increment, I would assume (yeah, I know about assume, but you can confirm) that as each record is added, the date will always be the date added. So and ID of 100 may have a date of Jan 14, 2020, you would never have an earlier date record with a higher ID such as ID 101 = Nov 3, 2019. As each ID added, higher date than last record regardless of the core id.
You can then continue your additional "left-joins" to other tables as needed.
REVISION FROM COMMENT CLARIFICATION
Martin, from the clarification you provided about the data coming from multiple sources and the date could represent older data, just revise the pre-query inner sql to the following. The query is heavily commented to clarify how the over/partition query works in this scenario
Now, integrating with all the rest of your stuff. I will only join to your primary core table as a left-join
select
cd.core_id,
cd.some_name,
cd.some_data,
cd.some_values,
ld2.link_id,
ld2.data_date,
ld2.some_linked_data_values
from
core_data cd
left join
( select pq1.*
from
-- first, all columns I want to have returned from
-- the linked_data table
( select core_id,
data_date,
link_id,
-- dense_rank() returns sequential counter value
-- starting at 1 based on every change of the
-- PARTITION BY Core_ID in next part below
dense_rank()
over ( partition by
-- primary sorting by the core_id
core_id
order by
-- then within each core_id, descending by date
data_date desc,
-- and then by the link_id descending, just in case
-- there are multiple records for the same core_id
-- AND the same date... So you get the most
-- recently added linked_data record for given core
link_id desc ) as sqlrow
from linked_data ) pq1
where
-- now, from inner partition/over query, only get the record
-- where the sqlrow = 1, as result of dense_rank()
-- that resets to 1 every time core_id changes
pq1.sqlrow=1 ) PQ
on cd.core_id = PQ.core_id
LEFT JOIN linked_data ld2
on PQ.Link_id = ld2.link_id
The inner query with OVER / PARTITION BY is basically making a first-pass of the data and ordering it first by the partition (the core id), then sub-sorted by the data_date DESCENDING (so most recent date first regardless of being added earlier or later from the import from whatever external sources), then sub-sorted by link_id descending based on the most recent record added for any given date.
The final outer WHERE clause is basically stating only give me back the first row for every core_id. So now, you have the proper critical elements to re-do the LEFT join back to the original core ID, yet have the proper link_id to get the proper record at your final query result.
After writing out this whole question, I found that a simple syntax fix on my latest example SQL resolved the correct way to do this (and I've done this in the past the correct way but had not learnt it).
I note the lack of (or rather my inability to easily find) a clear definitive answer on the Interwebs to my query, so here's my methodology...
The correct way to do this type of aggregated join with respect to getting 0 or 1 results from the joined table is as follows:
LEFT JOIN and then wrap the subquery in brackets and set it as ... as per usual.
Put the aggregation function into the subquery ignoring the join criteria; the join criteria will be in the outer query ON ... section.
use ORDER BY in the sub query and then set the aggregate at that point.
So;
SELECT ... many columns ...,
ld.data_value,
ld.data_date,
more.columns ...
FROM core
LEFT JOIN table1 ON core.core_id = table1.core_id
LEFT JOIN table2 ON core.core_id = table2.core_id
LEFT JOIN table3 ON core.core_id = table3.core_id
... etc ...
LEFT JOIN (
SELECT linked_data FROM linked_data ldi WHERE optional = 1 ORDER BY MAX(ldi.data_date)
) as ld ON core.core_id = ldi.core_id
WHERE ... core_id = value
There is no need for a WHERE in the subquery but obviously if you do then it should be static and not need to reference the outer table, as that's done on the ON clause. I put in WHERE optional = 1 for familiarity.
The nature of LEFT JOIN is it will return 0 or 1 results so I find there's no need for much else.
If anyone has a better way of solving my original issue please let me know!
P.s> I have not efficiency tested this answer at all.

MySQL query only is selecting one entry

I have two tables, h_user and appointment, and this query where I want to get all the users that missed more than 3 appointments in the last trimester. I am doing it like this:
select h_user.name from h_user
inner join appointment on appointment.id_user=h_user.id
having count(appointment.missed='y' and date(appointment.datetime)>(curdate()-interval 3 month))>3;
My problem is that when I run this I am only getting one user when I should get two since I included these(the third value is not relevant here, it's the doctor's id):
insert into appointment values('2019-10-11 16:00:00','1','10','y');
insert into appointment values('2019-11-15 10:00:00','1','11','y');
insert into appointment values('2019-12-14 10:00:00','1','11','y');
insert into appointment values('2019-11-21 10:00:00','1','11','y');
insert into appointment values('2019-10-21 10:00:00','1','11','y');
insert into appointment values('2019-10-11 16:00:00','2','12','y');
insert into appointment values('2019-11-15 10:00:00','2','13','y');
insert into appointment values('2019-12-14 10:00:00','2','13','y');
insert into appointment values('2019-11-21 10:00:00','2','13','y');
insert into appointment values('2019-10-21 10:00:00','2','13','y');
Also when I delete the user the result gives me and run it again, it gives me the other one so I know it works only for one user. If anyone could help me figure out the problem that would be great, ty in advance!
Basially your query is missing a group by clause (which old versions of MySQL allow), so it is giving you wrong results. Just add the missing clause (you do want to include the primary key column of the users table in the group by, in case two different users have the same name).
You should move all the conditions to the where clause for efficiency. I would also recommend against using date() against a table column, since this defeats an existing index; you can get the same results without this function.
Consider:
select u.name
from h_user u
inner join appointment a on a.id_user = u.id
where a.datetime > curdate() - interval 3 month and a.missed = 'y'
group by u.id, u.name
having count(*) > 3;
Demo on DB Fiddle:
| name |
| :--- |
| foo |
| bar |
You are missing a group by h_user.name clause and you should also move your 2nd condition in a WHERE clause:
select h_user.name
from h_user inner join appointment on
appointment.id_user=h_user.id
where date(appointment.datetime)>(curdate()-interval 3 month)
group by h_user.name
having sum(appointment.missed='y')>3
Note that it's safer to use the user's id in a group by clause to avoid cases where 2 or more users have the same name.
So this would be better:
select h_user.id, h_user.name
.................................
group by h_user.id, h_user.name
.................................

MySQL - How do I optimize appending field from table b to a query of table a

I know this has to be a fairly common issue, and I am sure the answer is readily available but I am not sure how to phrase my search so I have been forced to troubleshoot this on my own for the most part.
Table A
id | content_id | score
1 | 2 | 16
2 | 2 | 4
3 | 3 | 8
4 | 3 | 12
Table B
id | content
1 | "Content Goes Here"
2 | "Content Goes Here"
3 | "Content Goes Here"
Objective: SUM all scores from table A, group by the unique content_id and show the content associated with the id, ordered by the sum score.
Current Working Query:
SELECT a.content_id, b.content, SUM(a.score) AS sum
FROM table_a a
LEFT JOIN table_b b ON a.content_id = b.id
GROUP BY a.content_id
ORDER BY sum ASC;
Problem: As far as I can tell, with the way I have structured my query, the content is grabbed from table_b by looping through each record on table_a, checking for a record in table_b with an identical id, and grabbing the content field. The problem here is that in table_a there is nearly 500k+ records, and in table_b there is 112 records. Which means that potentially 500,000 x 112 cross table lookups/matches are being performed just to attached 112 unique content fields to a total of 112 results in the ending result set.
HELP!: How do I more efficiently append the 112 content fields from table_b to the 112 results produced by the query? I am guessing it has something to do with the query execution order, like somehow only looking for and appending the content field to the matched result row AFTER the sums are produced and it is narrowed down to only 112 records? Have studied the MySQL API and benchmarked various subqueries, several joins, and even tried playing with UNION. It is probably something abundandtly obvious to you guys, but my brain just can't get around it.
FYI: Like mentioned earlier, the query does work. The results are produced in about 8 to 10 seconds, and of course each subsequent query after that is immediate because of query caching. But for me, with how simple this is, I know that 8 seconds can at LEAST be cut in half. I just feel it deep down in my guts. Right deep down in my gutssss.
I hope this is concise enough, if I need to clarify or explain something better please let me know! Thanks in advance.
The MySQL query optimiser only allows "nested loop joins" ** These are the internal operators for how an INNER join is evaluated. Other RDBMS allow other kinds of JOINs which are more efficient.
However, in your case you can try this. Hopefully the optimiser will do the aggregate before the JOIN
SELECT
a.content_id, b.content a.sum
FROM
(
SELECT content_id, SUM(score) AS sum
FROM table_a
GROUP BY content_id
) a
JOIN table_b b ON a.content_id = b.id
ORDER BY
sum ASC;
In addition, if you don't want the results ordered you can use ORDER BY NULL which usually removes a filesort from the EXPLAIN. And of course, I assume that there are indexes on the 2 content_id columns (one primary key, one foreign key index)
Finally, I would also assume that an INNER JOIN will be enough: every a.contentid exists in tableb. If not, you are missing a foreign key and index on a.contentid
** It's getting better but you need MariaDB or MySQL 5.6
This should be a little faster:
SELECT
tmp.content_id,
b.content,
tmp.asum
FROM (
SELECT
a.content_id,
SUM(a.score) AS asum
FROM
table_a a
GROUP BY
a.content_id
ORDER BY
NULL
) as tmp
LEFT JOIN table_b b
ON tmp.content_id = b.id
ORDER BY
tmp.asum ASC
You can use EXPLAIN to check the query execution plan for both queries when you want to benchmark them

INNER JOIN on 3 tables with SUM()

I am having a problem trying to JOIN across a total of three tables:
Table users: userid, cap (ADSL bandwidth)
Table accounting: userid, sessiondate, used bandwidth
Table adhoc: userid, date, amount purchased
I want to have 1 query that returns a set of all users, their cap, their used bandwidth for this month and their adhoc purchases for this month:
< TABLE 1 ><TABLE2><TABLE3>
User | Cap | Adhoc | Used
marius | 3 | 1 | 3.34
bob | 1 | 2 | 1.15
(simplified)
Here is the query I am working on:
SELECT
`msi_adsl`.`id`,
`msi_adsl`.`username`,
`msi_adsl`.`realm`,
`msi_adsl`.`cap_size` AS cap,
SUM(`adsl_adhoc`.`value`) AS adhoc,
SUM(`radacct`.`AcctInputOctets` + `radacct`.`AcctOutputOctets`) AS used
FROM
`msi_adsl`
INNER JOIN
(`radacct`, `adsl_adhoc`)
ON
(CONCAT(`msi_adsl`.`username`,'#',`msi_adsl`.`realm`)
= `radacct`.`UserName` AND `msi_adsl`.`id`=`adsl_adhoc`.`id`)
WHERE
`canceled` = '0000-00-00'
AND
`radacct`.`AcctStartTime`
BETWEEN
'2010-11-01'
AND
'2010-11-31'
AND
`adsl_adhoc`.`time`
BETWEEN
'2010-11-01 00:00:00'
AND
'2010-11-31 00:00:00'
GROUP BY
`radacct`.`UserName`, `adsl_adhoc`.`id` LIMIT 10
The query works, but it returns wrong values for both adhoc and used; my guess would be a logical error in my joins, but I can't see it. Any help is very much appreciated.
Your query layout is too spread out for my taste. In particular, the BETWEEN/AND conditions should be on 1 line each, not 5 lines each. I've also removed the backticks, though you might need them for the 'time' column.
Since your table layouts don't match your sample query, it makes life very difficult. However, the table layouts all include a UserID (which is sensible), so I've written the query to do the relevant joins using the UserID. As I noted in a comment, if your design makes it necessary to use a CONCAT operation to join two tables, then you have a recipe for a performance disaster. Update your actual schema so that the tables can be joined by UserID, as your table layouts suggest should be possible. Obviously, you can use functions results in joins, but (unless your DBMS supports 'functional indexes' and you create appropriate indexes) the DBMS won't be able to use indexes on the table where the function is evaluated to speed the queries. For a one-off query, that may not matter; for production queries, it often does matter a lot.
There's a chance this will do the job you want. Since you are aggregating over two tables, you need the two sub-queries in the FROM clause.
SELECT u.UserID,
u.username,
u.realm,
u.cap_size AS cap,
h.AdHoc,
a.OctetsUsed
FROM msi_adsl AS u
JOIN (SELECT UserID, SUM(AcctInputOctets + AcctOutputOctets) AS OctetsUsed
FROM radact
WHERE AcctStartTime BETWEEN '2010-11-01' AND '2010-11-31'
GROUP BY UserID
) AS a ON a.UserID = u.UserID
JOIN (SELECT UserID, SUM(Value) AS AdHoc
FROM adsl_adhoc
WHERE time BETWEEN '2010-11-01 00:00:00' AND '2010-11-31 00:00:00'
GROUP BY UserId
) AS h ON h.UserID = u.UserID
WHERE u.canceled = '0000-00-00'
LIMIT 10
Each sub-query computes the value of the aggregate for each user over the specified period, generating the UserID and the aggregate value as output columns; the main query then simply pulls the correct user data from the main user table and joins with the aggregate sub-queries.
I think that the problem is here
FROM `msi_adsl`
INNER JOIN
(`radacct`, `adsl_adhoc`)
ON
(CONCAT(`msi_adsl`.`username`,'#',`msi_adsl`.`realm`)
= `radacct`.`UserName` AND `msi_adsl`.`id`=`adsl_adhoc`.`id`)
You are mixing joins with Cartesian product, and this is not good idea, because it's a lot harder to debug. Try this:
FROM `msi_adsl`
INNER JOIN
`radacct`
ON
CONCAT(`msi_adsl`.`username`,'#',`msi_adsl`.`realm`) = `radacct`.`UserName`
JOIN `adsl_adhoc` ON `msi_adsl`.`id`=`adsl_adhoc`.`id`