Trying to create a single report with multiple outer joins - ms-access

I'm new to Access with close to no knowledge in coding besides a few basic syntax. All I know comes from what I could find through the Internet.
I'm trying to create a report on the performance of academics based on their publications, grants and number of students. I have no problem creating all 3 reports individually but I've been requested to combine all 3 into a single report. The report needs to include all academics regardless if they have any publications, grants or students, AND must generate it in such a way that it is grouped based on each individual academic. I'm not very good with explanations but it should look something like this:
Academic 1
Publications
Grants
Students
Academic 2
Publications
Grants
Students
The relationship is something like this (sorry if the explanation is bad):
[Academics] 1-M [Supervision] M-1 [Students]
[Academics] 1-M [Published] M-1 [Publications]
[Academics] 1-M [Funding] M-1 [Grants]
The tables on the "1" side of the relationship have a primary key that is linked together by the tables in between. Academics have more than one grant, publication and student, and it is a report which is supposed to span a couple of years.
I've tried grouping based on Academic and placing subreports in the same group. Sadly, there are multiples of the same record for each subreport (e.g. the same publication will repeat itself several times) regardless if the query records themselves are distinct or if I've grouped them based on publication, etc.

The following is one way to get counts for your report -- assuming you want counts or sums or some combination:
Create three queries (maybe like you already have) where each query will join 'Academics' to one of the relationship tables (Supervision, Publishing, Funding). Make these 'Totals' queries where you 'GroupBy the unique field in 'Academics', and 'Count' some field in the other table. i.e. I named these Q1, Q2, Q3
SELECT Academics.AName, Count(Supervision.AName) AS Students
FROM Academics
INNER JOIN Supervision ON Academics.AName = Supervision.AName
GROUP BY Academics.AName;
Create a fourth query to be used in the report that joins 'Academics' and Q1, Q2, Q3 like follows:
SELECT Academics.AName, Sum(Q1.Students) AS Students, Sum(Q2.Publications) AS Pubs,
Sum(Q3.Funding) AS Funds
FROM ((Academics
LEFT JOIN Q1 ON Academics.AName = Q1.AName)
LEFT JOIN Q2 ON Academics.AName = Q2.AName)
LEFT JOIN Q3 ON Academics.AName = Q3.AName
GROUP BY Academics.AName;
This will return ALL Academics - even if no students, funds, or pubs. You can set a filter to only include an Academic if they have a related record.

Related

MySQL - When shouldn't I Join tables? Combinatorial Explosion of values

I am working on a database called classicmodels, which I found at: https://www.mysqltutorial.org/mysql-sample-database.aspx/
I realized that when I executed an Inner Join between 'payments' and 'orders' tables, a 'cartesian explosion' occurred. I understand that these two tables are not meant to be joined. However, I would like to know if it is possible to identify this just by looking at the relational schema or if I should check the tables one by one.
For instance, the customer number '141' appears 26 times in the 'orders table', which I found by using the following code:
SELECT
customerNumber,
COUNT(customerNumber)
FROM
orders
WHERE customerNumber=141
GROUP BY customerNumber;
And the same customer number (141) appears 13 times in the payments table:
SELECT
customerNumber,
COUNT(customerNumber)
FROM
payments
WHERE customerNumber=141
GROUP BY customerNumber;
Finally, I executed an Inner Join between 'payments' and 'orders' tables, and selected only the rows with customer number '141'. MySQL returned 338 rows, which is the result of 26*13. So, my query is multiplying the number of times this 'customer n°' appears in 'orders' table by the number of times it appears in 'payments'.
SELECT
o.customernumber,
py.amount
FROM
customers c
JOIN
orders o ON c.customerNumber=o.customerNumber
JOIN
payments py ON c.customerNumber=py.customerNumber
WHERE o.customernumber=141;
My questions is the following:
1 ) Is there a way to look at the relational schema and identify if a Join can be executed (without generating a combinatorial explosion)? Or should I check table by table to understand how the relationship between them is?
Important Note: I realized that there are two asterisks in the payments table's representation in the relational schema below. Maybe this means that this table has a composite primary key (customerNumber+checkNumber). The problem is that 'checkNumber' does not appear in any other table.
This is the database's relational schema provided by the 'MySQL Tutorial' website:
Thank you for your attention!
This is called "combinatorial explosion" and it happens when rows in one table each join to multiple rows in other tables.
(It's not "overestimation" or any sort of estimation. It's counting data items multiple times when it should only count them once.)
It's a notorious pitfall of summarizing data in one-to-many relationships. In your example each customer may have no orders, one order, or more than one. Independently, they may have no payments, one, or many.
The trick is this: Use subqueries so your toplevel query with GROUP BY avoids joining one-to-many relationships serially. In the query you showed us, that's happening.
You can this subquery to get a resultset with just one row per customer. (try it.)
SELECT customernumber,
SUM(amount) amount
FROM payments
GROUP BY customernumber
Likewise you can get the value of all orders for each customer with this
SELECT c.customernumber,
SUM(od.qytOrdered * od.priceEach) amount
FROM orders o
JOIN orderdetails od ON o.orderNumber = od.orderNumber
GROUP BY c.customernumber
This JOIN won't explode in your face because customer can have multiple orders, and each order can have multiple details. So it's a strict hierarchical rollup.
Now, we can use these subqueries in the main query.
SELECT c.customernumber, p.payments, o.orders
FROM customers c
LEFT JOIN (
SELECT c.customernumber,
SUM(od.qytOrdered * od.priceEach) orders
FROM orders o
JOIN orderdetails od ON o.orderNumber = od.orderNumber
GROUP BY c.customernumber
) o ON c.customernumber = o.customernumber
LEFT JOIN (
SELECT customernumber,
SUM() payment
FROM payments
GROUP BY customernumber
) p on c.customernumber = p.customernumber
Takehome tricks:
A subquery IS a table (a virtual table) that can be used whereever you might mention a table or a view.
The GROUP BY stuff in this query happens separately in two subqueries, so no combinatorial explosions.
All three participants in the toplevel JOIN have either one or zero rows per customernumber.
The LEFT JOINs are there so we can still see customers with (importantly for a business) no orders or no payments. With the ordinary inner JOIN, rows have to match both sides of the ON conditions or they're omitted from the resultset.
Pro tip Format your SQL queries fanatically carefully: They are really verbose. Adm. Grace Hopper would be proud. That means they get quite long and nested, putting the Structured in Structured Query Language. If you, or anybody, is going to reason about them in future, we must be able to grasp the structure easily.
Pro tip 2 The data engineer who designed this database did a really good job thinking it through and documenting it. Aspire to this level of quality. (Rarely reached in the real world.)
In this particular case, your behavior should depend on the accounting style being supported by the database, and this does not appear to be "open item" style accounting ie when an order is raised for 1000 there does not need to be a payment against it for 1000.. This is perhaps unusual in most consumer experience because you will be quite familiar with open item style ordering from Amazon - you buy a 500 dollar tv and a 500 dollar games console, the order is a thousand dollars and you pay for it, the payment going against the order. However, you're also familiar with "balance forward" accounting if you paid for that order using your credit card because you make similar purchases every day for a month and hen you get a statement from your bank saying you owe 31000 and you pay a lump of money, doesn't even have to be 31k. You aren't expected to make 31 payments of 1000 to your bank at the end of the month. Your bank allocate it to the oldest items on the account (if they're nice, or the newest items if they're not) and may eventually charge you interest on unpaid transactions
1 ) Is there a way to look at the relational schema and identify if a Join can be executed
Yes, you can tell looking at the schema- customer has many orders, customer makes many payments, but there is no relation between the order and payment tables at all so we can see there is no attempt to directly attach a payment to an order. You can see that customer is a parent table of payment and order, and therefore enjoys a relationship with each of them but they do not relate to each other. If you had Person, Car and Address tables, a person has many addresses during their life, and many cars but it doesn't mean there is a relationship between cars and addresses
In such a case it simply doesn't make sense to join payments to customers to orders because they do not relate that way. If you want to make such a join and not suffer a Cartesian explosion then you absolutely have to sum one side or the other (or both) to ensure that your joins are 1:1 and 1:M (or 1:1 and 1:1). You cannot arrange a join that is a pair of 1:M.
Going back to the car/person/address example to make any meaningful joins, you have to build more information into the question and arrange the join to create the answer. Perhaps the question is "what cars did they own while they lived at" - this flattens the Person:Address relationship to 1:1 but leaves Person:Car as 1:M so they might have owned many cars during their time in that house. "What was the newest car they owned while living at..." might be 1:1 on both sides if there is a clear winner for "newest" (though if they bought two cars manufactured at identical times...)
Which side you sum in your orders case will depend on what you want to know, but in this case I'd say you usually want to know "which orders haven't been paid for" and that's summing all payments and rolling summing all orders then looking at what point the rolling sum exceeds the sum of payments.. those are the unpaid orders
Take a look again at your database graph (the one that was present in the first iteration of your question). See the lines between tables have 3 angled legs on one end - that's the many end. You can start at any table in the graph and join to other tables by walking along the relationship. If you're going from the many end to the one end, and assuming you've picked out a single row in the start table (a single order) you can always walk to any other table in the many->one direction and not increase your row count. If you walk the other way you potentially increase your row count. If you split and walk two ways that both increase row count you get a Cartesian explosion. Of course, also you don't have to only join on relation lines, but that's out of scope for the question
ps: this is easier to see on the db diagram than the ERD in the question because the database purely concerns itself with the columns that are foreign keyed. The ERD is saying a customer has zero or one payments with a particular check number but the database will only be concerned with "the customer ID appears once in the customer table and multiple times in the payment table" because only part of the compound primary key of payment is keyed to the customer table. In other words, the ERD is concerned with business logic relations too, but the db diagram is purely how tables relate and they aren't necessarily aligned. For this reason the db diagrams are probably easier to read when walking round for join strategies
After seeing the answers of Caius Jard and O.Jones (please, check their replies), which kindly helped me to clarify this doubt, I decided to create a table to identify which customers paid for all orders they made and which ones did not. This creates a pertinent reason to join 'orders', 'orderdetails', 'payments' and 'customers' tables, because some orders may have been cancelled or still may be 'On Hold', as we can see in their corresponding 'status' in the 'orders' table. Also, this enables us to execute this join without generating a 'combinatorial explosion'.
I did this by using the CASE statement, which registers when py.amount and amount_in_orders match, don't match or when they are NULL (customers which did not make orders or payments):
SELECT
c.customerNumber,
py.amount,
amount_in_orders,
CASE
WHEN py.amount=amount_in_orders THEN 'Match'
WHEN py.amount IS NULL AND amount_in_orders IS NULL THEN 'NULL'
ELSE 'Don''t Match'
END AS Match
FROM
customers c
LEFT JOIN(
SELECT
o.customerNumber, SUM(od.quantityOrdered*od.priceEach) AS amount_in_orders
FROM
orders o
JOIN orderdetails od ON o.orderNumber=od.orderNumber
GROUP BY o.customerNumber
) o ON c.customerNumber=o.customerNumber
LEFT JOIN(
SELECT customernumber, SUM(amount) AS amount
FROM payments
GROUP BY customerNumber
) py ON c.customerNumber=py.customerNumber
ORDER BY py.amount DESC;
The query returned 122 rows. The images below are fractions of the generated output, so you can visualize what happened:
For instance, we can see that the customers identified by the numbers '141', '124', '119' and '496' did not pay for all the orders they made. Maybe some of them where cancelled or maybe they simply did not pay for them yet.
And this image shows some of the columns (not all of them) that are NULL:

Access: Counting Number of Occurrences in 2 Columns [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I'm working on a database for work, and I need to figure out a way for Access to count the number of projects that each employee is assigned. Projects have 1 or 2 employees assigned, and my boss needs to be able to quickly figure out how many projects each person is working on. Below is an example table:
Project Employee 1 Employee 2
Project A John Doe Jane Doe
Project B Jane Doe Sam Smith
Project C Jane Doe John Doe
Project D Sam Smith Anna Smith
Project E Anna Smith John Doe
And here is the result I'm looking for:
**Employee # of Projects**
John Doe 3
Jane Doe 3
Sam Smith 2
Anna Smith 2
The table you described is probably not the best way to store the data and I think it's only making your job more difficult. The value of a relational database is that you can have data living in different tables but related based on primary/ foreign keys which makes it significantly easier to pull reports like the one you described. It seems to me like this table might have previously lived in Excel, and I would spend some time now establishing relationships in Access which will save you time and headaches later. I would suggest creating 3 separate tables: employees, projects, and project employee assignments.
The employee table should have 3 fields: EmployeeID, which should be set to AutoNumber in Design view and then selected as the primary key, First Name, and Last Name, both short text fields. This EmployeeID field will be referenced in the project employee assignments table.
The projects table should have 2 fields: ProjectID, also set to AutoNumber in Design view and selected as the primary key, and ProjectName which will also be a short text field. You can also add other fields, perhaps a text field for ProjectDescription would be helpful later on.
The Project-Employee Assignments table should have 2 fields: EmployeeID and ProjectID. If you aren't familiar with one-to-one, one-to-many, and many-to-many relationships I would suggest looking it up- you are describing a many-to-many relationship between the projects and employees, that is, one project can have many employees and one employee can be involved in many projects. This table exists to establish those relationships between employees and projects.
From here, go to the database tools tab and select Relationships. You'll need to establish a one-to-many relationship between the Employees table and the Assignments table on the EmployeeID field. You'll also need to establish a one-to-many relationship between the Projects table and the Project-Employee Assignments table on the ProjectID field.
Enter each relationship between projects and employees in the Assignments table. If you have a short list of projects and employees, you can do this directly in the table, but I'd suggest creating a form to do this with 2 combo boxes that each select from the lists of existing projects and employees, respectively. There are many tutorials about creating combo boxes that show informative columns, like employee name, but save the ID numbers to the table. Search "Bind Combo Box to Primary Key but display a Description field" for one example.
Finally, create a query to count projects per employee. You should include your Employees table, as well as your Project-Employee Assignments table. Select FirstName and LastName from the Employees table. Select both columns (EmployeeID and ProjectID) from the Project-Employee Assignments table. Unclick "show" for EmployeeID. Right-click anywhere in the query to get a menu of more options and click the sigma for totals. Set the total for EmployeeID, FirstName, and LastName to "Group By" and for ProjectID to "Count" then save the query. Run the query and enjoy having your totals!
Elizabeth Ham's answer is very thorough and I recommend following her advice, but knowing that sometimes we don't have time to do a complete overhaul, here's some instructions on how to get results from the given table structure. As Elizabeth and I pointed out (in my comment), a single query could have gotten the requested data if the tables were complete and properly normalized.
Because there are multiple employee columns for which you want statistics, you need to join the given table at least twice, each time grouping on a different column and using a different alias. It is possible to do this using the visual Design View, however it is usually easier to post questions and answers on StackOverflow using SQL text, so that's what follows. Just paste the following code into the SQL view of a query, then you should be able to switch between SQL view and Design View.
Save the following SQL statements as two separate, named queries: [ProjectCount1] and [ProjectCount2]. Saving them allows you to refer to these queries multiple times in other queries (without embedding redundant subqueries):
SELECT P.[Employee 1] As Employee, Count(P.Project]) As ProjectCount
FROM Project As P
GROUP BY P.[Employee 1];
SELECT P.[Employee 2] As Employee, Count(P.[Project]) As ProjectCount
FROM Project As P
GROUP BY P.[Employee 2];
Now create a UNION query for the purpose of creating a unique list of employees from the two source columns. The UNION will automatically keep only distinct values (i.e. remove duplicates). (By the way, UNION ALL would return all rows from both tables including duplicates.) Save this query as [Employees]:
SELECT Employee FROM [ProjectCount1]
UNION
SELECT Employee FROM [ProjectCount2]
Finally, combine them all into a list of unique employees with a total sum of projects for each:
SELECT
E.Employee As Employee, nz(PC1.ProjectCount, 0) + nz(PC2.ProjectCount, 0) As ProjectCount
FROM
([Employees] AS E LEFT JOIN [ProjectCount1] As PC1
ON E.[Employee] = PC1.[Employee])
LEFT JOIN [ProjectCount2] As PC2
ON E.[Employee] = PC2.Employee
ORDER BY E.[Employee]
Note 1: The function nz() converts null values to the given non-null value, in this case 0 (zero). This ensures that you'll get a valid sum even when an employee appears in only one column (and as such has a null value in the other column).
Note 2: This solution will double count an employee if it's listed as both [Employee 1] and [Employee 2] in the original table. I assume that there are proper constraints to exclude that case, but if needed, one could do a self join on the second query [ProjectCount2] to exclude such double entries.
Note 3: If you do decide to follow Elizabeth's advice and you already have a lot of data in the existing structure, the above queries can also be useful in generating data for the new, normalized table structure. For instance, you could insert the unique list of employees from the above UNION query directly into a newly normalized [Employee] table.

database - MS Access

I have three tables Inventory, Book and DVD. Now I have to make Book and DVD as a subdatasheet to Inventory Table. Is it possible to do via query in MS Access
Assuming the inventID columns in the Book/DVD tables equal the inventoryID in the inventory table then yes'ish.
If you want actual subdatasheets you wouldn't need a query, just open the inventory table and insert the subdatasheet. The problem is you can only have one subdatasheet so you won't be able to see the Book and DVD subdatasheets together.
You can run a query but viewing both DVD and Book together will result in a weird looking query where columns for Book and DVD will show on the same line. This is how you would write that query:
SELECT Inventory.*, DVD.*, Book.*
FROM (Inventory LEFT JOIN DVD ON Inventory.inventoryID = DVD.inventid)
LEFT JOIN Book ON Inventory.inventoryID = Book.inventID;
It would output like this:
Sample query (I didn't create all the fields)
In the example, Test 1, 2 and 3 are books and 4 and 5 are DVD's
You might be better served to just make separate inventory tables for both types of items and get rid of the inventory table all together. When you want to see it all together you can run a Union query that brings both tables together and only show the common fields (everything that's currently in the Inventory table.

MS Access: Report to contain info from third table

I'm currently working on a database for a company, for them to use when making production orders. A report is to be made consisting of several things, mainly product number, order number etc etc. A part of the report is to be made up of a list of spare parts needed for the production of the item in question. I have a table with an order number and product number, which needs to look in another table to find the necessary spare parts. However, the name, location and stock of those spare parts are in a third table, and I can't seem to find a way to include these things automatically when the product number is known. I'm pretty new to MS Access, so any help will be greatly appreciated. Thanks!
I have a table called Table1, which uses a combobox to automatically fill boxes such as production time, test time etc from a given product number. This data is gathered from the second table StandardTimes, which has as a primary key the product number. Other columns in this table includes production area, standard quantity, average production time, and also includes in several columns, the necessary spare parts needed. In a third table called Inventory, we have the product numbers of the spare parts, their location in storage, and number of items currently in store. I created a report using a query which takes an order number, and creates a report on that order number from Table1. What needs to be included in this report is a list of the spareparts necessary, the location in the storage, and the number of items currently in store.
Revised from new user input
Your question still does not provide actual columns or data. As a result, it's hard to model your needs.
However, based on what I can read, I think that you have are missing some basic design setup items in a relational model.
Assuming that you have 3 tables: Table1 (Orders), StandardTimes (Products) and Inventory (SpareParts)
In English, every order has one or more products. Every product has one or more spare parts. Really you'd want an orders table, and an order details table which has records for each item as part of that order. But I'm answering it on your setup which I believe is flawed.
Orders <-(1:M)-> Products <- (1:M) -> SpareParts
You have an OrderID, a ProductID, and a SparePartID.
A query such as this would join those 3 tables together with that kind of relationship.
SELECT o.OrderNum, o.ProductNum, st.ProductionArea, st.StandardQuality, i.SparePartsNum, i.Location, i.Qty
FROM Orders as o
INNER JOIN StanardTimes as st on o.ProductNum = st.ProductNum
INNER JOIN Inventory as i on i.ProductNum = st.ProductNum
Some sample data would be helpful to help design the queries.
In principal you would need to join the tables together to get the desired result.
You would join the productID on tblOrders to the ProductID on tblProducts. This will net you the name of the product etc.
This would be an INNER join, as every order has a product.
You would then join to tblSpareParts, also using the productID so that you could return the status of the spare parts for that product. This might be a LEFT JOIN instead of an INNER, but it depends on if you maintain a value of 0 for spare parts (e.g. Every product has a corresponding spare parts record), or if you only maintain a spare parts record for items which have spare parts.

What is the best way to count rows in a mySQL complex table

I have a table with the following fields (for example);
id, reference, customerId.
Now, I often want to log an enquiry for a customer.. BUT, in some cases, I need to filter the enquiry based on the customers country... which is in the customer table..
id, Name, Country..for example
At the moment, my application shows 15 enquiries per page and I am SELECTing all enquiries, and for each one, checking the country field in customerTable based on the customerId to filter the country. I would also count the number of enquiries this way to find out the total number of enquiries and be able to display the page (Page 1 of 4).
As the database is growing, I am starting to notice a bit of lag, and I think my methodology is a bit flawed!
My first guess at how this should be done, is I can add the country to the enquiryTable. Problem solved, but does anyone else have a suggestion as to how this might be done? Because I don't like the idea of having to update each enquiry every time the country of a contact is changed.
Thanks in advance!
It looks to me like this data should be spread over 3 tables
customers
enquiries
countries
Then by using joins you can bring out the customer and country data and filter by either. Something like.....
SELECT
enquiries.enquiryid,
enquiries.enquiredetails,
customers.customerid,
customers.reference,
customers.countryid,
countries.name AS countryname
FROM
enquiries
INNER JOIN customers ON enquiries.customerid = customers.customerid
INNER JOIN countries ON customers.countryid = countries.countryid
WHERE countries.name='United Kingdom'
You should definitely be only touching the database once to do this.
Depending on how you are accessing your data you may be able to get a row count without issuing a second COUNT(*) query. You havent mentioned what programming language or data access strategy you have so difficult to be more helpful with the count. If you have no easy way of determining row count from within the data access layer of your code then you could use a stored procedure with an output parameter to give you the row count without making two round trips to the database. It all depends on your architecture, data access strategy and how close you are to your database.