Related to Join vs. sub-query but a different type of situation, and just trying to understand how this works.
I had to make a view where I get a bunch of employee codes from one table, and I have to get their names from a different table - the same two tables every time. I arranged my query like this:
SELECT
(SELECT name from emptable where empcod = code1) as emp1, code1,
(SELECT name from emptable where empcod = code2) as emp2, code2,
[repeat 6 times]
FROM codetable
It is more complicated than this, and more tables are joined, but this is the element I want to ask about. My boss says joining like so is better:
SELECT e1.name, c.code1, e2.name, c.code2, e3.name, code3 [etc]
FROM codetable c
INNER JOIN emptable e1 ON e1.empcod = c.code1
INNER JOIN emptable e2 ON e2.empcod = c.code2
INNER JOIN emptable e3 ON e3.empcod = c.code3
My reasoning, aside from not having to go search in the joins which table gets whose name and why, is the way I understand the join goes like this:
Take whole table A
Take whole table B
Combine all the data from both tables according to the 'ON' section of the join
select one single string from this complete combination of two whole tables from which I need no other data
I think it's obvious that this seems like it would take up a lot of resources. I understand the subquery as
Get one datum from table A (the employee code)
Match this one datum to every record from table B until you find a match
As soon as you get a match, bring back this one single datum from this other table (the employee's name)
Understanding that in the table of employees, the employee code is a primary key and cannot be duplicated, so every subquery can only ever give me one single string back.
It seems to me that comparing ONE number from one table to ONE number from another table and retrieving ONE string related to that number would be less resource-intensive than matching ALL of the data in two whole tables together in order to get this one string. But I figure I don't know what these databases are doing behind the scenes, and a lot of people seem to prefer joins. Can anyone explain this to me, if I'm understanding it wrong? The other posts I find here of similar situations tend to want more information from more tables, I'm not immediately finding anything about matching the same two tables six or seven times to retrieve one single string for every configuration.
Thanks in advance.
So as ScaisEdge explained, a join only gets executed once - and thus only spends time and resources once - no matter how many rows you have, whereas each of the six subselects get executed once for every row. If you have 100 rows, you're executing six joins once or you're executing 6 subselects 100 times.
It makes sense that this would be more resource-intensive, and I did not explain clearly enough that my case involves only one row at a time - in which case I guess the difference would be negligible anyway.
Edit2: Chose to separate the queries and collate/handle the information as a whole outside of the database's output. Taking these out in a .CSV format, and adding them into Excel where I'm going to be running the actual numbers.
Query 1 to pull out orders and desired info:
SELECT
shipstation_orders_v2.id AS SSO_id,
shipstation_orders_v2.order_number AS SSO_orderNumber,
shipstation_orders_v2.order_id AS SSO_orderID,
shipstation_orders_v2.storename AS SSO_storeName,
shipstation_orders_v2.order_date AS SSO_orderDate,
shipstation_orders_v2.order_total AS SSO_orderTotal,
shipstation_orders_v2.name AS SSO_name,
shipstation_orders_v2.company AS SSO_company
FROM shipstation_orders_v2
GROUP BY shipstation_orders_v2.id,
shipstation_orders_v2.order_number,
shipstation_orders_v2.order_id,
shipstation_orders_v2.storename,
shipstation_orders_v2.order_date,
shipstation_orders_v2.order_total,
shipstation_orders_v2.name,
shipstation_orders_v2.company
ORDER BY SSO_orderDate
Query 2 to pull out fulfillments and equivalent info:
SELECT DISTINCT
shipstation_orders_v2.id AS SSO_id,
shipstation_fulfillments.id AS SSF_id,
shipstation_fulfillments.order_number AS SSF_orderNumber,
shipstation_orders_v2.order_number AS SSO_orderNumber,
shipstation_orders_v2.order_id AS SSO_orderID,
shipstation_orders_v2.storename AS SSO_storeName,
shipstation_orders_v2.order_date AS SSO_orderDate,
shipstation_fulfillments.order_date AS SSF_orderDate,
shipstation_orders_v2.order_total AS SSO_orderTotal,
shipstation_fulfillments.amount_paid AS SSF_amountPaid,
shipstation_orders_v2.name AS SSO_name,
shipstation_orders_v2.company AS SSO_company,
shipstation_fulfillments.name AS SSF_name,
shipstation_fulfillments.company AS SSF_company
FROM shipstation_fulfillments
INNER JOIN shipstation_orders_v2
ON shipstation_fulfillments.order_number =
shipstation_orders_v2.order_number
WHERE shipstation_fulfillments.order_number =
shipstation_orders_v2.order_number
GROUP BY shipstation_orders_v2.id,
shipstation_fulfillments.id,
shipstation_fulfillments.order_number,
shipstation_orders_v2.order_number,
shipstation_orders_v2.order_id,
shipstation_orders_v2.storename,
shipstation_orders_v2.order_date,
shipstation_fulfillments.order_date,
shipstation_orders_v2.order_total,
shipstation_fulfillments.amount_paid,
shipstation_orders_v2.name,
shipstation_orders_v2.company,
shipstation_fulfillments.name,
shipstation_fulfillments.company
Edit: Question marked as answered. I figured out another way to do it that wasn't quite as harebrained. Props to DRapp for getting my brain moving.
Original Code is below Wall of Text
I'm a self-taught MySQL database user. I won't say administrator, since it's just me. I've put together a small database for work - about 60,000 rows and a maximum of 51 columns spread over three tables. I use this at work as a way to organize a fairly disparate sales data setup and make sense of it to identify trends, seasonality, all that good stuff. I work primarily with Shipstation data.
My problem is when I needed to introduce this third table. With two tables, obviously, it's just a simple JOIN. I got that working just fine. I'm having quite a bit of trouble setting up the JOINs correctly for this third table.
I'm attempting to JOIN the data from the two innermost queries to shipstation_orders_v2 and order_keys to the shipstation_fulfillments results I have in the third table.
For those of you who don't use Shipstation or aren't familiar with this element of it, fulfillments are in a different category than orders and don't use quite the same data. This is my dirty way of gluing them together so we have some decent, manipulable information on sales and shipping trends, etc.
I am making an internal query from shipstation_orders_v2 to order_keys as a way to SELECT DISTINCT the sum totals of split orders. I had problems with data duplication before I built up that subquery. With the (now) subquery and sub-subquery, the duping problem has been eliminated and with just those two tables it worked fine.
The issue is, when I'm making the SELECT from shipstation_fulfillments with a JOIN to the subquery and sub-subquery, I'm hitting a roadblock.
I've gotten several errors while working on this query. In order of occurrence and resolution:
Error 2013, lost connection to server during query (which told me I'm doing a full table read on three joined tables, since it isn't erroring out beforehand, but my rinkadink setup can't handle it). I got rid of that one.
Then, Error 1051 for an unidentified table name shipstation_fulfillments. To me I think it might be an issue for the query aliases. I am not sure.
Finally, good ole Error 1064, incorrect syntax on the first subquery after the
SELECT shipstation_fulfillments arguments.
Being self-taught, I'd virtually guarantee I'm merely missing an element of syntax somewhere that would appear fairly obvious to a well-practiced user of MySQL. Below is my current query setup.
If there needs to be any clarification, let me know.
SELECT
`shipstation_fulfillments`.`order_date` AS `orderDate`,
`shipstation_fulfillments`.`order_number` AS `orderNumber`,
(`shipstation_fulfillments`.`amount_paid` + `shipstation_fulfillments`.`tax_paid`) AS "Total Paid",
`shipstation_fulfillments`.`name` AS `name`,
`shipstation_fulfillments`.`company` AS `company`,
FROM
(
(SELECT
COUNT(`shipstation_orders_v2`.`order_key`) AS `orderCount`,
`shipstation_orders_v2`.`key_id` AS `key_id`,
`shipstation_orders_v2`.`order_number` AS `order_number`,
MAX(`shipstation_orders_v2`.`order_date`) AS `order_date`,
`shipstation_orders_v2`.`storename` AS `store`,
(`shipstation_orders_v2`.`order_total` - `shipstation_orders_v2`.`shippingPaid`) AS `orderPrice`,
`shipstation_orders_v2`.`shippingpaid` AS `shippingPaid`,
SUM(`shipstation_orders_v2`.`shippingpaid`) AS `SUM shippingPaid`,
`shipstation_orders_v2`.`order_total` AS `orderTotal`,
SUM(`shipstation_orders_v2`.`order_total`) AS `SUM Total Amount Paid`,
`shipstation_orders_v2`.`qtyshipped` AS `qtyShipped`,
SUM(`shipstation_orders_v2`.`qtyshipped`) AS `SUM qtyShipped`,
`shipstation_orders_v2`.`name` AS `name`,
`shipstation_orders_v2`.`company` AS `company`
FROM
(SELECT DISTINCT
`order_keys`.`key_id` AS `key_id`,
`order_keys`.`order_key` AS `order_key`,
`shipstation_orders_v2`.`order_number` AS `order_number`,
`shipstation_orders_v2`.`order_id` AS `order_id`,
`shipstation_orders_v2`.`order_date` AS `order_date`,
`shipstation_orders_v2`.`storename` AS `storename`,
`shipstation_orders_v2`.`order_total` AS `order_total`,
`shipstation_orders_v2`.`qtyshipped` AS `qtyshipped`,
`shipstation_orders_v2`.`shippingpaid` AS `shippingpaid`,
`shipstation_orders_v2`.`name` AS `name`,
`shipstation_orders_v2`.`company` AS `company`
FROM
(`shipstation_orders_v2`
JOIN `order_keys` ON ((`order_keys`.`order_key` = `shipstation_orders_v2`.`order_id`)))) `t`)
JOIN `shipstation_fulfillments`
ON (`shipstation_orders_v2`.`order_number` = `shipstation_fulfillments`.`order_number`)) `w`
As a couple notes... As for long table names, no problem, but you can use alias references to them such as I have done via example ...ShipStation_Fulfillments SSF... the "SSF" is now an alias for shorter typing yet still makes sense of origin.
When changing column names in query via "AS", you only need the as if your column name result will change from its original as you had in the beginning such as SSF.order_date AS orderDate where you remove the "_" from the final column name, but also in "Total Paid" (yet I HATE column names with embedded spaces, let the user interface handle labeling things, but thats just me).
When typing table.column (or alias.column), doing via CamelCasing helps readability vs camelcasing slightly harder to read where the brain naturally breaks into readable words for us.
Other issue based on query. Outer query portions can't recognize aliases from inner closed queryies, only the alias of the subselect as you had with the "t" and "w" aliases.
Next, when doing JOINs, my preference is to read them in the way the tables are within the query listing the first one on the left, and whatever is joined TO on the right.
If went from Table A Join to Table B, the ON clause would be ON A.KeyID = B.KeyID vs B.KeyID = A.KeyID especially if you are going several tables... A->B, B->C, C->D
Any query with aggregates (sum, avg, count, min, max, etc) must have a "GROUP BY" clause to identify when each record should break. In your example, I would assume break on the original sales order.
Although this query IS NOT WORKING, here is a cleaned-up version of your query showing implementations from above.
SELECT
SSF.order_date AS OrderDate,
SSF.order_number AS OrderNumber,
(SSF.amount_paid + SSF.tax_paid) AS `Total Paid`,
SSF.name,
SSF.company
FROM
( SELECT
SSOv2.key_id,
SSOv2.order_number,
SSOv2.storename AS store,
SSOv2.order_total - SSOv2.shippingPaid AS OrderPrice,
SSOv2.ShippingPaid,
SSOv2.order_total AS OrderTotal,
SSOv2.QtyShipped,
SSOv2.name,
SSOv2.company,
COUNT(SSOv2.order_key) AS orderCount,
MAX(SSOv2.order_date) AS order_date,
SUM(SSOv2.shippingpaid) AS `SUM shippingPaid`,
SUM(SSOv2.order_total) AS `SUM Total Amount Paid`,
SUM(SSOv2.qtyshipped) AS `SUM qtyShipped`
FROM
( SELECT DISTINCT
OK.key_id AS key_id,
OK.order_key AS order_key,
SSOv2.order_number AS order_number,
SSOv2.order_id AS order_id,
SSOv2.order_date AS order_date,
SSOv2.storename AS storename,
SSOv2.order_total AS order_total,
SSOv2.qtyshipped AS qtyshipped,
SSOv2.shippingpaid AS shippingpaid,
SSOv2.name AS name,
SSOv2.company AS company
FROM
shipstation_orders_v2 SSOv2
JOIN order_keys
ON SSOv2.order_id = OK.order_key
JOIN shipstation_fulfillments SSF
ON SSOv2.order_number = SSF.order_number ) t
) w
Next, without seeing actual data or listed structures critical to solve the query, I will ask you edit your existing post. Create a sample table listing table, columns and sample data so we can see the basis of what you are aggregating and trying to get out of the query. Especially show where there could be multiple rows per order and fulfillment respectively and a sample answer of what you EXPECT the results to show.
I've got a SQL statement:
SELECT AVG(`totalhours`)/5 AS `average`,* FROM `report_signout` JOIN `employee` ON `employee`.`username`=`report_signout`.`username`
However it's not working. Basically I need to calculate the average total hours an employee has been on the premises in any one week, the assumption is accepted that the office is only open 5 days a week. The total hours are coming from the table report_signout which I need to join on the username of the employee table so I can produce an outcome where I can then list the Firstname and Lastname of the employee along with the average hours on a web page. That last part is done in PHP which I already know how to do. I just need to see where I'm going wrong with the SQL statement.
If someone could point out to me please or give me a bit of help it would be much appreciated, thanks!
MySQL does not really know what totalhours is. As you are referencing more than one table, a table is not assumed, so you need to declare what to average by defining the reference as:
`table name`.`column name`
This is true for most if not all MySQL built in functions. Use absolute declarations (ie those with table and column names both defined, as above) as much as possible.
SELECT AVG(`report_signout`.`totalhours`)/5 AS `average`,
`report_signout`.*, `employee`.*
FROM `report_signout`
INNER JOIN `employee` ON `employee`.`username` = `report_signout`.`username`
As a small aside, try and avoid vague JOIN referencing instead using a complete JOIN reference, which states the type of JOIN rather than an assumption.
Also try to avoiding using * selection instead stating each column you wish to call.
I am trying to display a default record in a simple query but my attempt doesn't work:
SELECT
COALESCE(suppliers.supplier_name, 'No records') AS supplier_name
FROM suppliers
LEFT JOIN suppliers_purchases USING(supplier_id)
LEFT JOIN suppliers_purchases_articles USING(supplierpurchase_id)
WHERE suppliers_purchases_articles.article_id = 150
ORDER BY suppliers_purchases.supplierpurchase_id DESC
LIMIT 1
As the query returns no rows the coalesce never kicks in - there's no value to act on, let alone NULL.
While technically it is possible to solve your problem in SQL, it would become an awfully large, ugly, unmaintainable piece of SQL. This is because you are trying to solve an issue in SQL that it was never meant to do - a display problem. SQL is meant to control absolute and strict data sets, not default to informational messages based on the lack of a result set. No records is not the name of any supplier in your database, so don't list it as one.
Long story short: don't solve presentational issues in your data layer. Your front end code should handle the lack of results and fall back to properly displaying No records instead, where it's localizable, controllable, and expected by the developer after you.
While I agree this is a presentation logic issue, I have come across times where I had to control it from the database as I couldn't alter the UI.
If that is the case, you have a couple different options. One of them is to introduce a new temporary table and use another outer join:
SELECT
COALESCE(suppliers.supplier_name, 'No records') AS supplier_name
FROM (SELECT 1 as FakeCol) t
LEFT JOIN suppliers ON suppliers_purchases_articles.article_id = 150
LEFT JOIN suppliers_purchases USING(supplier_id)
LEFT JOIN suppliers_purchases_articles USING(supplierpurchase_id)
ORDER BY suppliers_purchases.supplierpurchase_id DESC
LIMIT 1
Condensed Fiddle Demo
Note I've moved the where criteria to the join. This isn't completely necessary, I just prefer the way it reads as such. If you have to leave where criteria, you don't want to negate your outer join, so you'll need to add corresponding is null checks as well.
SELECT dbo.postst.cost as Cost2014,
case when (dbo.InvNum.OrderDate) between '2014/01/01' and '2014/12/31' then dbo.Postst.Cost end as Cost2014
from dbo.postst INNER JOIN dbo.invnum on dbo.invnum.autoindex = dbo.postst.accountLink
Here is the technique I use for tracking down an issue like this:
SELECT dbo.postst.cost as Cost2014,
dbo.InvNum.OrderDate,
case when (dbo.InvNum.OrderDate) between '2014/01/01' and '2014/12/31' then dbo.Postst.Cost end as Cost2014
from dbo.postst
left JOIN dbo.invnum on dbo.invnum.autoindex = dbo.postst.accountLink
So I add the column I am doing some kind of processing on to see the values I am returning. Then I change the inner join to a left join (temporarily). Then when I run the select, I can usually see why they may not be meeting my expectations and why my query is not retuning the correct results.
In this case, you may not have any data in the right range or the join might be incorrect and thus no records are picked up at all.
What you have is a relatively simple query, If it more complex, I often use select * instead of the sepcific columns just to see if there is something in the other columns that is affecting the results. This is often the case when you have a one-many relationship and want to get only one record but are getting duplicates in the fields you selected.