Edit2: Chose to separate the queries and collate/handle the information as a whole outside of the database's output. Taking these out in a .CSV format, and adding them into Excel where I'm going to be running the actual numbers.
Query 1 to pull out orders and desired info:
SELECT
shipstation_orders_v2.id AS SSO_id,
shipstation_orders_v2.order_number AS SSO_orderNumber,
shipstation_orders_v2.order_id AS SSO_orderID,
shipstation_orders_v2.storename AS SSO_storeName,
shipstation_orders_v2.order_date AS SSO_orderDate,
shipstation_orders_v2.order_total AS SSO_orderTotal,
shipstation_orders_v2.name AS SSO_name,
shipstation_orders_v2.company AS SSO_company
FROM shipstation_orders_v2
GROUP BY shipstation_orders_v2.id,
shipstation_orders_v2.order_number,
shipstation_orders_v2.order_id,
shipstation_orders_v2.storename,
shipstation_orders_v2.order_date,
shipstation_orders_v2.order_total,
shipstation_orders_v2.name,
shipstation_orders_v2.company
ORDER BY SSO_orderDate
Query 2 to pull out fulfillments and equivalent info:
SELECT DISTINCT
shipstation_orders_v2.id AS SSO_id,
shipstation_fulfillments.id AS SSF_id,
shipstation_fulfillments.order_number AS SSF_orderNumber,
shipstation_orders_v2.order_number AS SSO_orderNumber,
shipstation_orders_v2.order_id AS SSO_orderID,
shipstation_orders_v2.storename AS SSO_storeName,
shipstation_orders_v2.order_date AS SSO_orderDate,
shipstation_fulfillments.order_date AS SSF_orderDate,
shipstation_orders_v2.order_total AS SSO_orderTotal,
shipstation_fulfillments.amount_paid AS SSF_amountPaid,
shipstation_orders_v2.name AS SSO_name,
shipstation_orders_v2.company AS SSO_company,
shipstation_fulfillments.name AS SSF_name,
shipstation_fulfillments.company AS SSF_company
FROM shipstation_fulfillments
INNER JOIN shipstation_orders_v2
ON shipstation_fulfillments.order_number =
shipstation_orders_v2.order_number
WHERE shipstation_fulfillments.order_number =
shipstation_orders_v2.order_number
GROUP BY shipstation_orders_v2.id,
shipstation_fulfillments.id,
shipstation_fulfillments.order_number,
shipstation_orders_v2.order_number,
shipstation_orders_v2.order_id,
shipstation_orders_v2.storename,
shipstation_orders_v2.order_date,
shipstation_fulfillments.order_date,
shipstation_orders_v2.order_total,
shipstation_fulfillments.amount_paid,
shipstation_orders_v2.name,
shipstation_orders_v2.company,
shipstation_fulfillments.name,
shipstation_fulfillments.company
Edit: Question marked as answered. I figured out another way to do it that wasn't quite as harebrained. Props to DRapp for getting my brain moving.
Original Code is below Wall of Text
I'm a self-taught MySQL database user. I won't say administrator, since it's just me. I've put together a small database for work - about 60,000 rows and a maximum of 51 columns spread over three tables. I use this at work as a way to organize a fairly disparate sales data setup and make sense of it to identify trends, seasonality, all that good stuff. I work primarily with Shipstation data.
My problem is when I needed to introduce this third table. With two tables, obviously, it's just a simple JOIN. I got that working just fine. I'm having quite a bit of trouble setting up the JOINs correctly for this third table.
I'm attempting to JOIN the data from the two innermost queries to shipstation_orders_v2 and order_keys to the shipstation_fulfillments results I have in the third table.
For those of you who don't use Shipstation or aren't familiar with this element of it, fulfillments are in a different category than orders and don't use quite the same data. This is my dirty way of gluing them together so we have some decent, manipulable information on sales and shipping trends, etc.
I am making an internal query from shipstation_orders_v2 to order_keys as a way to SELECT DISTINCT the sum totals of split orders. I had problems with data duplication before I built up that subquery. With the (now) subquery and sub-subquery, the duping problem has been eliminated and with just those two tables it worked fine.
The issue is, when I'm making the SELECT from shipstation_fulfillments with a JOIN to the subquery and sub-subquery, I'm hitting a roadblock.
I've gotten several errors while working on this query. In order of occurrence and resolution:
Error 2013, lost connection to server during query (which told me I'm doing a full table read on three joined tables, since it isn't erroring out beforehand, but my rinkadink setup can't handle it). I got rid of that one.
Then, Error 1051 for an unidentified table name shipstation_fulfillments. To me I think it might be an issue for the query aliases. I am not sure.
Finally, good ole Error 1064, incorrect syntax on the first subquery after the
SELECT shipstation_fulfillments arguments.
Being self-taught, I'd virtually guarantee I'm merely missing an element of syntax somewhere that would appear fairly obvious to a well-practiced user of MySQL. Below is my current query setup.
If there needs to be any clarification, let me know.
SELECT
`shipstation_fulfillments`.`order_date` AS `orderDate`,
`shipstation_fulfillments`.`order_number` AS `orderNumber`,
(`shipstation_fulfillments`.`amount_paid` + `shipstation_fulfillments`.`tax_paid`) AS "Total Paid",
`shipstation_fulfillments`.`name` AS `name`,
`shipstation_fulfillments`.`company` AS `company`,
FROM
(
(SELECT
COUNT(`shipstation_orders_v2`.`order_key`) AS `orderCount`,
`shipstation_orders_v2`.`key_id` AS `key_id`,
`shipstation_orders_v2`.`order_number` AS `order_number`,
MAX(`shipstation_orders_v2`.`order_date`) AS `order_date`,
`shipstation_orders_v2`.`storename` AS `store`,
(`shipstation_orders_v2`.`order_total` - `shipstation_orders_v2`.`shippingPaid`) AS `orderPrice`,
`shipstation_orders_v2`.`shippingpaid` AS `shippingPaid`,
SUM(`shipstation_orders_v2`.`shippingpaid`) AS `SUM shippingPaid`,
`shipstation_orders_v2`.`order_total` AS `orderTotal`,
SUM(`shipstation_orders_v2`.`order_total`) AS `SUM Total Amount Paid`,
`shipstation_orders_v2`.`qtyshipped` AS `qtyShipped`,
SUM(`shipstation_orders_v2`.`qtyshipped`) AS `SUM qtyShipped`,
`shipstation_orders_v2`.`name` AS `name`,
`shipstation_orders_v2`.`company` AS `company`
FROM
(SELECT DISTINCT
`order_keys`.`key_id` AS `key_id`,
`order_keys`.`order_key` AS `order_key`,
`shipstation_orders_v2`.`order_number` AS `order_number`,
`shipstation_orders_v2`.`order_id` AS `order_id`,
`shipstation_orders_v2`.`order_date` AS `order_date`,
`shipstation_orders_v2`.`storename` AS `storename`,
`shipstation_orders_v2`.`order_total` AS `order_total`,
`shipstation_orders_v2`.`qtyshipped` AS `qtyshipped`,
`shipstation_orders_v2`.`shippingpaid` AS `shippingpaid`,
`shipstation_orders_v2`.`name` AS `name`,
`shipstation_orders_v2`.`company` AS `company`
FROM
(`shipstation_orders_v2`
JOIN `order_keys` ON ((`order_keys`.`order_key` = `shipstation_orders_v2`.`order_id`)))) `t`)
JOIN `shipstation_fulfillments`
ON (`shipstation_orders_v2`.`order_number` = `shipstation_fulfillments`.`order_number`)) `w`
As a couple notes... As for long table names, no problem, but you can use alias references to them such as I have done via example ...ShipStation_Fulfillments SSF... the "SSF" is now an alias for shorter typing yet still makes sense of origin.
When changing column names in query via "AS", you only need the as if your column name result will change from its original as you had in the beginning such as SSF.order_date AS orderDate where you remove the "_" from the final column name, but also in "Total Paid" (yet I HATE column names with embedded spaces, let the user interface handle labeling things, but thats just me).
When typing table.column (or alias.column), doing via CamelCasing helps readability vs camelcasing slightly harder to read where the brain naturally breaks into readable words for us.
Other issue based on query. Outer query portions can't recognize aliases from inner closed queryies, only the alias of the subselect as you had with the "t" and "w" aliases.
Next, when doing JOINs, my preference is to read them in the way the tables are within the query listing the first one on the left, and whatever is joined TO on the right.
If went from Table A Join to Table B, the ON clause would be ON A.KeyID = B.KeyID vs B.KeyID = A.KeyID especially if you are going several tables... A->B, B->C, C->D
Any query with aggregates (sum, avg, count, min, max, etc) must have a "GROUP BY" clause to identify when each record should break. In your example, I would assume break on the original sales order.
Although this query IS NOT WORKING, here is a cleaned-up version of your query showing implementations from above.
SELECT
SSF.order_date AS OrderDate,
SSF.order_number AS OrderNumber,
(SSF.amount_paid + SSF.tax_paid) AS `Total Paid`,
SSF.name,
SSF.company
FROM
( SELECT
SSOv2.key_id,
SSOv2.order_number,
SSOv2.storename AS store,
SSOv2.order_total - SSOv2.shippingPaid AS OrderPrice,
SSOv2.ShippingPaid,
SSOv2.order_total AS OrderTotal,
SSOv2.QtyShipped,
SSOv2.name,
SSOv2.company,
COUNT(SSOv2.order_key) AS orderCount,
MAX(SSOv2.order_date) AS order_date,
SUM(SSOv2.shippingpaid) AS `SUM shippingPaid`,
SUM(SSOv2.order_total) AS `SUM Total Amount Paid`,
SUM(SSOv2.qtyshipped) AS `SUM qtyShipped`
FROM
( SELECT DISTINCT
OK.key_id AS key_id,
OK.order_key AS order_key,
SSOv2.order_number AS order_number,
SSOv2.order_id AS order_id,
SSOv2.order_date AS order_date,
SSOv2.storename AS storename,
SSOv2.order_total AS order_total,
SSOv2.qtyshipped AS qtyshipped,
SSOv2.shippingpaid AS shippingpaid,
SSOv2.name AS name,
SSOv2.company AS company
FROM
shipstation_orders_v2 SSOv2
JOIN order_keys
ON SSOv2.order_id = OK.order_key
JOIN shipstation_fulfillments SSF
ON SSOv2.order_number = SSF.order_number ) t
) w
Next, without seeing actual data or listed structures critical to solve the query, I will ask you edit your existing post. Create a sample table listing table, columns and sample data so we can see the basis of what you are aggregating and trying to get out of the query. Especially show where there could be multiple rows per order and fulfillment respectively and a sample answer of what you EXPECT the results to show.
I am sure this question has already been answered, but I can't find it or the answer was too complicated. I am new to SQL and am not sure how to word this generically.
I have a mySQL database of software installed on devices. My query to pull all the data has more fields and more joins, but for brevity I just included a few. I need to add another dimension to create a report that lists every case where a device has more than one installation of software from the same product family.
sample
Right now I have code kind of like this and it is not doing what I need. I have seen some info on exists but the examples didn't account for multiple joins so the syntax escapes me. Help?
select
devices.name,
sw_inventory.product,
products.family_name,
sw_inventory.ignore_usage,
from sw_inventory
inner join products
on sw_inventory.product=products.product_name
inner join devices
on sw_inventory.device_name=devices.name
where sw_inventory.ignore=0
group by devices.name, products.family_name
There are plenty of answers out there on this topic but I definitely understand not always knowing terminology. you are looking for how to find duplicates values.
Basically this is a two step process. 1 find the duplicates 2 relate that back to the original records if you want those. Note the second part is optional.
So to literally find all of the duplicates of the query you provided
ADD HAVING COUNT(*) > 1 after group by statements. If you want to know how many duplicates add a calculated column to count them.
select
devices.name,
sw_inventory.product,
products.family_name,
sw_inventory.ignore_usage,
NumberOfDuplicates = COUNT(*)
from sw_inventory
inner join products
on sw_inventory.product=products.product_name
inner join devices
on sw_inventory.device_name=devices.name
where sw_inventory.ignore=0
group by devices.name, products.family_name
HAVING COUNT(*) > 1
The most programming I do is shell scripts or some python or perl, so I have a basic understanding after Googling for a bit. A little background on what i’m doing. We have two inventory systems: Our’s and Their’s. I need to pull information from Their’s to compare with Our’s.
The problem: We need UPC’s to do inventory while They provide SKU’s in their monthly inventory report.
The solution: Join 3 tables. I’m using phpmyadmin to manage a mysql backend on a recent install of Debain Wheezy.
The first query takes every SKU They have in Their inventory, compares it with the UPC in Their listings and, since not every ‘UPC’ is actually a UPC, compares it to the UPC in Our listings. It looks like this:
SELECT TheirListings.upc, SUM(TheirInventory.quantity)
FROM TheirInventory
JOIN TheirListings
ON TheirListings.sku = TheirInventory.sku
JOIN OurListings
ON TheirListings.upc = OurListings.Upc
GROUP BY TheirListings.upc
ORDER BY TheirListings.upc
And it seems to work well. Our system is happy with it and it makes me happy because this reduces manual entry by 96%. Now I need to get everything this didn’t catch: the 4% that does need manual entry. Our tables shortened for brevity like this:
TheirListings TheirInventory OurListings
upc sku upc
sku quantity
I need to select all the SKU’s and associated quantities:
SELECT TheirInventory.sku, SUM(TheirInventory.quantity)
FROM TheirInventory
LEFT OUTER JOIN TheirListings
ON TheirListings.sku = TheirInventory.sku
LEFT OUTER JOIN OurListings
ON TheirListings.upc = OurListings.upc
WHERE OurListings.upc IS NULL
OR TheirListings.upc IS NULL
GROUP BY TheirInventory.sku
ORDER BY TheirInventory.sku
To double check it’s catching the remainder, I did SELECT COUNT(TheirInventory.sku) for both of those queries and another to return the total of sku’s. Adding my two queries gives me exactly 1 more than expected. I’m not sure where I went wrong.
The first thing I would guess is that they're not working on the same datasets. If you ran the first one a few days ago, perhaps some data has changed causing it to show up in the second query?
If not, what I would do is
SELECT SKU
FROM TheirInventory
WHERE SKU IN (<1st query>)
AND SKU IN (<2nd query>)
This will tell you what is showing up in both, and you can diagnose from there.
I am confused on this MySQL select query, I get the correct information back except the COUNT(messages) and COUNT(project_ideas) are coming back twice as many.
SELECT
create_project.title,
image1,
create_project.description,
create_project.date,
create_project.active,
create_project.completed,
create_project.project_id,
categories.name,
messages.receiver_read,
project_ideas.project_id,
COUNT(messages.ideas_id) AS num_of_messages,
COUNT(project_ideas.ideas_id) AS num_of_ideas
FROM
create_project
LEFT JOIN project_ideas ON create_project.project_id = project_ideas.project_id
LEFT JOIN messages ON messages.project_id = create_project.project_id
JOIN categories ON create_project.category = categories.category_id
WHERE
create_project.user_id = {$_SESSION['user']['user_id']}
AND create_project.active = 1
AND create_project.completed = 1
GROUP BY project_ideas.project_id
ORDER BY create_project.date ASC
Any help would be appreciated thanks.
If there is more than one row in your create_project table that matches to a single row in your messages table, then the row in messages will show up once for each matching row in create_project. Additionally, since you have many joins, you have many places for duplicate rows to show up. If a project belongs to more than one category, for example, your join against categories will result in every row from the other tables being duplicated for each category that a project belongs to. I'd wager this is actually the source of your error. And what makes it so insidious is that the GROUP BY hides the duplication everywhere except in functions that do counting and summing.
#Wrikken's comment is correct and useful. If you remove the GROUP BY, you'll see every row included in the count. There you should see that rows from the messages table are repeated. As #Wrikken also said, you can mitigate this by using COUNT(DISTINCT ...). I would try, however, to make sure your joins are correct or that your table data is correct, before papering over the problem with a COUNT(DISTINCT ...). That is to say, make sure that COUNT(DISTINCT ...) really makes logical sense in terms of the data you are looking for.
Unrelated to your action question, I had to point out something that I see (and have done myself before I knew better). Although MySQL lets you include columns in your select list that are not in a GROUP BY or an aggregate function (e.g., COUNT()), it's bad practice to do so. The results are technically undefined (see: http://dev.mysql.com/doc/refman/5.0/en/group-by-extensions.html). I think MySQL is wrong for doing this, but it's not my call. Other database systems would flag this as an error.
Try this:
COUNT(messages.ideas_id) OVER(PARTITION BY messages.project_id) AS num_of_messages,
COUNT(project_ideas.ideas_id) OVER(PARTITION BY project_ideas.project_id) AS num_of_ideas
I'm having an issue getting this SQL query to work properly.
I have the following query
SELECT apps.*,
SUM(IF(adtracking.appId = apps.id AND adtracking.id = transactions.adTrackingId, transactions.payoutAmount, 0)) AS 'revenue',
SUM(IF(adtracking.appId = apps.id AND adtracking.type = 'impression', 1, 0)) AS 'impressions'
FROM apps, adtracking, transactions
WHERE apps.userId = '$userId'
GROUP BY apps.id
Everything is working, HOWEVER for the 'impressions' column I am generating in the query, I am getting a WAY larger number than there should be. For example, one matching app for this query should only have 72 for 'Impressions' yet it is coming up with a value of over 3,000 when there aren't even that many rows in the adtracking table. Why is this? What is wrong here?
Your problem is you have no join conditions, so you are getting every row of every table being joined in your query result - called a cartesian product.
To fix, change your FROM clause to this:
FROM apps a
LEFT JOIN adtracking ad ON ad.appId = a.id
LEFT JOIN transactions t ON t.adTrackingId = ad.id
You haven't provided the schema for your tables, so I guessed the names of the relevant columns - you may have to adjust them. Also, your transaction table may join to adtracking - it's impossible to know from your question, so agin you have have to alter things slightly. Hopefully you get the idea.
Edit:
Note: your group-by clause is incorrect. You either need to list every column of apps (not recommended), or change your select to only select the id column from apps (recommended). Change your select to this:
SELECT apps.id,
-- rest of query the same
Otherwise you'll get weird, incorrect, results.