Just joined the group yesterday, and have already received great advice from the community. I have another scenario I need a bit of a push with. The advice I received yesterday resolved my first issue, but I have a 2nd scenario I would like to try implementing.
I have the following Postgresql query that converts a "size" value from a json column type from bytes to MB:
SELECT c.name AS customer, i.customer_id, i.captured_at, i.name AS image,
i.raw_upload_complete
, pg_size_pretty((i.manifest ->> 'size')::numeric) AS image_size
FROM public.images i
JOIN public.customers c ON c.id = i.customer_id
WHERE i.raw_upload_complete = 'true' AND captured_at > date_trunc('day', now()) -
interval '2 months'
ORDER BY customer ASC
Here is an example of the output:
Customer Customer ID captured_at image raw_upload_complete image_size
Customer 1 250 2022-05-09 Ventures Pit TRUE 4044 MB
Customer 1 250 2022-06-01 Ventures Pit TRUE 500 MB
Customer 2 85 2022-04-18 Devault Quarry TRUE 672 MB
Customer 2 85 2022-05-02 Talmage Quarry TRUE 3876 MB
The query works great, however, what I would like to do is group the result set by Customer, and SUM the image_size for each customer. So in the above example, I'd like to have the customers grouped as they are, and reflect customer 1 has a total image size of 4544 MB, and Customer 2 has a total image size of 4548 MB. Ideally I'd like each customer summarized on one line, or in some kind of high-level rolled-up fashion.
I mentioned in my first post I am fairly new at SQL so the solutions could be obvious, but I've been struggling with this one. I've tried the crosstab function to no avail, but I'm not sure if there is a better option. The thing that seems to have been giving me issues is that the image_size column that was created comes out as TEXT, not numeric, which has been causing issues. I have been using Google Data Studio and thought simply changing the data type would help, but even though it would let me change it to a number, it was still being recognized as a text field.
Thanks very much.
I'm going to work this backwards. First, let's write a query that gets the total image size by customer
select c.id, pg_size_pretty(sum((i.manifest ->> 'size')::numeric)) as total
from images i
join customers c on i.customer_id = c.id
where [snippped]
group by c.id
Note that you want to do the sum(), and then apply pg_size_pretty. That's because the results of pg_size_pretty() will have the "MB", "GB" stuff inside, so it's actually text and not a number anymore.
Now we can use that query as a CTE to get the result you want
with total_size as (
select c.id, pg_size_pretty(sum((i.manifest ->> 'size')::numeric)) as total
from images i
join customers c on i.customer_id = c.id
where [snippped]
group by c.id
) as total
SELECT c.name AS customer, i.customer_id, i.captured_at, i.name AS image,
i.raw_upload_complete
, total_size.total
FROM public.images i
JOIN public.customers c ON c.id = i.customer_id
JOIN total_size on c.id = total_size.id
WHERE i.raw_upload_complete = 'true' AND captured_at > date_trunc('day', now()) -
interval '2 months'
ORDER BY customer ASC
So if you look carefully, you'll see that we have the additional join to the "total" CTE, and that the SELECT clause uses the "total" value from the CTE.
The reason why you have to do this in two steps is that your desired result mixes detailed image data (e.g. "captured_at") with summary image data (the total size). That kind of mixing trips up a lot of people who are learning SQL.
Related
I am trying to grab all of the SUMS for columns, as I am doing this I want to JOIN the shipped filed so I am only grabbing SUMS of current orders that have not shipped. From my efforts I keep getting a result in which all the columns are wrong amounts calculated. THANK YOU IN ADVANCE.
I believe I need to write a subQ, but I am having issues trying to get there.
If I remove the JOIN, the result is perfect, no problems, but I need the join so I am only calculating non shipped items. I believe its pulling some other records from the join table. THANK YOU IN ADVANCE.
SELECT XX.order_num, XX.shipped, PP.order_num AS JON, PP.part_num AS JPT, SUM(PP.total_qty) AS QTY, SUM(PP.work_time) AS WT,SUM(PP.setup_time) AS ST,SUM(PP.scrap) AS SC
FROM PP
JOIN XX
ON XX.order_num = PP.order_num
WHERE PP.department='RIBBON'
AND PP.ribbon_type='CRIMPING' AND XX.shipped IS NULL
GROUP BY part_num
ORDER BY PP.order_num DESC
I am getting this:
so185702 6609628 8,120 92.67 HRS 1.92 HRS 0
When it should read this:
so185702 6609628 760 545 15 0
I just need help in writing the subQ, I am still a beginner. THANK YOU.
When you mix normal columns with aggregate functions (like sum), you need the GROUP BY-clause where you list all columns in SELECT which do not have the aggregate function. In your query the first four columns:
SELECT
XX.order_num,
XX.shipped,
PP.order_num AS JON,
PP.part_num AS JPT,
SUM(PP.total_qty) AS QTY,
SUM(PP.work_time) AS WT,
SUM(PP.setup_time) AS ST,
SUM(PP.scrap) AS SC
FROM PP
JOIN XX ON XX.order_num = PP.order_num
WHERE PP.department='RIBBON' AND PP.ribbon_type='CRIMPING' AND XX.shipped IS NULL
GROUP BY XX.order_num, XX.shipped, PP.order_num, PP.part_num
ORDER BY PP.order_num DESC
http://sqlfiddle.com/#!9/e6effb/1
I'm trying to get a top 10 by revenue per brand for France on december.
There are 2 tables (first table has date, second table has brand and I'm trying to join them)
I get this error "FUNCTION db_9_d870e5.SUM does not exist. Check the 'Function Name Parsing and Resolution' section in the Reference Manual"
Is my use of Inner join there correct?
It's because you had an extra space after SUM. Please change it from
SUM (o1.total_net_revenue)to SUM(o1.total_net_revenue).
See more about it here.
Also after correcting it, your query still had more error as you were not selecting order_id on your intermediate table i2 so edited here as :
SELECT o1.order_id, o1.country, i2.brand,
SUM(o1.total_net_revenue)
FROM orders o1
INNER JOIN (
SELECT i1.brand, SUM(i1.net_revenue) AS total_net_revenue,order_id
FROM ordered_items i1
WHERE i1.country = 'France'
GROUP BY i1.brand
) i2
ON o1.order_id = i2.order_id AND o1.total_net_revenue = i2.total_net_revenue
AND o1.total_net_revenue = i2.total_net_revenue
WHERE o1.country = 'France' AND o1.created_at BETWEEN '2016-12-01' AND '2016-12-31'
GROUP BY 1,2,3
ORDER BY 4
LIMIT 10`
--EDIT stack Fan is correct that the o2.total_net_revenue exists. My confusion was because the data structure duplicated three columns between the tables, including one that was being looked for.
There were a couple errors with your SQL statement:
1. You were referencing an invalid column in your outer-select-SUM function. I believe you're actually after i2.total_net_revenue.
The table structure is terrible, the "important" columns (country, revenue, order_id) are duplicated between the two tables. I would also expect the revenue columns to share the same name, if they always have the same values in them. In the example, there's no difference between i1.net_revenue and o1.total_net_revenue.
In your inner join, you didn't reference i1.order_id, which meant that your "on" clause couldn't execute correctly.
PROTIP:
When you run into an issue like this, take all the complicated bits out of your query and get the base query working correctly first. THEN add your functions.
PROTIP:
In your GROUP BY clause, reference the actual columns, NOT the column numbers. It makes your query more robust.
This is the query I ended up with:
SELECT o1.order_id, o1.country, i2.brand,
SUM(i2.total_net_revenue) AS total_rev
FROM orders o1
INNER JOIN (
SELECT i1.order_id, i1.brand, SUM(i1.net_revenue) AS total_net_revenue
FROM ordered_items i1
WHERE i1.country = 'France'
GROUP BY i1.brand
) i2
ON o1.order_id = i2.order_id AND o1.total_net_revenue = i2.total_net_revenue
AND o1.total_net_revenue = i2.total_net_revenue
WHERE o1.country = 'France' AND o1.created_at BETWEEN '2016-12-01' AND '2016-12-31'
GROUP BY o1.order_id, o1.country, i2.brand
ORDER BY total_rev
LIMIT 10
I am getting wrong results in the sum of total deposits.
I want to output a report of total deposits per campaign_name
and eventually inside a date range.
SELECT IFNULL(campaign_name,'DIRECT'),
IFNULL(TotalDeposit,0)
FROM trackings
LEFT JOIN
(SELECT deposit_amount,
sum(deposit_amount) AS TotalDeposit,
uuid
FROM conversions
LEFT JOIN transactions ON conversions.trader_id = transactions.trader_id
WHERE aff_id =3
AND TYPE='deposit'
GROUP BY transactions.trader_id) AS conversions ON trackings.uuid = conversions.uuid
WHERE aff_id=3
GROUP BY campaign_name
results: missing 200 from trynow campaign??
campaign_name,TotalDeposit
DIRECT,0.00
new_campaign_name,0.00
test march,500.00
testing,0.00
trynow,800.00
expected results:
campaign_name,TotalDeposit
DIRECT,0.00
new_campaign_name,0.00
test march,500.00
testing,0.00
trynow,1000.00
I think your data isn't quite right - using the data that you've supplied, the deposit of 500 for test march is never going to be returned, as it is linked to trader_id 7506, who has no records in the conversions table.
However, the following query is simpler and easier to understand, and correctly returns 1000 for trynow
SELECT
IFNULL(SUM(t.deposit_amount),0) AS total_deposits
, IFNULL(tr.campaign_name,'DIRECT') AS campaign
FROM
trackings tr LEFT JOIN
conversions c ON
tr.uuid = c.uuid LEFT JOIN
transactions t ON
c.trader_id = t.trader_id AND
tr.`aff_id` = t.aff_id AND
t.type = 'Deposit'
WHERE
tr.aff_id = 3 AND
tr.updated_at >= '2015-03-01' AND tr.updated_at < '2015-04-01'
GROUP BY
IFNULL(tr.campaign_name,'DIRECT')
If you can check the test data supplied or otherwise point me in the right direction, I might be able to improve the query to return exactly what you want.
For date filtering, see the addition to the where clause above. NOte that if you need to filter on a date in the transactions table, the date filtering clause must be part of the "on" statement instead (as this table is left-joined, so we can't filter in the main where clause).
I have this query wherein I want to find out the sales for the current year and the sales for last year. I cannot make it into 2 separate queries since it has to be of the same item code. Meaning the item codes used in the sales for the current year must also be the item codes used for the sales last year.
The code below is working but it takes almost 8 to 9 minutes to fetch
select p.itemcode,
p.itemdescription,
( select round((SUM(SA.QUANTITY*P.SellingPrice)),2)
from sales s
join product p on s.itemcode=p.itemcode
where YEAR(s.date) = 2013
),
( select round((SUM(SA.QUANTITY * P.SellingPrice)),2)
from sales s
join product p on s.itemcode=p.itemcode
where YEAR(s.date) = 2012
)
from product p
join supplier s on p.suppliercode = s.suppliercode
join currency c on c.countrycode=s.countrycode
join country co on co.countrycode=c.countrycode
JOIN SALES SA ON SA.ITEMCODE=P.ITEMCODE
where c.countrycode = 'NZ'
group by p.itemcode
limit 10
Ideally the output should be
Itemcode Itemdescription SalesforCurrentYear SalesforLastYear
GS771516 BUBBLE PARTY MACHINE 1035300.00 2079300.00
GSNBC-025 X'MAS HOUSE 600612.25 1397163.25
GSNBC-031 BRANDENBURGER TOR 741010.75 1572207.25
Thanks!!
The query can be simplified by eliminating two joins:
select .......
.......
from product p
join supplier s on p.suppliercode = s.suppliercode
JOIN SALES SA ON SA.ITEMCODE=P.ITEMCODE
where s.countrycode = 'NZ'
group by p.itemcode
limit 10
Afterwards, two dependent subqueries in the select clause can be reduced to one outer join:
select p.itemcode,
p.itemdescription,
round((SUM( CASE WHEN YEAR(s.date) = 2013
THEN SA.QUANTITY*P.SellingPrice
ELSE 0 END
)),2) As Sum2013,
round((SUM( CASE WHEN YEAR(s.date) = 2012
THEN SA.QUANTITY * P.SellingPrice
ELSE 0 END
)),2) As Sum2012
from product p
join supplier s on p.suppliercode = s.suppliercode
LEFT JOIN SALES SA ON SA.ITEMCODE=P.ITEMCODE
where s.countrycode = 'NZ'
group by p.itemcode
limit 10
Please try this query and let us know how it will perform.
Follow any of these steps
1.You can parse your query.
2.Remove redundant statements.
3.Use inner join or outer join.
You've got sales three times in the same scope. I'd get rid of two of those, and that should help. Also, in terms of business logic, all of these tables seem mandatory for a sale. If that's true, you should use "inner join", for compatibility with standard SQL - even though it's the same in MySQL.
Based on my research, this is a very common problem which generally has a fairly simple solution. My task is to alter several queries from get all results into get top 3 per group. At first this was going well and I used several recommendations and answers from this site to achieve this (Most Viewed Products). However, I'm running into difficulty with my last one "Best Selling Products" because of multiple joins.
Basically, I need to get all products in order by # highest sales per product in which the maximum products per vendor is 3 I've got multiple tables being joined to create the original query, and each time I attempt to use the variables to generate rankings it produces invalid results. The following should help better understand the issue (I've removed unnecessary fields for brevity):
Product Table
productid | vendorid | approved | active | deleted
Vendor Table
vendorid | approved | active | deleted
Order Table
orderid | `status` | deleted
Order Items Table
orderitemid | orderid | productid | price
Now, my original query to get all results is as follows:
SELECT COUNT(oi.price) AS `NumSales`,
p.productid,
p.vendorid
FROM products p
INNER JOIN vendors v ON (p.vendorid = v.vendorid)
INNER JOIN orders_items oi ON (p.productid = oi.productid)
INNER JOIN orders o ON (oi.orderid = o.orderid)
WHERE (p.Approved = 1 AND p.Active = 1 AND p.Deleted = 0)
AND (v.Approved = 1 AND v.Active = 1 AND v.Deleted = 0)
AND o.`Status` = 'SETTLED'
AND o.Deleted = 0
GROUP BY oi.productid
ORDER BY COUNT(oi.price) DESC
LIMIT 100;
Finally, (and here's where I'm stumped), I'm trying to alter the above statement such that I received only the top 3 product (by # sold) per vendor. I'd add what I have so far, but I'm embarrassed to do so and this question is already a wall of text. I've tried variables but keep getting invalid results. Any help would be greatly appreciated.
Even though you specify LIMIT 100, this type of query will require a full scan and table to be built up, then every record inspected and row numbered before finally filtering for the 100 that you want to display.
select
vendorid, productid, NumSales
from
(
select
vendorid, productid, NumSales,
#r := IF(#g=vendorid,#r+1,1) RowNum,
#g := vendorid
from (select #g:=null) initvars
CROSS JOIN
(
SELECT COUNT(oi.price) AS NumSales,
p.productid,
p.vendorid
FROM products p
INNER JOIN vendors v ON (p.vendorid = v.vendorid)
INNER JOIN orders_items oi ON (p.productid = oi.productid)
INNER JOIN orders o ON (oi.orderid = o.orderid)
WHERE (p.Approved = 1 AND p.Active = 1 AND p.Deleted = 0)
AND (v.Approved = 1 AND v.Active = 1 AND v.Deleted = 0)
AND o.`Status` = 'SETTLED'
AND o.Deleted = 0
GROUP BY p.vendorid, p.productid
ORDER BY p.vendorid, NumSales DESC
) T
) U
WHERE RowNum <= 3
ORDER BY NumSales DESC
LIMIT 100;
The approach here is
Group by to get NumSales
Use variables to row number the sales per vendor/product
Filter the numbered dataset to allow for a max of 3 per vendor
Order the remaining by NumSales DESC and return only 100
I like this elegant solution, however when I run an adapted but similar query on my dev machine I get a non-deterministic result-set returned. I believe this is due to the way the MySql optimiser deals with assigning and reading user variables within the same statement.
From the docs:
As a general rule, you should never assign a value to a user variable and read the value within the same statement. You might get the results you expect, but this is not guaranteed. The order of evaluation for expressions involving user variables is undefined and may change based on the elements contained within a given statement; in addition, this order is not guaranteed to be the same between releases of the MySQL Server.
Just adding this note here in case someone else comes across this weird behaviour.
The answer given by #RichardTheKiwi worked great and got me 99% of the way there! I am using MySQL and was only getting the first row of each group marked with a row number, while the rest of the rows remained NULL. This resulted in the query returning only the top hit for each group rather than the first three rows. To fix this, I had to initialize #r in the initvars subquery. I changed,
from (select #g:=null) initvars
to
from (select #g:=null, #r:=null) initvars
You could also initialize #r to 0 and it would work the same. And for those less familiar with this type of syntax, the additional section is reading through each sorted group and if a row has the same vendorid as the previous row, which is tracked with the #g variable, it increments the row number, which is stored in the variable #r. When this process reaches the next group with a new vendorid, the IF statement will no longer evaluate as true and the #r variable (and thereby the RowNum) will be reset to 1.