Refine SQL Query given list of ids - mysql

I am trying to improve this query given that it takes a while to run. The difficulty is that the data is coming from one large table and I need to aggregate a few things. First I need to define the ids that I want to get data for. Then I need to aggregate total sales. Then I need to find metrics for some individual sales. This is what the final table should look like:
ID | Product Type | % of Call Sales | % of In Person Sales | Avg Price | Avg Cost | Avg Discount
A | prod 1 | 50 | 25 | 10 | 7 | 1
A | prod 2 | 50 | 75 | 11 | 4 | 2
So % of Call Sales for each product and ID adds up to 100. The column sums to 100, not the row. Likewise for % of In Person Sales. I need to define the IDs separately because I need it to be Region Independent. Someone could make sales in Region A or Region B, but it does not matter. We want aggregate across Regions. By aggregating the subqueries and using a where clause to get the right ids, it should cut down on memory required.
IDs Query
select distinct ids from tableA as t where year>=2021 and team = 'Sales'
This should be a unique list of ids
Aggregate Call Sales and Person Sales
select ids
,sum(case when sale = 'call' then 1 else 0 end) as call_sales
,sum(case when sale = 'person' then 1 else 0 end) as person_sales
from tableA
where
ids in t.ids
group by ids
This will be as follows with the unique ids, but the total sales are from everything in that table, essentially ignoring the where clause from the first query.
ids| call_sales | person_sales
A | 100 | 50
B | 60 | 80
C | 100 | 200
Main Table as shown above
select ids
,prod_type
,cast(sum(case when sale = 'call' then 1 else 0 end)/CAST(call_sales AS DECIMAL(10, 2)) * 100 as DECIMAL(10,2)) as call_sales_percentage
,cast(sum(case when sale = 'person' then 1 else 0 end)/CAST(person_sales AS DECIMAL(10, 2)) * 100 as DECIMAL(10,2)) as person_sales_percentage
,mean(price) as price
,mean(cost) as cost
,mean(discount) as discount
from tableA as A
where
...conditions...
group by
...conditions...

You can combine the first two queries as:
select ids, sum( sale = 'call') as call_sales,
sum(sale = 'person') as person_sales
from tableA
where
ids in t.ids
group by ids
having sum(year >= 2021 and team = 'Sales') > 0;
I'm not exactly sure what the third is doing, but you can use the above as a CTE and just plug it in.

Related

How to assign to each row a number of times a value appears in the whole table?

I'm trying to run an SQL query on Vertica but I can't find a way to get the results I need.
Let's say I have a table showing:
productID
campaignID (ID of the sales campaign)
calendarYearWeek (calendar week when the campaign was active [usually they're active for 5 days)
countryOrigin (in which country was the product sold, as it's international sales)
valueLocal (price in local currency)
What I need to do is to find products sold in different countries and compare their prices between markets.
Sometimes the campaigns are available only in one country, sometimes in more, so to avoid having hundreds of thousands of unnecessary rows that I can't compare to others, I want to distill only those products that were available in more than 1 countryOrigin.
What's important - a product can be available in different campaigns with a different price.
That's why in my SELECT statement I added a new column:
calendarYearWeek||productID||campaignID AS uniqueItem - that way I know that I'm checking the price only for a specific product in a specific campaign during a specific week of year.
The table is also joined with another table to get exchange rates etc., so it's also GROUPed BY, so in each row I have a price and average exchange rate for a given uniqueItem in a specific country.
If I run this query, it works but even just for this year it gives me several million results, most of which I don't need because these are products sold only in one country and I need to compare prices across different markets.
So what I thought I need is to assign to each row a number of times a uniqueItem value appears in the whole table. If it's 1 - then the product is sold only in one country and I don't have to care about it. If it's 2 or 3 - this is what I need. Then I can filter out the unnecessary results in the WHERE clause ( > 1) and I can work on a smaller, better data set.
I tried different combinations of COUNT, I tried row_number + OVER(PARTITION BY) (works only partially, as when a product is available in 2 or more countries it counts the rows, but still I cannot filter out "1" because then I'll lose the "first" country on the list). I thought about MATCH_RECOGNIZED, but I've never used it before and I think it's not available in Vertica.
Sorry if it's messy, but I'm not really advanced in SQL and English is not my native language.
Do you have any ideas how to get only the data I need?
What I have now is:
SELECT
a.originCountry,
a.calendarYearWeek,
a.productID,
a.campaignId,
a.valueLocal,
ROUND(AVG(b.exchange_rate),4),
a.calendarYearWeek||a.productID||a.campaignID AS uniqueItem
FROM table1 a
LEFT JOIN table2 b
ON a.reportDate = b.reportDate
AND a.originCountry = b.originCountry
WHERE a.originCountry IN ('ES', 'DE', 'FR')
GROUP BY 3, 4, 7, 1, 5, 2
ORDER BY 3, 4, 1
----------
I need some sample data - so I make up a few rows.
You need to find the identifying grouping columns of those combinations that occur more than once in a sub select or a common table expression, to join with table1.
You need to formulate the average as an OLAP function if you want the country back in the report.
WITH
-- input, don't use in final query ..
table1(originCountry,calendarYearWeek,productID,campaignId,valuelocal,reportDate) AS (
SELECT 'ES',202203,43,142,100.50, DATE '2022-01-19'
UNION ALL SELECT 'DE',202203,43,142,135.00, DATE '2022-01-19'
UNION ALL SELECT 'FR',202203,43,142, 98.75, DATE '2022-01-19'
UNION ALL SELECT 'ES',202203,44,147,198.75, DATE '2022-01-19'
UNION ALL SELECT 'DE',202203,44,147,205.00, DATE '2022-01-19'
UNION ALL SELECT 'FR',202203,44,147,198.75, DATE '2022-01-19'
UNION ALL SELECT 'es',202203,49,150, 1.25, DATE '2022-01-19'
)
,
table2(originCountry,reportDate,exchange_rate) AS (
SELECT 'ES',DATE '2022-01-19', 1
UNION ALL SELECT 'DE',DATE '2022-01-19', 1
UNION ALL SELECT 'FR',DATE '2022-01-19', 1
)
-- end of input; real query starts here, replace following comma with "WITH" ..
,
-- you need the unique ident grouping values to join with ..
selgrp AS (
SELECT
a.calendarYearWeek
, a.productID
, a.campaignId
FROM table1 a
GROUP BY
a.calendarYearWeek
, a.productID
, a.campaignId
HAVING COUNT(*) > 1
-- chk calendarYearWeek | productID | campaignId
-- chk ------------------+--------+--------
-- chk 202203 | 43 | 142
-- chk 202203 | 44 | 147
)
SELECT
a.originCountry
, a.calendarYearWeek
, a.productID
, a.campaignId
, a.valueLocal
, AVG(b.exchange_rate) OVER w::NUMERIC(9,4) AS avg_exch_rate
-- a.calendarYearWeek||a.productID||a.campaignID AS uniqueItem
FROM table1 a
JOIN selgrp USING(calendarYearWeek,productID,campaignId)
LEFT JOIN table2 b
ON a.reportDate = b.reportDate
AND a.originCountry = b.originCountry
WHERE UPPER(a.originCountry) IN ('ES', 'DE', 'FR')
WINDOW w AS (PARTITION BY a.calendarYearWeek,a.productID,a.campaignID)
ORDER BY 3, 4, 1
-- out originCountry | calendarYearWeek | productID | campaignId | valueLocal | avg_exch_rate
-- out ---------------+------------------+-----------+------------+------------+---------------
-- out DE | 202203 | 43 | 142 | 135.00 | 1.0000
-- out ES | 202203 | 43 | 142 | 100.50 | 1.0000
-- out FR | 202203 | 43 | 142 | 98.75 | 1.0000
-- out DE | 202203 | 44 | 147 | 205.00 | 1.0000
-- out ES | 202203 | 44 | 147 | 198.75 | 1.0000
-- out FR | 202203 | 44 | 147 | 198.75 | 1.0000

Need validation that interpretation for a Grouping Query is correct

I am running the following query and at first it appears to give the sub totals for customers and shows by date each customers payment amounts only if that total for all payments is greater than $90,000.
SELECT
Customername,
Date(paymentDate),
CONCAT('$', Round(SUM(amount),2)) AS 'High $ Paying Customers'
FROM Payments
JOIN Customers
On payments.customernumber = customers.customernumber
Group by customername, Date(paymentDate) WITH ROLLUP
having sum(amount)> 90000;
But upon looking at the records for Dragon Souveniers, Ltd. and Euro+ Shopping Channel is is actually showing the paydates that have amounts individually over $90000 as well as the subtotal for that customer as a rollup. For all other customers, their individual payment dates are not reported in the result set and only their sum is if it over $90000. For example Annna's Decorations as 4 payment records and none of them are over 90000 but her sum is reported as the value for the total payments in the query with the rollup. Is this the correct interpretation?
The HAVING clause work correct, It filters all records with a total no above 90000. It also does do this for totals.
When using GROUP BY .... WITH ROLLUP, you can detect the created ROLL UP lines by using the GROUPING() function.
You should add a condition in a way that the desired columns are not filtered.
Simple example:
select a, sum(a), grouping(a<3)
from (select 1 as a
union
select 2
union select 3) x
group by a<3 with rollup;
output:
+---+--------+---------------+
| a | sum(a) | grouping(a<3) |
+---+--------+---------------+
| 3 | 3 | 0 |
| 1 | 3 | 0 |
| 1 | 6 | 1 |
+---+--------+---------------+
this shows that the last line (with grouping(i<3) == 1) is a line containing totals for a<3.

Combine results from 3 sql queries to calculate running stock

I am trying to calculate the stock by product a warehouse had over time. I have the information about today's stock, and also the amount of products sold and purchased by day. So, the calculation for yesterday values would be:
Yesterday_stock=Stock-yesterday_sold_quantity+yesterday_purchased_quantity. My problem is that i should save somewhere the amount of everyday's stock in order to calculate the stock of the previous day. I found that in order to do that i could use over sql clause with order by. But unfortunately, i have sql server 2008 and this is not a choice.
The tables are:
Prdamount which holds the current stock per product (StuPrdID ) and if it is blocked for some reason.
|-------------- |------------------|---------------
| StuPrdID | StuQAmount |prdBlockingReason
|---------------|------------------|-------------
| 12345| 16 |
|---------------|------------------|--------------
| 08889| 12 | expired
|---------------|------------------|------------
Table Moves which holds information about inserts and outputs of products. If MoveCase field has value equal 1 it is an output move, if it is a 2 it is a purchased quantity. Moves table dummy data:
|-------------- |--------------------- -|--------|-------
|MoveItemCode | MoveDate |MoveCase|MoveRealQty
|---------------|---------------------- |--------|-------
| 12345 |2018-06-24 00:00:00.000| 1 |14
|---------------|-----------------------|--------|--------
| 08889 |2018-06-24 00:00:00.000| 2 |578
|---------------|-----------------------|--------|--------
and table Product with information related with data:
|-------------- |------------------|
| PrdCode | PrdDespription |
|---------------|------------------|
| 12345| Orange juice|
|---------------|------------------|
| 08889| Chocolate|
|---------------|------------------|
I want an output like this:
|------------|--------------------- -|--------|--------------|------------
|Prdcode | PrdDescription |Stock |Stock 18/07/03|Stock 18/7/02
|------------|---------------------- |--------|--------------|------------
| 12345 |Orange Juice | 80 |50 34
|----------- |-----------------------|--------|--------------|------------
| 08889 |Chocolate | 45 |82 17
|------------|-----------------------|--------|--------------|-------------
this query gives me the running stock:
select
product.PrdCode,
product.PrdDescr,
SUM(StuQAmount) as Stock
from prdamount
left join product on (product.PrdID=prdamount.StuPrdID)
where prdamount.prdBlockingReason=' '
group by product.PrdCode,product.PrdDescr
order by product.PrdCode asc
This query gives me the quantity sold by product per day:
select
moves.MoveItemCode,
prd.PrdDescr,
moves.MoveDate,
SUM(MoveRealQty) as 'sold_quantity'
from moves
left join prd on (moves.MoveItemCode=product.PrdCode)
where (moves.MoveDate>'2018-06-01' and and moves.MoveCase=1)
group by moves.MoveItemCode,product.PrdDescr,moves.MoveDate
order by moves.MoveItemCode asc,moves.MoveDate asc
And this query gives me the quantity purchases by product per day:
select
moves.MoveItemCode,
prd.PrdDescr,
moves.MoveDate,
SUM(MoveRealQty) as 'Purchased_Quantity'
from Moves
left join product on (moves.MoveItemCode=product.PrdCode)
where (moves.MoveDate>'2018-06-01' and moves.MoveCase=2)
group by moves.MoveItemCode,product.PrdDescr,moves.MoveDate
order by moves.MoveItemCode asc,moves.MoveDate asc
I tried to combine these 3 queries into one using subqueries, but it didn't work. So how can i accomplish the result that i want? Sorry if the question is silly, i am a beginner in sql
try this,
select
product.PrdCode,
moves.MoveItemCode,
product.PrdDescr,
moves.MoveDate,
SUM( case when moves.MoveCase=1 then MoveRealQty else 0 end) as 'sold_quantity',
SUM( case when moves.MoveCase=2 then MoveRealQty else 0 end) as 'Purchased_Quantity',
(select SUM(StuQAmount) from prdamount where StuPrdID = product.PrdID and prdBlockingReason=' ')
from moves
left join product on (moves.MoveItemCode=product.PrdCode)
where (moves.MoveDate>'2018-06-01')
group by moves.MoveItemCode,product.PrdDescr,moves.MoveDate, product.PrdCode
order by moves.MoveItemCode asc,moves.MoveDate asc

Count each customer once in this query

I have two tables: one is a list of store locations (with lat/long) and the other is a customer list (with address lat/long). What I need is a query that shows how many customers are within certain ranges from each store. The goal is to have each customer counted once in the the distance range that is closest to a store. That is, each customer should only be counted once. For example, if they are 2 miles from one store and 5 from another, then only count them as being associated with the first store.
The query below is supposed to roll all this up so basically I can see the maximum distance all customers are from any store.
This is what my query looks like:
SELECT CASE
WHEN dist < 8046. THEN 1
WHEN dist < 16093. THEN 2
WHEN dist < 40233. THEN 3
WHEN dist < 80467. THEN 4
WHEN dist < 160934. THEN 5
END AS grp,count(*)
FROM (SELECT s.id, s.identifier, ST_Distance_Sphere(s.the_geom, c.the_geom) AS dist FROM full_data_for_testing_deid_2 c, demo_locations_table s)
AS loc_dist
GROUP BY grp
And here's the result:
| Count | grp |
|---------|------|
| 2860 | 1 |
| 4858 | 2 |
| 12735 | 3 |
| 11432 | 4 |
| 23950 | 5 |
| 1002970 | null |
There are only 32048 customers in my database, so this isn't quite working right. If it were, I'd expect the values to increase linearly, but in my results there are more customers in group 3 v. 4, which shouldn't be the case. In addition, groups 1-5 should add up to 32048, as each customer should only be counted once.
Any thoughts on how to adjust this such that each customer is only counted once?
To count each customer only once (in Postgres 9.3+):
SELECT CASE
WHEN s.dist < 8046.0 THEN 1
WHEN s.dist < 16093.0 THEN 2
WHEN s.dist < 40233.0 THEN 3
WHEN s.dist < 80467.0 THEN 4
WHEN s.dist < 1609340.0 THEN 5
END AS grp
, count(*)
FROM full_data_for_testing_deid_2 c
, LATERAL (
SELECT s.id, s.identifier, ST_Distance_Sphere(s.the_geom, c.the_geom) AS dist
FROM demo_locations_table s
ORDER BY dist
LIMIT 1
) s
GROUP BY 1;
This takes every customer exactly once and finds the closest location to go with it before aggregating.
But I don't think ST_Distance_Sphere() uses a GiST index on the_geom.
Consider ST_DWithin() instead if performance is an issue.
How to alter this PostGIS ST_distance_sphere query to give the answer for all points in the table, not just one?

Mysql Agregate function to select maximum and then select minimum price within that group

I am trying to get the maximum value out of a aggregate function, and then also get the min value out of a Price column which comes back in results.
id | discount | price
1 | 60 | 656
2 | 60 | 454
3 | 60 | 222
4 | 30 | 335
5 | 30 | 333
6 | 10 | 232
So in above table, I would like to separate Minimum Price vs Highest Discount.
This is the result I should be seeing:
id | discount | price
3 | 60 | 222
5 | 30 | 333
6 | 10 | 232
As you can see, its taken discount=60 group and separated the lowest price - 222, and the same for all other discount groups.
Could someone give me the SQL for this please, something like this -
SELECT MAX(discount) AS Maxdisc
, MIN(price) as MinPrice
,
FROM mytable
GROUP
BY discount
However, this doesnt separate the minimum price for each group. I think i need to join this table to itself to achieve that. Also, the table contains milions of rows, so the sql needs to be fast. One flat table.
This question is asked and answered with tedious regularity in SO. If only the algorithm was better at spotting duplicates. Anyway...
SELECT x.*
FROM my_table x
JOIN
( SELECT discount,MIN(price) min_price FROM my_table GROUP BY discount) y
ON y.discount = x.discount
AND y.min_price = x.price;
In your query, you cannot group by discount and then maximize the discount value.
This should get you the result you are looking for..
SELECT Max(ID) AS ID, discount, MIN(price) as MinPrice, FROM mytable GROUP BY discount
If you do not need the id, yo would do:
select discount, min(price) as minprice
from table t
group by discount;
If you want other columns in the row, you can either join back to the original table or use the substring_index()/group_concat() trick:
select substring_index(group_concat(id order by price), ',', 1) as id,
discount, min(price)
from table t
group by discount;
This will not always work because the intermediate result for group_concat() can overflow if there are too many matches within a column. This is controlled by a system parameter, which could be made bigger if necessary.