I'm trying to run an SQL query on Vertica but I can't find a way to get the results I need.
Let's say I have a table showing:
productID
campaignID (ID of the sales campaign)
calendarYearWeek (calendar week when the campaign was active [usually they're active for 5 days)
countryOrigin (in which country was the product sold, as it's international sales)
valueLocal (price in local currency)
What I need to do is to find products sold in different countries and compare their prices between markets.
Sometimes the campaigns are available only in one country, sometimes in more, so to avoid having hundreds of thousands of unnecessary rows that I can't compare to others, I want to distill only those products that were available in more than 1 countryOrigin.
What's important - a product can be available in different campaigns with a different price.
That's why in my SELECT statement I added a new column:
calendarYearWeek||productID||campaignID AS uniqueItem - that way I know that I'm checking the price only for a specific product in a specific campaign during a specific week of year.
The table is also joined with another table to get exchange rates etc., so it's also GROUPed BY, so in each row I have a price and average exchange rate for a given uniqueItem in a specific country.
If I run this query, it works but even just for this year it gives me several million results, most of which I don't need because these are products sold only in one country and I need to compare prices across different markets.
So what I thought I need is to assign to each row a number of times a uniqueItem value appears in the whole table. If it's 1 - then the product is sold only in one country and I don't have to care about it. If it's 2 or 3 - this is what I need. Then I can filter out the unnecessary results in the WHERE clause ( > 1) and I can work on a smaller, better data set.
I tried different combinations of COUNT, I tried row_number + OVER(PARTITION BY) (works only partially, as when a product is available in 2 or more countries it counts the rows, but still I cannot filter out "1" because then I'll lose the "first" country on the list). I thought about MATCH_RECOGNIZED, but I've never used it before and I think it's not available in Vertica.
Sorry if it's messy, but I'm not really advanced in SQL and English is not my native language.
Do you have any ideas how to get only the data I need?
What I have now is:
SELECT
a.originCountry,
a.calendarYearWeek,
a.productID,
a.campaignId,
a.valueLocal,
ROUND(AVG(b.exchange_rate),4),
a.calendarYearWeek||a.productID||a.campaignID AS uniqueItem
FROM table1 a
LEFT JOIN table2 b
ON a.reportDate = b.reportDate
AND a.originCountry = b.originCountry
WHERE a.originCountry IN ('ES', 'DE', 'FR')
GROUP BY 3, 4, 7, 1, 5, 2
ORDER BY 3, 4, 1
----------
I need some sample data - so I make up a few rows.
You need to find the identifying grouping columns of those combinations that occur more than once in a sub select or a common table expression, to join with table1.
You need to formulate the average as an OLAP function if you want the country back in the report.
WITH
-- input, don't use in final query ..
table1(originCountry,calendarYearWeek,productID,campaignId,valuelocal,reportDate) AS (
SELECT 'ES',202203,43,142,100.50, DATE '2022-01-19'
UNION ALL SELECT 'DE',202203,43,142,135.00, DATE '2022-01-19'
UNION ALL SELECT 'FR',202203,43,142, 98.75, DATE '2022-01-19'
UNION ALL SELECT 'ES',202203,44,147,198.75, DATE '2022-01-19'
UNION ALL SELECT 'DE',202203,44,147,205.00, DATE '2022-01-19'
UNION ALL SELECT 'FR',202203,44,147,198.75, DATE '2022-01-19'
UNION ALL SELECT 'es',202203,49,150, 1.25, DATE '2022-01-19'
)
,
table2(originCountry,reportDate,exchange_rate) AS (
SELECT 'ES',DATE '2022-01-19', 1
UNION ALL SELECT 'DE',DATE '2022-01-19', 1
UNION ALL SELECT 'FR',DATE '2022-01-19', 1
)
-- end of input; real query starts here, replace following comma with "WITH" ..
,
-- you need the unique ident grouping values to join with ..
selgrp AS (
SELECT
a.calendarYearWeek
, a.productID
, a.campaignId
FROM table1 a
GROUP BY
a.calendarYearWeek
, a.productID
, a.campaignId
HAVING COUNT(*) > 1
-- chk calendarYearWeek | productID | campaignId
-- chk ------------------+--------+--------
-- chk 202203 | 43 | 142
-- chk 202203 | 44 | 147
)
SELECT
a.originCountry
, a.calendarYearWeek
, a.productID
, a.campaignId
, a.valueLocal
, AVG(b.exchange_rate) OVER w::NUMERIC(9,4) AS avg_exch_rate
-- a.calendarYearWeek||a.productID||a.campaignID AS uniqueItem
FROM table1 a
JOIN selgrp USING(calendarYearWeek,productID,campaignId)
LEFT JOIN table2 b
ON a.reportDate = b.reportDate
AND a.originCountry = b.originCountry
WHERE UPPER(a.originCountry) IN ('ES', 'DE', 'FR')
WINDOW w AS (PARTITION BY a.calendarYearWeek,a.productID,a.campaignID)
ORDER BY 3, 4, 1
-- out originCountry | calendarYearWeek | productID | campaignId | valueLocal | avg_exch_rate
-- out ---------------+------------------+-----------+------------+------------+---------------
-- out DE | 202203 | 43 | 142 | 135.00 | 1.0000
-- out ES | 202203 | 43 | 142 | 100.50 | 1.0000
-- out FR | 202203 | 43 | 142 | 98.75 | 1.0000
-- out DE | 202203 | 44 | 147 | 205.00 | 1.0000
-- out ES | 202203 | 44 | 147 | 198.75 | 1.0000
-- out FR | 202203 | 44 | 147 | 198.75 | 1.0000
I am trying to improve this query given that it takes a while to run. The difficulty is that the data is coming from one large table and I need to aggregate a few things. First I need to define the ids that I want to get data for. Then I need to aggregate total sales. Then I need to find metrics for some individual sales. This is what the final table should look like:
ID | Product Type | % of Call Sales | % of In Person Sales | Avg Price | Avg Cost | Avg Discount
A | prod 1 | 50 | 25 | 10 | 7 | 1
A | prod 2 | 50 | 75 | 11 | 4 | 2
So % of Call Sales for each product and ID adds up to 100. The column sums to 100, not the row. Likewise for % of In Person Sales. I need to define the IDs separately because I need it to be Region Independent. Someone could make sales in Region A or Region B, but it does not matter. We want aggregate across Regions. By aggregating the subqueries and using a where clause to get the right ids, it should cut down on memory required.
IDs Query
select distinct ids from tableA as t where year>=2021 and team = 'Sales'
This should be a unique list of ids
Aggregate Call Sales and Person Sales
select ids
,sum(case when sale = 'call' then 1 else 0 end) as call_sales
,sum(case when sale = 'person' then 1 else 0 end) as person_sales
from tableA
where
ids in t.ids
group by ids
This will be as follows with the unique ids, but the total sales are from everything in that table, essentially ignoring the where clause from the first query.
ids| call_sales | person_sales
A | 100 | 50
B | 60 | 80
C | 100 | 200
Main Table as shown above
select ids
,prod_type
,cast(sum(case when sale = 'call' then 1 else 0 end)/CAST(call_sales AS DECIMAL(10, 2)) * 100 as DECIMAL(10,2)) as call_sales_percentage
,cast(sum(case when sale = 'person' then 1 else 0 end)/CAST(person_sales AS DECIMAL(10, 2)) * 100 as DECIMAL(10,2)) as person_sales_percentage
,mean(price) as price
,mean(cost) as cost
,mean(discount) as discount
from tableA as A
where
...conditions...
group by
...conditions...
You can combine the first two queries as:
select ids, sum( sale = 'call') as call_sales,
sum(sale = 'person') as person_sales
from tableA
where
ids in t.ids
group by ids
having sum(year >= 2021 and team = 'Sales') > 0;
I'm not exactly sure what the third is doing, but you can use the above as a CTE and just plug it in.
In a doctrine query, how can I group by only some of the variables in a composite primary key? In the case below, Category and Product (but not Iterator)?
Example
The below table, should keep a history of price changes over time (Transactions). I am trying to write a doctrine query to return the latest dated record (that precedes a reference date):
Category (PK) | Product (PK) | Iterator (PK) | CreateDT | Amount
1 1 1 2016-01-01 100
1 1 2 2016-04-01 150
1 2 1 2016-09-01 50
2 1 1 2016-01-01 75
2 1 2 2016-01-31 80
2 1 3 2016-09-01 90
Using the above as reference, the result I am trying to get is one record per Category/Product as below:
a). Considering Category=1 and comparing against a reference date 2016-02-01:
Category (PK) | Product (PK) | Iterator (PK) | CreateDT | Amount
1 1 1 2016-01-01 100
1 2 1 2016-09-01 50
b). Only comparing against a reference date 2016-02-01:
Category (PK) | Product (PK) | Iterator (PK) | CreateDT | Amount
1 1 1 2016-01-01 100
1 2 1 2016-09-01 50
2 1 2 2016-01-31 80
Code
My attempt is with the below but it does not work (in an ideal world, I would like `$category to be optional):
public function findActiveIterationByDate($category, $refdate)
{
return $this->getEntityManager()
->createQuery(
'SELECT b3
FROM AppBundle:Transaction t1
INNER JOIN
(SELECT t2
FROM AppBundle:Transaction t2, max(createDT) as createDT
WHERE t2.Category=$category, t2.createDT <= $refdate
GROUP BY t2.Category, t2.Product) as t3
ON t1.Category=t3.Category AND t1.Product=t3.Product'
)
->getResult();
}
Error
However, attempting this gives me the below error - I'm not sure if the problem is my INNER JOIN or DQL is not able to group by a partial primary key?
[Doctrine\ORM\Query\QueryException]
[Semantical Error] line 0, col 102 near '(SELECT t2
': Error: Class '(' is not defined.
The error is because you can't use a subquery in JOIN statement. You can see it in DQL syntax declartation. More specifically Join and JoinAssociationDeclaration definitions.
I think you could move the INNER JOIN statement into a SELECT subquery and make the WHERE conditions more restrictive if possible even though this would probably decrease performance.
Maybe even better would be writing a Native SQL and map the result to Doctrine entities which is relatively simple and well documented. See http://docs.doctrine-project.org/projects/doctrine-orm/en/latest/reference/native-sql.html#the-resultsetmapping.
I'm new to mysql so please help me out with this.
I have a table containing the following columns:
nr | date | hour | user | shop | brand | categ | product | price | promo
183 02/03/14 17:06 cristi 186 brand1 categ 1 prod 1 299 no
184 02/03/14 17:06 cristi 186 brand2 categ 2 prod 2 399 yes
184 01/03/14 17:06 cristi 186 brand3 categ 3 prod 3 199 no
The query that I use is
SELECT *
FROM evaluari
WHERE magazin = %s HAVING MAX(data)
Where "s" is the shop ID (186).
but that return only the first row containing 02/03/14 date. How can I show both/all rows containing the same max date?
Try not to name colums with reserved words like "date" it might cause you problems.
You can do what you want like this:
SELECT * FROM evaluari WHERE magazin = 186 AND date = (SELECT MAX(date) from evaluari WHERE magazin = 186)
Probably, not optimal, but at first swing, you could do this
SELECT * FROM evaluari
where date IN (SELECT date FROM evaluari WHERE magazin = %s HAVING MAX(date))
AND magazin = %s;
In fact, this really rubs me as nasty... going to try to figure something smoother. Stay tuned :)
I have a query here, anyone can help me to count the total duplicated fields?
SELECT *
FROM item
INNER JOIN itemgroup on item.itemgroupid = itemgroup.itemgroupid
INNER JOIN status on status.statusid = item.status
INNER JOIN owner on owner.ownerid = item.owner
INNER JOIN
(
SELECT code //, (SELECT count(*) FROM item WHERE ....) as 'total_duplicateds'
FROM item
GROUP BY code
HAVING count(code) > 1
) dup ON item.code = dup.code
Total items: 500
Total items with duplicated codes: 149
Now I get a total of 149 fields returned, how can I add this as a new field to each row?
After the slash is how I learnt to do it but this is a little higher level for me..
Can someone help me out?
To be even more specific
What I'd like to get returned is like:
itemid | code| itemname | itemgroup | owner | total_duplicateds
1 1000 X 1 1 3
2 1000 X 2 2 3
3 1001 A 1 1 3
4 1000 B 3 1 3
5 1002 U 2 1 3
Add COUNT aggregation and GROUP BY all columns that are interesting you.