have data which looks like this :
Sales Invoice #| Invoice Date|Amount|Product
75025134 | 9/20/2019 |220 | A
75025134 | 9/20/2019 |220 | A
75025135 | 9/20/2019 |222 |B
My desired output shall be Product A , Unique Invoice =1, Amount = 220 , we shall only charge customer on Unique per invoice.The system however stores Invoices# on basis of order date, ship to data. How can I normalize this in sql.
Are you simply looking for select distinct?
select distinct sales_invoice, invoice_date, amount, product
from t;
Related
Please note that I'm an absolute n00b in MySQL but somehow I managed to build some (for me) complex queries that work as they should. My main problem now is that for a many of the queries we're working on:
The querie is becoming too big and very hard to see through.
The same subqueries get repeated many times and that is adding to the complexity (and probably to the time needed to process the query).
We want to further expand this query but we are reaching a point where we can no longer oversee what we are doing. I've added one of these subqueries at the end of this post, just as an example.
!! You can fast foward to the Problem section if you want to skip the details below. I think the question can be answered also without the additional info.
What we want to do
Create a MySQL query that calculates purchase orders and forecasts for a given supplier based on:
Sales history in a given period (past [x] months = interval)
Current stock
Items already in backorder (from supplier)
Reserved items (for customers)
Supplier ID
I've added an example of a subquery at the bottom of this message. We're showing just this part to keep things simple for now. The output of the subquery is:
Part number
Units sold
Units sold (outliers removed)
Units sold per month (outliers removed)
Number of invoices with the part number in the period (interval)
It works quite OK for us, although I'm sure it can be optimised. It removes outliers from the sales history (e.g. one customer that orders 50 pcs of one product in one order). Unfortunately it can only remove outliers with substantial data, so if the first order happens to be 50 pcs then it is not considered an outlier. For that reason we take the amount of invoices into account in the main query. The amount of invoices has to exceed a certain number otherwise the system wil revert to a fixed value of "maximum stock" for that product.
As mentioned this is only a small part of the complete query and we want to expand it even further (so that it takes into account the "sales history" of parts that where used in assembled products).
For example if we were to build and sell cars, and we want to place an
order with our tyre supplier, the query calculates the amount of tyres we need to order based on the sales history of the various car models (while also taking into account the stock of the cars, reserved cars and stock of the tyres).
Problem
The query is becomming massive and incomprehensible. We are repeating the same subqueries many times which to us seems highly inefficient and it is the main cause why the query is becomming so bulky.
What we have tried
(Please note that we are on MySQL 5.5.33. We will update our server soon but for now we are limited to this version.)
Create a VIEW from the subqueries.
The main issue here is that we can't execute the view with parameters like supplier_id and interval period. Our subquery calculates the sum of the sold items for a given supplier within the given period. So even if we would build the VIEW so that it calculates this for ALL products from ALL suppliers we would still have the issue that we can't define the interval period after the VIEW has been executed.
A stored procedure.
Correct me if I'm wrong but as far as I know, MySQL only allows us to perform a Call on a stored procedure so we still can't run it against the parameters (period, supplier id...)
Even this workaround won't help us because we still can't run the SP against the parameters.
Using WITH at the beginning of the query
A common table expression in MySQL is a temporary result whose scope is confined to a single statement. You can refer this expression multiple times with in the statement.
The WITH clause in MySQL is used to specify a Common Table Expression, a with clause can have one or more comms-separated subclauses.
Not sure if this would be the solution because we can't test it. WITH is not supported untill MySQL version 8.0.
What now?
My last resort would be to put the mentioned subqueries in a temp table before starting the main query. This might not completely eliminate our problems but at least the main query will be more comprehensible and with less repetition of fetching the same data. Would this be our best option or have I overlooked a more efficient way to tackle this?
Many thanks for your kind replies.
SELECT
GREATEST((verkocht_sd/6*((100 + 0)/100)),0) as 'units sold p/month ',
GREATEST(ROUND((((verkocht_sd/6)*3)-voorraad+reserved-backorder),0),0) as 'Order based on units sold',
SUM(b.aantal) as 'Units sold in period',
t4.verkocht_sd as 'Units sold in period, outliers removed',
COUNT(*) as 'Number of invoices in period',
b.art_code as 'Part number'
FROM bongegs b -- Table that has all the sales records for all products
RIGHT JOIN totvrd ON (totvrd.art_code = b.art_code) -- Right Join stock data to also include items that are not in table bongegs (no sales history).
LEFT JOIN artcred ON (artcred.art_code = b.art_code) -- add supplier ID to the part numbers.
LEFT JOIN
(
SELECT
SUM(b.aantal) as verkocht_sd,
b.art_code
FROM bongegs b
RIGHT JOIN totvrd ON (totvrd.art_code = b.art_code)
LEFT JOIN artcred ON (artcred.art_code = b.art_code)
WHERE
b.bon_datum > DATE_SUB(CURDATE(), INTERVAL 6 MONTH)
and b.bon_soort = "f" -- Selects only invoices
and artcred.vln = 1 -- 1 = Prefered supplier
and artcred.cred_nr = 9117 -- Supplier ID
and b.aantal < (select * from (SELECT AVG(b.aantal)+3*STDDEV(aantal)
FROM bongegs b
WHERE
b.bon_soort = 'f' and
b.bon_datum > DATE_SUB(CURDATE(), INTERVAL 6 MONTH)) x)
GROUP BY b.art_code
) AS t4
ON (b.art_code = t4.art_code)
WHERE
b.bon_datum > DATE_SUB(CURDATE(), INTERVAL 6 MONTH)
and b.bon_soort = "f"
and artcred.vln = 1
and artcred.cred_nr = 9117
GROUP BY b.art_code
Bongegs | all rows from sales forms (invoices F, offers O, delivery notes V)
| art_code | bon_datum | bon_soort | aantal |
|:---------|:---------: |:---------:|:------:|
| item_1 | 2021-08-21 | f | 6 |
| item_2 | 2021-08-29 | v | 3 |
| item_6 | 2021-09-03 | o | 2 |
| item_4 | 2021-10-21 | f | 6 |
| item_1 | 2021-11-21 | o | 6 |
| item_3 | 2022-01-17 | v | 6 |
| item_1 | 2022-01-21 | o | 6 |
| item_4 | 2022-01-26 | f | 6 |
Artcred | supplier ID's
| art_code | vln | cred_nr |
|:---------|:----:|:-------:|
| item_1 | 1 | 1001 |
| item_2 | 1 | 1002 |
| item_3 | 1 | 1001 |
| item_4 | 1 | 1007 |
| item_5 | 1 | 1004 |
| item_5 | 2 | 1008 |
| item_6 | 1 | 1016 |
| item_7 | 1 | 1567 |
totvrd | stock
| art_code | voorraad | reserved | backorder |
|:---------|:---------: |:--------:|:---------:|
| item_1 | 1 | 0 | 5 |
| item_2 | 0 | 0 | 0 |
| item_3 | 88 | 0 | 0 |
| item_4 | 9 | 0 | 0 |
| item_5 | 67 | 2 | 20 |
| item_6 | 112 | 9 | 0 |
| item_7 | 65 | 0 | 0 |
| item_8 | 7 | 1 | 0 |
Now, on to the query. You have LEFT JOINs to the artcred table, but then include artcred in the WHERE clause making it an INNER JOIN (required both left and right tables) in the result. Was this intended, or are you expecting more records in the bongegs table that do NOT exist in the artcred.
Well to be honest I was not fully aware that this would essentially form an INNER JOIN but in this case it doesn't really matter. A record that exists in bongegs always exists in artcred as well (every sold product must have a supplier). That doesn't work both ways since a product can be in artcred without ever being sold.
You also have RIGHT JOIN on totvrd which implies you want every record in the TotVRD table regardless of a record in the bongegs table. Is this correct?
Yes it is intended. Otherwise only products with actual sales in the period would end up in the result and we also wanted to include products with zero sales.
One simplification:
and b.aantal < ( SELECT * from ( SELECT AVG ...
-->
and b.aantal < ( SELECT AVG ...
A personal problem: my brain hurts when I see RIGHT JOIN; please rewrite as LEFT JOIN.
Check you RIGHTs and LEFTs -- that keeps the other table's rows even if there is no match; are you expecting such NULLs? That is, it looks like they can all be plain JOINs (aka INNER JOINs).
These might help performance:
b: INDEX(bon_soort, bon_datum, aantal, art_code)
totvrd: INDEX(art_code)
artcred: INDEX(vln, cred_nr, art_code)
Is b the what you keep needing? Build a temp table:
CREATE TEMPORARY TABLE tmp_b
SELECT ...
FROM b
WHERE ...;
But if you need to use tmp_b multiple times in the same query, (and since you are not yet on MySQL 8.0), you may need to make it a non-TEMPORARY table for long enough to run the query. (If you have multiple connections building the same permanent table, there will be trouble.)
Yes, 5.5.33 is rather antique; upgrade soon.
(pre
By getting what I believe are all the pieces you had, I think this query significantly simplifies the query. Lets first start with the fact that you were trying to eliminate the outliers by selecting the standard deviation stuff as what to be excluded. Then you had the original summation of all sales also from the bongegs table.
To simplify this, I have the sub-query ONCE internal that does the summation, counts, avg, stddev of all orders (f) within the last 6 months. I also computed the divide by 6 for per-month you wanted in the top.
Since the bongegs is now all pre-aggregated ONCE, and grouped per art_code, it does not need to be done one after the other. You can use the totals directly at the top (at least I THINK is similar output without all actual data and understanding of your context).
So the primary table is the product table (Voorraad) and LEFT-JOINED to the pre-query of bongegs. This allows you to get all products regardless of those that have been sold.
Since the one aggregation prequery has the avg and stddev in it, you can simply apply an additional AND clause when joining based on the total sold being less than the avg/stddev context.
The resulting query below.
SELECT
-- appears you are looking for the highest percentage?
-- typically NOT a good idea to name columns starting with numbers,
-- but ok. Typically let interface/output name the columns to end-users
GREATEST((b.verkocht_sdperMonth * ((100 + 0)/100)),0) as 'units sold p/month',
-- appears to be the total sold divided by 6 to get monthly average over 6 months query of data
GREATEST( ROUND(
( (b.verkocht_sdperMonth * 3) - v.voorraad + v.reserved - v.backorder), 0), 0)
as 'Order based on units sold',
b.verkocht_sd as 'Units sold in period',
b.AvgStdDev as 'AvgStdDeviation',
b.NumInvoices as 'Number of invoices in period',
v.art_code as 'Part number'
FROM
-- stock, master inventory, regardless of supplier
-- get all products, even though not all may be sold
Voorraad v
-- LEFT join to pre-query of Bongegs pre-grouped by the art_code which appears
-- to be basis of all other joins, std deviation and average while at it
LEFT JOIN
(select
b.arc_code,
count(*) NumInvoices,
sum( b.aantal ) verkocht_sd,
sum( b.aantal ) / 6.0 verkocht_sdperMonth,
avg( b.aantal ) AvgSale,
AVG(b.aantal) + 3 * STDDEV( b.aantal) AvgStdDev
from
bongegs b
JOIN artcred ac
on b.art_code = ac.art_code
AND ac.vln = 1
and ac.cred_nr = 9117
where
-- only for ORDERS ('f') and within last 6 months
b.bon_soort = 'f'
AND b.bon_datum > DATE_SUB(CURDATE(), INTERVAL 6 MONTH)
group by
b.arc_code ) b
-- result is one entry per arc_code, thus preventing any Cartesian product
ON v.art_code = b.art_code
GROUP BY
v.art_code
I have a scenario where I need to display total number of attendees of an event. With the help of registration form I have already captured the details of people who are attending and my table looks like below.
ID | NAME | PHONE_NUMBER | IS_LIFE_PARTNER_ATTENDING
1 | ABC | 1234567890 | N
2 | PQR | 1234567891 | Y
3 | XYZ | 1234567892 | N
I can easily display number of registrations by using count(id). But while displaying number of attendees I have to consider as two attendees if registrant is coming with his/her partner. (identified by IS_LIFE_PARTNER_ATTEDNING column)
So, in the above case, the number of registrants are 3, but number of attendees are 4, because "PQR" is coming with his/her life partner.
How can we do this in mysql query?
You can use the following query:
SELECT
SUM( 1 + (IS_LIFE_PARTNER_ATTEDNING = 'Y')) AS totalAttendees
FROM your_table;
WORKING DEMO
Since boolean expression resolves into 0/1 in MySQL so that you can capitalize this in your case.
Note:
SUM(a=b) returns 1 only if a is equal to b otherwise it returns 0
Caution:
*Never underestimate these parentheses (IS_LIFE_PARTNER_ATTEDNING = 'Y'). If you omit them then the whole summation would result in zero(0).
* because of operator precedence
Use SUM with CASE
SELECT
Name,
SUM(CASE WHEN IS_LIFE_PARTNER_ATTEDNING='y' THEN 2 ELSE 1 END ) AS'Attendes'
FROM
table
GROUP by name
I have the following table:
JobCode | Designation | SalaryWithIncrement
----------------------------------------------
JC001 | IT | 150,000
JC001 | IT | 155,000
JC002 | Sales | 100,000
JC003 | HR | 200,000
JC003 | HR | 210,000
JC003 | HR | 220,000
Required output:
JobCode | Designation | SalaryWithIncrement
------------------------------------------------
JC001 | IT | 305,000
JC002 | Sales | 100,000
JC003 | HR | 630,000
Below is the code I used, but I don't get grand total after grouping
SELECT JobCode, designation, salaryWithIncrement
FROM table1
group by (JobCode)
Any help is appreciated.
You can use the aggregate sum function:
SELECT JobCode, designation, SUM(salaryWithIncrement)
FROM table1
GROUP BY JobCode, designation
In most cases when you have such requirement, Implement using GroupBY with SQL Aggregate function
Group the Fields based on what you want records, in youu case JobCode and designation
You can learn about Group BY here MSDN
SELECT JobCode,designation,SUM(salaryWithIncrement)
FROM Job GROUP BY JobCode,designation
Here is your sample working code SQL Fiddle
For this, you have to use the sum function in a group by statement.
select jobCode, designation, sum(salaryWithInc) from Job group by jobCode;
Check this link to see a working example.
In my opinion, you have to restructure differently your table to avoid the data redundancy (jobCode and designation). For this you can have two tables, one with the jobCode and designation and the other with the salaryWithInc.
create table Job (jobId int auto_increment, jobCode varchar(5), designation varchar(25), primary key (jobId));
create table Salary (job int, salaryWithInc decimal(5,2), foreign key (job) references Job (jobId));
insert into Job (jobCode, designation) values ("JC001", "IT"),("JC002", "Sales"), ("JC003", "HR");
insert into Salary values (1,150.00), (1,155.00),(2,100.00),(3,200.00),(3,210.00),(3,220.00);
In this case, you use this query to get the required result:
select J.jobCode, J.designation, sum(S.salaryWithInc) from Job as J join Salary as S on J.jobId=S.job group by (J.jobId);
Check this link to see a working example.
Hope it's useful!
I'm working on manipulating a product data feed, and am currently working on grouping the related products. I've almost got things where I want them, but, like a mediocre racing driver, I've run out of skill right when I need it the most.
To illustrate my problem I've created a simplified version. Here is the data structure:
CREATE TABLE `feed` (
`sku` VARCHAR(10),
`price` DECIMAL(6,2),
`groupkey` VARCHAR(10)
);
INSERT INTO `feed` (`sku`, `price`, `groupkey`) VALUES
('AAA', 10.00, NULL),
('BBB', 10.00, 'group1'),
('CCC', 12.00, 'group1'),
('DDD', 10.00, 'group2'),
('EEE', 12.00, 'group2'),
('FFF', 14.00, 'group2'),
('GGG', 20.00, NULL);
My current query is:
SELECT feed.groupkey
, group_concat(feed.sku) AS skus
, group_concat(feed.price) AS prices
, feed.price AS pprice
FROM
feed
WHERE
feed.groupkey IS NOT NULL
GROUP BY
feed.groupkey;
The query returns the following rows:
+----------+-------------+-------------------+--------+
| groupkey | skus | prices | pprice |
+----------+-------------+-------------------+--------+
| group1 | BBB,CCC | 10.00,12.00 | 10.00 |
| group2 | DDD,EEE,FFF | 10.00,12.00,14.00 | 10.00 |
+----------+-------------+-------------------+--------+
What I actually need to do is subtract pprice from each concatenated price, giving me the price difference between each sku, rather than their absolute prices. This would return the dream result:
+----------+-------------+-------------------+--------+
| groupkey | skus | prices | pprice |
+----------+-------------+-------------------+--------+
| group1 | BBB,CCC | 0.00,2.00 | 10.00 |
| group2 | DDD,EEE,FFF | 0.00,2.00,4.00 | 10.00 |
+----------+-------------+-------------------+--------+
I've spent a lot of time on this feed in general, and am really stuck on what is probably the last hurdle in the integration. I'd really appreciate some guidance to help me in the right direction.
edit: I'm using the results from this query as "virtual" product rows, to serve as parents for the products in the group.
You can just do the subtraction in the group_concat(), for something like:
SELECT feed.groupkey, group_concat(feed.sku) AS skus,
group_concat(feed.price - min(feed.price)) AS prices
min(feed.price) AS pprice
FROM feed
WHERE feed.groupkey IS NOT NULL
GROUP BY feed.groupkey
The problem is . . . which feed.price? The value returned in your original query is an arbitrary value from one of the rows in the group. Thinking that you might want the difference over the minimum, I used that value.
I think the best way to write the query is:
SELECT feed.groupkey, group_concat(feed.sku) AS skus,
group_concat(feed.price - fsum.minprice) AS prices
min(feed.price) AS pprice
FROM feed left outer join
(select groupkey, MIN(feed.price) as minprice
from feed
group by groupkey
) fsum
on feed.groupkey = fsum.groupkey
WHERE feed.groupkey IS NOT NULL
GROUP BY feed.groupkey
You CANNOT assume the ordering for hidden columns and group_concat(). The documentation is quite explicit on this point:
MySQL extends the use of GROUP BY so that the select list can refer to
nonaggregated columns not named in the GROUP BY clause. This means
that the preceding query is legal in MySQL. You can use this feature
to get better performance by avoiding unnecessary column sorting and
grouping. However, this is useful primarily when all values in each
nonaggregated column not named in the GROUP BY are the same for each
group. The server is free to choose any value from each group, so
unless they are the same, the values chosen are indeterminate.
Furthermore, the selection of values from each group cannot be
influenced by adding an ORDER BY clause. Sorting of the result set
occurs after values have been chosen, and ORDER BY does not affect
which values the server chooses.
If you want things in a particular order, then you need to be sure the structure is queried properly. That said, it often works in practice, but there is no guarantee.
I have a db table named APPLICATION that has the following columns
applicationnumber varchar,
createddate datetime,
applicantname varchar,
material varchar,
location varchar
I am needed to write a query that would display the number of applications created for each month for the locations
Eg. The query result should be something like below
Location | Jan2012 | Feb2012 | Mar2012 | Apr2012
-----------------------------------------------------------------
London | 34322342 | 4342424 | 54353454 | 5434
Chicago| 43242345 | 9943455 | 85748294 | 544
The result is the number of applications created in each month for the specific location.
Each column will execute the same query logic, with just the month changing.
I tried using the MONTH() function, but I need the month matrix as a column and not as a row.
You can use the DATE_FORMAT function on the createddate field to only get the month and year and then GROUP BY location. Something like this should do it:
SELECT
`location`,
COUNT(`applicationnumber`),
DATE_FORMAT(`createddate`, '%M %Y') AS `date`
FROM `APPLICATION`
GROUP BY `location`;
Should give a result with the count of rows per month.