SQL - Select distinct column without excluding rows - mysql

I'm running a query that aggregates sales information by either category or subcategory within some date range. I was asked to add budget information to it for a report that displays the information fetched by this query.
SELECT
DATE_FORMAT(B.TxnDate, "%Y-%m") AS FormattedTxnDate,
SUM(B.Quantity) AS QuantitySum,
SUM(B.Quantity * B.Amount) AS Revenue,
SUM(CASE WHEN B.AverageCost = 0 THEN (B.Quantity * C.PurchaseCost) ELSE (B.Quantity *
B.AverageCost) END) AS COGS,
A.CustomerRefFullName,
SUM(D.Budget_2018) AS Budget,
D.Brand, D.Category, D.Subcategory, D.ProductManager, C.VendorRefFullName
FROM
qb_invoice_info A, qb_invoice_line_info B, qb_item_info C, qb_item_group D
WHERE
A.TxnID = B.TxnID
AND
B.Item_ListID = C.ListID
AND
C.Parent_ListID = D.ListID
AND
(C.Type = "Inventory" OR C.Type = "InventoryAssembly")
AND
B.TxnDate BETWEEN ? AND ?
GROUP
BY D.Category, YEAR(B.TxnDate), QUARTER(B.TxnDate)
ORDER
BY D.Category ASC, YEAR(B.TxnDate) ASC, QUARTER(B.TxnDate) ASC
Every subcategory has its own budget amount. The problem is that some subcategories share all of the same information except for their unique IDs. A few records might look like this within the qb_item_group table.
qb_item_group
id | Category | Subcategory | Budget
------------------------------------
1A | Lights | DMX | 4000
1B | Lights | DMX | 4000
1C | Lights | DMX | 4000
2A | Lights | Flash | 5000
3A | Lights | Bulbs | 1000
In this case, the total budget for lights would be 10,000 because we ignore two of the DMX budgets. I tried SUM(DISTINCT D.Budget_2018 AS Budget earlier today but it failed as I expected because it's only adding unique budget values. How can I adapt the query I have above so that I can retrieve all sales records by either category or subcategory but still get a total budget that is the sum of all unique subcategories under the parent category?

SELECT
DATE_FORMAT(B.TxnDate, "%Y-%m") AS FormattedTxnDate,
SUM(B.Quantity) AS QuantitySum,
SUM(B.Quantity * B.Amount) AS Revenue,
SUM(
CASE
WHEN B.AverageCost = 0 THEN (B.Quantity * C.PurchaseCost)
ELSE (B.Quantity * B.AverageCost)
END
) AS COGS,
A.CustomerRefFullName,
COALESCE(category_budgets.budget, 0) AS budget,
D.Brand,
D.Category,
D.Subcategory,
D.ProductManager,
C.VendorRefFullName
FROM
qb_invoice_info A,
qb_invoice_line_info B,
qb_item_info C,
qb_item_group D
LEFT JOIN
(
SELECT
a.category,
SUM(a.budget) as budget
FROM
(
SELECT DISTINCT
category
budget
FROM
budgets
) a
) category_budgets
ON
category_budgets.category = D.category
WHERE
A.TxnID = B.TxnID
AND
B.Item_ListID = C.ListID
AND
C.Parent_ListID = D.ListID
AND
(C.Type = "Inventory" OR C.Type = "InventoryAssembly")
AND
B.TxnDate BETWEEN ? AND ?
GROUP BY
D.Category, YEAR(B.TxnDate), QUARTER(B.TxnDate)
ORDER BY
D.Category ASC, YEAR(B.TxnDate) ASC, QUARTER(B.TxnDate) ASC
;
You can left join with the sum of distinct categories and budgets. This will give you all of your output rows desired but will also give you $0 budgets for categories that don't have entries in the budgets table. Good luck!

Related

Retrieving top 2 bids for each item in an auction

I am having trouble combining data from multiple tables. I have tried joins and subqueries but to no avail. I basically need to combine 2 queries into one. My tables (simplified):
Stock:
id int(9) PrimaryIndex
lot_number int(4)
description text
reserve int(9)
current_bid int(9)
current_bidder int(6)
Members:
member_id int(11) PrimaryIndex
name varchar(255)
Bids:
id int(9)
lot_id int(9)
bidder_id int(5)
max_bid int(9)
time_of_bid datetime
I'm currently using 2 separate queries which with 1000's of lots, makes it very inefficient. 1st query:
SELECT S.id, S.lot_number, S.description, S.reserve FROM stock S ORDER BY
S.lot_number ASC
The 2nd query within a while loop then gets the bidding info:
SELECT DISTINCT B.bidder_id, B.lot_id, B.max_bid, B.time_of_bid,
M.fname, M.lname FROM bids B, members M WHERE B.lot_id=? AND
B.bidder_id=M.member_id ORDER BY B.max_bid DESC LIMIT 2
Below is what i would like as output from a single query, if possible:
Lot No. | Reserve | Current Bid | 1st Max Bid | 1st Bidder | 2nd Max Bid | 2nd Max Bidder
1 | $100 | $120 | $150 | Steve | $110 | John
2 | $500 | $650 | $900 | Tom | $600 | Paul
I have had partial success with just getting the MAX(B.bid) and then its related details (WHERE S.id=B.id), but i cant get the top 2 bids for each lot.
First assign a row number rn to rows within each group of lot_id in table bids (highest bid gets 1, 2nd highest bid gets 2 and so on). The highest bid and second highest bid will be on two different rows after the LEFT JOIN. Use GROUP BY to merge the two rows into one.
select s.lot_number, s.reserve, s.current_bid,
max( case when rn = 1 then b.max_bid end) as first_max_bid,
max( case when rn = 1 then m.name end) as first_bidder,
max( case when rn = 2 then b.max_bid end) as second_max_bid,
max( case when rn = 2 then m.name end ) as second_bidder
from
stock s
left join
(select * from
(select *,
(#rn := if(#lot_id = lot_id, #rn+1,
if( #lot_id := lot_id, 1, 1))) as rn
from bids cross join
(select #rn := 0, #lot_id := -1) param
order by lot_id, max_bid desc
) t
where rn <= 2) b
on s.lot_number = b.lot_id
left join members m
on b.bidder_id = m.member_id
group by s.lot_number, s.reserve, s.current_bid
order by s.lot_number

Select MAX value with restriction to rows

I have 3 tables:
matchdays:
matchday_id | season_id | userid | points | matchday
----------------------------------------------------
1 | 1 | 1 | 33 | 1
2 | 1 | 2 | 45 | 1
etc
players
userid | username
-----------------
1 | user1
2 | user2
etc.
seasons
seasons_id | title | userid
----------------------------
1 | 2011 | 3
2 | 2012 | 10
3 | 2013 | 5
My query:
SELECT s.title, p.username, SUM(points) FROM matchdays m
INNER JOIN players p ON p.userid = m.userid
INNER JOIN seasons s ON m.userid = s.userid
group by s.season_id
This results in (example!):
title | username | SUM(points)
------------------------------
2011 | user3 | 3744
2012 | user10 | 3457
2013 | user5 | 3888
What it should look like is a table with the winner (max points) of every season. Right now, the title and username is correct, but the sum of the points is way too high. I couldn't figure out what sum is calculated. Ideally, the sum is the addition of every matchday of a season for every user.
Your main issue is that you group by seasons only. Thus your SUM is running on all points over a season, regardless of the player.
The whole approach is wrong anyway. The "flaw" with userid in the season table is your biggest issue, and you seem to know it.
I will explain you how to calculate your rankings in the database one time for all, and to have them at your disposal at all times, which will save you a lot of headaches, and obviously save some CPU and loading times as well.
Start by creating a new table "Rankings":
CREATE table rankings (season_id INT, userid INT, points INT, rank INT)
If you have a lot of players, index all columns but points
Then, populate the table for each season:
This is a oneshot operation to run each time a season has ended.
So for the time being, you will have to run it several times for each season.
The key here is to compute the rank of each player for the season, which is a must-have that will be super-handy for later. Because MySQL doesnt have a window function for that, we have to use an old trick : incrementing a counter.
I decompose.
This will compute the points of a season, and provide the ranking for that season:
SELECT season_id, userid, SUM(points) as points
FROM matchdays
WHERE season_id = 1
GROUP BY season_id, userid
ORDER BY points DESC
Now we adapt this query to add a rank column :
SELECT
season_id, userid, points,
#curRank := #curRank + 1 AS rank
FROM
(
SELECT season_id, userid, SUM(points) as points
FROM matchdays
WHERE season_id = 1
GROUP BY season_id, userid
) T,
(
SELECT #curRank := 0
) R
ORDER BY T.points DESC
That's it.
Now we can INSERT the results of this computation into our ranking table, to store it once for good :
INSERT INTO rankings
SELECT
season_id, userid, points,
#curRank := #curRank + 1 AS rank
FROM
(
SELECT season_id, userid, SUM(points) as points
FROM matchdays
WHERE season_id = 1
GROUP BY season_id, userid
) T,
(
SELECT #curRank := 0
) R
ORDER BY T.points DESC
Change the season_id = 1 and repeat for each season.
Save this query somewhere, and in the future, run it once each time a season has ended.
Now you have a proper database-computed ranking and a nice ranking table that you can query whenever you want.
You want the winner for each season ? As simple as that:
SELECT S.title, P.username, R.points
FROM Ranking R
INNER JOIN seasons S ON R.season_id=S.season_id
INNER JOIN players P ON R.userid=P.userid
WHERE R.rank = 1
You will discover over the time that you can do a lot of different things very simply with your ranking table.
You're join is wrong, try something like:
SELECT s.title, p.username, SUM(m.points) as points FROM matchdays m
JOIN players p ON p.userid = m.userid
JOIN seasons s ON m.season_id = s.season_id
group by s.season_id, p.userid
ORDER by points DESC;
As pointed out, userid does'nt belong/is not needed in 'seasons' table.

MySQL SELECT to Rank A Number out of a set of numbers

I have rows of data from a SELECT query with a few prices (say three for this example). One is our price, one is competitor1 price, one is competitor2 price. I want to add a column that spits out the rank of our price as compared to the other two prices; if our price is the lowest it would spit out the number 1 if the highest it would spit out the number it is out of.
Something like this:
Make | Model | OurPrice | Comp1Price | Comp2Price | Rank | OutOf
MFG1 MODEL1 350 100 500 2 3
MFG1 MODEL2 50 100 100 1 3
MFG2 MODEL1 100 NULL 50 2 2
MFG2 MODEL2 9999 500 NULL 2 2
-Sometimes the competitor price will be NULL as seen above, and I believe this is where my issue lies. I have tried a CASE and it works when only on one competitor but when I add a AND statement it spits out the ranks as all NULL. Is there a better way of doing this through a MySQL query?
SELECT
MT.MAKE as Make,
MT.MODEL as Model,
MT.PRICE as OurPrice,
CT1.PRICE as Comp1Price,
CT2.PRICE as Comp2Price,
CASE
WHEN MT.PRICE < CT1.PRICE AND MT.PRICE < CT2.PRICE
THEN 1 END AS Rank
(CT1.PRICE IS NOT NULL) + (CT2.PRICE IS NOT NULL) + 1 as OutOf
FROM mytable MT
LEFT JOIN competitor1table as CT1 ON CT1.MODEL = MT.MODEL
LEFT JOIN competitor2table as CT2 ON CT2.MODEL = MT.MODEL
ORDER BY CLASS
Not tested, but you can try:
SELECT
a.MAKE AS Make,
a.MODEL AS Model,
a.PRICE AS OurPrice
MAX(CASE WHEN a.compnum = 1 THEN pricelist END) AS Comp1Price,
MAX(CASE WHEN a.compnum = 2 THEN pricelist END) AS Comp2Price,
FIND_IN_SET(a.PRICE, GROUP_CONCAT(a.pricelist ORDER BY a.pricelist)) AS Rank,
COUNT(a.pricelist) AS OutOf
FROM
(
SELECT MAKE, MODEL, PRICE, PRICE AS pricelist, 0 AS compnum
FROM mytable
UNION ALL
SELECT a.MAKE, a.MODEL, a.PRICE, CT1.PRICE, 1
FROM mytable a
LEFT JOIN competitor1table CT1 ON a.MODEL = CT1.MODEL
UNION ALL
SELECT a.MAKE, a.MODEL, a.PRICE, CT2.PRICE, 2
FROM mytable a
LEFT JOIN competitor2table CT2 ON a.MODEL = CT2.MODEL
) a
GROUP BY
a.MAKE, a.MODEL
(CT1.PRICE IS NOT NULL AND CT1.PRICE < MT.PRICE) + (CT2.PRICE IS NOT NULL AND CT2.PRICE < MT.PRICE) + 1 as Rank

MySQL - Complex COUNT Query

I have a table called user_scores as below:
id | af_id | uid | level | record_date
----------------------------------------
1 | 1.1 | 1 | 3 | 2012-01-01
2 | 1.1 | 1 | 4 | 2012-02-01
3 | 1.2 | 1 | 3 | 2012-01-01
4 | 1.2 | 1 | 5 | 2012-03-01
...
I have another table call user_info as below:
uid | forename | surname | gender
-----------------------------------
1 | Homer | Simpson | M
2 | Marge | Simpson | F
3 | Bart | Simpson | M
4 | Lisa | Simpson | F
...
In user scores uid is the user id of a registered user on the system, af_id identifies a particular test a user submits. A user scores a level between 1 - 5 for each test, which can be submitted every month.
My problem is I need to produce an analysis at the end of the year to COUNT the number of users that have achieved each level for a particular test. The analysis is to show a gender split for male and female.
So for example an administrator would select test 1.1 and the system would generate stats based that would COUNT of the total MAX level achieved by each user in the year, with a gender split.
Any help is much appreciated. Thank you in advance.
-
I think I need to clarify myself a bit. Because a user can complete the test multiple times throughout the year, there will be multiple scores for the same test. The query should take the highest level achieved and include this in the count. An example result would be:
Male Results:
level1 | level2 | level3 | level4 | level5
------------------------------------------
2 | 5 | 10 | 8 | 1
I am not certain I get exactly what you mean, but as always I'll have a go. As I understand it you want to know how many people from each gender reached each level in a certain year.
SELECT MaxLevel,
COUNT(CASE WHEN ui.Gender = 'M' THEN 1 END) AS Males,
COUNT(CASE WHEN ui.Gender = 'F' THEN 1 END) AS Females
FROM User_Info ui
INNER JOIN
( SELECT MAX(Level) AS MaxLevel,
UID
FROM User_Scores us
WHERE af_ID = '1.1'
AND YEAR(Record_Date) = 2012
GROUP BY UID
) AS MaxUs
ON MaxUs.uid = ui.UID
GROUP BY MaxLevel
I've put some sample data on SQL Fiddle so you see if it is what you were after.
EDIT
To transpose the data so levels are along the top and Gender in the rows the following will work:
SELECT Gender,
COUNT(CASE WHEN MaxLevel = 1 THEN 1 END) AS Level1,
COUNT(CASE WHEN MaxLevel = 2 THEN 1 END) AS Level2,
COUNT(CASE WHEN MaxLevel = 3 THEN 1 END) AS Level3,
COUNT(CASE WHEN MaxLevel = 4 THEN 1 END) AS Level4,
COUNT(CASE WHEN MaxLevel = 5 THEN 1 END) AS Level5
FROM User_Info ui
INNER JOIN
( SELECT MAX(Level) AS MaxLevel,
UID
FROM User_Scores us
WHERE af_ID = '1.1'
AND YEAR(Record_Date) = 2012
GROUP BY UID
) AS MaxUs
ON MaxUs.uid = ui.UID
GROUP BY Gender
Note, that if there are ever more than 5 levels you will need to add more to the select statement, or start building dynamic SQL.
Assuming record_date holds only dates (without time parts):
SELECT
s.maxlevel,
COUNT(NULLIF(gender, 'F')) AS M,
COUNT(NULLIF(gender, 'M')) AS F
FROM user_info u
INNER JOIN (
SELECT
uid,
MAX(level) AS maxlevel
FROM user_scores
WHERE record_date > DATE_SUB(CURDATE(), INTERVAL DAYOFYEAR(CURDATE()) DAY)
AND af_id = '1.1'
GROUP BY
uid
) s ON s.uid = u.uid
GROUP BY
s.maxlevel
That will show you only the maximum levels found in the user_scores table. If you have a Levels table where all possible levels (1 to 5) are listed, you could use that table to get a complete list of levels. If some levels are not present in the requested subset of data, the corresponding rows will show 0s in both columns.
Here's the above script with minor changes to show the complete chart of levels:
SELECT
l.level AS maxlevel,
COUNT(NULLIF(gender, 'F')) AS M,
COUNT(NULLIF(gender, 'M')) AS F
FROM user_info u
INNER JOIN (
SELECT
uid, MAX(level) AS maxlevel
FROM user_scores
WHERE record_date > DATE_SUB(CURDATE(), INTERVAL DAYOFYEAR(CURDATE()) DAY)
AND af_id = '1.1'
GROUP BY
uid
) s ON s.uid = u.uid
RIGHT JOIN Levels l ON s.maxlevel = l.level
GROUP BY
l.level
Hope this is what your looking for!
Show number of records group by userid and gender of the max score for af_id '1.1'.
select count(*), info.uid, info.gender, max(score.level)
from user_info as info
join user_scores as score
on info.uid = score.uid
where score.af_id = '1.1'
group by info.uid, info.gender;
EDITED based on your edit.
select sum(if(a.gender="M",1,0)) Male_users, sum(if(a.gender="F",1,0)) Female_users
from myTable a where
a.level = (select max(b.level) from myTable b where a.uid=b.uid)
group by af_id.
I typed this in a rush. But it should work or at least get you where you need to go. E.G. if you need to specify time frame, add that.
You need something like
SELECT
uid,
MAX(level)
WHERE
record_date BETWEEN '2012-01-01' AND '2012-12-31'
AND af_id='1.1'
GROUP BY uid
If you need the gender splits then depending on what stat you need per gender you can either add a JOIN on the user_info table into this query (to get the MAX per gender) to wrap this as a sub-query and JOIN on the whole thing.

Specific MySQL issue with JOIN

I have a product table:
product_id
shop_id -> id from shop table
product_pair = there is product_id, if it is paired
Then I have a shop table:
shop_id
And finally a shipping table:
shop_id -> id from shop table
country_id -> id of country
And I want to find the products which can be shipped to country_id 60
It's no problem, if it's not paired..
Like:
SELECT p.*, c.*, p.product_name AS score
FROM (`rcp_products` p)
JOIN `rcp_shipping` s ON `s`.`shop_id` = `p`.`shop_id` AND s.country_id = 60
JOIN `rcp_category` c ON `c`.`cat_id` = `p`.`cat_id`
WHERE `p`.`cat_id` = '7'
AND `p`.`product_price_eur` > 0
AND `p`.`product_mark_delete` = 0
ORDER BY `score` asc
LIMIT 10
(There are some additional WHERE's and another columns, which I think haven't got influence)
Now, I have paired products. So, in a table with products is something like this:
product_id | product_name | product_pair | shop_id
1 | Abc | 0 | 0
2 | Def | 1 | 3
3 | Ghi | 1 | 2
So, products 2 and 3 are paired to product 1.
Now, I have no idea how to get country_id for product_id = 1 in that SQL that I posted above.
Maybe my database structure is not the best :) But how can I do it better?
Thank you.
Overall, the idea that you need to use here is self-join - that's how you can find the pairs of products. After that it's just simple WHERE conditions.
The core query (the one that just finds the pairs from a specific shop) would look like this:
SELECT DISTINCT A.product_id as P1, B.product_id as P2, A.shop_id as S1, B.shop_id as S2
FROM products A, products B
WHERE (A.product_pair = B.product_id OR A.product_pair = 0) //find pair and non-paired
AND (A.product_id > B.product_id) //ensure no duplicates (e.g. A&B and B&A)
AND (A.shop_id = B.shop_id) //ensure that both can be found in the same shop
AND A.shop_id = YOUR_SHOP_ID //filter to specific shop
This should satisfy the conditions when products are sold in more than 1 shop, otherwise the query could probably become a bit shorter / easier.