Processing values in a column for each group - mysql

I have a MySQL table of customers and the shop branches they have purchased from, similar to the following:
customer_id | branch_id | is_major_branch
-----------------------------------------------
5 24 1
5 83 0
5 241 0
8 66 0
8 72 0
9 15 1
16 31 1
16 61 1
is_major_branch is 1 if that branch is a particularly large store.
How can I delete all rows where a customer has shopped in a minor branch (is_major_branch = 0), except if a customer has only ever shopped in a minor branch? Example result set:
customer_id | branch_id | is_major_branch
-----------------------------------------------
5 241 1
8 66 0
8 72 0
9 15 1
16 31 1
16 61 1
Notice how customer 8 has only ever shopped in a minor branches, so we ignore them from the deletion.

You can delete the rows doing:
delete t
from t join
(select customer_id, max(is_major_branch) as max_is_major_branch
from t
group by customer_id
) tt
on t.customer_id = tt.customer_id
where t.is_major_branch = 0 and tt.max_is_major_branch = 1;
If you just want a select query, then use exists:
select t.*
from t
where not (t.is_major_branch = 0 and
exists (select 1 from t t2 where t2.customer_id = t.customer_id and t2.is_major_branch = 1)
);

Related

How to I get the mysql average of a join with conditions?

I am trying to get the average ratings of a user by project type but include all users that have ratings regardless of type.
SELECT projects.user_id, AVG(ratings.rating) AS avg1
FROM projects
JOIN ratings
ON projects.project_id = ratings.project_id
WHERE projects.type='0'
GROUP BY projects.user_id;
Thanks in advance for the help.
The output I get for type 0 is:
user_id | avg1
-----------------
11 | 2.25
but I am trying to get:
user_id | avg1
-----------------
11 | 2.25
12 | 0
because user 12 has a project in the rating table but not of type 0 I still want it output with avg1 = 0
The output for type 1 works as expected because all users that have ratings also have type 1:
user_id | avg1
-----------------
11 | 4
12 | 2.5
Projects table is: (only the first 4 projects are in the ratings table)
project_id |user_id | type
--------------------------
51 11 0
52 12 1
53 11 0
54 11 1
55 12 1
56 13 0
57 14 1
Ratings table is:
project_id | rating
-------------------
51 0
51 1
52 4
51 5
52 2
53 3
54 4
52 1.5
Use conditional aggregation:
SELECT p.user_id,
COALESCE(AVG(CASE WHEN p.type = '0' THEN r.rating END), 0) AS avg1
FROM projects p JOIN ratings r
ON p.project_id = r.project_id
GROUP BY p.user_id;
See the demo.

count occurence of rows with specific status without using subquery

Here is my table
loan_id bid_id lender_id borrower_id amount interest duration loan_status
1 1 60 63 300.00 12.00 3 'completed'
2 2 61 63 300.00 12.00 3 'completed'
3 3 62 63 300.00 12.00 3 'pending',
4 1 62 63 300.00 12.00 3 'pending'
7 4 60 63 300.00 12.00 3 'completed'
I want to pull only those bid_id whose loan_status of all records is completed. It means if there is any record of bid_id with status pending then it will not pull that record.
I am using the followin query that is working fine:
SELECT bid_id
FROM loan
WHERE bid_id NOT IN (
SELECT l.bid_id
FROM loan l
WHERE l.`loan_status` = 'pending'
AND l.bid_id = bid_id
GROUP BY l.`bid_id`
HAVING COUNT(l.`bid_id`)>= 1
)
GROUP BY bid_id
Is there any other way in which we can get desired result without using sub query.
You can readily do this with group by and having:
select bid_id
from loan
group by bid_id
having sum(loand_status = 'pending') = 0

SQL Join Inventory Count Table to Date table

I have a running inventory table of different products that records the inventory count after every transaction. Transactions do not happen every day, so the table does not have a running daily count.
I need to have all dates listed for each product so that I can sum and average the counts over a period of time.
inventory
DATE ID Qty Count
2014-05-13 123 12 12
2014-05-19 123 -1 11
2014-05-28 123 -1 10
2014-05-29 123 -3 7
2014-05-10 124 5 5
2014-05-15 124 -1 4
2014-05-21 124 -1 3
2014-05-23 124 -3 0
I have a table that includes dates for a Join, but I am not sure how to make the missing dates join over multiple products.
I need the query as follows. It needs to to return the counts over the a period selected, but also include dates inbetween.
DATE ID Qty Count
2013-05-01 123 0 0
2013-05-02 123 0 0
2013-05-03 123 0 0
2013-05-04 123 0 0
2013-05-05 123 0 0
2013-05-06 123 0 0
2013-05-07 123 0 0
2013-05-08 123 0 0
2013-05-09 123 0 0
2013-05-10 123 0 0
2013-05-11 123 0 0
2013-05-12 123 0 0
2014-05-13 123 12 12
2013-05-14 123 0 12
2013-05-15 123 0 12
2013-05-16 123 0 12
2013-05-17 123 0 12
2013-05-18 123 0 12
2014-05-19 123 -1 11
2013-05-20 123 0 11
2013-05-21 123 0 11
2013-05-22 123 0 11
2013-05-23 123 0 11
2013-05-24 123 0 11
2013-05-25 123 0 11
2013-05-26 123 0 11
2013-05-27 123 0 11
2014-05-28 123 -1 10
2014-05-29 123 -3 7
2013-05-30 123 0 7
2013-05-31 123 0 7
2013-05-01 124 0 0
2013-05-02 124 0 0
2013-05-03 124 0 0
2013-05-04 124 0 0
2013-05-05 124 0 0
2013-05-06 124 0 0
2013-05-07 124 0 0
2013-05-08 124 0 0
2013-05-09 124 0 0
2014-05-10 124 5 5
2014-05-11 124 0 5
2014-05-12 124 0 5
2014-05-13 124 0 5
2014-05-14 124 0 5
2014-05-15 124 -1 4
2014-05-16 124 0 4
2014-05-17 124 0 4
2014-05-18 124 0 4
2014-05-19 124 0 4
2014-05-20 124 0 4
2014-05-21 124 -1 3
2014-05-22 124 0 3
2014-05-23 124 -3 0
2014-05-24 124 0 0
2014-05-25 124 0 0
2014-05-26 124 0 0
2014-05-27 124 0 0
2014-05-28 124 0 0
2014-05-29 124 0 0
2014-05-30 124 0 0
2014-05-31 124 0 0
Use inv join inv to build up at least 31 rows and construct a table of 31 days. Then join the ids, and finally the original table.
select a.d, a.id, a.qty,
if(a.id=#lastid, #count:=#count+a.qty, #count:=a.count) `count`,
#lastid:=a.id _lastid
from (
select a.d, b.id, ifnull(c.qty, 0) qty, ifnull(c.count, 0) `count`
from (
select adddate('2014-05-01', #row) d, #row:=#row+1 i
from inv a
join inv b
join (select #row := 0) c
limit 31) a
join (
select distinct id
from inv) b
left join inv c on a.d = c.date and b.id = c.id
order by b.id, a.d) a
join (select #count := 0, #lastid := 0) b;
fiddle
Here are the steps needed:
Get all dates between the two given dates.
Get the initial stock per ID. This is: get the first date on or after the given start date for that ID, read this record's stock and subtract its transaction quantity.
For every date get the previous stock. If there is a record for this date, then add its transaction quantity and compare the result with its stock quantity. Throw an error if values don't match. (This is because you store data redundantly; a record's quantity must equal the quantity of the previous record plus its own transaction quantity. But data can always be inconsistent, so better check it.) Show the new stock and the difference to the previous stock.
All this would typically be achieved with a recursive CTE for the dates, a derived table for all initial stocks at best using a KEEP DENSE_RANK function, and the LAG function to look into the previous record.
MySQL doesn't support recursive CTEs - or CTEs at all for that matter. You can emulate this with a big enough table and a variable.
MySQL doesn't support the KEEP DENSE_RANK function. You can work with another derived table instead to find the minimum date per ID first.
MySQL doesn't support the LAG function. You can work with a variable in MySQL instead.
Having said this, I suggest to use a programming language instead (Java, C#, PHP, whatever). You would just select the raw data with SQL, use a loop and simply do all processiong on a per record base. This is much more convenient (and readable) than building a very complex query that does all that's needed. You can do this in SQL, even MySQL; I just don't recommend it.
The SQL I ended up using to resolve this question used a combination of #Fabricators answer (which really was a correct answer) and my edits.
I ended up using an existing table to create the date rows instead of a cross join. The cross join had poor performance for how many products I was working with.
SELECT
POSTDATE,
IF(#PROD_ID = PRODUCT_ID, #NEW := 0, #NEW := 1) AS New_Product,
(#PROD_ID := PRODUCT_ID) AS PRODUCT_ID,
QUANTITY,
IF(#NEW = 1, #INVENTORY := QUANTITY, #INVENTORY := #INVENTORY+QUANTITY) AS 'Count'
FROM (
(
SELECT
POSTDATE,
PRODUCT_ID,
QUANTITY
FROM
inventory
)
UNION ALL
(
SELECT
dateslist_sub.TransDate AS POSTDATE,
productlist_sub.PRODUCT_ID,
0 AS QUANTITY,
FROM
(
SELECT
TransDate
FROM
(
SELECT
adddate('2013-05-01', #row) AS TransDate,
#row:=#row+1 i
FROM
any_table,
(SELECT #row := 0) row
) datestable
WHERE
TransDate <= CURDATE()
) dateslist_sub
cross join (
SELECT
PRODUCT_ID
FROM
products_table
ORDER BY
PRODUCT_ID ASC
) productlist_sub
ORDER BY
productlist_sub.PRODUCT_ID ASC,
dateslist_sub.TransDate ASC
)
ORDER BY
PRODUCT_ID ASC,
POSTDATE ASC
) daily_rows_sub

access query needed

I am looking for an access query, but a sql server 2008 could be sufficient as I can use a passthrough feature in access.
My data looks like this .
--------------------------------------------------------------
id nameid name score diff include
--------------------------------------------------------------
1 0001 SO 100 0 0
2 0001 SO 100 0 0
3 0001 SO 100 0 0
4 0001 SO 100 0 0
5 0001 SO 100 0 0
6 0001 SO 100 0 0
7 0002 MO 10 0 0
8 0002 MO 18 0 1
9 0002 MO 20 0 0
10 0002 MO 14 0 0
11 0002 MO 100 0 0
11 0002 MO 100 0 0
12 0003 MA 10 0 0
13 0003 MA 18 0 1
14 0003 MA 20 0 0
15 0003 MA 14 0 0
16 0003 MA 100 0 1
17 0003 MA 100 0 0
Now what i want is to go through each row and only select the rows where include = 1. THIS IS EASY however ,I don't want the entire row.. I want to select the "group". The group can be identified by the nameid (or name).
So for the above I want the following result:
--------------------------------------------------------------
id nameid name score diff include
--------------------------------------------------------------
7 0002 MO 10 0 0
8 0002 MO 18 0 1
9 0002 MO 20 0 0
10 0002 MO 14 0 0
11 0002 MO 100 0 0
11 0002 MO 100 0 0
12 0003 MA 10 0 0
13 0003 MA 18 0 1
14 0003 MA 20 0 0
15 0003 MA 14 0 0
16 0003 MA 100 0 1
17 0003 MA 100 0 0
Ask your table for row with include = 1.
Then join again with the table to have all the rows corresponding to the first query's nameid :
SELECT DISTINCT m.*
FROM myTable m
INNER JOIN myTable m2
ON m.nameid = m2.nameid
AND m2.include = 1
A join query will work better than an 'in' query for big amount of datas. You still need an index on the field 'nameid', and on 'include' could not hurt too.
An equivalent is with 'WHERE EXISTS' :
SELECT m.*
FROM myTable m
WHERE EXISTS
(
SELECT *
FROM myTable m2
WHERE m2.include = 1
AND m2.nameid = m.nameid
)
You could see the difference here :
Can an INNER JOIN offer better performance than EXISTS
And why you have to use a Where exists when you have a filter with a lot of IDs :
Difference between EXISTS and IN in SQL?
I think this query identifies the nameid values you want included in your main query.
SELECT DISTINCT nameid
FROM YourTable
WHERE include = 1;
If that is true, incorporate it as a subquery and use an INNER JOIN with YourTable to return only those rows for which a nameid value is associated with include = 1 ... in any row of the table.
SELECT id, nameid, name, score, diff, include
FROM
YourTable AS y
INNER JOIN (
SELECT DISTINCT nameid
FROM YourTable
WHERE include = 1
) AS q
ON y.nameid = q.nameid;
The Access query designer will probably substitute square brackets plus a dot in place of the parentheses enclosing the subquery.
SELECT id, nameid, name, score, diff, include
FROM
YourTable AS y
INNER JOIN [
SELECT DISTINCT nameid
FROM YourTable
WHERE include = 1
]. AS q
ON y.nameid = q.nameid;
You need a subquery - as follows:
SELECT *
FROM tablename
WHERE nameid IN
(
SELECT DISTINCT nameid
FROM tablename
WHERE include = 1
)
SELECT * FROM yourTable WHERE nameid IN (SELECT DISTINCT nameid FROM yourTable WHERE include=1)
What you do is, select every row, whose nameid is in your subquery.
The subquery selects the nameid for rows where include=1.

MySQL Group By not producing expected result

This is my table structure:
rec_id product_id quantity quantity_in quantity_out balance stock_date status
1 2 342 NULL 17 325 2009-10-23 1
2 2 325 NULL 124 201 2009-10-23 1
3 1 156 NULL 45 111 2009-10-23 1
4 2 201 NULL 200 1 2009-10-23 1
5 2 1 NULL 1 0 2009-10-23 1
6 1 111 NULL 35 76 2009-10-23 1
All I want is the last transaction done for a given product: product_id, quantity, quantity_out and balance from this table.
Example, there are 2 transaction done for product 2 (ids 1 & 2):
final balance for product_id 2 is 0 -> stored in rec_id 5
final balance for product_id 1 is 76 -> stored in rec_id 6
Final result/output should be like this:
recid productid quantity quantityin quantityout balance stock_date status
5 2 1 NULL 1 0 2009-10-23 1
6 1 111 NULL 35 76 2009-10-23 1
You can find the latest record for each product like:
select max(rec_id) as MaxRec
from YourTable
group by product_id
Using a subquery, you can retrieve the latest rows for their product:
select *
from YourTable
where rec_id in (
select max(rec_id) as MaxRec
from YourTable
group by product_id
)
Here's a single query with no subqueries:
SELECT main.*
FROM YourTable main
LEFT JOIN YourTable newer
ON newer.product_id = main.product_id AND newer.rec_id > main.rec_id
WHERE newer.rec_id IS NULL;
You can tweak the field list however you want--make sure you select fields from main, not newer, which should be all null.