Delete from a table matching one criteria where there are rows in same table matching different criteria? - mysql

Sorry for the mega title... I was trying to be descriptive enough. I've got a table that contains event attendance data that has some erroneous data in it. The table definition is kind of like this:
id (row id)
date
company_name
attendees
It ended up with some cases where for a given date, there are two entries matching a company_name and date but one has attendees=0 and the other has attendees>0. In those cases, I want to discard the ones where attendees=0.
I know you can't join on the same table while deleting, so please consider this query to be pseudocode that shows what I want to accomplish.
DELETE FROM attendance a WHERE a.attendees=0 AND a.date IN (SELECT b.date FROM attendance b WHERE b.attendees > 0 AND b.company_name = a.company_name);
I also tried to populate a temporary table with the ids of the rows I want to delete, but that query hangs because of the IN (SELECT ...) clause. My table has thousands of rows so that just maxes out the CPU and then times out.

This ugly thing should work (using alias permit to avoid the You can't specify target table for update in FROM clause error)
DELETE FROM attendance
WHERE (attendees, date, company_name)
IN (SELECT c.a, c.d, c.c
FROM
(SELECT MIN(attendees) a, date d, company_name c
FROM attendance
GROUP BY date, company_name
HAVING COUNT(*) > 1) as c);
SqlFiddle

Related

Update Column Using ROW_Number() function. But it is failing. Could Any one suggest a solution?

I know guys, this might be a silly question, but I have not found any solution till now, so I am asking this question with all the inputs and outputs that I have done. Could anyone provide me the solution.
What I want to do is: the parcelno can have one or more invoicenumbers, I want to find how many invoice numbers does an parcel has and give it a rank. The ranking part is important because my further work is depending on this column.
I have one table named TableA. It has three columns Invoicenumber which is the unique id, ParcelNo which can be duplicate and Ranking which I want to update.
CREATE TABLE TableA
(
Invoicenumber varchar(5),
ParcelNo varchar(5),
Ranking bit,
IDate Datetime
)
INSERT INTO TableA (Invoicenumber, ParcelNo)
VALUES ('INV01', 'P0001'), ('INV02', 'P0001'),
('INV03', 'P0002'), ('INV04', 'P0002'),
('INV05', 'P0003'), ('INV06', 'P0003')
When I run the following query the output is as desired.
;WITH CTE AS
(
SELECT
*,
ROW_NUMBER() OVER(PARTITION BY PARCELNO ORDER BY INVOICENUMBER) AS RWNO
FROM
TableA
)
SELECT
T.*, C.RWNO
FROM CTE C
JOIN TableA T ON T.Invoicenumber = C.Invoicenumber
The output is below:
So, I tried to update the Ranking column in Table A.
I run this query to do so:
;WITH CTE AS
(
SELECT
*,
ROW_NUMBER() OVER(PARTITION BY PARCELNO ORDER BY INVOICENUMBER) AS RWNO
FROM
TableA
)
UPDATE T
SET Ranking = C.RWNO
FROM CTE C
JOIN TableA T ON T.Invoicenumber = C.Invoicenumber
But the output is wrong. The column is not updated as expected.
Below is the output of the updated column:
Why is the Ranking column is updated incorrectly?
I want to update the column to prepare some data. This table is sample for the explanation.
I am elaborating my issue below:-
Below in the image are two tables:-
Table A and Table B has IDate column.
I want to update the IDate column in A from B. But the dates should be unique. First date should not be repeated. These date are associated with Invoicenumbers.
I think what you really want is a calculated column (called a calculated field or generated field). I'm guessing that your parcel number should point to another table that stores information about the parcels. If that's the case, then go with:
-- First approach
CREATE TABLE Parcels (
id int IDENTITY (1,1) NOT NULL,
ParcelNo varchar(5),
Description varchar(max)
-- Ranking AS (SELECT COUNT(*) FROM Invoices i WHERE i.ParcelID = id)
);
CREATE TABLE Invoices (
id int IDENTITY (1,1) NOT NULL,
InvoiceNumber varchar(5),
ParcelID int FOREIGN KEY REFERENCES Parcels(id)
);
ALTER TABLE Parcels ADD Ranking AS (SELECT COUNT(*) FROM Invoices i WHERE i.ParcelID = id);
INSERT INTO Parcels
(ParcelNo)
VALUES
('P0001'),
('P0001'),
('P0002'),
('P0003');
INSERT INTO Invoices
(InvoiceNumber, ParcelID)
VALUES
('INV01', (SELECT p.id FROM Parcels p WHERE p.ParcelNo = 'P0001')),
('INV02', (SELECT p.id FROM Parcels p WHERE p.ParcelNo = 'P0001')),
('INV03', (SELECT p.id FROM Parcels p WHERE p.ParcelNo = 'P0002')),
('INV04', (SELECT p.id FROM Parcels p WHERE p.ParcelNo = 'P0002')),
('INV05', (SELECT p.id FROM Parcels p WHERE p.ParcelNo = 'P0003')),
('INV06', (SELECT p.id FROM Parcels p WHERE p.ParcelNo = 'P0003'));
On the other hand, if you really want all the data in a single table, then try this:
-- Second approach
CREATE TABLE TableA (
Invoicenumber varchar(5),
ParcelNo varchar(5),
Ranking AS (SELECT COUNT(*) FROM TableA a WHERE a.ParcelNo = ParcelNo)
)
Some notes:
Both of my approaches assume that by ranking, you mean that you want a count of how many invoices are in a parcel.
My first approach has a circular reference, because the Invoices table has a foreign key into the Parcels table, but the Parcels table tabulates information from the Invoices table. That's why I commented out the calculated field in the first table, then added the calculated field back in after creating both tables.
Notice that I capitalized all SQL keywords (except the types such as varchar). It's easier to read SQL if you either go with all caps or no caps for an entire query.
Notice my semicolons at the end of each logical break. Semi-colons are technically optional, but a lot of folks consider using them to be good practice.
For my first approach, I'm using a foreign key. You can read more about those here.
Because my first approach split the table into 2 tables, I needed to somehow know the id of the Parcels table when populating the Invoices table, even though the ids are given by the database (so I can't know them ahead of time). Those select statements accomplish that.
My syntax should work with SQL Server, but no necessarily with any other DBMS. That's because calculated fields are not ANSI standard.

SELECT data from multiple tables if a requirement is met in second table

the title doesnt describe it that well, my problem:
I have 2 tables, one table for orders, the other for the product.
An order can have n products associated with it.
I want to select those orders, where all their associated products have a status (attribute of the product) greater or equal to x. (So I know that every product of my order is "ready" and the order can be processed further)
Every ordered product has an OrderID
Any tips?
e: Just started with SQL, dont bash me if this is a stupid question
It's a matter of mindset.
You have to find the 'dual' form of your question ( -> double negation).
You need to find all the orders that have AT LEAST one line that is not ready.
Assuming your tables are the common:
Order(ID,bla,bla,bla) and Order Line(orderID, row#, status, bla, bla) FK orderid references order.
You can use this stub:
Select *
from orders O
where not exists ( select * from order_line OL
where ol.orderID=O.orderID --binding with outer query
and status <> 'ready'
)
SIDE NOTE: my query will produce also empty orders, to filter them just add to outer query and exists (select * from orderline oe where oe.orderid=o.orderid)

How to get count by combining more than 2 tables in mysql

i have a table 'A' with status column, it can have 4 values. In table A i have table 'B's id, table B have table 'C's id. I want to get the status count FROM table 'A' by joining all these columns. The status column in table A is a foreign key from table 'D'. Table 'D' having status like 1-agreed, 2-not agreed etc
The question is missing some information that might be helpful. Particularly, what exactly you are wanting to count. (i.e. are you just trying to count ALL rows, or are you trying to count the number of rows in table A that have each status). I'll put together an answer that assumes that latter.
I'll also just assume that "id" is the primary key of its own table, and that id will be the id from other tables inside a table.
select A.statusField, count(*)
from A
join B on (A.Bid = B.id)
join C on (B.Cid = C.id)
group by A.statusField
Hope that helps.

MYSQL query to Select the first duplicate record in JAVA

I am trying to retrieve the the first row among the duplicate row, THE FIRST OCCURED ***
--Table--
Order_No Product User
1 Book Student
2 Book Student
3 Book Student
I want to get the Order_No of the first duplicate row in JAVA, I have used DISTINCT and DISTINCT TOP 1 etc but nothing worked, NEED HELP
SELECT min(order_no), product, user
FROM 'table'
GROUP BY user, product
This is basic SQL?
SELECT min(order_no), product, user FROM table GROUP BY product, user
See also more information on GROUP BY
All fields not part of your group by must have some sort of way to determine which to pick of the n potentially different values. min() will pick the lowest value (even with strings and dates) while max() will pick the highest. You can also use First() and Last() to grab the value according to when they show up.
Supposing you had other values to pick from, you might see something like:
SELECT min(order_no), product, user, min(creation_date),
sum(quantity), first(billing_address)
FROM orders GROUP BY product, user
SELECT t.*
FROM table t
WHERE NOT EXISTS ( SELECT a
FROM table t2
WHERE t2.Product = t.Product
AND t2.User = t.User
AND t2.Order_No < t.Order_No
)

Sql Query Optimization - Remove Not In Operator

I am storing employee attendance on a table 'attendance' having the following structure:
EmpID, CDate
Attendance system insert this table everyday with employee-id of all employees present on that particular day.
I need to find out absent statement of a particular employee. I can do this easily by selecting all distinct date that are not in - dates where the employee is present.
Is there any way I can remove the not in operator on that sql statement. Please help
Here is the sql query for employee with EmpId 01:
select distinct CDate
from attendance
where CDate not in (
Select CDate from attendance where EmpID='01')
The problem isn't in the NOT IN clause, it is the subquery.
You want something more like:
SELECT DISTINCT a1.CDate, if (EmpID=NULL, false,true) as Present
FROM attendance as a1
LEFT JOIN attendance as a2 USING (CDate)
WHERE a2.EmpID='01'
This is a cartesian join which pulls all of the dates, then joins the employee attendance status on that. Should be significantly faster than your subquery.
Updated, with tested code:
SELECT DISTINCT a1.CDate, IF (a2.EmpID IS NULL, false,true) as Present
FROM attendance AS a1
LEFT JOIN attendance AS a2 ON (a1.CDate = a2.CDate AND a2.EmpID='01')
My bad on the previous answer. I should have put the subselection into the ON instead of an aggregate.
You could change your mechanism to store data for each employee, each day.
Yes, it'll add a lot of rows, but how can you be sure that you'll get all dates from logged data? what if theres nobody at work today? no one will have absense?
If you'd go with:
EmpID, CDate, Present
1, {date}, 0|1
then you'd have simpler and faster query traded for table size:
select CDate from attendance where EmpID = 1 and status = 0;