Finding contradiction in mysql table - mysql

I have a table of 5000 rows with 9 columns in it. And I am in a process of data cleaning. So I need a query to return only rockets names that are Active and Retired at the same time
Below is a sample of 2 columns that I am working on :
Rocket
Status
Sputnik
Retired
Sputnik
Active
Vanguard
Retired
Juno I
Retired
Sputnik
Retired
Vostok
Retired
So the result should be like this :
Rocket
Status
Sputnik
Retired
Sputnik
Active
I tried distinct, self join, group by but I failed to achieve my goal.
-- This query will return every distinct rows:
select distinct(concat(rocket,'_', rocketstatus)) as BB
from space.test_1
group by bb
-- This query will return nothing:
select a.rocket, a.rocketstatus from space.test_1 b
join space.test_1 a on a.id = b.id
where a.rocketstatus not in (select b.rocketstatus from space.test_1 b)

May be this variant will suit you
select distinct a.Rocket,b.Status from
(
select Rocket,count(distinct Status) as cnt from test_1 group by Rocket
) a inner join test_1 b on b.Rocket=a.Rocket where a.cnt>1

Related

Optimization of MySQL query have millions of records

OBJECTIVE: Need query to count all "distinct" leads outside of current company that do not exist in current company. The query needs to account for millions of records between multiple tables (lead_details, domains, company)
EXAMPLE:
company 1 -> domain 1 -> lead 1 lead_details records exists.
company 2 -> domain 2 -> lead 1 lead_details records exists.
company 2 -> domain 2 -> lead 2 lead_details records exists.
company 3 -> domain 3 -> lead 2 lead_details records exists.
company 3 -> domain 3 -> lead 3 lead_details records exists.
RESULT: If I run the query for the data above on company 1, the result should be a count of (2) since lead 2 & lead 3 is unique and does not exist in company 1
domain_id domain_name company_id company_name lead_id lead_count
"2" "D2" "2" "C2" "2" "2"
"3" "D3" "3" "C3" "3" "1"
Here is my Query, Please let me know if anyone has any better suggestion.
SELECT al.*
FROM (
SELECT
d.id AS domain_id,
d.name AS domain_name,
c.id AS company_id,
c.name AS company_name,
ld.lead_id,
count(ld.lead_id) as lead_count
FROM domains d
INNER JOIN company c
ON (c.id = d.company_id AND c.id != 1)
INNER JOIN lead_details ld
ON (ld.domain_id = d.id)
GROUP BY ld.lead_id
) al
LEFT JOIN (
SELECT ld.lead_id FROM domains d
INNER JOIN company c
ON (c.id = d.company_id AND c.id = 1)
INNER JOIN lead_details ld
ON (ld.domain_id = d.id)
) ccl
ON al.lead_id = ccl.lead_id
WHERE ccl.lead_id IS NULL;
I have almost million rows, so need to figure out better solution..
Plan A
The pattern
FROM ( SELECT ... )
JOIN ( SELECT ... ) ON ...
is inefficient, especially in older versions of MySQL. This is because neither of the subqueries has any indexes, so (in older versions) a repeated full table scan is needed of one of the subqueries.
The better method is to try to reformulate as
FROM t1 ...
JOIN t2 ... ON ...
JOIN t3 ... ON ...
LEFT JOIN t4 ... ON ...
LEFT JOIN t5 ... ON ...
Plan B
This is closer to what you have...
CREATE TEMPORARY TABLE ccl
( INDEX(lead_id) )
SELECT ... -- the stuff that is after LEFT JOIN
Then replace that subquery with just ccl. This provides the index that is missing from the original query.
Plan C
Summary Table. (This may or may not be practical for your query, since you are looking distinct and do not exist.) Every month (or week or whatever) calculate subtotals for the last month and store it into another table. Then the query against this other table will be much faster.

Eliminate certain duplicated rows after group by

With this db:
Chef(cid,cname,age),
Recipe(rid,rname),
Cooked(orderid,cid,rid,price)
Customers(cuid,orderid,time,daytime,age)
[cid means chef id, and so on]
Given orders from customers, I need to find for each chef, the difference between his age and the average of people who ordered his/her meals.
I wrote the following query:
select cid, Ch.age - AVG(Cu.age) as Diff
from Chef Ch NATURAL JOIN Cooked Co,Customers Cu
where Co.orderid = Cu.orderid
group by cid
This solves the problem, but if you assume that customers has their unique id, it might not work,because then one can order two meals of the same chef and affect the calculation.
Now I know it can be answered with NOT EXISTS but I'm looking for a soultion which includes the group by function (something similar to what I wrote). So far I couldn't find (I searched and tried many ways, from select distinct , to manipulation in the where clause ,to "having count(distinct..)" )
Edit: People asked for an exmaple. i'm coding using SQLFiddle and it crashes alot, so I'll try my best:
cid | cuid | orderid | Cu.age
-----------------------------
1 333 1 20
1 200 2 41
1 200 5 41
2 4 3 36
Let's say Chef 1's age is 50 . My query will give you 50 - (20+40+40/3) = 16 and 2/3. althought it should actually be 50 - (20+40/2) = 20. (because the guy with id 200 ordered two recipes of our beloved Chef 1.).
Assume Chef 2's age is 47. My query will result:
cid | Diff
----------
1 16.667
2 11
Another edit: I wasn't taught any particular sql-query form.So I really have no idea what are the differences between Oracle's to MySql's to Microsoft Server's, so I'm basically "freestyle" querying.(I hope it will be good in my exam as well :O )
First, you should write your query as:
select cid, Ch.age - AVG(Cu.age) as Diff
from Chef Ch join
Cooked Co
on ch.cid = co.cid join
Customers Cu
on Co.orderid = Cu.orderid
group by cid;
Two different reasons:
NATURAL JOIN is just a bug waiting to happen. List the columns that you want used for the join, lest an unexpected field or spelling difference affect the results.
Never use commas in the FROM clause. Always use explicit JOIN syntax.
Next, the answer to your question is more complicated. For each chef, we can get the average age of the customers by doing:
select cid, avg(age)
from (select distinct co.cid, cu.cuid, cu.age
from Cooked Co join
Customers Cu
on Co.orderid = Cu.orderid
) c
group by cid;
Then, for the difference, you need to bring that information in as well. One method is in the subquery:
select cid, ( age - avg(cuage) ) as diff
from (select distinct co.cid, cu.cuid, cu.age as cuage, c.age as cage
from Chef c join
Cooked Co
on ch.cid = co.cid join
Customers Cu
on Co.orderid = Cu.orderid
) c
group by cid, cage;

Marking Records as duplicates in mySQL

I am not a databases guy,but I have been given the "fun" job of cleaning up someone else's database. We have many duplicate record in our databases and some of customers are getting double or triple billed every month.
Given the following Database example
:
Table: Customers
ID Name Phone DoNotBill
1 Acme Inc 5125551212 No
2 ABC LLC 7138221661 No
3 Big Inc 4132229807 No
4 Acme 5125551212 No
5 Tree Top 2127657654 No
Is it possible to write a query that Identifies the all duplicate phone numbers (in this case records 1 and 4) and then marks and duplicate records yes by updating the DoNotBill column. But leaves the first record unmarked.
In this example case we would be left with:
ID Name Phone DoNotBill
1 Acme Inc 5125551212 No
2 ABC LLC 7138221661 No
3 Big Inc 4132229807 No
4 Acme 5125551212 Yes
5 Tree Top 2127657654 No
something like this?
UPDATE
customers cust,
(SELECT
c1.ID,
c1.name,
c1.phone,
c1.DoNotBill
FROM customers c
LEFT JOIN
(SELECT
cc.ID
FROM customers cc
) as c1 on c1.phone = c.phone
) dup
SET cust.DoNotBill = 'Yes' WHERE cust.id=dup.id ;
To begin with I assume that the DoNotBill column only has two possible values; yes and no. In that case it should be bool instead of varchar, meaning it would be either true or false.
Furthermore I don't get the meaning of the DoNotBill column. Why wouldn't you just use something like this?
select distinct phone from customers
SQL SELECT DISTINCT
That would give you the phone numbers without duplicates and without the need for an extra column.
This depends on ur data amount
You can do it in steps and make use some tools like excel...
This qrt
SELECT a.id,b.id,a.phone FROM clients a , clients b WHERE
A.phone =b.phone
And a.id!=b.id
The result is all duplicated records.
Add
Group by a.phone
And u will get 1 record for each 2 duplicates.
if you like the records and they are whT u need. ChNge select to select a.id and
Use this qry as subqry to an update sql statement
UPDATE clients SET billing='no' WHERE id IN ( sql goes here)
UPDATE customers c SET c.DoNotBill="Yes";
UPDATE customers c
JOIN (
SELECT MIN( ID ) ID, Phone
FROM customers
GROUP BY Phone
) u ON c.ID = u.ID AND c.Phone = u.Phone
SET c.DoNotBill="No";
That way not only duplicates are eliminated, but all multiple entries are dealt with.

Mysql select query for getting co-workers

I have a many to many relationship between some Jobs and some Workers.
Each worker has a property like age. I want to get all the Jobs and Workers that collaborate with the workers of age 22 (for example), including the workers of age 22.
For example if A and B are two workers who do the Job X and one of them is 22 years old, I want a query to return both A and B (joined with the X and its properties)
I have three tables:
Job
1 JobI
2 JobII
Workers:
A Smith 22
B John 21
C Jack 23
J-W relation
1 A
1 B
2 B
2 C
In this example I want A and B info and Job I because A is 22 years old and collaborate with B in Job I
Something like
Select * From Workers
join (Select Distinct WorkerID From WorkerJobs Join Workers on Worker.WorkerID = WorkerJobs.WorkerID and Worker.Age = 22) worker22 on worker22.workerid = worker.workerid
join Jobs on jobs.jobid = workerjobs.jobid
join WorkerJobs on Workerjobs.workerid = workers.workerid
and WorkerJobs.JobId = Jobs.JobID
Ie get all the jobs with a 22 year old worker, then join back to jobs and workers to get the details.
Any 22 year old with more than one job will repeat as will any job with more than one 2 year old worker though.
Your question is a bit confusing.. Do you want all jobs for workers of age 22 or want to return A and B? Still I will try to answer both.
Lets say in your jobs table you have job_id,worker_id,job_description.... and lets say in your worker table you have worker_id,age,other description
In order to get all the jobs and workers with age 22 run following query.
SELECT jobs.*,workers.* FROM jobs,workers WHERE jobs.worker_id = workers.worker_id AND workers.age=22;
This will return all the jobs data and workers data associated with workers with age 22.
Hope this answers your question
Assuming your schema looks like this:
Table workers {
id,
age,
...
}
Table jobs {
id,
worker_id,
...
}
To answer your question, you need a query with a subquery.
The inner query
select jobs.id
from jobs
left join workers on workers.id = jobs.worker_id
where age = 22
returns all the jobs, who have workers aged 22.
The outer query
select *
from jobs, workers
where jobs.worker_id = workers.id
and jobs.id in (INNER QUERY)
selects all the jobs and workers, who have a job in the inner query.
The end result:
select *
from jobs, workers
where jobs.worker_id = workers.id
and jobs.id in (select jobs.id
from jobs
left join workers on workers.id = jobs.worker_id
where age = 22)

MySQL selecting rows with a max id and matching other conditions

Using the tables below as an example and the listed query as a base query, I want to add a way to select only rows with a max id! Without having to do a second query!
TABLE VEHICLES
id vehicleName
----- --------
1 cool car
2 cool car
3 cool bus
4 cool bus
5 cool bus
6 car
7 truck
8 motorcycle
9 scooter
10 scooter
11 bus
TABLE VEHICLE NAMES
nameId vehicleName
------ -------
1 cool car
2 cool bus
3 car
4 truck
5 motorcycle
6 scooter
7 bus
TABLE VEHICLE ATTRIBUTES
nameId attribute
------ ---------
1 FAST
1 SMALL
1 SHINY
2 BIG
2 SLOW
3 EXPENSIVE
4 SHINY
5 FAST
5 SMALL
6 SHINY
6 SMALL
7 SMALL
And the base query:
select a.*
from vehicle a
join vehicle_names b using(vehicleName)
join vehicle_attribs c using(nameId)
where c.attribute in('SMALL', 'SHINY')
and a.vehicleName like '%coo%'
group
by a.id
having count(distinct c.attribute) = 2;
So what I want to achieve is to select rows with certain attributes, that match a name but only one entry for each name that matches where the id is the highest!
So a working solution in this example would return the below rows:
id vehicleName
----- --------
2 cool car
10 scooter
if it was using some sort of max on the id
at the moment I get all the entries for cool car and scooter.
My real world database follows a similar structure and has 10's of thousands of entries in it so a query like above could easily return 3000+ results. I limit the results to 100 rows to keep execution time low as the results are used in a search on my site. The reason I have repeats of "vehicles" with the same name but only a different ID is that new models are constantly added but I keep the older one around for those that want to dig them up! But on a search by car name I don't want to return the older cards just the newest one which is the one with the highest ID!
The correct answer would adapt the query I provided above that I'm currently using and have it only return rows where the name matches but has the highest id!
If this isn't possible, suggestions on how I can achieve what I want without massively increasing the execution time of a search would be appreciated!
If you want to keep your logic, here what I would do:
select a.*
from vehicle a
left join vehicle a2 on (a.vehicleName = a2.vehicleName and a.id < a2.id)
join vehicle_names b on (a.vehicleName = b.vehicleName)
join vehicle_attribs c using(nameId)
where c.attribute in('SMALL', 'SHINY')
and a.vehicleName like '%coo%'
and a2.id is null
group by a.id
having count(distinct c.attribute) = 2;
Which yield:
+----+-------------+
| id | vehicleName |
+----+-------------+
| 2 | cool car |
| 10 | scooter |
+----+-------------+
2 rows in set (0.00 sec)
As other said, normalization could be done on few levels:
Keeping your current vehicle_names table as the primary lookup table, I would change:
update vehicle a
inner join vehicle_names b using (vehicleName)
set a.vehicleName = b.nameId;
alter table vehicle change column vehicleName nameId int;
create table attribs (
attribId int auto_increment primary key,
attribute varchar(20),
unique key attribute (attribute)
);
insert into attribs (attribute)
select distinct attribute from vehicle_attribs;
update vehicle_attribs a
inner join attribs b using (attribute)
set a.attribute=b.attribId;
alter table vehicle_attribs change column attribute attribId int;
Which led to the following query:
select a.id, b.vehicleName
from vehicle a
left join vehicle a2 on (a.nameId = a2.nameId and a.id < a2.id)
join vehicle_names b on (a.nameId = b.nameId)
join vehicle_attribs c on (a.nameId=c.nameId)
inner join attribs d using (attribId)
where d.attribute in ('SMALL', 'SHINY')
and b.vehicleName like '%coo%'
and a2.id is null
group by a.id
having count(distinct d.attribute) = 2;
The table does not seems normalized, however this facilitate you to do this :
select max(id), vehicleName
from VEHICLES
group by vehicleName
having count(*)>=2;
I'm not sure I completely understand your model, but the following query satisfies your requirements as they stand. The first sub query finds the latest version of the vehicle. The second query satisfies your "and" condition. Then I just join the queries on vehiclename (which is the key?).
select a.id
,a.vehiclename
from (select a.vehicleName, max(id) as id
from vehicle a
where vehicleName like '%coo%'
group by vehicleName
) as a
join (select b.vehiclename
from vehicle_names b
join vehicle_attribs c using(nameId)
where c.attribute in('SMALL', 'SHINY')
group by b.vehiclename
having count(distinct c.attribute) = 2
) as b on (a.vehicleName = b.vehicleName);
If this "latest vehicle" logic is something you will need to do a lot, a small suggestion would be to create a view (see below) which returns the latest version of each vehicle. Then you could use the view instead of the find-max-query. Note that this is purely for ease-of-use, it offers no performance benefits.
select *
from vehicle a
where id = (select max(b.id)
from vehicle b
where a.vehiclename = b.vehiclename);
Without going into proper redesign of you model you could
1) Add a column IsLatest that your application could manage.
This is not perfect but will satisfy you question (until next problem, see not at the end)
All you need is when you add a new entry to issue queries such as
UPDATE a
SET IsLatest = 0
WHERE IsLatest = 1
INSERT new a
UPDATE a
SET IsLatest = 1
WHERE nameId = #last_inserted_id
in a transaction or a trigger
2) Alternatively you can find out the max_id before you issue your query
SELECT MAX(nameId)
FROM a
WHERE vehicleName = #name
3) You can do it in single SQL, and providing indexes on (vehicleName, nameId) it should actually have decent speed with
select a.*
from vehicle a
join vehicle_names b ON a.vehicleName = b.vehicleName
join vehicle_attribs c ON b.nameId = c.nameId AND c.attribute = 'SMALL'
join vehicle_attribs d ON b.nameId = c.nameId AND d.attribute = 'SHINY'
join vehicle notmax ON a.vehicleName = b.vehicleName AND a.nameid < notmax.nameid
where a.vehicleName like '%coo%'
AND notmax.id IS NULL
I have removed your GROUP BY and HAVING and replaced it with another join (assuming that only single attribute per nameId is possible).
I have also used one of the ways to find max per group and that is to join a table on itself and filter out a row for which there are no records that have a bigger id for a same name.
There are other ways, search so for 'max per group sql'. Also see here, though not complete.