What I am trying to do is compare different snapshots of data, to calculate changes over time. It's a report on computers. I want to find if one combination in a table matches the same combination in another table. For example:
April_Table May_Table
Computer User Computer User
192.168.1.1 Jim 192.168.1.1 John
192.168.1.2 Jerry 192.168.1.2 Jerry
So, the query would return 192.168.1.2 Jerry.
I've done this with one combination before, but I've never done it with two before. I haven't found a way to do this with two.
I'm expecting to find 192.168.1.2 Jerry.
With an INNER JOIN:
select a.*
from April_Table as a inner join May_Table as m
on m.Computer = a.Computer and m.User = a.User
or with EXISTS:
select a.*
from April_Table as a
where exists (
select 1 from May_Table as m
where m.Computer = a.Computer and m.User = a.User
)
Related
I have an issue with data redundancy. My JOIN query in MySQL creates a very large data set (~8mb) while a lot of the data is redundant. After analysis, I can see that the query is fast, but the data transfer can take several seconds. What options do I have?
For example, say that I have the two tables
Users:
user_id
user_name
1
Alex
2
Joe
And Purchases:
user_id
purchase_id
purchase_amount
1
A
100
2
B
200
1
C
300
1
D
400
If I simply LEFT Join the tables with
SELECT users.user_id, users.user_name, purchase_id, purchase_amount
FROM Users
LEFT JOIN purchases ON users.id = purchases.user_id
I will end up with a result:
user_id
user_name
purchase_id
purchase_amount
1
Alex
A
100
2
Joe
B
200
1
Alex
C
300
1
Alex
D
400
However, as we can see, the user_id 1 and user_name Alex exists in three places. For very large result sets this can become an issue.
I'm thinking about using GROUP BY and GROUP_CONCAT to reduce the redundancy. Is this in general a good idea? My first tests seem to work, but I have to set the MySQL SET SESSION group_concat_max_len = 1000000; which might not be a good thing since I don't know what to set it to.
For example I could do something like
SELECT user_id, user_name, GROUP_CONCAT(CONCAT(purchase_id, ':', purchase_amount))
FROM Users
LEFT JOIN purchases ON users.id = purchases.user_id
GROUP BY user_id, user_name
And end up with a result:
user_id
user_name
GROUP_CONCAT...
1
Alex
A:100,C:300,D:400
2
Joe
B:200
Are there any other options for me? Is this the way to go? Parsing the concatenated column is not an issue. I am trying to solve the large data set being returned.
we can have temp table in between?
use apache spark's map reduce to get data in desired format.
I have a relation between users and groups. Users can be in a group or not.
EDIT : Added some stuff to the model to make it more convenient.
Let's say I have a rule to add users in a group considering it has a specific town, and a custom metadata like age 18).
Curently, I do that to know which users I have to add in the group of the people living in Paris who are 18:
SELECT user.id AS 'id'
FROM user
LEFT JOIN
(
SELECT user_id
FROM user_has_role_group
WHERE role_group_id = 1 -- Group for Paris
)
AS T1
ON user.id = T1.user_id
WHERE
(
user.town = 'Paris' AND JSON_EXTRACT('custom_metadata', '$.age') = 18
)
AND T1.user_id IS NULL
It works & gives me the IDs of the users to insert in group.
But when I have 50 groups to proceed, like for 50 town or various ages, it forces me to do 50 requests, it's very slow and not efficient for my Database.
How could I generate a result for each group ?
Something like :
role_group_id user_to_add
1 1
1 2
2 1
2 3
The only way I know to do that for now is to do an UNION on several sub queries like the one above, but of course it's very slow.
Note that the custom_metadata field is a user defined field. I can't create specific columns or tables.
Thanks a lot for your help.
if I good understood you:
select user.id, grp.id
from user, role_group grp
where (user.id, grp.id) not in (select user_id, role_group_id from user_has_role_group) and user.town in ('Paris', 'Warsav')
that code give list of users and group which they not belong from one of towns..
To add the missing entries to user_has_role_group, you might want to have some mapping between those town names and their group_id's.
The example below is just using a subquery with unions for that.
But you could replace that with a select from a table.
Maybe even from role_group, if those names correlate with the user town names.
insert into user_has_role_group (user_id, group_id)
select u.user_id, g.group_id
from user u
join (
select 'Paris' as name, 1 as group_id union all
select 'Rome', 2
-- add more towns here
) g on (u.town = g.name)
left join user_has_role_group ug
on (ug.user_id = u.user_id and ug.role_group_id = g.group_id)
where u.town in ('Paris','Rome') -- add more towns here
and json_extract(u.custom_metadata, '$.age') = 18
and ug.id is null;
With this db:
Chef(cid,cname,age),
Recipe(rid,rname),
Cooked(orderid,cid,rid,price)
Customers(cuid,orderid,time,daytime,age)
[cid means chef id, and so on]
Given orders from customers, I need to find for each chef, the difference between his age and the average of people who ordered his/her meals.
I wrote the following query:
select cid, Ch.age - AVG(Cu.age) as Diff
from Chef Ch NATURAL JOIN Cooked Co,Customers Cu
where Co.orderid = Cu.orderid
group by cid
This solves the problem, but if you assume that customers has their unique id, it might not work,because then one can order two meals of the same chef and affect the calculation.
Now I know it can be answered with NOT EXISTS but I'm looking for a soultion which includes the group by function (something similar to what I wrote). So far I couldn't find (I searched and tried many ways, from select distinct , to manipulation in the where clause ,to "having count(distinct..)" )
Edit: People asked for an exmaple. i'm coding using SQLFiddle and it crashes alot, so I'll try my best:
cid | cuid | orderid | Cu.age
-----------------------------
1 333 1 20
1 200 2 41
1 200 5 41
2 4 3 36
Let's say Chef 1's age is 50 . My query will give you 50 - (20+40+40/3) = 16 and 2/3. althought it should actually be 50 - (20+40/2) = 20. (because the guy with id 200 ordered two recipes of our beloved Chef 1.).
Assume Chef 2's age is 47. My query will result:
cid | Diff
----------
1 16.667
2 11
Another edit: I wasn't taught any particular sql-query form.So I really have no idea what are the differences between Oracle's to MySql's to Microsoft Server's, so I'm basically "freestyle" querying.(I hope it will be good in my exam as well :O )
First, you should write your query as:
select cid, Ch.age - AVG(Cu.age) as Diff
from Chef Ch join
Cooked Co
on ch.cid = co.cid join
Customers Cu
on Co.orderid = Cu.orderid
group by cid;
Two different reasons:
NATURAL JOIN is just a bug waiting to happen. List the columns that you want used for the join, lest an unexpected field or spelling difference affect the results.
Never use commas in the FROM clause. Always use explicit JOIN syntax.
Next, the answer to your question is more complicated. For each chef, we can get the average age of the customers by doing:
select cid, avg(age)
from (select distinct co.cid, cu.cuid, cu.age
from Cooked Co join
Customers Cu
on Co.orderid = Cu.orderid
) c
group by cid;
Then, for the difference, you need to bring that information in as well. One method is in the subquery:
select cid, ( age - avg(cuage) ) as diff
from (select distinct co.cid, cu.cuid, cu.age as cuage, c.age as cage
from Chef c join
Cooked Co
on ch.cid = co.cid join
Customers Cu
on Co.orderid = Cu.orderid
) c
group by cid, cage;
I have a table of machines, and a table representing the reachability if these machines across time.
machines
id name
1 machine1
2 machine2
3 machine3
4 machine4
machines_reachability
machine_id is_reachable time
1 0 (whatever)
2 1 (whatever)
3 0 (whatever)
1 1 (whatever)
2 0 (whatever)
3 0 (whatever)
1 1 (whatever)
2 1 (whatever)
3 1 (whatever)
I'm trying to find machines that has NO reachability records (i.e. machine4) using JOINS. This can be done in another ways but I need to do this through joins to have a better understanding of it.
I tried the following
SELECT * FROM machines m LEFT OUTER JOIN machines_reachability mr ON m.id = mr.machine_id
I understand that this should output the whole left table contents (i.e. machines) and the OUTER keyword should exclude the intersection of results between machines and machines_reachability tables based on the condition m.id = mr.machine_id. But that didn't work as I expected. It showed all contents but didn't exclude the rows that didn't match.
So how can I run a JOIN query that actually shows the rows that didn't join whether it's the left table or the right one selectively.
Select distinct machines.names where machines natural left outer join machines_rechability where is_reachable is null
Using joins:
select *
from machines m left outer join
machines_reachability mr
on m.id = mr.machine_id and
mr.is_reachable = 1
where mr.machine_id is NULL
The idea is to start with all the machines. The left join keeps all records in the first table, even those that do not match. There is a match in the second table when a machine is reachable (I assume the record has to have the flag set as well as being in the table). The final where clause keeps only machines that have no match in the second table.
what about
SELECT * from machines where not exists
(
select machine_id from machines_reachability
where machines.id = machines_reachability.machine_id
);
SELECT *
FROM machines m
JOIN machines_reachability mr
ON (m.id <> mr.machine_id)
GROUP BY m.id;
I'm modifying phpBB's table to have bidirectional relationships for friends. Unfortuntately, people that have already added friends have created duplicate rows:
user1 user2 friend
2 3 true
3 2 true
2 4 true
So I'd like to remove rows 1 and 2 from the example above. Currently, this is my query built (doesn't work atm):
DELETE FROM friends WHERE user1 IN (SELECT user1 FROM (SELECT f1.user1 FROM friends f1, friends f2 WHERE f1.user1=f2.user2 AND f1.user2=f2.user1 GROUP BY f1.user1) AS vtable);
inspired by Mysql Duplicate Rows ( Duplicate detected using 2 columns ), but the difference is that I don't have the unique ID column and I'd like stay away from having an extra column.
Apologies if this isn't 100% legal MySQL, I'm a MSSQL user...
DELETE F1
FROM friends F1
INNER JOIN friends F2
ON F2.user1 = F1.user2
AND F2.user2 = F1.user1
WHERE F1.user1 < F1.user2
DELETE r
FROM friends l, friends r
WHERE l.user1 = r.user2
AND l.user2 = r.user1
This deletes both entries. If you like to keep on of them you have to add a where statement like Will A alread proposed, but i suggest you to use > instead of < to keep the smaller user1 id. Just looks better :)