SQL LEFT JOIN on two possible columns - mysql

We are adding a table to our database schema. It has a relationship to an already existing table, which we added a foreign key for. Mind you, I didn't create this schema nor do I have permission to change much. The application has been running for a while and they are hesitant to change much.
USER_ACTIVITY_T (preexisint table - only relevant columns referred)
activity_id (pk)
username
machineid (fk - recently added)
MACHINE_T (new table)
machineid (pk - auto increment)
machinename (unique)
From the point where I added the machine table, it collects machine data; allowing users to see what machines were involved during the activity. This is useful but it only shows data from the point that it was implemented. A lead asked me to attempt to fill preexisting records by referring to the username associated with the machine. We understand that this is not 100% accurate but... yeah. Our idea was to add username to MACHINE_T and use as a way to populate the machinename in reports retroactively (which assumes that the user has only used one machine and never changed their username).
So, the new MACHINE_T table would look like:
MACHINE_T (new table)
machineid (pk - auto increment)
machinename (unique)
username
Right now, our current SQL is:
SELECT * FROM `USER_ACTIVITY_T` LEFT JOIN `MACHINE_T`
ON MACHINE_T.machineid=USER_ACTIVITY_T.machineid
Anyone have any suggestions on how to join on the username if USER_ACTIVITY_T.machineid is null but has a matching username? I'm sorry. This is an odd request that I may spend far too much time over-analyzing. Thank you for any help. I'm almost tempted to just say it can be reasonably done.

You want to select the joins from a when the joined column is not null and from b when it is null.
You dont want repeat information however so UNION may cause problems on its own.
Try only selecting the not null entries on the first join and then exclude the null entries from the second join before you union them.
So:
SELECT *
FROM `USER_ACTIVITY_T`
LEFT JOIN `MACHINE_T`
ON MACHINE_T.machineid = USER_ACTIVITY_T.machineid
UNION ALL
SELECT *
FROM `USER_ACTIVITY_T`
JOIN `MACHINE_T`
ON MACHINE_T.username = USER_ACTIVITY_T.username
WHERE USER_ACTIVITY_T.machineid IS NULL
This way you are basically using one query for the null entries and one for the not null entries and UNIONing them.

And, I just discovered the UNIION operator which will help me solve this. However, I am open to other solutions.

Related

Mysql query for select data from multiple table with comparision

I have a three tables namely profile, academic,payment and these tables having two same columns that are username and status.
my problem is how to select username from the tables where status=1 in all the tables
Typically it works like this:
SELECT * FROM profile
LEFT JOIN academic ON profile.username=academic.username
LEFT JOIN payment ON profile.username=payment.username
WHERE profile.status=1 AND academic.status=1 AND payment.status=1
As a note having username as a key is usually a bad thing, often super bad since if someone's able to change their name you need to update N other tables. You may have a circumstance where you forget to update one or more tables, then subsequently someone registers with the former name and "inherits" this data.
It's also typically very inefficient to use a string INDEX key when a user_id integer value would suffice.

Intersection of two (very) big tables

I have two tables: all_users and vip_users
all_users table has a list of all users (you don't say?) in my system and it currently has around 57k records, while vip_users table has around 37k records.
Primary key in both tables is an autoincrement id field. all_users table is big in terms of attribute count (around 20, one of them is email), while vip_users table has only (along with id) email attribute.
I wanted to query out the "nonVip" users by doing this (with help of this question here on SO):
SELECT all_users.id, all_users.email
FROM all_users
LEFT OUTER JOIN vip_users
ON (all_users.email=vip_users.email)
WHERE vip_users.email IS NULL
And now, finally coming to the problem - I ran this query in phpmyadmin and even after 20 minutes I was forced to close it and restart httpd service as it was taking too long to complete, my server load jumped over 2 and the site (which also queries the database) became useless as it was just loading too slow. So, my question is - how do I make this query? Do I make some script and run it over night - not using phpmyadmin (is this maybe where the problem lies?), or do I need to use different SQL query?
Please help with your thoughts on this.
Try indexing the fields email on both tables, that should speed up the query
CREATE INDEX useremail ON all_users(email)
CREATE INDEX vipemail ON vip_users(email)
As written, you're not getting the results you're looking for. You're looking for vip_users rows where the email matches an email in users, and is also NULL.
Is there a reason you want vip_users to have a separate id from users? If you change the vip_users id field to a fk on the users id field, yo would then change your select to:
SELECT all_users.id, all_users.email
FROM all_users
LEFT OUTER JOIN vip_users
ON (all_users.id=vip_users.id)
WHERE vip_users.email IS NULL;
There's no reason this query should take any discernible about of time. 37k records is not a very big table....
I think NOT IN is faster and used less resource than LEFT OUTER JOIN.
Can you try -
SELECT *
FROM all_users
WHERE id NOT IN (SELECT id
FROM vip_users
WHERE email IS NULL);

What is the proper way to store friendship associations in a mysql DB

I want to create a table where my users can associate a friendship between one another. Which at the same time this table will work in conjunction to what I would to be a one-to-many relation between various other tables I am attempting to work up.
Right now I am thinking of something like this
member_id, friend_id, active, date
member_id would be the column of the user making the call, friend_id would be the column of the friend they are attempting to tie to, active would be a toggle of sorts 0 = pending, 1 = active, date would just be a logged date of the last activity on that particular row.
Now my confusion is if I were to query I would typically query for member_id then base the rest of the query off of associated friend_id's to display data accordingly to the right people. So with this logic of sorts in mind, that makes me think I would have to have 2 rows per request. One where its the member_id who's requesting and the friend_id of the request inserted into the table, then one thats the opposite so I could query accordingly every time. So in essences its like double dipping for every one action requested to this particular table I need to make 2 like actions to make it work.
Which in all does not make sense to me as far as optimization goes. So in all my question is what is the proper way to handle data for relations like this? Or am I actually thinking sanely about this being an approach to handling it?
If a friendship is always mutual, then you can choose between data redundancy (i.e. both directions having a row) for the sake of simpler queries, or learn to live with slightly more complex queries. I'd personally avoid data redundancy unless there is a compelling reason otherwise - you're not just wasting space and performance, but you'll need to be careful when enforcing it - a simple CHECK is incapable of referencing other rows and depending on your DBMS a trigger may be limited in what it can do with a mutating table.
An easy way ensure to only one row per friendship is to always insert the lower value in member_id and higher value in friend_id (make a constraint CHECK (member_id < friend_id) to enforce it). Then, when you query, you'll have search in both directions - for example, finding all friends of the given person (identified by person_id) would look something like this:
SELECT *
FROM
person
WHERE
id <> :person_id
AND (
id IN (
SELECT friend_id
FROM friendship
WHERE member_id = :person_id
)
OR
id IN (
SELECT member_id
FROM friendship
WHERE friend_id = :person_id
)
)
BTW, in this scheme, you'd probably want to rename member_id and friend_id to, say, friend1_id and friend2_id...
Two ways to look at it:
WHERE ((friend_id = x AND member_id = y) OR (friend_id = y AND member_id = x))
would allow you to query by simply stating one side of the relationship. If both sides are added, this method would still work without causing duplicate rows to be returned.
Conversely, adding both sides of the relationship, so that your queries consist of
WHERE friend_id = x AND member_id = y
not only makes queries easier to write, but also easier to plan (meaning better DB performance).
My vote is for the latter option.
Beautiful - there's no problem with your table as-is.
ALSO:
I'm not sure if this cardinality is "one to many", or "many to many":
http://en.wikipedia.org/wiki/Cardinality_%28data_modeling%29
Q: I were to query I would typically query for member_id then base the
rest of the query off of associated friend_id's to display data
accordingly to the right people
A: Frankly, I don't see any problem querying "member to friend", or "friend to member" (or any other combinations - e.g. friends who share friends). Again, it looks good.
Introduce a helper table like:
users
user_id, name, ...
friendship
user_id, friend_id, ....
select u.name as user, u2.name as friend from users u
inner join friendship f on f.user_id = u.user_id
inner join users u2 on u2.user_id = f.friend_id
I think this is pretty similar to what you have, just putting a query as an example.

How to check if a given data exists in multiple tables (all of which has the same column)?

I have 3 tables, each consisting of a column called username. On the registration part, I need to check that the requested username is new and unique.
I need that single SQL that will tell me if that user exists in any of these tables, before I proceed. I tried:
SELECT tbl1.username, tbl2.username, tbl3.username
FROM tbl1,tbl2,tbl3
WHERE tbl1.username = {$username}
OR tbl2.username = {$username}
OR tbl3.username ={$username}
Is that the way to go?
select 1
from (
select username as username from tbl1
union all
select username from tbl2
union all
select username from tbl3
) a
where username = 'someuser'
In the event you honestly just want to know if a user exists:
The quickest approach is an existence query:
select
NOT EXISTS (select username from a where username = {$username}) AND
NOT EXISTS (select username from b where username = {$username}) AND
NOT EXISTS (select username from c where username = {$username});
If your username column is marked as Unique in each table, this should be the most efficient query you will be able to make to perform this operation, and this will outperform a normalized username table in terms of memory usage and, well, virtually any other query that cares about username and another column, as there are no excessive joins. If you've ever been called on to speed up an organization's database, I can assure you that over-normalization is a nightmare. In regards to the advice you've received on normalization in this thread, be wary. It's great for limiting space, or limiting the number of places you have to update data, but you have to weigh that against the maintenance and speed overhead. Take the advice given to you on this page with a grain of salt.
Get used to running a query analyzer on your queries, if for no other reason than to get in the habit of learning the ramifications of choices when writing queries -- at least until you get your sea legs.
In the event you want to insert a user later:
If you are doing this for the purpose of eventually adding the user to the database, here is a better approach, and it's worth it to learn it. Attempt to insert the value immediately. Check afterwards to see if it was successful. This way there is no room for some other database call to insert a record in between the time you've checked and the time you inserted into the database. For instance, in MySQL you might do this:
INSERT INTO {$table} (`username`, ... )
SELECT {$username} as `username`, ... FROM DUAL
WHERE
NOT EXISTS (select username from a where username = {$username}) AND
NOT EXISTS (select username from b where username = {$username}) AND
NOT EXISTS (select username from c where username = {$username});
All database API's I've seen, as well as all SQL implementations will provide you a way to discover how many rows were inserted. If it's 1, then the username didn't exist and the insertion was successful. In this case, I don't know your dialect, and so I've chosen MySQL, which provides a DUAL table specifically for returning results that aren't bound to a table, but honestly, there are many ways to skin this cat, whether you put it in a transaction or a stored procedure, or strictly limit the process and procedure that can access these tables.
Update -- How to handle users who don't complete the sign up process
As #RedFilter points out, if registration is done in multiple steps -- reserving a username, filling out details, perhaps answering an email confirmation, then you will want to at least add a column to flag this user (with a timestamp, not a boolean) so that you can periodically remove users after some time period, though I recommend creating a ToBePurged table and add new users to that, along with a timestamp. When the confirmation comes through, you remove the user from this table. Periodically you will check this table for all entries prior to some delta off your current time and simply delete them from whichever table they were originally added. My philosophy behind this is to define more clearly the responsibility of the table and to keep the number of records you are working with very lean. We certainly don't want to over-engineer our solutions, but if you get into the habit of good architectural practices, these designs will flow out as naturally as their less efficient counterparts.
No. Two processes could run your test at the same time and both would report no user and then both could insert the same user.
It sounds like you need a single table to hold ALL the users with a unique index to prevent duplicates. This master table could link to 'sub-tables' using a user ID, not user name.
Given the collation stuff, you could do this instead, if you don't want to deal with the collation mismatch:
select sum(usercount) as usercount
from (
select count(*) as usercount from tbl1 where username = 'someuser'
union all
select count(*) as usercount from tbl2 where username = 'someuser'
union all
select count(*) as usercount from tbl3 where username = 'someuser'
) as usercounts
If you get 0, there isn't a user with that username, if you get something higher, there is.
Note: Depending on how you do the insert, you could in theory get more than one user with the same username due to race conditions (see other comments about normalisation and unique keys).
1- You need to normalize your tables
See: http://databases.about.com/od/specificproducts/a/normalization.htm
2- Don't use implicit SQL '89 joins.
Kick the habit and use explicit joins
SELECT a.field1, b.field2, c.field3
FROM a
INNER JOIN b ON (a.id = b.a_id) -- JOIN criteria go here
INNER JOIN c ON (b.id = c.b_id) -- and here, nice and explicit.
WHERE ... -- filter criteria go here.
With your current set up RedFilter's answer should work fine. I thought it would be worth noting that you shouldn't have redundant or dispersed data in your database to begin with though.
You should have one and only one place to store any specific data - so in your case, instead of having a username in 3 different tables, you should have one table with username and a primary key identifier for those usernames. Your other 3 tables should then foreign-key reference the username table. You'll be able to construct much simpler and more efficient queries with this layout. You're opening a can of worms by replicating data in various locations.

Storing Friends in Database for Social Network

For storing friends relationships in social networks, is it better to have another table with columns relationship_id, user1_id, user2_id, time_created, pending or should the confirmed friend's user_id be seralized/imploded into a single long string and stored along side with the other user details like user_id, name, dateofbirth, address and limit to like only 5000 friends similar to facebook?
Are there any better methods? The first method will create a huge table! The second one has one column with really long string...
On the profile page of each user, all his friends need to be retrieved from database to show like 30 friends similar to facebook, so i think the first method of using a seperate table will cause a huge amount of database queries?
The most proper way to do this would be to have the table of Members (obviously), and a second table of Friend relationships.
You should never ever store foreign keys in a string like that. What's the point? You can't join on them, sort on them, group on them, or any other things that justify having a relational database in the first place.
If we assume that the Member table looks like this:
MemberID int Primary Key
Name varchar(100) Not null
--etc
Then your Friendship table should look like this:
Member1ID int Foreign Key -> Member.MemberID
Member2ID int Foreign Key -> Member.MemberID
Created datetime Not Null
--etc
Then, you can join the tables together to pull a list of friends
SELECT m.*
FROM Member m
RIGHT JOIN Friendship f ON f.Member2ID = m.MemberID
WHERE f.MemberID = #MemberID
(This is specifically SQL Server syntax, but I think it's pretty close to MySQL. The #MemberID is a parameter)
This is always going to be faster than splitting a string and making 30 extra SQL queries to pull the relevant data.
Separate table as in method 1.
method 2 is bad because you would have to unserialize it each time and wont be able to do JOINS on it; plus UPDATE's will be a nightmare if a user changes his name, email or other properties.
sure the table will be huge, but you can index it on Member11_id, set the foreign key back to your user table and could have static row sizes and maybe even limit the amount of friends a single user can have. I think it wont be an issue with mysql if you do it right; even if you hit a few million rows in your relationship table.