Intersection of two (very) big tables

Intersection of two (very) big tables - mysql

I have two tables: all_users and vip_users
all_users table has a list of all users (you don't say?) in my system and it currently has around 57k records, while vip_users table has around 37k records.
Primary key in both tables is an autoincrement id field. all_users table is big in terms of attribute count (around 20, one of them is email), while vip_users table has only (along with id) email attribute.
I wanted to query out the "nonVip" users by doing this (with help of this question here on SO):
SELECT all_users.id, all_users.email
FROM all_users
LEFT OUTER JOIN vip_users
ON (all_users.email=vip_users.email)
WHERE vip_users.email IS NULL
And now, finally coming to the problem - I ran this query in phpmyadmin and even after 20 minutes I was forced to close it and restart httpd service as it was taking too long to complete, my server load jumped over 2 and the site (which also queries the database) became useless as it was just loading too slow. So, my question is - how do I make this query? Do I make some script and run it over night - not using phpmyadmin (is this maybe where the problem lies?), or do I need to use different SQL query?
Please help with your thoughts on this.

Try indexing the fields email on both tables, that should speed up the query
CREATE INDEX useremail ON all_users(email)
CREATE INDEX vipemail ON vip_users(email)

As written, you're not getting the results you're looking for. You're looking for vip_users rows where the email matches an email in users, and is also NULL.
Is there a reason you want vip_users to have a separate id from users? If you change the vip_users id field to a fk on the users id field, yo would then change your select to:
SELECT all_users.id, all_users.email
FROM all_users
LEFT OUTER JOIN vip_users
ON (all_users.id=vip_users.id)
WHERE vip_users.email IS NULL;
There's no reason this query should take any discernible about of time. 37k records is not a very big table....

I think NOT IN is faster and used less resource than LEFT OUTER JOIN.
Can you try -
SELECT *
FROM all_users
WHERE id NOT IN (SELECT id
FROM vip_users
WHERE email IS NULL);

Related

MySQL using two databases in same query. Do not want to combine tables

New to MySQL, searched for answers, but multiple database questions seem to be all about combining tables, that's not what I'm after.
In new database, I duplicated a table from old database, with most columns but not all.
I need to get customer number from old database where customer name has 'Co.' in their name. This should return 14 or so customers with about 80 rows
I then need to delete all orders in the second database that has those customer numbers.
Is this possible with a subquery? that's where I am stuck right now.
Thanks so much.

You can work with multiple databases on one query:
Ex:
SELECT database1.col1, database2.col2
FROM database1.options, database2.options
WHERE database1.option_name="sort_order"

Say you have a table of SO threads (threads_table) from which you need to eliminate duplicates that you have already identified in the Problem field of another table (problem_log).
DELETE FROM `threads_table`
WHERE `thread_ID` IN
(SELECT `Thread_ID` from `problem_log_table`
WHERE `Problem`='Duplicate');
Edited to add:
Here's one way to do it, if I'm understanding your needs correctly. (Btw, I've assumed away the added complexity of working with tables it two different databases.)
DELETE FROM tbl2
WHERE tbl2.customer_num IN
(SELECT tbl1.customer_num from `tbl1`
WHERE tbl1.customer_name LIKE '%Co.%');

SQL LEFT JOIN on two possible columns

We are adding a table to our database schema. It has a relationship to an already existing table, which we added a foreign key for. Mind you, I didn't create this schema nor do I have permission to change much. The application has been running for a while and they are hesitant to change much.
USER_ACTIVITY_T (preexisint table - only relevant columns referred)
activity_id (pk)
username
machineid (fk - recently added)
MACHINE_T (new table)
machineid (pk - auto increment)
machinename (unique)
From the point where I added the machine table, it collects machine data; allowing users to see what machines were involved during the activity. This is useful but it only shows data from the point that it was implemented. A lead asked me to attempt to fill preexisting records by referring to the username associated with the machine. We understand that this is not 100% accurate but... yeah. Our idea was to add username to MACHINE_T and use as a way to populate the machinename in reports retroactively (which assumes that the user has only used one machine and never changed their username).
So, the new MACHINE_T table would look like:
MACHINE_T (new table)
machineid (pk - auto increment)
machinename (unique)
username
Right now, our current SQL is:
SELECT * FROM `USER_ACTIVITY_T` LEFT JOIN `MACHINE_T`
ON MACHINE_T.machineid=USER_ACTIVITY_T.machineid
Anyone have any suggestions on how to join on the username if USER_ACTIVITY_T.machineid is null but has a matching username? I'm sorry. This is an odd request that I may spend far too much time over-analyzing. Thank you for any help. I'm almost tempted to just say it can be reasonably done.

You want to select the joins from a when the joined column is not null and from b when it is null.
You dont want repeat information however so UNION may cause problems on its own.
Try only selecting the not null entries on the first join and then exclude the null entries from the second join before you union them.
So:
SELECT *
FROM `USER_ACTIVITY_T`
LEFT JOIN `MACHINE_T`
ON MACHINE_T.machineid = USER_ACTIVITY_T.machineid
UNION ALL
SELECT *
FROM `USER_ACTIVITY_T`
JOIN `MACHINE_T`
ON MACHINE_T.username = USER_ACTIVITY_T.username
WHERE USER_ACTIVITY_T.machineid IS NULL
This way you are basically using one query for the null entries and one for the not null entries and UNIONing them.

And, I just discovered the UNIION operator which will help me solve this. However, I am open to other solutions.

Mysql query for select data from multiple table with comparision

I have a three tables namely profile, academic,payment and these tables having two same columns that are username and status.
my problem is how to select username from the tables where status=1 in all the tables

Typically it works like this:
SELECT * FROM profile
LEFT JOIN academic ON profile.username=academic.username
LEFT JOIN payment ON profile.username=payment.username
WHERE profile.status=1 AND academic.status=1 AND payment.status=1
As a note having username as a key is usually a bad thing, often super bad since if someone's able to change their name you need to update N other tables. You may have a circumstance where you forget to update one or more tables, then subsequently someone registers with the former name and "inherits" this data.
It's also typically very inefficient to use a string INDEX key when a user_id integer value would suffice.

How to query 3 mysql tables and return matching results (with one to many relationships)?

I am trying to query a database to return some matching records and can't work out how to do it in the most efficient way. I have a TUsers table, a TJobsOffered table and a TJobsRequested table. The UserID is the primary key for the TUsers table and is used within the Job tables in a one to many relationship.
Ultimately I want to run a query that returns a list of all matching users based on a particular UserID (eg a matching user is one that has at least one matching record in both tables, eg if UserA has jobid 999 listed in TJobsOffered and UserB has jobid 999 listed in TJobsRequested then this is a match).
In order to try and get my head around it i've simplified it down a lot and am trying to match the records based on the jobids for the user in question, eg:
SELECT DISTINCT TJobsOffered.FUserID FROM TJobsOffered, TJobsRequested
WHERE TJobsOffered.FUserID=TJobsRequested.FUserID AND
(TJobsRequested.FJobID='12' OR TJobsRequested.FJobID='30') AND
(TJobsOffered.FJobID='86' OR TJobsOffered.FJobID='5')
This seems to work fine and returns the correct results however when I introduce the TUsers table (so I can access user information) it starts returning incorrect results. I can't work out why the following query doesn't return the same results as the one listed above as surely it's still matching the same information just with a different connector (or is the one above effectively many to many and the one below 2 sets of one to many comparisons)?
SELECT DISTINCT TUsers.Fid, TUsers.FName FROM TUsers, TJobsOffered, TJobsRequested
WHERE TUsers.Fid=TJobsRequested.FUserID AND TUsers.Fid=TJobsOffered.FUserID AND
(TJobsRequested.FJobID='12' OR TJobsRequested.FJobID='30') AND
(TJobsOffered.FJobID='86' OR TJobsOffered.FJobID='5')
If anyone could explain where i'm going wrong with the second query and how you should incorporate TUsers then that would be greatly appreciated as I can't get my head around the join. If you are able to give me any pointers as to how I can do this all in one query by just passing the user id in then that would be massively appreciated as well! :)
Thanks so much,
Dave

Try this
SELECT DISTINCT TJobsOffered.FUserID , TUsers.FName
FROM TJobsOffered
INNER JOIN TJobsRequested ON TJobsOffered.FUserID=TJobsRequested.FUserID
LEFT JOIN TUsers ON TUsers.Fid=TJobsOffered.FUserID
WHERE
(TJobsRequested.FJobID (12,30) AND
(TJobsOffered.FJobID IN (86 ,5)

You need to add "AND TJobsOffered.FUserID=TJobsRequested.FUserID" to your where clause.

Storing Friends in Database for Social Network

For storing friends relationships in social networks, is it better to have another table with columns relationship_id, user1_id, user2_id, time_created, pending or should the confirmed friend's user_id be seralized/imploded into a single long string and stored along side with the other user details like user_id, name, dateofbirth, address and limit to like only 5000 friends similar to facebook?
Are there any better methods? The first method will create a huge table! The second one has one column with really long string...
On the profile page of each user, all his friends need to be retrieved from database to show like 30 friends similar to facebook, so i think the first method of using a seperate table will cause a huge amount of database queries?

The most proper way to do this would be to have the table of Members (obviously), and a second table of Friend relationships.
You should never ever store foreign keys in a string like that. What's the point? You can't join on them, sort on them, group on them, or any other things that justify having a relational database in the first place.
If we assume that the Member table looks like this:
MemberID int Primary Key
Name varchar(100) Not null
--etc
Then your Friendship table should look like this:
Member1ID int Foreign Key -> Member.MemberID
Member2ID int Foreign Key -> Member.MemberID
Created datetime Not Null
--etc
Then, you can join the tables together to pull a list of friends
SELECT m.*
FROM Member m
RIGHT JOIN Friendship f ON f.Member2ID = m.MemberID
WHERE f.MemberID = #MemberID
(This is specifically SQL Server syntax, but I think it's pretty close to MySQL. The #MemberID is a parameter)
This is always going to be faster than splitting a string and making 30 extra SQL queries to pull the relevant data.

Separate table as in method 1.
method 2 is bad because you would have to unserialize it each time and wont be able to do JOINS on it; plus UPDATE's will be a nightmare if a user changes his name, email or other properties.
sure the table will be huge, but you can index it on Member11_id, set the foreign key back to your user table and could have static row sizes and maybe even limit the amount of friends a single user can have. I think it wont be an issue with mysql if you do it right; even if you hit a few million rows in your relationship table.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008