SQL query to select from two fairly large table

SQL query to select from two fairly large table - mysql

I have 2 tables :
1) user where credentials of users are stored. It holds more than 1000 user records.
ID | NAME | PASSWORD | USERTYPEID
2) user_logs in which the login details are captured. Its fairly large i.e. more than 100000 records.
ID | NAME | DATEOFLOGIN | USERID | LOGINTYPE
I have to find the users which do not access the system between two given dates and their last logged in date.
SELECT MAX(userlogs.dateoflogin) AS lastlogindate,
u1.id AS Id,
u1.name AS Name
FROM USER u1
LEFT OUTER JOIN user_logs userlogs ON u1.id = userlogs.userid
WHERE u1.id NOT IN
( SELECT userid
FROM user_logs userlogs2
WHERE userlogs2.logtype='Login'
AND userlogs2.dateoflogin BETWEEN '2013-05-10' AND '2013-05-20'
AND userlogs2.userid IS NOT NULL)
GROUP BY u1.id;
If the tables hold smaller records then it works well.
But in live system where user table has more than 1000 records and user_logs table has more than 100000 records, the query took very long time and I dont know it succeeded or not. :)
How do I optimize the above query. This query also finds the user who never attempt to login.

First off, you need to modify that subquery if you want to improve performance. They are convenient but they have a tendency to significantly slow down a query.
Secondly, make sure you have indexes on all the columns in a WHERE clause.

This is equivalent to your query, but a LEFT JOIN with a NULL check is often more efficient than NOT IN.
SELECT MAX(userlogs.dateoflogin) as lastlogindate , u1.id as Id , u1.name as Name
FROM user u1
LEFT OUTER JOIN user_logs userlogs ON u1.id = userlogs.userid
LEFT OUTER JOIN (SELECT distinct userid
FROM user_logs
WHERE logtype='Login'
AND dateoflogin BETWEEN '2013-05-10' AND '2013-05-20'
AND userid IS NOT null) userlogs2 ON u1.id = userlogs2.userid
WHERE userlogs2.userid IS NULL
GROUP BY u1.id
Make sure you have an index on dateoflogin so the subquery will perform well.
Compare the output of EXPLAIN with both queries.

Related

I'm not getting correct data by joining two tables

I have two tables users and radacct, both table has same column username .
table radacct also has a column acctstoptime which is important here.
table acctstoptime is actually a datetime column with allow null.
table users has more than 50k records (usernames) but table radacct has variation in records between 12k to 20k
i also has a application which insert or remove data from table radacct and update column acctstoptime in it, so acctstoptime null means username is active/connected and when it has a timestamp it means username is not connected.
Note: This is simplest way of explaining without any complexity. Its a freeradius application: https://en.wikipedia.org/wiki/FreeRADIUS
So when i write this query:
SELECT u.username, u.expiration
FROM users u JOIN radacct r ON u.username=r.username
WHERE r.acctstoptime IS NULL
I get online/connected username. but what i want is to get offline/not connected username list.
When i write this query:
SELECT u.username, u.expiration
FROM users u
LEFT JOIN radacct r ON u.username=r.username
WHERE r.acctstoptime IS NOT NULL
I get offline customers but not full list joined by users table. Some of them also has another entry in radacct table which has r.acctstoptime NULL so firstly i can not get uniqueness, this query only shows not null query that are present in radacct table.
I like to explain further that:
radacct table has multiple entries with same username but with multiple acctstoptime, only one record can be NULL but other records may vary with different datetime
So i want that i get all username that does not have acctstoptime null, it also has to be unique so only one latest record that is not null
https://ibb.co/Yb84T27
In Really simple words if you understand how freeradius works: I want to get offline customers whose account are active/on/recharged

Use not exists:
select u.*
from users u
where not exists (select 1
from radacct r
where r.username = u.username and
r.acctstoptime is null
);

It's not a perfect script but here's a possible solution:
Get a list of all the radacct that are null and filter those out of the user table.
SELECT DISTINCT u.username, u.expiration
FROM users u
WHERE u.username NOT IN (SELECT username FROM radacct r
WHERE r.acctstoptime IS NULL)
Otherwise I would suggest you split the problem in solvable pieces.

Use aggregation and set the condition in the HAVING clause:
SELECT u.username, u.expiration
FROM users u JOIN radacct r
ON u.username = r.username
GROUP BY u.username, u.expiration
HAVING MAX(r.acctstoptime IS NULL) = 0

Querying a large table using mysql

I manage a property website. I have a table with banned users (small table) and a table called advert_views which keeps track of each listing that each user views (currently 1.3m lines and growing). The advert_views table alsio takes note of the IP address for every advert viewed).
I want to get the IP addresses used by the banned users and check if any of these banned users have opened new accounts. I ran the following query:
SELECT adviews.user_id AS 'banned user_id',
adviews.client_ip AS 'IPs used by banned users',
adviews2.user_id AS 'banned users that opened a new account'
FROM banned_users
LEFT JOIN users on users.email_address = banned_users.email_address #since I don't store the user_id in banned_users
LEFT JOIN advert_views adviews ON adviews.user_id = users.id AND adviews.user_id IS NOT NULL # users may view listings when not logged in but they have restricted access to the information on the listing
LEFT JOIN (SELECT client_ip,
user_id
FROM advert_views
WHERE user_id IS NOT NULL
) adviews2
ON adviews2.client_ip = adviews.client_ip
WHERE banned_users.rec_status = 1 and adviews.user_id <> adviews2.user_id
GROUP BY adviews2.user_id
I applied an index on the advert_views table and the users table as per below:
enter image description here
My query takes half an hour to execute. Is there a way how to improve my query speed?
Thanks!
Chris

First of all: Why do you outer join the tables? Or better: Why do you try to outer join the tables? A left join is meant to get data from a table even when there is no match. But then your results could contain rows with all values null. (That doesn't happen though, because adviews.user_id <> adviews2.user_id in your where clause dismisses all outer-joined rows.) Don't give the DBMS more work to do than necessary. If you want inner joins, then don't outer join. (Though the difference in execution time won't be huge.)
Next: You select from banned_users, but you only use it to check existence. You shouldn't do this. Use an EXISTS or IN clause instead. (This is mainly for readability and in order not to produce duplicate results. This probably won't speed things up.)
SELECT av1.user_id AS 'banned user_id',
av2.client_ip AS 'IPs used by banned users',
av2.user_id AS 'banned users that opened a new account'
FROM adviews av1
JOIN adviews av2 ON av2.client_ip = av1.client_ip AND av2.user_id <> av1.user_id
WHERE av1.user_id IN
(
SELECT user_id
FROM users
WHERE email_address IN (select email_address from banned_users where rec_status = 1)
)
GROUP BY av2.user_id;
You may replace the inner IN clause with a join. It's mostly a matter of personal preference, but it is also that in the past MySQL sometimes didn't perform well on IN clauses, so many people made it a habit to join instead.
WHERE av1.user_id IN
(
SELECT u.user_id
FROM users u
JOIN banned_users bu ON bu.email_address = u.email_address
WHERE bu.rec_status = 1
)
At last consider removing the GROUP BY clause. It reduces your results to one row per reusing user_id, showing one of its related banned user_ids (arbitrarily chosen in case there is more than one). I don't know your tables. Are you getting many records per reusing user_id? If not, remove the clause.
As to indexes I suggest:
banned_users(rec_status, email_address)
users(email_address, user_id)
adviews(user_id, client_ip)
adviews(client_ip, user_id)

Select & update in same query

I have two tables user & user login history. I need to make a report of the times a particular user is logging into the system. The table contains millions of rows of data. So running a nested query to fetch number of logins of users is taking a lot of time.
I am trying to loop through all the users and update the logins column. How can I do this in one query?
The schema is like this:
users table:
id INT(10)
username VARCHAR(7)
logins INT(10)
user_logs table:
id INT(10)
userid INT(10)
login_date DATETIME(19)
http://sqlfiddle.com/#!9/dc4149
I'm running this query
UPDATE users u
SET u.logins = (SELECT COUNT(*)
FROM user_logs
WHERE userid = u.id)
LIMIT 1
This is not working.
Is there any way how I could loop through users & update their respective login count?
I tried doing this with PHP but as the tables are very large. Doing this 1 by 1 takes very time.
Can I do this via command line?

An update should take so long, especially if you have proper indexed on both tables.
Try this:
UPDATE users u
INNER JOIN(SELECT ul.userid,count(1) as cnt FROM user_logs ul GROUP BY ul.userid) u2
ON(u2.userid = u.id)
SET u.logins = u2.cnt
Then make sure you have the following indexes:
users - (id,logins)
user_logins - (userid)
If that doesn't help - try doing this in two steps , build a derived table with the sub query results, and update by it :
CREATE TABLE temp_for_update AS(
SELECT ul.userid,count(1) as cnt
FROM user_logs ul
GROUP BY ul.userid);
CREATE INDEX YourIndex
ON temp_for_update (userid,cnt);
UPDATE users u
INNER JOIN temp_for_update u2
ON(u2.userid = u.id)
SET u.logins = u2.cnt
This should defiantly be faster.

Try using update join like
UPDATE users a
JOIN (
SELECT userid, COUNT(*) as count_login
FROM user_logs
GROUP BY userid) b ON b.userid = a.id
SET a.logins = b.count_login;

How to check combination of records from multiple rows(MySQL)

I am working on writing a query that is able to check multiple rows at the same time. If the combination of the same user's records provides the enough information I need, even every single record doesn't provides enough information I need, the user is considered passed.
For example:
There are two tables.
One is "user" which keep user's personal information:
id client_id first_name last_name date_of_birth ssn address
Another one is "lab" which keep users' medical test information:
id external_source_id user_id date wbc rbc hemoglobin hematocrit mcv mch mchc rdw plateletcount
One user can only have one record in user table, but could have multiple records in labs table. What I want to do is check the users' multiple lab records that belongs to the same user to see if the combination of those records provide the necessary information I need. If yes, the user is passed, even if any single lab record doesn't provide enough information. For example, the necessary information includes cholesterol, ldl, triglycerides, glucose. If a user has two lab records, one record provides cholesterol(NOT NULL) and ldl(NOT NULL), another one provides triglycerides(NOT NULL), glucose(NOT NULL). He is considered passed.
How do I write the query that is able to do that?
The query I currently have is like this:
SELECT users.id AS user_id, users.first_name, users.last_name, clients.name AS client,
users.social_security_number AS ssn, users.hiredate, hra.id AS hra_id, hra.date AS hra_date, hra.maileddate AS hra_maileddate,
screening.id AS screening_id, screening.date AS screening_date, screening.maileddate AS screening_maileddate
FROM users
INNER JOIN clients
ON(
users.client_id = clients.id
)
INNER JOIN hra
ON(
users.id = hra.user_id
)
LEFT JOIN labs
ON(
users.id = labs.user_id
)
WHERE users.client_id = '1879'
AND hra.date BETWEEN '2011-07-01' AND '2011-11-15'
AND hra.maileddate IS NOT NULL
AND labs.date BETWEEN '2011-05-15' AND '2011-11-15'
AND labs.maileddate IS NULL
AND labs.cholesterol IS NOT NULL
AND labs.ldl IS NOT NULL
AND labs.triglycerides IS NOT NULL
AND (labs.glucose IS NOT NULL OR labs.ha1c IS NOT NULL)
GROUP BY users.id

This will select all users in your example
select u.*
from user u
join lab l1 on l1.user_id = u.id and l1.wbc is not null
join lab l2 on l2.user_id = u.id and l2.hemoglobin is not null
join lab l3 on l3.user_id = u.id and l3.plateletcount is not null
-- etc for other fields that need to be not null
This will work even if the same records have more than one desired column, or if the values are spread out across rows.
If you want the lab values too, just select u.*, l1.wbc, l2.hemoglobin, ... etc

If you want the Users who PASS :
You can use a IN with AND clause
Select u.* from user u
where
u.user_id in (select user_id from lab where wbc is not null) and
u.user_id in (select user_id from lab where hemoglobin is not null) and
u.user_id in (select user_id from lab where plateletcount is not null);
if you want the users who DONT PASS
You can use a IN with OR clause
Select u.* from user u
where
u.user_id in (select user_id from lab where wbc is null) OR
u.user_id in (select user_id from lab where hemoglobin is null) OR
u.user_id in (select user_id from lab where plateletcount is null);
i hope that makes sense :)

Single MySQL query which checks another table for rows

I have a Users table and a Payments table. I need a query which lists the users who DO NOT have a record in the payments table where the field PaymentCompleted=1.
These are the columns in the tables (simplified):
Users: UserID, UserName
Payments: PaymentID, UserID, PaymentCompleted
The query should select the field UserName.

select distinct UserName
from Users left outer join Payments on Users.UserID = Payments.UserID
where PaymentCompleted is NULL or PaymentCompleted != 1

SELECT UserName
FROM Users u
WHERE NOT EXISTS(Select 1
from Payments p
Where p.UserId = u.UserId
AND p.PaymentCompleted = 1)

select * from t_users T where T.userid not exists (select p.userid from t_payments t where PaymentCompleted=1).
One note: "not in" clauses can be computationally inefficient for large numbers of records. If you start seeing performance issues, you may want to do some refactoring/redesign.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

SQL query to select from two fairly large table - mysql

First off, you need to modify that subquery if you want to improve performance. They are convenient but they have a tendency to significantly slow down a query. Secondly, make sure you have indexes on all the columns in a WHERE clause.

Related

I'm not getting correct data by joining two tables

Querying a large table using mysql

Select & update in same query

How to check combination of records from multiple rows(MySQL)

Single MySQL query which checks another table for rows

Categories

Resources