Querying a large table using mysql

Querying a large table using mysql - mysql

I manage a property website. I have a table with banned users (small table) and a table called advert_views which keeps track of each listing that each user views (currently 1.3m lines and growing). The advert_views table alsio takes note of the IP address for every advert viewed).
I want to get the IP addresses used by the banned users and check if any of these banned users have opened new accounts. I ran the following query:
SELECT adviews.user_id AS 'banned user_id',
adviews.client_ip AS 'IPs used by banned users',
adviews2.user_id AS 'banned users that opened a new account'
FROM banned_users
LEFT JOIN users on users.email_address = banned_users.email_address #since I don't store the user_id in banned_users
LEFT JOIN advert_views adviews ON adviews.user_id = users.id AND adviews.user_id IS NOT NULL # users may view listings when not logged in but they have restricted access to the information on the listing
LEFT JOIN (SELECT client_ip,
user_id
FROM advert_views
WHERE user_id IS NOT NULL
) adviews2
ON adviews2.client_ip = adviews.client_ip
WHERE banned_users.rec_status = 1 and adviews.user_id <> adviews2.user_id
GROUP BY adviews2.user_id
I applied an index on the advert_views table and the users table as per below:
enter image description here
My query takes half an hour to execute. Is there a way how to improve my query speed?
Thanks!
Chris

First of all: Why do you outer join the tables? Or better: Why do you try to outer join the tables? A left join is meant to get data from a table even when there is no match. But then your results could contain rows with all values null. (That doesn't happen though, because adviews.user_id <> adviews2.user_id in your where clause dismisses all outer-joined rows.) Don't give the DBMS more work to do than necessary. If you want inner joins, then don't outer join. (Though the difference in execution time won't be huge.)
Next: You select from banned_users, but you only use it to check existence. You shouldn't do this. Use an EXISTS or IN clause instead. (This is mainly for readability and in order not to produce duplicate results. This probably won't speed things up.)
SELECT av1.user_id AS 'banned user_id',
av2.client_ip AS 'IPs used by banned users',
av2.user_id AS 'banned users that opened a new account'
FROM adviews av1
JOIN adviews av2 ON av2.client_ip = av1.client_ip AND av2.user_id <> av1.user_id
WHERE av1.user_id IN
(
SELECT user_id
FROM users
WHERE email_address IN (select email_address from banned_users where rec_status = 1)
)
GROUP BY av2.user_id;
You may replace the inner IN clause with a join. It's mostly a matter of personal preference, but it is also that in the past MySQL sometimes didn't perform well on IN clauses, so many people made it a habit to join instead.
WHERE av1.user_id IN
(
SELECT u.user_id
FROM users u
JOIN banned_users bu ON bu.email_address = u.email_address
WHERE bu.rec_status = 1
)
At last consider removing the GROUP BY clause. It reduces your results to one row per reusing user_id, showing one of its related banned user_ids (arbitrarily chosen in case there is more than one). I don't know your tables. Are you getting many records per reusing user_id? If not, remove the clause.
As to indexes I suggest:
banned_users(rec_status, email_address)
users(email_address, user_id)
adviews(user_id, client_ip)
adviews(client_ip, user_id)

Related

How to rewrite UNION with LEFT JOIN more efficiently

I have two tables...one that registers users and one that checks in users. A user will always have a single entry in the register table but a user may have 0 or multiple entries in the checkin table. For a raffle selector, I wrote a query that is picking 1 entry from the register table and then 1 entry from the checkin table - each sub query picks a random entry so long as that userID does not exist in a 3rd table that stores the raffle winners. After the two entries are returned than it randomly selects one of the two returned entries as the winnner.
However, I believe there should be a more efficient way of writing this so its ONLY picking an entry once....not picking two entries and then picking one of the two.
It took me quite a while to figure out how to correctly write the below query as I am not proficient in mysql at all. The query works and seems to work efficiently, but I believe there should be a better way of writing it that also consolidates the amount of query code.
Hoping someone here can help or advise.
Table note: clubusers/clubHistory have multiple overlapping columns but the tables are not the same:
register = clubUsers
checkins = clubHistory
winners = clubRaffleWinners
SELECT * FROM (
(SELECT ch.user_ID,ch.clID FROM clubHistory AS ch
LEFT OUTER JOIN clubRaffleWinners AS cr1 ON
ch.user_ID=cr1.user_ID
AND cr1.cID=1157
AND cr1.rafID=18
AND cr1.crID=1001
AND cr1.ceID=1167
AND cr1.chDate1='2022-06-04'
WHERE
ch.cID=1157
AND ch.crID=1001
AND ch.ceID=1167
AND ch.chDate='2022-06-04'
AND cr1.user_ID IS NULL
GROUP BY ch.user_ID ORDER BY RAND() LIMIT 1
)
UNION
(SELECT cu.user_ID,cu.clID FROM clubUsers AS cu
LEFT OUTER JOIN clubRaffleWinners AS cr2 ON
cu.user_ID=cr2.user_ID
AND cr2.cID=1157
AND cr2.rafID=18
AND cr2.crID=1001
AND cr2.ceID=1167
AND cr2.chDate1='2022-06-04'
WHERE
cu.cID=1157
AND cu.crID=1001
AND cu.ceID=1167
AND cu.calDate<='2022-06-04'
AND cr2.user_ID IS NULL
GROUP BY cu.user_ID ORDER BY RAND() LIMIT 1
)
) AS foo order by RAND() LIMIT 1 ;
UPDATE:
As #JettoMartinez points out below, my current query could in fact randomly return the same user from each table so the final returned entry would just be the same user. I didn't realize this in my struggles just to get the above query to work. Thus my original OP asking for a more optimized query simply selecting a single random entry from both tables (where that user is not already in the winners table) is applicable for yet another reason.

There are two ways I can think of (Do note that since I don't fully understand the tables, I'm not using all the conditions you used in your JOIN statements, meaning it might need more work):
Using a exclusive subquery:
SELECT
cu.user_ID,
cu.clID,
ch.cID
FROM
clubUsers cu
LEFT JOIN clubHistory ch ON ch.user_ID = cu.user_ID
WHERE user_ID NOT IN (
SELECT
user_ID
FROM
clubRaffleWinners
WHERE
-- other conditions
)
ORDER BY RAND() LIMIT 1;
Using a LEFT "OUTER" JOIN, as you asked for:
SELECT
cu.user_ID,
cu.clID,
ch.cID -- Or any relevant field from clubHistory, really
FROM
clubUsers cu
LEFT JOIN clubHistory ch ON ch.user_ID = cu.user_ID
LEFT JOIN clubRaffleWinners cr ON cr.user_ID = cu.user_ID
AND ... -- other conditions to ensure uniqueness
AND ... -- that could also be in the WHERE part
WHERE
cr.user_ID IS NULL -- this will filter out the INNER part of the JOIN
ORDER BY RAND() LIMIT 1;
I don't have a dataset to properly test this queries, so please take them as a concept. I also didn't queried in clubHistory since I honestly don't see the point of doing so. Interpolating clubRaggleWinners to clubUsers seems enough for me.
EDIT
Since the user_ID in clubHistory is relevant to the raffle, I added a LEFT JOIN to it and added a field from said table in the SELECT statement, so that the user_id repeats once per entry in clubHistory plus the row of clubUsers, meaning that every user has 1 + number of entries / number of users + number of entries - number of winners chances to win.
This logic can be applied to the first query with a subquery too, and if the added field needs to be out, the query could be wrapped in a CTE or a subquery.

From what you are describing, and I want to make sure I understand.
Every registered person is qualified 1 entry.
However, each time they have checked in, they get 1 entry for each time they checked in. So, for someone registered and has NEVER checked-in, they get 1 entry. But if someone registered, and checked in 3 times, they would get a total of JUST the 3 times they checked in, vs 4 just for being registered.
Regardless of who is POSSIBLE, you want to EXCLUDE all people who have already been a winner in the raffle.
You SHOULD be able to get results from this below. Since the columns appear to be the same filtering on the cID, crID, ceID and Date, I have the primary FROM based on the registered clubUsers.
From that, a left-join to the clubHistory will either allow that person's ID to be returned once if only registered, OR multiple times based on the times checked in such as the example.
From the given user, I am also directly left-joining to the raffle winning history on the same criteria. If its the same criteria to the club history join, and the same criteria to the raffle (with exception of rafID = 18), appearing to indicate a specific raffle being drawn for, If the person is found, or not, the final WHERE accounts to exclude if its the single entry, or multiple entries via the IS NULL test.
The query will return all entries single or multiple, that have not already won in the order by RAND() qualifier, and apply a single LIMIT 1 to get the final winner. I dont know why you needed what appeared to be the clubhouse ID when you only really care about WHO won, without any regard to being a clubhouse history entry or not.
SELECT
cu.user_ID
FROM
clubUsers AS cu
LEFT JOIN clubHistory ch
on cu.user_ID = ch.user_ID
AND cu.cID = ch.cID
AND cu.crID = ch.crID
AND cu.ceID = ch.ceID
AND ch.chDate = '2022-06-04'
LEFT JOIN clubRaffleWinners AS crw
ON cu.user_ID = crw.user_ID
AND cu.cID = crw.cID
AND cu.crID = crw.crID
AND cu.ceID = crw.ceID
AND crw.chDate1 = '2022-06-04'
AND crw.rafID = 18
WHERE
cu.cID = 1157
AND cu.crID = 1001
AND cu.ceID = 1167
AND cu.calDate <= '2022-06-04'
AND crw.user_id IS NULL
order by
RAND()
LIMIT 1
For performance purposes, I would ensure the following indexes
table index
clubUsers ( cid, crID, ceID, calDate, user_id )
clubHistory ( user_id, cID, crID, ceID, chDate )
clubRaffleWinners ( user_id, cID, crID, ceID, chDate1, rafID )

(Just a Comment, but need formatting.)
I would start by trying to put these 4 values in a single table, not repeated across 3 tables:
cu.cID=1157
AND cu.crID=1001
AND cu.ceID=1167
AND cu.calDate<='2022-06-04'
Please provide SHOW CREATE TABLE for each table; then I can assess whether the recommended indexes make sense.

Does JOIN or LEFT JOIN keep checking in a SELECT query?

I have a JOIN query but I need to optimize it for performance.
For example, in this query:
"SELECT id FROM users WHERE id = :id"
Since there is no LIMIT 1 at the end of the query, that select query will keep searching. If I add LIMIT 1 to the end of that query, it will select only one and stop searching for more.
Here is my question and query:
"SELECT messages.text, users.name
FROM messages
LEFT JOIN users
ON messages.from_id = users.id
WHERE messages.user_id = :user_id"
In the JOIN users ON messages.from_id = users.id part, since there is only 1 user with that ID, will it keep searching after it has found that query? If it does, how can I optimize it so that it only searches for 1 row?

SELECT id FROM users WHERE id = :id
If there is no index on id, the entire table is scanned.
If there is a UNIQUE or PRIMARY KEY on id, only one row will be checked.
If there is a plain INDEX, it will scan from the first match until it finds an id that does not match.
For this:
SELECT m.text, u.name
FROM messages AS m
LEFT JOIN users AS u ON m.from_id = u.id
WHERE m.user_id = :user_id
It will do a "Nested Loop Join":
Find the occurrence(s) in messages that satisfy m.user_id = :user_id (see above).
For each such row, reach into users based on the ON clause.
There may be multiple rows (again, depending the index or lack of such).
So, your question "how can I optimize it so that it only searches for 1 row" is answered:
If there can only be one row, declare it UNIQUE.
If there are sometimes more than on, then INDEX. But don't worry about checking for an extra row; it is not that costly.
You say "only 1 user with that ID", but fail to specify which id in which table.
But that is not the end of the story...
LEFT JOIN may get turned into JOIN. In that case, users may be the first table to look at. Note also that the Optimizer is smart enough to deduce that you want u.id = :user_id. Anyway, the NLJ will start with users, then reach into messages. Again, the types of indexes are important.
Please provide SHOW CREATE TABLE for both tables. Then I can condense the answer to the relevant parts. Please provide EXPLAIN SELECT ... for confirmation of what I am saying.

MySQL - 3 tables, is this complex join even possible?

I have three tables: users, groups and relation.
Table users with fields: usrID, usrName, usrPass, usrPts
Table groups with fields: grpID, grpName, grpMinPts
Table relation with fields: uID, gID
User can be placed in group in two ways:
if collect group minimal number of points (users.usrPts > group.grpMinPts ORDER BY group.grpMinPts DSC LIMIT 1)
if his relation to the group is manually added in relation tables (user ID provided as uID, as well as group ID provided as gID in table named relation)
Can I create one single query, to determine for every user (or one specific), which group he belongs, but, manual relation (using relation table) should have higher priority than usrPts compared to grpMinPts? Also, I do not want to have one user shown twice (to show his real group by points, but related group also)...
Thanks in advance! :) I tried:
SELECT * FROM users LEFT JOIN (relation LEFT JOIN groups ON (relation.gID = groups.grpID) ON users.usrID = relation.uID
Using this I managed to extract specified relations (from relation table), but, I have no idea how to include user points, respecting above mentioned priority (specified first). I know how to do this in a few separated queries in php, that is simple, but I am curious, can it be done using one single query?
EDIT TO ADD:
Thanks to really educational technique using coalesce #GordonLinoff provided, I managed to make this query to work as I expected. So, here it goes:
SELECT o.usrID, o.usrName, o.usrPass, o.usrPts, t.grpID, t.grpName
FROM (
SELECT u.*, COALESCE(relationgroupid,groupid) AS thegroupid
FROM (
SELECT u.*, (
SELECT grpID
FROM groups g
WHERE u.usrPts > g.grpMinPts
ORDER BY g.grpMinPts DESC
LIMIT 1
) AS groupid, (
SELECT grpUID
FROM relation r
WHERE r.userUID = u.usrID
) AS relationgroupid
FROM users u
)u
)o
JOIN groups t ON t.grpID = o.thegroupid
Also, if you are wondering, like I did, is this approach faster or slower than doing three queries and processing in php, the answer is that this is slightly faster way. Average time of this query execution and showing results on a webpage is 14 ms. Three simple queries, processing in php and showing results on a webpage took 21 ms. Average is based on 10 cases, average execution time was, really, a constant time.

Here is an approach that uses correlated subqueries to get each of the values. It then chooses the appropriate one using the precedence rule that if the relations exist use that one, otherwise use the one from the groups table:
select u.*,
coalesce(relationgroupid, groupid) as thegroupid
from (select u.*,
(select grpid from groups g where u.usrPts > g.grpMinPts order by g.grpMinPts desc limit 1
) as groupid,
(select gid from relations r where r.userId = u.userId
) as relationgroupid
from users u
) u

Try something like this
select user.name, group.name
from group
join relation on relation.gid = group.gid
join user on user.uid = relation.uid
union
select user.name, g1.name
from group g1
join group g2 on g2.minpts > g1.minpts
join user on user.pts between g1.minpts and g2.minpts

Mysql query incredibly slow in LEFT JOIN if the given value 0 to the primary id

In this sql:
SELECT s.*,
u.id,
u.name
FROM shops s
LEFT JOIN users u ON u.id = s.user_id
OR u.id = s.owner_user_id
WHERE s.status = 1
For some reason this query takes an amazing time. although id is the primary key. it seems especially after I added this part OR u.id=s.owner_user_id the query became slow. owner_user_id often is 0 only handful of times. But why would it take so long apparently scanning the whole table? The database table users is very long and big. I didn't design it. this is for a client who subsequent programmers added too many fields. the table is 22k rows and dozens of fields.
*the names of the fields for demonstration only. actual names are different, so don't ask me why I'm looking for owner_user_id (; I did solve the slowness by remove the "OR ..." part and instead searching for the id in the loop if it is not 0. but I would like to know why this is happening and how to speedup that query as is.

You may be able to speed it up by using IN instead of the OR but that is minor.
SELECT u.id,
u.name
FROM shops s
LEFT JOIN users u ON u.id IN ( s.user_id, s.owner_user_id )
WHERE s.status = 1
Firstly, are there any indexes on this table? Mainly one on the user.id field or the s.user_id or s.owner_user_id?
However, I must ask why you need to use a LEFT JOIN instead of a regular join. The LEFT JOIN causes the matching of every row with every other one. And since I'm assuming the value / id should either be in the user_id or the owner_user_id field, and that there will always be a match, if that is the case then the use of a JOIN should speed the query up a bit.
And as Mitch said, 22k rows is tiny.

How are you going to know which user record is which? Here's how I'd do it
SELECT s.*,
u.name AS user_name,
o.name AS owner_name
FROM shops s
LEFT JOIN users u ON s.user_id = u.id
LEFT JOIN users o ON s.owner_user_id = o.id
WHERE s.status = 1
I've omitted the IDs from the user table in the SELECT as these will be part of s.* anyway.
I'm curious about the left joins too. If shops.user_id and shops.owner_user_id are required foreign keys, use inner joins instead.

MySQL returning results from one table based on data in another table

Before delving into the issue, first I will explain the situation. I have two tables such as the following:
USERS TABLE
user_id
username
firstName
lastName
GROUPS TABLE
user_id
group_id
I want to retrieve all users who's first name is LIKE '%foo%' and who is a part of a group with group_id = 'givengid'
So, the query would like something like this:
SELECT user_id FROM users WHERE firstName LIKE '%foo'"
I can make a user defined sql function such as ismember(user_id, group_id) that will return 1 if the user is a part of the group and 0 if they are not and this to the WHERE clause in the aforementioned select statement. However, this means that for every user who's first name matches the criteria, another query has to be run through thousands of other records to find a potential match for a group entry.
The users and groups table will each have several hundred thousand records. Is it more conventional to use the user defined function approach or run a query using the UNION statement? If the UNION approach is best, what would the query with the union statement look like?
Of course, I will run benchmarks but I just want to get some perspective on the possible range of solutions for this situation and what is generally most effective/efficient.

You should use a JOIN to get users matching your two criteria.
SELECT
user_id
FROM
users
INNER JOIN
groups
ON groups.user_id = users.users_id
AND groups.group_id = given_id
WHERE
firstName LIKE '%foo'

You don't need to use either a UNION or a user-defined function here; instead, you can use a JOIN (which lets you join one table to another one based on a set of equivalent columns):
SELECT u.user_id
FROM users AS u
JOIN groups AS g
ON g.user_id = u.user_id
WHERE g.group_id = 'givengid'
AND u.firstName LIKE '%foo'
What this query does is join rows in the groups table to rows in the users table when the user_id is the same (so if you were to use SELECT *, you would end up with a long row containing the user data and the group data for that user). If multiple groups rows exist for the user, multiple rows will be retrieved before being filtered by the WHERE clause.

Use a join:
SELECT DISTINCT user_id
FROM users
INNER JOIN groups ON groups.user_id = users.user_id
WHERE users.firstName LIKE '%foo'
AND groups.group_id = '23'
The DISTINCT makes sure you don't have duplicate user IDs in the result.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008