Before I go on, I don't want to use ANY query which involves selecting all rows and counting the occurrences manually. I'm doing this in PHP by the way.
Basically, I have a bans table. Each new record/row is a new ban. The field titled user_name signifies which player was banned. Is there a way to count who has the most bans in this table? I really don't want to select every row and then count it out for each player. The table is pretty big, therefore making the mentioned solution impractical and inefficient.
This is done with COUNT() aggregates, grouping by user_name.
Get bans by user from most to least:
SELECT
user_name,
COUNT(*) as numbans
FROM bans
GROUP BY user_name
/* ordered by number of bans from greatest to least */
ORDER BY numbans DESC
Get only the most banned user:
SELECT
user_name,
COUNT(*) as numbans
FROM bans
GROUP BY user_name
ORDER BY numbans DESC
/* adjust LIMIT for how many records you want returned -- 1 gives only the first record */
LIMIT 1
In your question, you're very concerned about not wanting to count manually in PHP. Note that this is just about never necessary. RDBMS systems are designed explicitly for organizing and querying data, and are very good at doing tasks like this efficiently. Read up on GROUP BY clauses and aggregate functions in MySQL and master them. You generally shouldn't need to do it in code.
Bonus: Get all users banned 3 or more times
SELECT
user_name,
COUNT(*) as numbans
FROM bans
GROUP BY user_name
/* HAVING clause limits results of an aggregate like COUNT() (which you cannot do in the WHERE clause) */
HAVING numbans >= 3
gets the top 10 most banned (leave the limit line if you want to see all in order):
select user_name, count(user_name) as num_of_bans
from bans
group by user_name
order by num_of_bans desc
limit 10
SELECT user_name , COUNT( id )
FROM table_name
GROUP BY user_name
Related
I'm working on a project and I have a problem. I have a table namedfriendswith three columnid,from_emailandto_email(it's a social networking site and "from_email" is the person that follows the "to_email"). I want a query to return the top 5 friends I follow according to the number of their followers. I know that the query for top 5 is:
SELECT
to_mail,
COUNT(*) AS friendsnumber
FROM
friends
GROUP BY
to_email
ORDER BY
friendsnumber DESC
LIMIT 5
Any ideas?
I would also like to return friends with the same number of followers ordered by their name. Is it possible?
You should use COUNT(from_email) instead of COUNT(*); because you want to calculate the number of followers, which is represented by from_email.
Thus, your select clause would be something like:
SELECT to_email, COUNT(from_email) as magnitude
as for getting the most popular people that you follow, you could use IN clause:
WHERE to_email IN (SELECT to_email FROM friends WHERE from_email='MY_EMAIL');
and about name, you shall join this query with the other table which contains the name value.
Since you've got the essentials now, I hope you can try to compose the full query on your own =)
Join again to the table for the 2nd tier count:
SELECT f1.to_email
FROM friends f1
JOIN friends f2 on f2.to_mail = f1.to_email
WHERE f1.from_email = 'myemail'
GROUP BY 1
ORDER BY count(*) DESC
LIMIT 5
If an index is defined on to_email, this will perform very well.
I have a businesses table. I want to get the owners name who has the most businesses. So far all I only know that I need to use GROUP BY and HAVING.
The problem is I only know the most basic queries...
Maybe something like this can help:
select owner, count(*) cntx
from businesses
group by owner
order by cntx desc
limit 1
Or executing the query without limit 1 clause, and then iterate the result till your needs are satisfied.
Use GROUP BY and order descending and then take the top one record which is the one that has most businesses:
select OwnerId, count(*) from businesses
group by OwnerId order by count(*) desc
limit 1
I have got this query:
SELECT
t.type_id, t.product_id, u.account_id, t.name, u.username
FROM
types AS t
INNER JOIN
( SELECT user_id, username, account_id
FROM users WHERE account_id=$account_id ) AS u
ON
t.user_id = u.user_id
ORDER BY
t.type_id DESC
1st question:
It takes around 30seconds to do this at the moment with only 18k records in types table.
The only indexes at the moment are only a primary indexes with just id.
Would the long time be caused by a lack of more indexes? Or would it be more to do with the structure of this query?
2nd question:
How can I add the LIMIT so I only get 100 records with the highest type_id?
Without changing the results, I think it is a 100 times faster if you don't make a sub-select of your users table. It is not needed at all in this case.
You can just add LIMIT 100 to get only the first 100 results (or less if there aren't a 100).
SELECT SQL_CALC_FOUND_ROWS /* Calculate the total number of rows, without the LIMIT */
t.type_id, t.product_id, u.account_id, t.name, u.username
FROM
types t
INNER JOIN users u ON u.user_id = t.user_id
WHERE
u.account_id = $account_id
ORDER BY
t.type_id DESC
LIMIT 1
Then, execute a second query to get the total number of rows that is calculated.
SELECT FOUND_ROWS()
That sub select on MySQL is going to slow down your query. I'm assuming that this
SELECT user_id, username, account_id
FROM users WHERE account_id=$account_id
doesn't return many rows at all. If that's the case then the sub select alone won't explain the delay you're seeing.
Try throwing an index on user_id in your types table. Without it, you're doing a full table scan of 18k records for each record returned by that sub select.
Inner join the users table and add that index and I bet you see a huge increase in speed.
I'm a MySQL query noobie so I'm sure this is a question with an obvious answer.
But, I was looking at these two queries. Will they return different result sets? I understand that the sorting process would commence differently, but I believe they will return the same results with the first query being slightly more efficient?
Query 1: HAVING, then AND
SELECT user_id
FROM forum_posts
GROUP BY user_id
HAVING COUNT(id) >= 100
AND user_id NOT IN (SELECT user_id FROM banned_users)
Query 2: WHERE, then HAVING
SELECT user_id
FROM forum_posts
WHERE user_id NOT IN(SELECT user_id FROM banned_users)
GROUP BY user_id
HAVING COUNT(id) >= 100
Actually the first query will be less efficient (HAVING applied after WHERE).
UPDATE
Some pseudo code to illustrate how your queries are executed ([very] simplified version).
First query:
1. SELECT user_id FROM forum_posts
2. SELECT user_id FROM banned_user
3. Group, count, etc.
4. Exclude records from the first result set if they are presented in the second
Second query
1. SELECT user_id FROM forum_posts
2. SELECT user_id FROM banned_user
3. Exclude records from the first result set if they are presented in the second
4. Group, count, etc.
The order of steps 1,2 is not important, mysql can choose whatever it thinks is better. The important difference is in steps 3,4. Having is applied after GROUP BY. Grouping is usually more expensive than joining (excluding records can be considering as join operation in this case), so the fewer records it has to group, the better performance.
You have already answers that the two queries will show same results and various opinions for which one is more efficient.
My opininion is that there will be a difference in efficiency (speed), only if the optimizer yields with different plans for the 2 queries. I think that for the latest MySQL versions the optimizers are smart enough to find the same plan for either query so there will be no difference at all but off course one can test and see either the excution plans with EXPLAIN or running the 2 queries against some test tables.
I would use the second version in any case, just to play safe.
Let me add that:
COUNT(*) is usually more efficient than COUNT(notNullableField) in MySQL. Until that is fixed in future MySQL versions, use COUNT(*) where applicable.
Therefore, you can also use:
SELECT user_id
FROM forum_posts
WHERE user_id NOT IN
( SELECT user_id FROM banned_users )
GROUP BY user_id
HAVING COUNT(*) >= 100
There are also other ways to achieve same (to NOT IN) sub-results before applying GROUP BY.
Using LEFT JOIN / NULL :
SELECT fp.user_id
FROM forum_posts AS fp
LEFT JOIN banned_users AS bu
ON bu.user_id = fp.user_id
WHERE bu.user_id IS NULL
GROUP BY fp.user_id
HAVING COUNT(*) >= 100
Using NOT EXISTS :
SELECT fp.user_id
FROM forum_posts AS fp
WHERE NOT EXISTS
( SELECT *
FROM banned_users AS bu
WHERE bu.user_id = fp.user_id
)
GROUP BY fp.user_id
HAVING COUNT(*) >= 100
Which of the 3 methods is faster depends on your table sizes and a lot of other factors, so best is to test with your data.
HAVING conditions are applied to the grouped by results, and since you group by user_id, all of their possible values will be present in the grouped result, so the placing of the user_id condition is not important.
To me, second query is more efficient because it lowers the number of records for GROUP BY and HAVING.
Alternatively, you may try the following query to avoid using IN:
SELECT `fp`.`user_id`
FROM `forum_posts` `fp`
LEFT JOIN `banned_users` `bu` ON `fp`.`user_id` = `bu`.`user_id`
WHERE `bu`.`user_id` IS NULL
GROUP BY `fp`.`user_id`
HAVING COUNT(`fp`.`id`) >= 100
Hope this helps.
No it does not gives same results.
Because first query will filter records from count(id) condition
Another query filter records and then apply having clause.
Second Query is correctly written
Here is my data structure
alt text http://luvboy.co.cc/images/db.JPG
when i try this sql
select rec_id, customer_id, dc_number, balance
from payments
where customer_id='IHS050018'
group by dc_number
order by rec_id desc;
something is wrong somewhere, idk
I need
rec_id customer_id dc_number balance
2 IHS050018 DC3 -1
3 IHS050018 52 600
I want the recent balance of the customer with respective to dc_number ?
Thanx
There are essentially two ways to get this
select p.rec_id, p.customer_id, p.dc_number, p.balance
from payments p
where p.rec_id IN (
select s.rec_id
from payments s
where s.customer_id='IHS050018' and s.dc_number = p.dc_number
order by s.rec_id desc
limit 1);
Also if you want to get the last balance for each customer you might do
select p.rec_id, p.customer_id, p.dc_number, p.balance
from payments p
where p.rec_id IN (
select s.rec_id
from payments s
where s.customer_id=p.customer_id and s.dc_number = p.dc_number
order by s.rec_id desc
limit 1);
What I consider essentially another way is utilizing the fact that select rec_id with order by desc and limit 1 is equivalent to select max(rec_id) with appropriate group by, in full:
select p.rec_id, p.customer_id, p.dc_number, p.balance
from payments p
where p.rec_id IN (
select max(s.rec_id)
from payments s
group by s.customer_id, s.dc_number
);
This should be faster (if you want the last balance for every customer), since max is normally less expensive then sort (with indexes it might be the same).
Also when written like this the subquery is not correlated (it need not be run for every row of the outer query) which means it will be run only once and the whole query can be rewritten as a join.
Also notice that it might be beneficial to write it as correlated query (by adding where s.customer_id = p.customer_id and s.dc_number = p.dc_number in inner query) depending on the selectivity of the outer query.
This might improve performance, if you look for the last balance of only one or few rows.
I don't think there is a good way to do this in SQL without having window functions (like those in Postgres 8.4). You probably have to iterate over the dataset in your code and get the recent balances that way.
ORDER comes before GROUP:
select rec_id, customer_id, dc_number, balance
from payments
where customer_id='IHS050018'
order by rec_id desc
group by dc_number