SQL: Inner join on a Max Times Occurred Statement - mysql

Posed a question: What school was attended most frequently?
I came up with the following statement...
select unitid, count(unitid) as 'Times_Occurred'
from people
group by unitid
order by count(*) desc
limit 50;
I made it limit 50 because multiple unitid's had 3. I have two tables, a people table, and a post table. They are connected by unitid. I am trying to figure out how to do an inner join to get not only the top 50 highest unitid's, and how often they occur, but also the colleges that go along with those Id's.
Any and all help is much appreciated!!

Perhaps your counts are being inflated and you don't know how to handle that... this is one approach using an inline view. The reason this works is because the counts are calculated and retained prior to the join. Thus the 1-M cardinality doesn't negatively effect the counts.
Select *
from (
select unitid, count(unitid) as 'Times_Occurred'
from people
group by unitid
order by count(*) desc
limit 50) A
INNER JOIN Post B
on A.UnitID = B.UnitID

Related

MySQL Spring complicated query - ways to order and query efficiency

I run this complicated query on Spring JPA Repository.
My goal is to get all info from the site table, ordering it by events severity on each site.
This is my query:
SELECT alls.* FROM sites AS alls JOIN
(
SELECT distinct ets.id FROM
(
SELECT s.id, et.`type`, et.severity_level, COUNT(et.`type`) FROM sites AS s
JOIN users_sites AS us ON (s.id=us.site_id)
JOIN users AS u ON (us.user_id=u.user_id)
JOIN areas AS a ON (s.id=a.site_id)
JOIN panels AS p ON (a.id=p.area_id)
JOIN events AS e ON (p.id=e.panel_id)
JOIN event_types AS et ON (e.event_type_id=et.id)
WHERE u.user_id="98765432-123a-1a23-123b-11a1111b2cd3"
GROUP BY s.id , et.`type`, et.severity_level
ORDER BY et.severity_level, COUNT(et.`type`) DESC
) AS ets
) as etsd ON alls.id = etsd.id
The second select (the one with "distinct") returns site_ids ordered correctly by severity.
Note that there are different event_types + severity in each site, and I use pagination on the answer, so I need the distinct.
The problem is - the main select doesn't keep this order.
Is there any way to keep the order in one complicated query?
Another related question - one of my ideas was making two queries:
The "select distinct" query that will return me the order --> saved in a list "order list"
The main "sites" query (that becomes very simple) with "where id in {"order list"}
Order the second query in code by "order list".
I use the query every 10 seconds, so it is very sensitive on performance.
What seems to be faster in this case - original complicated query or those 2?
Any insight will be appreciated.
Tnx a lot.
A quirk of SQL's declarative set-oriented syntax for us procedural programmers: ORDER by clauses in subqueries are not carried through to the outer query, except sometimes by accident. If you want ordering at any query level, you must specify it at that level or you will get unpredictable results. The query optimizers are usually smart enough to avoid wasting sort operations.
Your requirement: give at most one sites row for each sites.id value, ordered by the worst event. Worst: lowest event severity, and if there are more than one event with lowest severity, the largest count.
Use this sort of thing to get the "worst" for each id, in place of DISTINCT.
SELECT id, MIN(severity_level) severity_level, MAX(num) num
FROM (
/* your inner query */
) ets
GROUP BY id
This gives at most one row per sites.id value. Then your outer query is
SELECT alls.*
FROM sites alls
JOIN (
SELECT id, MIN(severity_level) severity_level, MAX(num) num
FROM (
/* your inner query */
) ets
GROUP BY id
) worstevents ON alls.id = worstevents.id
ORDER BY worstevents.severity_level, worstevents.num DESC, alls.id
Putting it all together:
SELECT alls.*
FROM sites alls
JOIN (
SELECT id, MIN(severity_level) severity_level, MAX(num) num
FROM (
SELECT s.id, et.severity_level, COUNT(et.`type`) num
FROM sites AS s
JOIN users_sites AS us ON (s.id=us.site_id)
JOIN users AS u ON (us.user_id=u.user_id)
JOIN areas AS a ON (s.id=a.site_id)
JOIN panels AS p ON (a.id=p.area_id)
JOIN events AS e ON (p.id=e.panel_id)
JOIN event_types AS et ON (e.event_type_id=et.id)
WHERE u.user_id="98765432-123a-1a23-123b-11a1111b2cd3"
GROUP BY s.id , et.`type`, et.severity_level
) ets
GROUP BY id
) worstevents ON alls.id = worstevents.id
ORDER BY worstevents.severity_level, worstevents.num DESC, alls.id
An index on users.user_id will help performance for these single-user queries.
If you still have performance trouble, please read this and ask another question.

How to improve slow MySQL subquery

I know there's many questions/answers for slow queries, but I'm struggling to relate an existing answer to my example.
I have the following simple query which counts article views in a subquery:
SELECT
articles.id,
articles.views,
articles.title,
articles.slug,
articles.created_at,
(SELECT count(*) FROM tracking WHERE element_id = articles.id AND tracking_type = 'article_view') AS tracking_views
FROM articles
WHERE articles.company_id = 123
ORDER BY articles.created_at DESC
This particular company has ~250 articles, and the query takes over 12 seconds.
Is there a better/more efficient way I could be doing this?
Try joining to a group by. Its pretty hard to say without knowing how many articles / views and companies there are though.
What you want is for SQL to be able to to the aggregation of tracking in one go, rather than individually for every row in the result, which is implied by the position of your tracking_view sub select.
If your lucky (I didnt check) the join to the counts sub select will be smart enough to skip any articles that are not for the right company. If not you can include the join back to company in the counts sub select.
eg
select a.*, counts.count
from articles a
join (
select count(*) as count, element_id
from tracking
where tracking_type = 'article_view'
group by tracking.element_id
) as counts on counts.element_id = a.id
where a.company_id = 123
ORDER BY articles.created_at DESC

MySQL - Get row and average of rows

First of all I'll just warn everyone that I'm something of a rookie with MySQL. Additionally I haven't tested the example queries below so they might not be perfect.
Anyway, I have a table of items, each one with a name, a category and a score. Every 12 hours the top item is taken, used and then removed.
So far I've simply been grabbing the top item with
SELECT * FROM items_table ORDER BY score DESC LIMIT 1
The only issue with this is that some categories are biased and have generally higher scores. I'd like to solve this by sorting by the score divided by the average score instead of simply sorting by the score. Something like
ORDER BY score/(GREATEST(5,averageScore))
I'm now trying to work out the best way to find averageScore. I have another table for categories so obviously I could add an averageScore column to that and run a cronjob to keep them updated and retrieve them with something like
SELECT * FROM items_table, categories_table WHERE items_table.category = categories_table.category ORDER BY items_table.score/(GREATEST(5,categories_table.averageScore)) DESC LIMIT 1
but this feels messy. I know I can find all the averages using something like
SELECT AVG(score) FROM items_table GROUP BY category
What I'm wondering is if there's some way to retrieve the averages right in the one query.
Thanks,
YM
You can join the query that calculates the averages:
SELECT i.*
FROM items_table i JOIN (
SELECT category, AVG(score) AS averageScore
FROM items_table
GROUP BY category
) t USING (category)
ORDER BY i.score/GREATEST(5, t.averageScore) DESC
LIMIT 1

SQL query is not retrieving all the fields

I have to tables in my database, the first one (participants) look just like that:
And I have another called votes in which I can vote for any participants.
So my problem is that I'm trying to get all the votes of each participant but when I execute my query it only retrieves four rows sorted by the COUNT of votes, And the other remaining are not appearing in my query:
SELECT COUNT(DISTINCT `votes`.`id`) AS count_id, participants.name
AS participant_name FROM `participants` LEFT OUTER JOIN `votes` ON
`votes`.`participant_id` = `participants`.`id` GROUP BY votes.participant_id ORDER BY
votes.participant_id DESC;
Retrieves:
I think the problem is that you're grouping by votes.participant_id, rather than participants.id, which limits you to participants with votes, the outer join notwithstanding. Check out http://sqlfiddle.com/#!2/c5d3d/5/0
As what i have understood from the query you gave you were selecting unique id's from the votes table and I assume that your column id is not an identity. but it would be better if that would be an identity? and if so, here is my answer.replace your select with these.
Select count (votes.participant.id) as count_id ,participants.name as participant_name
from participants join votes
on participants.id = vote.participant_id
group by participants.name
order by count_id
just let me know if it works
cheers

MySQL Group By and HAVING

I'm a MySQL query noobie so I'm sure this is a question with an obvious answer.
But, I was looking at these two queries. Will they return different result sets? I understand that the sorting process would commence differently, but I believe they will return the same results with the first query being slightly more efficient?
Query 1: HAVING, then AND
SELECT user_id
FROM forum_posts
GROUP BY user_id
HAVING COUNT(id) >= 100
AND user_id NOT IN (SELECT user_id FROM banned_users)
Query 2: WHERE, then HAVING
SELECT user_id
FROM forum_posts
WHERE user_id NOT IN(SELECT user_id FROM banned_users)
GROUP BY user_id
HAVING COUNT(id) >= 100
Actually the first query will be less efficient (HAVING applied after WHERE).
UPDATE
Some pseudo code to illustrate how your queries are executed ([very] simplified version).
First query:
1. SELECT user_id FROM forum_posts
2. SELECT user_id FROM banned_user
3. Group, count, etc.
4. Exclude records from the first result set if they are presented in the second
Second query
1. SELECT user_id FROM forum_posts
2. SELECT user_id FROM banned_user
3. Exclude records from the first result set if they are presented in the second
4. Group, count, etc.
The order of steps 1,2 is not important, mysql can choose whatever it thinks is better. The important difference is in steps 3,4. Having is applied after GROUP BY. Grouping is usually more expensive than joining (excluding records can be considering as join operation in this case), so the fewer records it has to group, the better performance.
You have already answers that the two queries will show same results and various opinions for which one is more efficient.
My opininion is that there will be a difference in efficiency (speed), only if the optimizer yields with different plans for the 2 queries. I think that for the latest MySQL versions the optimizers are smart enough to find the same plan for either query so there will be no difference at all but off course one can test and see either the excution plans with EXPLAIN or running the 2 queries against some test tables.
I would use the second version in any case, just to play safe.
Let me add that:
COUNT(*) is usually more efficient than COUNT(notNullableField) in MySQL. Until that is fixed in future MySQL versions, use COUNT(*) where applicable.
Therefore, you can also use:
SELECT user_id
FROM forum_posts
WHERE user_id NOT IN
( SELECT user_id FROM banned_users )
GROUP BY user_id
HAVING COUNT(*) >= 100
There are also other ways to achieve same (to NOT IN) sub-results before applying GROUP BY.
Using LEFT JOIN / NULL :
SELECT fp.user_id
FROM forum_posts AS fp
LEFT JOIN banned_users AS bu
ON bu.user_id = fp.user_id
WHERE bu.user_id IS NULL
GROUP BY fp.user_id
HAVING COUNT(*) >= 100
Using NOT EXISTS :
SELECT fp.user_id
FROM forum_posts AS fp
WHERE NOT EXISTS
( SELECT *
FROM banned_users AS bu
WHERE bu.user_id = fp.user_id
)
GROUP BY fp.user_id
HAVING COUNT(*) >= 100
Which of the 3 methods is faster depends on your table sizes and a lot of other factors, so best is to test with your data.
HAVING conditions are applied to the grouped by results, and since you group by user_id, all of their possible values will be present in the grouped result, so the placing of the user_id condition is not important.
To me, second query is more efficient because it lowers the number of records for GROUP BY and HAVING.
Alternatively, you may try the following query to avoid using IN:
SELECT `fp`.`user_id`
FROM `forum_posts` `fp`
LEFT JOIN `banned_users` `bu` ON `fp`.`user_id` = `bu`.`user_id`
WHERE `bu`.`user_id` IS NULL
GROUP BY `fp`.`user_id`
HAVING COUNT(`fp`.`id`) >= 100
Hope this helps.
No it does not gives same results.
Because first query will filter records from count(id) condition
Another query filter records and then apply having clause.
Second Query is correctly written