I have a MySQL table t1 a text field f1. I have this query to find the top 100 most common values of f1 along with their frequency:
SELECT COUNT(*) AS c, f1 FROM t1 GROUP BY f1 ORDER BY c DESC LIMIT 100;
What I need now is a query to find out what are the longest values of f1 that occur most often. That is, I want to first order the records of the table by frequencies (like the query above does) and then I want to order them by length and grab the top 100. I tried doing that with this query but it doesn't return what I want, it simply returns the records with the longest values of f1 (most of them with only 1 occurrences):
SELECT f1, LENGTH(f1) AS l, COUNT(*) AS c FROM t1 GROUP BY f1, LENGTH(f1) ORDER BY l DESC, c DESC LIMIT 100;
My table has more than 44M records in case that matters.
Thanks.
You said you want to order by the frequency then the length, but you ask for the order by length then frequency. Reverse your ORDER BY clause.
Related
Suppose I have four tables: tbl1 ... tbl4. Each has a unique numerical id field. tbl1, tbl2 and tbl3 each has a foreign key field for the next table in the sequence. E.g. tbl1 has a tbl2_id foreign key field, and so on. Each table also has a field order (and other fields not relevant to the question).
It is straightforward to join all four tables to return all rows of tbl1 together with corresponding fields from the other three fields. It is also easy to order this result set by a specific ORDER BY combination of the order fields. It is also easy to return just the row that corresponds to some particular id in tbl1, e.g. WHERE tbl1.id = 7777.
QUESTION: what query most efficiently returns (e.g.) 100 rows, starting from the row corresponding to id=7777, in the order determined by the specific combination of order fields?
Using ROW_NUMBER or (an emulation of it in MySQL version < 8) to get the position of the id=7777 row, and then using that in a new version of the same query to set the offset in the LIMIT clause would be one approach. (With a read lock in between.) But can it be done in a single query?
# FIRST QUERY: get row number of result row where tbl1.id = 7777
SELECT x.row_number
FROM
(SELECT #row_number:=#row_number+1 AS row_number, tbl1.id AS id
FROM (SELECT #row_number:=0) AS t, tbl1
INNER JOIN tbl2 ON tbl2.id = tbl1.tbl2_id
INNER JOIN tbl3 ON tbl3.id = tbl2.tbl3_id
INNER JOIN tbl4 ON tbl4.id = tbl3.tbl4_id
WHERE <some conditions>
ORDER BY tbl4.order, tbl3.order, tbl2.order, tbl1.order
) AS x
WHERE id=7777;
Store the row number from the above query and use it to bind :offset in the following query.
# SECOND QUERY : Get 100 rows starting from the one with id=7777
SELECT x.field1, x.field2, <etc.>
FROM
(SELECT #row_number:=#row_number+1 AS row_number, field1, field2
FROM (SELECT #row_number:=0) AS t, tbl1
INNER JOIN tbl2 ON tbl2.id = tbl1.tbl2_id
INNER JOIN tbl3 ON tbl3.id = tbl2.tbl3_id
INNER JOIN tbl4 ON tbl4.id = tbl3.tbl4_id
WHERE <same conditions as before>
ORDER BY tbl4.order, tbl3.order, tbl2.order, tbl1.order
) AS x
LIMIT :offset, 100;
Clarify question
In the general case, you won't ask for WHERE id1 > 7777. Instead, you have a tuple of (11,22,33,44) and you want to "continue where you left off".
Two discussions, with
That is messy, but not impossible. See Iterating through a compound key . Ig gives an example of doing it with 2 columns; 4 columns coming from 4 tables is an extension of such.
A variation
Here is another discussion of such: https://dba.stackexchange.com/questions/164428/should-i-store-data-pre-ordered-rather-than-ordering-on-the-fly/164755#164755
In actually implementing such, I have found that letting the "100" (LIMIT) be flexible can be easier to think through. The idea is: reach forward 100 rows (with LIMIT 100,1). Let's say you get (111,222,333,444). If you are currently at (111, ...), then deal with id2/3/4. If it is, say, (113, ...), then do WHERE id1 < 113 and leave off any specification of id2/3/4. This means fetching less than 100 rows, but it lands you just shy of starting id1=113.
That is, it involves constructing a WHERE clause with between 1 and 4 conditions.
In all cases, your query says ORDER BY id1, id2, id3, id4. And the only use for LIMIT is in the probe to figure out how far ahead the 100th row is (with LIMIT 100,1).
I think I can dig out some old Perl code for that.
I'm trying to fetch a MIN() and MAX() value from a query so that I can use the resulting values in a PHP function but can't seem to work out how to do it because there is a LIMIT involved.
SELECT MIN(ID) AS MinID, MAX(ID) AS MaxID
FROM parts_listing
WHERE BaseGroup = 0
LIMIT 0,50;
This is a dynamically-generated LIMIT used by a pagination function so in this example, should give MinID = 1 and MAXID = 50 but instead gives MinID = 1 and MinID = 129, which is the number of total records in BaseGroup 0. If I use any of the other BaseGroup values, it also gives the total values. If I change the starting record of LIMIT to, for example, LIMIT 10,50 I get nothing whatsoever.
I realized that there are similar questions here but they did not help in this specific case. Any ideas?
LIMIT is applied after processing the MIN/MAX, when there's only a single row left.
You need to move it to a Derived Table:
SELECT MIN(ID) AS MinID, MAX(ID) AS MaxID
FROM
(
SELECT ID
FROM parts_listing
WHERE BaseGroup = 0
LIMIT 0,50
) as dt
And you probably need an ORDER BY, too.
LIMIT does not limit how record are using mysql to calculate MIN or MAX functions. ALL RECORDS that matches WHERE criteria are used to calculate results
In other word LIMIT has no any sense with aggregate functions
LIMIT does not see query, just see result set, if query outputs more than one row, limits works with that
You need to use WHERE clausule, to "limit" rows used in aggregate functions
Say I have a table that stores each and every time someone does something (let's say jumps)
The table has a JumpNumber (auto-increments each time there's an insert, so there's one for every jump rather than this being a total). It also records the member who jumped as MemberID, and the time they jumped at.
I would like to make a query that finds the most occurring member then gives their ID and every time at which they've ever jumped.
However, if there's 2 or more members with the most jumps (so a tie) it should still display each of them, with their jump times.
So I couldn't just do a descending order and limit to 1. I'm also confused as to how I should find the most reoccurring member, I'm guessing a COUNT but not 100% sure how.
Well it would be something like:
SELECT USER_ID
FROM YOURTABLE A
WHERE JUMPS = (SELECT MAX(JUMPS)
FROM YOURTABLE B)
This will return all USER_ID with the most Jumps, then you can select all records where the selectes user(s) made something
If you store jumps, use variant by Xavjer
If you not store jumps, first you have find max count
select user_id, count(*) as c from TABLE group by user_id order by c desc limit 1
After that you have do same grouping again and select all user_id with that count and left join original table for other fields.
select A.* from (
select user_id from
(select user_id, count(*) as c from TABLE group by user_id) as tempB
) as tempC where tempC.c=(
select count(*) as c from TABLE group by user_id order by c desc limit 1
)
) as join_table1
LEFT JOIN TABLE as A on A.user_id=join_table1;
Okay, lets say I have the following MySQL query:
SELECT table1.*, COUNT(table2.link_id) AS count
FROM table1
LEFT JOIN table2 on (table1.key = table2.link_id)
GROUP BY table1.key
ORDER BY table1.name ASC
LIMIT 20
Simple right? It returns the table1 info, with the number of times each row is linked in table2.
However, you'll notice that it limits the resulting rows to 20... and sorts the resulting rows by table1.name. What this does is return the top 20 results in alphabetical order.
What I was wondering if there was a way I could limit to the top 20 results based on count in descending order; while ALSO getting the remaining 20 results in alphabetical order. I know I can simply sort the returned array in a followup code, but I'm wondering if there is a way to do this in a single query.
Use subselect for limit, and sort in the outer select
SELECT * FROM (SELECT table1.*, COUNT(table2.link_id) AS count
FROM table1
LEFT JOIN table2 on (table1.key = table2.link_id)
GROUP BY table1.key
ORDER BY count DESC
LIMIT 20 ) t
ORDER BY name ASC
Exist a better way to do what the following SQL query does? I have the feeling that table1 will be searched twice and may be that can be avoided with some trick and increase the efficient of the query, but I just can't figure out how ;( Here is the query (in MySQL):
SELECT a, SUM(count)
FROM table1
GROUP BY a
HAVING SUM(count) = (SELECT SUM(count) as total FROM table1 GROUP BY a ORDER BY total DESC LIMIT 1)
The goal is return the number(s) with the major accumulate, with its accumulate.
being table1 a two field table like:
a,count
1,10
1,30
1,0
2,1
2,100
2,4
3,10
4,50
4,55
The result with that data sample is:
2,105
4,105
Thanks in advance.
SELECT a, total FROM
(SELECT a AS a, SUM(COUNT) AS total
FROM table1
GROUP BY a) AS xyz
HAVING total = MAX(total)
Hope this will work for you
This sub-query is executed only once, and you don't have to bother with creating any pre-query as other answers may suggest (although doing so this is still correct, just not needed). Database engine will realise, that the sub-query is not using any variable dependent on the other part of the query. You can use EXPLAIN to see how the query is executed.
More on the topic in this answer:
https://stackoverflow.com/a/658954/1821029
I think you could probably do it by moving your HAVING sub-select query into its on prequery. Since it will always include a single row, you won't require any "JOIN", and it does not have to keep recomputing the COUNT(*) every time the HAVING is applied. Do it once, then the rest
SELECT
a,
SUM(count)
FROM
table1,
( SELECT SUM(count) as total
FROM table1
GROUP BY a
ORDER BY total DESC
LIMIT 1 ) PreQuery
GROUP BY
a
HAVING
SUM(count) = PreQuery.Total
This query return one row with two columns:
1- a list of comma separated values of "a" column, which have the biggest "Total"
2- and the biggest Total value
select group_concat(a), Total
from
(select a, sum(count) as Total
from table1
group by a) OnTableQuery
group by Total
order by Total desc
limit 1
Note that it queries table1 just one time. The query was already tested.