Alternative to "ntile" for MySQL version lower than 8? - mysql

I am trying the below code, which analyses and scores customers based on recency, frequency and monetary value of transactions.
select customer_id, rfm_recency, rfm_frequency, rfm_monetary
from
(
select customer_id,
ntile(4) over (order by last_order_date) as rfm_recency,
ntile(4) over (order by count_order) as rfm_frequency,
ntile(4) over (order by sum_amount) as rfm_monetary
from
(
select customer_id,
max(local_date) as last_order_date,
count(*) as count_order,
sum(amount) as sum_amount
from transaction
group by customer_id) as T
) as P
However ntile is not available in my MySQL version (v5) as apparently it's a "window function" which works on v8+ only.
I can't find a working alternative to this function. I am very new to SQL so I'm having a hard time figuring it out myself.
Is there an ntile alternative that I can use? The code works fine if i remove the ntile segment.

You should really upgrade to MySQL 8.0 if you need features in MySQL 8.0. They are bound to be easier and more optimized.
I found a way to simulate the ntile query shown in the documentation:
SELECT
val,
ROW_NUMBER() OVER w AS 'row_number',
NTILE(2) OVER w AS 'ntile2',
NTILE(4) OVER w AS 'ntile4'
FROM numbers
WINDOW w AS (ORDER BY val);
Here's a solution:
SELECT val, #r:=#r+1 AS rownum,
FLOOR((#r-1)*2/9)+1 AS ntile2,
FLOOR((#r-1)*4/9)+1 AS ntile4
FROM (SELECT #r:=0,#n:=0) AS _init, numbers
The 2 and 4 factors are for the ntile(2) and ntile(4) respectively. The 9 value is because there are 9 rows in this example table. You must know the count of the table before you can run this query. The solution also requires user defined variables, which are always kind of tricky.
Result:
+------+--------+--------+--------+
| val | rownum | ntile2 | ntile4 |
+------+--------+--------+--------+
| 1 | 1 | 1 | 1 |
| 1 | 2 | 1 | 1 |
| 2 | 3 | 1 | 1 |
| 3 | 4 | 1 | 2 |
| 3 | 5 | 1 | 2 |
| 3 | 6 | 2 | 3 |
| 4 | 7 | 2 | 3 |
| 4 | 8 | 2 | 4 |
| 5 | 9 | 2 | 4 |
+------+--------+--------+--------+
I'll leave it as an exercise for you to adapt this technique to your query and your table, or to decide that it's time to upgrade to MySQL 8.0.

You can enumerate rows and use arithmetic. Unfortunately, you'll need to do this three times:
select floor(seqnum * 4 / #rn) as ntile_recency, t.*
from (select (#rn := #rn + 1) as seqnum, t.*
from (select customer_id, max(local_date) as last_order_date, count(*) as count_order,
sum(amount) as sum_amount
from transaction
group by customer_id
order by last_order_date
) t cross join
(select #rn := 0) params
) t;

Related

How to get maximum appearance count of number from comma separated number string from multiple rows in MySQL?

My MySQL table having column with comma separated numbers. See below example -
| style_ids |
| ---------- |
| 5,3,10,2,7 |
| 1,5,12,9 |
| 6,3,5,9,4 |
| 8,3,5,7,12 |
| 7,4,9,3,5 |
So my expected result should have top 5 numbers with maximum appearance count in descending order as 5 rows as below -
| number | appearance_count_in_all_rows |
| -------|----------------------------- |
| 5 | 5 |
| 3 | 4 |
| 9 | 3 |
| 7 | 2 |
| 4 | 2 |
Is it possible to get above result by MySQL query ?
As already alluded to in the comments, this is a really bad idea. But here is one way of doing it -
WITH RECURSIVE seq (n) AS (
SELECT 1 UNION ALL SELECT n+1 FROM seq WHERE n < 20
), tbl (style_ids) AS (
SELECT '5,3,10,2,7' UNION ALL
SELECT '1,5,12,9' UNION ALL
SELECT '6,3,5,9,4' UNION ALL
SELECT '8,3,5,7,12' UNION ALL
SELECT '7,4,9,3,5'
)
SELECT seq.n, COUNT(*) appearance_count_in_all_rows
FROM seq
JOIN tbl ON FIND_IN_SET(seq.n, tbl.style_ids)
GROUP BY seq.n
ORDER BY appearance_count_in_all_rows DESC
LIMIT 5;
Just replace the tbl cte with your table.
As already pointed out you should fix the data if possible.
For further details read Is storing a delimited list in a database column really that bad?.
You could use below answer which is well explained here and a working fiddle can be found here.
Try,
select distinct_nr,count(distinct_nr) as appearance_count_in_all_rows
from ( select substring_index(substring_index(style_ids, ',', n), ',', -1) as distinct_nr
from test
join numbers on char_length(style_ids) - char_length(replace(style_ids, ',', '')) >= n - 1
) x
group by distinct_nr
order by appearance_count_in_all_rows desc ;

Mysql update a column with removing differences

I don't know how to explain this in words. So please let me say an example.
Suppose the items table sorted by order column:
| id | name | order |
| 5 | x | 1 |
| 2 | y | 3 |
| 3 | z | 4 |
| 7 | p | 8 |
I want to update order column in a way which each of them has 1 difference with their successive row with keeping the order.
Desired result:
| id | name | order |
| 5 | x | 1 |
| 2 | y | 2 |
| 3 | z | 3 |
| 7 | p | 4 |
Edit:
Selecting row_number() isn't my solution as I want to change orders and I'm not just looking for the row number.
In MySQL8, just use row_number():
select t.*,
row_number() over(order by ord) as new_ord
from mytable t
This demonstrates that the information can easily be computed on the fly when needed and leads to the finding that storing such derived information might not be a good idea. It is tedious to keep it up to date when new rows are added or deleted.
Instead, you can use the above query, or put it in a view:
create view myview as
select t.*,
row_number() over(order by ord) as new_ord
from mytable t
Note: order is a language keyword, I used ord instead.
If you really need an update, for a one-time task for example:
update mytable t
inner join (
select id, row_number() over(order by ord) as new_ord from mytable
) t1 on t1.id = t.id
set t.ord = t1.new_ord
I would suggest using view for such requirement as also mentioned in other answer.
If this is the one time activity and if order is unique for each record then you can use the following query which uses corelated sub-query.
Update your_table t
Set t.order = (select count(1)
From your_table tt where tt.order <= t.order);

Leaderboard position SQL optimization

I'm offering an experience leaderboard for a Discord bot I actively develop with stuff like profile cards showing one's rank. The SQL query I'm currently using works flawlessly, however I notice that this query takes a rather long processing time.
SELECT id,
discord_id,
discord_tag,
xp,
level
FROM (SELECT #rank := #rank + 1 AS id,
discord_id,
discord_tag,
xp,
level
FROM profile_xp,
(SELECT #rank := 0) r
ORDER BY xp DESC) t
WHERE discord_id = '12345678901';
The table isn't too big (roughly 20k unique records), but this query is taking anywhere between 300-450ms on average, which piles up relatively fast with a lot of concurrent requests.
I was wondering if this query can be optimized to increase performance. I've isolated this to this query, the rest of the MySQL server is responsive and swift.
I'd be happy about any hint and thanks in advance! :)
You're scanning 20,000 rows to assign "row numbers" then selecting exactly one row from it. You can use aggregation instead:
SELECT *, (
SELECT COUNT(*)
FROM profile_xp AS x
WHERE xp > profile_xp.xp
) + 1 AS rnk
FROM profile_xp
WHERE discord_id = '12345678901'
This will give you rank of the player. For dense rank use COUNT(DISTINCT xp). Create an index on xp column if necessary.
Not an answer; too long for a comment:
I usually write this kind of thing exactly the same way that you have done, because it's quick and easy, but actually there's a technical flaw with this method - although it only becomes apparent in certain situations.
By way of illustration, consider the following:
DROP TABLE IF EXISTS ints;
CREATE TABLE ints (i INT NOT NULL PRIMARY KEY);
INSERT INTO ints VALUES
(0),(1),(2),(3),(4),(5),(6),(7),(8),(9);
Your query:
SELECT a.*
, #i:=#i+1 rank
FROM ints a
JOIN (SELECT #i:=0) vars
ORDER
BY RAND() DESC;
+---+------+
| i | rank |
+---+------+
| 3 | 4 |
| 2 | 3 |
| 5 | 6 |
| 1 | 2 |
| 7 | 8 |
| 9 | 10 |
| 4 | 5 |
| 6 | 7 |
| 8 | 9 |
| 0 | 1 |
+---+------+
Look, the result set isn't 'random' at all. rank always corresponds to i
Now compare that with the following:
SELECT a.*
, #i:=#i+1 rank
FROM
( SELECT * FROM ints ORDER by RAND() DESC) a
JOIN (SELECT #i:=0) vars;
+---+------+
| i | rank |
+---+------+
| 5 | 1 |
| 2 | 2 |
| 8 | 3 |
| 7 | 4 |
| 4 | 5 |
| 6 | 6 |
| 0 | 7 |
| 1 | 8 |
| 3 | 9 |
| 9 | 10 |
+---+------+
Assuming discord_id is the primary key for the table, and you're just trying to get one entry's "rank", you should be able to take a different approach.
SELECT px.discord_id, px.discord_tag, px.xp, px.level
, 1 + COUNT(leaders.xp) AS rank
, 1 + COUNT(DISTINCT leaders.xp) AS altRank
FROM profile_xp AS px
LEFT JOIN profile_xp AS leaders ON px.xp < leaders.xp
WHERE px.discord_id = '12345678901'
GROUP BY px.discord_id, px.discord_tag, px.xp, px.level
;
Note I have "rank" and "altRank". rank should give you a similar position to what you were originally looking for; your results could have fluctuated for "ties", this rank will always put tied players at their highest "tie". If 3 records tie for 2nd place, those (queried separately with this) will show 2nd place, the next xp down would should 5th place (assuming 1 in 1st, 2,3,4 in 2nd, 5 in 5th). The altRank would "close the gaps" putting 5 in the 3rd place "group".
I would also recommend an index on xp to speed this up further.

Select max version number for each group consisting of multiple columns

I have the following tables:
Apps
TYPE_ID | BUILD_ID | CONFIG_ID | VERSION_ID | (All foreign keys to the respective tables)
1 | 1 | 1 | 1 |
1 | 1 | 1 | 2 |
2 | 2 | 3 | 3 |
2 | 2 | 3 | 4 |
Versions
ID | major | minor | patch
1 | 1 |0 |1
2 | 2 |0 |0
3 | 3 |0 |3
4 | 4 |0 |0
I need to select highest version rows from Apps table for each unique combinations of TYPE_ID, BUILD_ID and CONFIG_ID.
The version number should be calculated by MAX(major * 1000000 + minor * 1000 + patch) in the versions table.
So from the given example of the Apps table the result would be:
TYPE_ID | BUILD_ID | CONFIG_ID | VERSION_ID |
1 | 1 | 1 | 2 |
2 | 2 | 3 | 4 |
Have tried something like this:
SELECT p1.* FROM Apps p1
INNER JOIN (
SELECT max(VERSION_ID) MaxVersion, CONFIG_ID
FROM Apps
GROUP BY CONFIG_ID
) p2
ON p1.CONFIG_ID = p2.CONFIG_ID
AND p1.VERSION_ID = p2.MaxVersion
GROUP BY `TYPE_ID`, `BUILD_ID`, `CONFIG_ID`
But MAX is applied on the VERSION_ID and I need MAX to be applied on major, minor and patch combinations.
MySQL Version 15.1 distribution 5.5.56-MariaDB
Any help would be appreciated.
Cheers!
You can compute the maximum version per type_id, build_id, config_id using the formula described in your question, use it again same formula to locate the version:
SELECT sq.type_id, sq.build_id, sq.config_id, versions.id AS version_id_max
FROM (
SELECT type_id, build_id, config_id, MAX(major * 1000000 + minor * 1000 + patch) AS max_version
FROM apps
INNER JOIN versions ON apps.version_id = versions.id
GROUP BY type_id, build_id, config_id
) sq
INNER JOIN versions ON max_version = major * 1000000 + minor * 1000 + patch
+---------+----------+-----------+----------------+
| type_id | build_id | config_id | version_id_max |
+---------+----------+-----------+----------------+
| 1 | 1 | 1 | 2 |
| 2 | 2 | 3 | 4 |
+---------+----------+-----------+----------------+
Try this :
select type_id, build_id, config_id,
max(1000000*v.major+1000*v.minor+v.patch) as version
from apps a left join versions v on a.version_id=v.id
group by type_id, build_id, config_id
Utilizing Nested Derived subqueries, and a bit of hacky way of identifying VERSION_ID corresponding to MAX VERSION_NO.
We basically first get a derived table determining the VERSION_NO for each row in the Apps table.
Now using that derived table as a source for SELECT, we group by on the TYPE_ID, BUILD_ID and CONFIG_ID, and using a GROUP_CONCAT and string manipulation based trick, we determine the VERSION_ID corresponding to maximum VERSION_NO, for a group.
Try the following:
SELECT nest.TYPE_ID,
nest.BUILD_ID,
nest.CONFIG_ID,
SUBSTRING_INDEX(GROUP_CONCAT(DISTINCT nest.VERSION_ID
ORDER BY nest.VERSION_NO DESC
SEPARATOR ','), ',', 1) AS VERSION_ID
FROM (
SELECT A.TYPE_ID,
A.BUILD_ID,
A.CONFIG_ID,
A.VERSION_ID,
(V.major*1000000 + V.minor*1000 + V.patch) AS VERSION_NO
FROM Apps AS A
INNER JOIN Versions AS V ON V.ID = A.VERSION_ID
) AS nest
GROUP BY nest.TYPE_ID, nest.BUILD_ID, nest.CONFIG_ID
SQL FIDDLE
Try this:
SELECT a1.type_id, a1.build_id, a1.config_id, a1.version_id
FROM apps a1
WHERE NOT EXISTS(
(SELECT 'NEXT'
FROM apps a2
WHERE a2.type_id = a1.type_id
AND a2.build_id = a1.build_id
AND a2.config_id = a1.config_id
AND a2.version_id > a1.version_id))
Try this query, what I do here, I imitate well-known function ROW_NUMBER() OVER (PARTITION BY Type_id, Build_id, Config_id ORDER BY major desc, minor desc, patch desc).
select #type_id_lag := 0, #build_id_lag :=0, #config_id_lag := 0, #rn := 0;
select type_id, build_id, config_id, major, minor, patch from (
select case when #type_id_lag = type_id and
#build_id_lag = build_id and
#config_id_lag = config_id then #rn := #rn + 1 else #rn := 1 rn,
#type_id_lag := type_id type_id,
#build_id_lag := build_id build_id,
#config_id_lag := config_id config_id,
v.major, v.minor, v.patch
from Apps a
left join Versions v on a.version_id = v.id
order by a.type_id, a.build_id, a.config_id,
v.major desc, v.minor desc, v.patch desc
) a where rn = 1;

What is SQL to select a property and the max number of occurrences of a related property?

I have a table like this:
Table: p
+----------------+
| id | w_id |
+---------+------+
| 5 | 8 |
| 5 | 10 |
| 5 | 8 |
| 5 | 10 |
| 5 | 8 |
| 6 | 5 |
| 6 | 8 |
| 6 | 10 |
| 6 | 10 |
| 7 | 8 |
| 7 | 10 |
+----------------+
What is the best SQL to get the following result? :
+-----------------------------+
| id | most_used_w_id |
+---------+-------------------+
| 5 | 8 |
| 6 | 10 |
| 7 | 8 |
+-----------------------------+
In other words, to get, per id, the most frequent related w_id.
Note that on the example above, id 7 is related to 8 once and to 10 once.
So, either (7, 8) or (7, 10) will do as result. If it is not possible to
pick up one, then both (7, 8) and (7, 10) on result set will be ok.
I have come up with something like:
select counters2.p_id as id, counters2.w_id as most_used_w_id
from (
select p.id as p_id,
w_id,
count(w_id) as count_of_w_ids
from p
group by id, w_id
) as counters2
join (
select p_id, max(count_of_w_ids) as max_counter_for_w_ids
from (
select p.id as p_id,
w_id,
count(w_id) as count_of_w_ids
from p
group by id, w_id
) as counters
group by p_id
) as p_max
on p_max.p_id = counters2.p_id
and p_max.max_counter_for_w_ids = counters2.count_of_w_ids
;
but I am not sure at all whether this is the best way to do it. And I had to repeat the same sub-query two times.
Any better solution?
Try to use User defined variables
select id,w_id
FROM
( select T.*,
if(#id<>id,1,0) as row,
#id:=id FROM
(
select id,W_id, Count(*) as cnt FROM p Group by ID,W_id
) as T,(SELECT #id:=0) as T1
ORDER BY id,cnt DESC
) as T2
WHERE Row=1
SQLFiddle demo
Formal SQL
In fact - your solution is correct in terms of normal SQL. Why? Because you have to stick with joining values from original data to grouped data. Thus, your query can not be simplified. MySQL allows to mix non-group columns and group function, but that's totally unreliable, so I will not recommend you to rely on that effect.
MySQL
Since you're using MySQL, you can use variables. I'm not a big fan of them, but for your case they may be used to simplify things:
SELECT
c.*,
IF(#id!=id, #i:=1, #i:=#i+1) AS num,
#id:=id AS gid
FROM
(SELECT id, w_id, COUNT(w_id) AS w_count
FROM t
GROUP BY id, w_id
ORDER BY id DESC, w_count DESC) AS c
CROSS JOIN (SELECT #i:=-1, #id:=-1) AS init
HAVING
num=1;
So for your data result will look like:
+------+------+---------+------+------+
| id | w_id | w_count | num | gid |
+------+------+---------+------+------+
| 7 | 8 | 1 | 1 | 7 |
| 6 | 10 | 2 | 1 | 6 |
| 5 | 8 | 3 | 1 | 5 |
+------+------+---------+------+------+
Thus, you've found your id and corresponding w_id. The idea is - to count rows and enumerate them, paying attention to the fact, that we're ordering them in subquery. So we need only first row (because it will represent data with highest count).
This may be replaced with single GROUP BY id - but, again, server is free to choose any row in that case (it will work because it will take first row, but documentation says nothing about that for common case).
One little nice thing about this is - you can select, for example, 2-nd by frequency or 3-rd, it's very flexible.
Performance
To increase performance, you can create index on (id, w_id) - obviously, it will be used for ordering and grouping records. But variables and HAVING, however, will produce line-by-line scan for set, derived by internal GROUP BY. It isn't such bad as it was with full scan of original data, but still it isn't good thing about doing this with variables. On the other hand, doing that with JOIN & subquery like in your query won't be much different, because of creating temporery table for subquery result set too.
But to be certain, you'll have to test. And keep in mind - you already have valid solution, which, by the way, isn't bound to DBMS-specific stuff and is good in terms of common SQL.
Try this query
select p_id, ccc , w_id from
(
select p.id as p_id,
w_id, count(w_id) ccc
from p
group by id,w_id order by id,ccc desc) xxx
group by p_id having max(ccc)
here is the sqlfidddle link
You can also use this code if you do not want to rely on the first record of non-grouping columns
select p_id, ccc , w_id from
(
select p.id as p_id,
w_id, count(w_id) ccc
from p
group by id,w_id order by id,ccc desc) xxx
group by p_id having ccc=max(ccc);