MySQL Natural Sort (like OSX Finder) - mysql

I've searched for this for a long time, but the solutions I've found aren't working as I need.
Let me explain: I have a table containing a couple of thousands of products, each one with an alphanumeric SKU, used also for sorting.
This SKU consists of:
Category Code (variable number of alphabetic characters),
Product Number (integer),
Product Model Variation (optional, variable number of alphabetic characters)
For example: MANT 12 CL (without spaces)
Now, I need to get them ordered like this (and if these were filenames, OSX Finder would order them perfectly):
MANT1
MANT2
MANT2C
MANT2D
MANT2W
MANT3
MANT4C
MANT9
MANT12
MANT12C
MANT12CL
MANT12P
MANT13
MANT21
MANT24
MANT24D
MANT29
Of course ORDER BY sku is plainly wrong:
MANT1
MANT12
MANT12C
MANT12CL
MANT12P
MANT13
MANT2
MANT21
MANT24
MANT24D
MANT29
MANT2C
MANT2D
MANT2W
MANT3
MANT4C
MANT9
And ORDER BY LENGTH(sku), sku has problems sorting the model variations:
MANT1
MANT2
MANT3
MANT9
MANT12
MANT13
MANT21
MANT24
MANT29
MANT2C
MANT2D
MANT2W
MANT4C
MANT12C
MANT12P
MANT24D
MANT12CL
So, is there a way to sort this stuff like Finder would?
(Also, once sorted, is there a way to get the next and previous product? I don't mind using several queries: at this point elegance is the last of my problems...)
Thanks everybody in advance.
One last thing: during my searches I've found this answer to a similar question
but I have no idea how to use it in PHP, so I don't know if it works and is actually an answer to my question.

Are you using PHP when fetching data?
If so, try using natural sort function for in memory sort after data is already loaded?

The order is not 'plain wrong', it simply depends what collation you use. In your case, you might try the binary collation, for example, 'latin1_bin'.
Following example the ORDER BY using COLLATE for UTF8 data:
mysql> SELECT c1 FROM t1 ORDER BY c1;
+------+
| c1 |
+------+
| a1 |
| a12 |
| a13c |
| a2 |
| a21 |
+------+
mysql> SELECT c1 FROM t1 ORDER BY c1 COLLATE 'utf8_bin';
+------+
| c1 |
+------+
| a1 |
| a12 |
| a2 |
| a21 |
| a13c |
+------+

Related

MySql Regexp result word part of known word

Been struggling for this for awhile.
Is there a way to find all rows in my table where the word in the column 'word' is a part of a search word?
+---------+-----------------+
| id_word | word |
+---------+-----------------+
| 177041 | utvälj |
| 119270 | fonders |
| 39968 | flamländarens |
| 63567 | hänvisningarnas |
| 61244 | hovdansers |
+---------+-----------------+
I want to extract the row 119270, fonders. I want to do this by passing in the word 'plafonders'.
SELECT * FROM words WHERE word REGEXP 'plafonders$'
That query will of course not work in this case, would've been perfect if it had been the other way around.
Does anyone know a solution to this?
SELECT * FROM words WHERE 'plafonders' REGEXP concat(word, '$')
should accomplish what you want. Your regex:
plafonders$
is looking for plafonders at the end of the column. This is looking for everything the column has until its end, e.g. the regexp is fonders$ for 119270.
See https://regex101.com/r/Ytb3kg/1/ compared to https://regex101.com/r/Ytb3kg/2/.
MySQL's REGEXP does not handle accented letters very well. Perhaps it will work OK in your limited situation.
Here's a slightly faster approach (though it still requires a table scan):
SELECT * FROM words
WHERE 'PLAutvälj' =
RIGHT('PLAutvälj', CHAR_LENGTH(word)) = word;
(To check the accents, I picked a different word from your table.)

Mysql query like number greater than x

I have a field for comments used to store the title of the item sold on the site as well as the bid number (bid_id). Unfortunately, the bid_id is not stored on its own in that table.
I want to query items that have a number (the bid_id) greater than 4,000 for example.
So, what I have is:
select * from mysql_table_name where comment like '< 4000'
I know this won't work, but I need something similar that works.
Thanks a lot!
Just get your bid_id column cleaned up. Then index is.
create table `prior`
( id int auto_increment primary key,
comments text not null
);
insert `prior` (comments) values ('asdfasdf adfas d d 93827363'),('mouse cat 12345678');
alter table `prior` add column bid_id int; -- add a nullable int column
select * from `prior`; -- bid_id is null atm btw
update `prior` set bid_id=right(comments,8); -- this will auto-cast to an int
select * from `prior`;
+----+-----------------------------+----------+
| id | comments | bid_id |
+----+-----------------------------+----------+
| 1 | asdfasdf adfas d d 93827363 | 93827363 |
| 2 | mouse cat 12345678 | 12345678 |
+----+-----------------------------+----------+
Create the index:
CREATE INDEX `idxBidId` ON `prior` (bid_id); -- or unique index
select * from mysql_table_name where substring(comment,start,length, signed integer) < 4000
This will work, but I suggest create new column and put the bid value in it then compare.
To update value in new column you can use
update table set newcol = substring(comment,start,length)
Hope this will help
There is nothing ready that works like that.
You could write a custom function or loadable UDF, but it would be a significant work, with significant impact on the database. Then you could run WHERE GET_BID_ID(comment) < 4000.
What you can do more easily is devise some way of extracting the bid_id using available string functions.
For example if the bid_id is always in the last ten characters, you can extract those, and replace all characters that are not digits with nil. What is left is the bid_id, and that you can compare.
Of course you need a complex expression with LENGTH(), SUBSTRING(), and REPLACE(). If the bid_id is between easily recognizable delimiters, then perhaps SUBSTRING_INDEX() is more your friend.
But better still... add an INTEGER column, initialize it to null, then store there the extracted bid_id. Or zero if you're positive there's no bid_id. Having data stored in mixed contexts is evil (and a known SQL antipattern to boot). Once you have the column available, you can select every few seconds a small number of items with new_bid_id still NULL and subject those to extraction, thereby gradually amending the database without overloading the system.
In practice
This is the same approach one would use with more complicated cases. We start by checking what we have (this is a test table)
SELECT commento FROM arti LIMIT 3;
+-----------------------------------------+
| commento |
+-----------------------------------------+
| This is the first comment 100 200 42500 |
| Another 7 Q 32768 |
| And yet another 200 15 55332 |
+-----------------------------------------+
So we need the last characters:
SELECT SUBSTRING(commento, LENGTH(commento)-5) FROM arti LIMIT 3;
+-----------------------------------------+
| SUBSTRING(commento, LENGTH(commento)-5) |
+-----------------------------------------+
| 42500 |
| 32768 |
| 55332 |
+-----------------------------------------+
This looks good but it is not; there's an extra space left before the ID. So 5 doesn't work, SUBSTRING is 1-based. No matter; we just use 4.
...and we're done.
mysql> SELECT commento FROM arti WHERE SUBSTRING(commento, LENGTH(commento)-4) < 40000;
+-------------------+
| commento |
+-------------------+
| Another 7 Q 32768 |
+-------------------+
mysql> SELECT commento FROM arti WHERE SUBSTRING(commento, LENGTH(commento)-4) BETWEEN 35000 AND 55000;
+-----------------------------------------+
| commento |
+-----------------------------------------+
| This is the first comment 100 200 42500 |
+-----------------------------------------+
The problem is if you have a number not of the same length (e.g. 300 and 131072). Then you need to take a slice large enough for the larger number, and if the number is short, you will get maybe "1 5 300" in your slice. That's where SUBSTRING_INDEX comes to the rescue: by capturing seven characters, from " 131072" to "1 5 300", the ID will always be in the last space separated token of the slice.
IN THIS LAST CASE, when numbers are not of the same length, you will find a problem. The extracted IDs are not numbers at all - to MySQL, they are strings. Which means that they are compared in lexicographic, not numerical, order; and "17534" is considered smaller than "202", just like "Alice" comes before "Bob". To overcome this you need to cast the string as unsigned integer, which further slows down the operations.
WHERE CAST( SUBSTRING(...) AS UNSIGNED) < 4000

MySQL - GROUP_CONCAT if value is not a substring

I have a column called "Permissions" in my table. The permissions are strings which can be:
"r","w","x","rw","wx","rwx","xwr"
etc. Please note the order of characters in the string is not fixed. I want to GROUP_CONCAT() on the "Permissions" column of my table. However this causes very large strings.
Example: "r","wr","wx" group concatenated is "r,wr,wx" but should be "r,w,x" or "rwx". Using distinct() clause doesn't seem to help much. I am thinking that if I could check if a permission value is a substring of the other column then I should not concatenate it, but I don't seem to find a way to accomplish that.
Any column based approach using solely string functions would also be appreicated.
EDIT:
Here is some sample data:
+---------+
| perm |
+---------+
| r,x,x,r |
| x |
| w,rw |
| rw |
| rw |
| x |
| w |
| x,x,r |
| r,x |
+---------+
The concatenated result should be:
+---------+
| perm |
+---------+
| r,w,x |
+---------+
I don't have control over the source of data and would like not to create new tables ( because of restricted privileges and memory constraints). I am looking for a post-processing step that converts each column value to the desired format.
A good idea would be to first normalize your data.
You could, for example try this way (I assume your source table is named Files):
Create simple table called PermissionCodes with only column named Code (type of string).
Put r, w, and x as values into PermissionCodes (three rows total).
In a subquery join Files to PermissionCodes on a condition that Code exists as a substring in Permissions.
Perform your GROUP_CONCAT aggregation on the result of the subquery.
If it is a case here, that for the same logical entires in Files there exists multiple permission sets that overlaps (i.e. for some file there is a row with rw and another row with w) then you would limit your subquery to distinct combinations of Files' keys and Code.
Here's a fiddle to demonstrate the idea:
http://sqlfiddle.com/#!9/6685d6/4
You can try something like:
SELECT user_id, GROUP_CONCAT(DISTINCT perm)
FROM Permissions AS p
INNER JOIN (SELECT 'r' AS perm UNION ALL
SELECT 'w' UNION ALL
SELECT 'x') AS x
ON p.permission LIKE CONCAT('%', x.perm, '%')
GROUP BY user_id
You can include any additional permission code in the UNION ALL of the derived table used to JOIN with Permissions table.
Demo here

MySQL Fulltext search present me inaccurate result

Let's say that I have a database that looks like this (MyISAM):
+------------+-------------------+------------------+
| student_id | student_firstname | student_lastname |
+------------+-------------------+------------------+
| 30 | Patrik | Andersson |
| 79 | Patrik | Svensson |
+------------+-------------------+------------------+
And I perform this query:
SELECT s.student_firstname, s.student_lastname FROM students s
WHERE MATCH (student_firstname, student_lastname)
AGAINST
('+Patrik Svensson*' IN BOOLEAN mode)
This generates both of the above rows. Why do I not get 1 row in my result? Is it because the last three letters in the student_lastname are the same? Is there any way to make FULLTEXT more precise?
Have you tried reading the MySQL documentation?
http://dev.mysql.com/doc/refman/5.5/en/fulltext-boolean.html
And I quote:
By default (when neither + nor - is specified) the word is optional,
but the rows that contain it are rated higher.
And:
'+apple macintosh'
Find rows that contain the word “apple”, but rank rows higher if they
also contain “macintosh”.
I have tested it, this query is giving right result
SELECT s.student_firstname, s.student_lastname FROM students s
WHERE MATCH (student_firstname, student_lastname)
AGAINST
('+Patrik +Svensson*' IN BOOLEAN mode)

Max occurences of a given value in a table

I have a table (pretty big one) with lots of columns, two of them being "post" and "user".
For a given "post", I want to know which "user" posted the most.
I was first thinking about getting all the entries WHERE (post='wanted_post') and then throw a PHP hack to find which "user" value I get the most, but given the large size of my table, and my poor knowledge of MySQL subtle calls, I am looking for a pure-MySQL way to get this value (the "user" id that posted the most on a given "post", basically).
Is it possible ? Or should I fall back on the hybrid SQL-PHP solution ?
Thanks,
Cystack
It sounds like this is what you want... am I missing something?
SELECT user
FROM myTable
WHERE post='wanted_post'
GROUP BY user
ORDER BY COUNT(*) DESC
LIMIT 1;
EDIT: Explanation of what this query does:
Hopefully the first three lines make sense to anyone familiar with SQL. It's the last three lines that do the fun stuff.
GROUP BY user -- This collapses rows with identical values in the user column. If this was the last line in the query, we might expect output something like this:
+-------+
| user |
+-------+
| bob |
| alice |
| joe |
ORDER BY COUNT(*) DESC -- COUNT(*) is an aggregate function, that works along with the previous GROUP BY clause. It tallies all of the rows that are "collapsed" by the GROUP BY for each user. It might be easier to understand what it's doing with a slightly modified statement, and it's potential output:
SELECT user,COUNT(*)
FROM myTable
WHERE post='wanted_post'
GROUP BY user;
+-------+-------+
| user | count |
+-------+-------+
| bob | 3 |
| alice | 1 |
| joe | 8 |
This is showing the number of posts per user.
However, it's not strictly necessary to actually output the value of an aggregate function in this case--we can just use it for the ordering, and never actually output the data. (Of course if you want to know how many posts your top-poster posted, maybe you do want to include it in your output, as well.)
The DESC keyword tells the database to sort in descending order, rather than the default of ascending order.
Naturally, the sorted output would look something like this (assuming we leave the COUNT(*) in the SELECT list):
+-------+-------+
| user | count |
+-------+-------+
| joe | 8 |
| bob | 3 |
| alice | 1 |
LIMIT 1 -- This is probably the easiest to understand, as it just limits how many rows are returned. Since we're sorting the list from most-posts to fewest-posts, and we only want the top poster, we just need the first result. If you wanted the top 3 posters, you might instead use LIMIT 3.