Why is this select ordering not working as expected? - mysql

My select statement is
SELECT styleNo , right( concat( '######', trim( styleNo ) ) , 6 ) AS keyn
FROM `styles` WHERE 1 ORDER BY keyn
StyleNo is varchar(6) and contains mostly numeric strings from 3 to 6 characters long, some of which include a leading or trailing capital letter. The aim is to show the styles in magnitude order.
I want 100 to 999 first, then A00 to Z99, then 1000 to 9999 and A000 to Z999 etc.
I get these results.
101 ###101
180 ###180
105 ###105
104 ###104
102 ###102
123 ###123
124 ###124
432 ###432
1004 ##1004
1001 ##1001
1002 ##1002
1003 ##1003
1006 ##1006
1234 ##1234
1231 ##1231
1255 ##1255
1288 ##1288
2005 ##2005
2006 ##2006
2007 ##2007
2008 ##2008
Why are the results not in ascending order? keyn is what I expect.
I am using MySQL 5.1.36-community with UTF-8 characters sets everywhere. None of the styleNo shown contain anything other than '0' to '9' characters.
Some further information: I changed the sql statement to
SELECT styleNo , right( concat(repeat('#',5), trim( styleNo ) ) , 6 ) AS keyn
FROM `styles` ORDER BY keyn ASC
and tried with different values for the 5. 3 and larger should all produce suitable values for keyn.
However 3, 4 5, 6 and 7 all produce different orders of the results - all wrong.
A value of 8 works for the first 75 lines in my test file.
Without understanding why, I don't feel confident this is correct.

It seems you want to order by the length of the (trimmed) field, so you could also say:
SELECT styleNo FROM `styles`
WHERE 1
ORDER BY LENGTH(TRIM(`styleNo`)),TRIM(`styleNo`)
potentially saving you a headache as well when styleNo is changed one day to be longer than 6 chars.

Related

Get the average of values in every specific epoch ranges in unix timestamp which returns -1 in specific condition in MySQL

I have a MySQL table which has some records as follows:
unix_timestamp value
1001 2
1003 3
1012 1
1025 5
1040 0
1101 3
1105 4
1130 0
...
I want to compute the average for every 10 epochs to see the following results:
unix_timestamp_range avg_value
1001-1010 2.5
1011-1020 1
1021-1030 5
1031-1040 0
1041-1050 -1
1051-1060 -1
1061-1070 -1
1071-1080 -1
1081-1090 -1
1091-1100 -1
1101-1110 3.5
1111-1120 -1
1121-1130 0
...
I saw some similar answers like enter link description here and enter link description here and enter link description here but these answers are not a solution for my specific question. How can I get the above results?
The easiest way to do this is to use a calendar table. Consider this approach:
SELECT
CONCAT(CAST(cal.ts AS CHAR(50)), '-', CAST(cal.ts + 9 AS CHAR(50))) AS unix_timestamp_range,
CASE WHEN COUNT(t.value) > 0 THEN AVG(t.value) ELSE -1 END AS avg_value
FROM
(
SELECT 1001 AS ts UNION ALL
SELECT 1011 UNION ALL
SELECT 1021 UNION ALL
...
) cal
LEFT JOIN yourTable t
ON t.unix_timestamp BETWEEN cal.ts AND cal.ts + 9
GROUP BY
cal.ts
ORDER BY
cal.ts;
In practice, if you have the need to do this sort of query often, instead of the inline subquery labelled as cal above, you might want to have a full dedicated table representing all timestamp ranges.

MySQL WHERE RAND() Behaviour

Consider the following table foo:
a b v
0 9 1
10 19 2
20 29 3
30 39 4
40 49 5
50 59 6
60 69 7
70 79 8
80 89 9
90 100 10
a and b are the lower and upper boundaries of a certain value v. Example:
x = 79 => v = 8
This can be done with the following statement:
SELECT `v`
FROM `foo`
WHERE 79 BETWEEN `a` AND `b`
Which MySQL correctly returns:
v
8
For each number between 0 and 100 provided as input, MySQL will correctly return one, and only one number between 1 and 10.
Issue
However, if the input number is substituted with a random number generator, the behaviour is somewhat different:
SELECT `v`
FROM `foo`
WHERE ROUND(RAND()*100) BETWEEN `a` AND `b`
Instead of returning only one number, MySQL might return anything from empty result set up to 3 numbers!
Question 1
Is this the expected behaviour of the RAND() statement? What is the reasoning for this apparently weird behavior?
Question 2
Considering the intended purpose, is the statement correct? What would be the correct one? How to correct this behaviour?
An ansi-sql-friendly solution with a single query would look like
SELECT x.rnd, `v`
FROM yourTable y
INNER JOIN (SELECT RAND()*100 rnd) x
WHERE x.rnd BETWEEN y.`a` AND y.`b`;
it generates a random value just once and then is used in a joined query.
Demo: http://rextester.com/YOIW49684
The query and the base table are kindly borrowed from Tim Biegeleisen
If you want the same random value to be applied to every row of the query, then one option is to use a session variable:
SET #rnd = RAND()*100;
SELECT v
FROM foo
WHERE ROUND(#rnd) BETWEEN a AND b;
Demo

Custom number sequence formatting

The system I am working with has a numbering system where the numbers 0-999 are represented by the usual 0-999, but 1000 is represented by A00, followed by A01, A02, A03, etc, 1100 being B00 etc.
I can't think of a way to handle this in T-SQL without resorting to inspecting individual digits with huge case statements, and there must be a better way than that. I had thought about using Hexadecimal but that's not right.
DECLARE #startint int = 1,
#endint int = 9999;
;WITH numbers(num)
AS
(
SELECT #startint AS num
UNION ALL SELECT num+1 FROM numbers
WHERE num+1 <= #endint
)
SELECT num, convert(varbinary(8), num) FROM [numbers] N
OPTION(MAXRECURSION 0)
With this 999 is now 3E7, where it should just be 999.
This currently produces this:
Number Sequence
0 0x00000000
1 0x00000001
...
10 0x0000000A
...
100 0x00000064
...
999 0x000003E7
1000 0x000003E8
What I'm looking for:
Number Sequence
0 000
1 001
...
10 010
11 011
12 012
...
999 999
1000 A00
1001 A01
...
1099 A99
1100 B00
1101 B01
1200 C00
I need this to work in SQL Server 2008.
You can use integer division and modulo to separate the hundreds part from the tens.
After that, you can add 64 to the quotient to get an ASCII value starting from A.
create function function dbo.fn_NumToThreeLetters(#num integer)
RETURNS nchar(3)
AS
begin
RETURN (SELECT (case
when #num/1000 >0 then
CHAR(( (#num-900)/100) +64)
+ replace(cast( #num %100 as nchar(2)),' ','0')
else cast(#num as nvarchar(3))
end)
)
END
select dbo.fn_NumToThreeLetters(1100)
-------
B00
select dbo.fn_NumToThreeLetters(999)
-------
999
The first when clause ensures that the conversion is applied only if a number is above 1000. If it is, subtract 900 then divide by 100, so we get a number that starts from 1 for 1000, 2 for 1100, etc.
Add 64 to it to get an ASCII starting from A and convert it back to a character with CHAR.
The remainder just needs to be converted to a 2-digit nchar, where spaces are replaced with 0.
This will work only up to 3500. The question doesn't specify what should be done with larger numbers

"ORDER BY 'column_name' DESC", but numbers before letters requires two statements?

I want to select data in my database and order it alphabetically but numbers first and then letters.
Sample of my table:
watch_date
474 1442437 2181257 2015-12-20 Modern Family 3 13 Little Bo Bleep
475 1442437 2189485 2015-12-21 Modern Family 3 14 Me? Jealous?
476 1442437 2209159 2015-12-22 Modern Family 3 15 Aunt Mommy
477 1442437 2237415 2015-12-22 Modern Family 3 16 Virgin Territory
478 1442437 2244103 2015-12-22 Modern Family 3 17 Leap Day
479 1442437 2239817 2015-12-24 Modern Family 3 18 Send Out the Clowns
480 1442437 2305713 2015-12-24 Modern Family 3 19 Election Day
481 1442437 2305715 2016-01-02 Modern Family 3 20 The Last Walt
482 1442437 2247703 2016-01-03 Modern Family 3 21 Planes, Trains and Cars
483 1553656 2656658 unknown Under the Dome 1 1 Pilot
484 1553656 2779640 unknown Under the Dome 1 2 The Fire
485 1553656 2821206 unknown Under the Dome 1 3 Manhunt
I want to order it using the fourth column (the column with dates and the text 'unknown' (it's called 'watch_date'))
The data in this column is either the text unknown or a date with the format YYYY-MM-DD
To do this I use the following query:
SELECT * FROM `watched_episodes` ORDER BY `watch_date` + 0 DESC, `watch_date` DESC
I thought the + 0 part added a 0 to all results, yet when I do not add the second ORDER BY statement:
`watch_date` DESC
it does put the numbers (dates) before the letters but it does not seem to sort the dates at all.
Hmmm, your query is simply wrong. It should be giving an error because of the use of single quotes. I'll assume you intend:
SELECT *
FROM watched_episodes
ORDER BY watch_date + 0 DESC, watch_date DESC;
This seems like a strange method. If the values is always 'Unknown' or a string in YYYY-MM-DD format, I would go for:
ORDER BY (watch_date = 'unknown'), -- USE `DESC` to put `unknown` first
watch_date DESC
Use "union"
SELECT * FROM (
SELECT * FROM `watched_episodes` WHERE `watch_date` != 'unknown'
ORDER BY `watch_date` DESC
) table_alias
UNION SELECT * FROM `watched_episodes` WHERE `watch_date` = 'unknown';
You can order by whatever you want in subquery and still have the "unknown" rows after the actual date
Tip use "date" as column type of date format - so you can use date function without converting and just let the unknown be null.

SQL Order By but invert for one element

I've got an SQL query which results upto the following
Code int1 int2 int3 S
C12 21 22 14 1
C33 43 56 2 3
C34 23 2 1 3
C55 33 92 12 5
CB56 45 66 10 5
MA10 10 11 12 1
This is the result of using OrderBy on Code
However I do not want it to order according to alphabets
But by the number after it for ex 1 in M1 and 33 in C33
In some cases the number after the alphabet may be 3 digits like E344
What I want it to look like
Code int1 int2 int3 S
MA10 10 11 12 1
C12 21 22 14 1
C323 43 56 2 3
C325 43 56 2 3
C34 23 2 1 3
C525 33 92 12 5
CB56 45 66 10 5
What I need is
M Should always show on top if Present
Then Sort According to Number on first place
Then Sort it according to the number on the Second place
Then sort it according to the number on the third place
The field 's' will always consist of the first digit from the Code
step 1 split up the column into 2 columns, 1 containing the letters, the other containing the numbers:
SELECT
substring(code,0,PATINDEX('%[0-9]%', Code)) as letters,
substring(code,PATINDEX('%[0-9]%', Code)) as numbers,
fields
FROM table
step 2 Convert the numbers to integer and sort
CONVERT(substring(code,PATINDEX('%[0-9]%', Code)),UNSIGNED INTEGER) as numbers
step 3 sort
Order by field1 asc, field2 desc... etc
It might be easier to use a subquery:
select * from
(SELECT
substring(code,0,PATINDEX('%[0-9]%', Code)) as letters,
CONVERT(substring(code,PATINDEX('%[0-9]%', Code)),UNSIGNED INTEGER) as numbers,
fields
FROM table) T
order by numbers asc, letters desc
This is conceptually simple: you want to order first by whether the code starts or not with the letter 'M', and then by the numeric portion of the code. You say in a comment:
extracting the first digit then order and then extracting the second
digit and then order and then extracting the third digit and then
order
This is exactly how alphabetical order has always worked. You order by first character; when it is the same you order by the second character, etc. so you need no special treatment for that case. Just get the numeric part of the code as a string, then order by it.
At this point, the only problem left to resolve is how to extract the numeric part of the code. That would be easy with PATINDEX() (as Alfons pointed out) but unfortunately MySQL does not support PATINDEX() as far as I know.
Now, what follows is extremely ugly, but it does work. Basically we get the non-numeric part of the string by removing all the numeric characters from it, then use the length of the non-numeric part to extract the numeric part.
SELECT mytable.* FROM mytable
INNER JOIN (
SELECT
code,
RIGHT(code, LENGTH(letters)) numbers
FROM (
SELECT
code,
REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
code, 0, ''), 1, ''), 2, ''), 3, ''), 4, '')
, 5, ''), 6, ''), 7, ''), 8, ''), 9, '') letters
FROM mytable
) letters_table
) numbers_table
ON numbers_table.code = mytable.code
ORDER BY (numbers_table.code like 'M%') DESC,
numbers_table.numbers ASC
This solution is probably inefficient. However, I don't think you can get any acceptable efficiency anyway unless you store the numeric part on a separate column that you can index.
As you are guaranteeing that s is the value of the first digit in the code, this can be used to find the start of the code and from there, get the numeric part of the code. As you want to have all the codes starting with 'M' first, this results in the following ORDER BY clause:
... ORDER BY IF(SUBSTR(code, 1, 1) = 'M', 0, 1), SUBSTR(code, LOCATE(s, code))