Sorting strings with numbers and text in Rails - mysql

In my database I have table with a name column containing grades, like 1. grade, 2. grade, and so on. When the numbers have reached 10 or more, the sorting doesn't work as I would like, as 10. grade comes before 2. grade in the sorted recordset. I know this is because string sorting is different from integer sorting. The question is how to sort these strings in a numeric way.
Because the grade-records are a part of a tree buildt with the ancestry plugin, I have to put the whole sorting code inside :order => "(some code that sorts the results)".
I have tried :order => "CAST(SUBSTRING_INDEX(name, '.') AS SIGNED)". But this doesn't work.
I use SQLite in my development environment and MySQL in the production environment.

try this:
replace the constant vale '. grade' of your column with empty string, then you get the numeric value. cast the same to int
order by cast(replace(name,'. grade','') as int)
EDIT:
as per your comment if its not 'grade' always, then try
order by cast(left(name,LOCATE('.',name,1)-1) as UNSIGNED)
SQL fiddle demo

Related

MySQL Query conditional find nth element in column string

I have a MySQL table setup where one column's values are a string of comma-separated True/False values (1s or 0s). For example, in the column, one field's value may be "0,1,0,0,0,0,1,1,0" and another may be "1,0,0,1,1,1,0,0,0" (note: these are NOT 9 separate columns, but a string in one column). I need to QUERY the MySQL table for elements that are "true"(1) for the "nth element" of that column's value/string.
So, if I was looking for rows, with a specific column, where the 3rd element of the column's value was 1, it would produce a list of results. So, in this case, I would only be searching for "1" in the fth place (12345 = X,X,X...) of the string (X,X,1,X,X,X,X,X,X,X). How can I query this?
This is a crude example of what I am trying to do ...
"SELECT tfcolumn FROM mytable WHERE substr({tfcolumn}, 0, 5)=1"
{tfcolumn} represents the column value
5 represents the 5th position of the string
=1 represents what I need that position to equal to.
Please help. Thanks
You can't. Once you put a serialized data type into a column in SQL (like comma separated lists, or JSON objects) you are preventing yourself from performing any query on the data in those columns. You have to pull the data in a different way and then use a program like python, VB, etc to get the comma separated values you are looking for.
Unless you want to deal with trying to make this mess of a query work...
I would recommend changing your table structure before it's too late. Although it is possible, it is not optimized in a format that a DBMS recognizes. Because of that the DBMS will spend a significant amount of time going through every record to parse the csv values which is something that it was not meant to be doing. Doing the query in SQL will take as much time (if not more time) than just pulling all the records and searching with a tool that can do it properly.
If the column contains values exactly like the ones you posted, then the Nth element is at the 2 * N - 1 position in the comma separated list.
So do this:
SELECT tfcolumn
FROM tablename
WHERE substr(tfcolumn, 2 * 5 - 1, 1) = '1'
Replace 5 with the index that you search for.
See the demo.
Or remove all commas and get the Nth char:
SELECT tfcolumn
FROM tablename
WHERE substr(replace(tfcolumn, ',', ''), 5, 1) = '1'
See the demo.
Try this
if substring_index(substring_index('0,1,0,0,0,0,1,1,0',',',3),',',-1)='1'
The first argument can be your column name. The second argument (',') tells the function that the string is comma-separated. The third argument takes the first 3 elements of the string. So, the output of inner substring_index is '0,1,0'.
The outer substring_index has -1 as the last argment. So, it starts counting in reverse direction & takes only 1 element starting from right.
For example, if the value in a particular row is '2,682,7003,14,185', then the value of substring_index(substring_index('2,682,7003,14,185',',',3),',',-1) is '7003'.

Select statement returns data although given value in the where clause is false

I have a table on my MySQL db named membertable. The table consists of two fields which are memberid and membername. The memberid field has the type of integer and uses auto_increment function starting from 2001. The membername table has the type of varchar.
The membertable has two records with the same order as described above. The records look like this :
memberid : 2001
membername : john smith
memberid : 2002
membername : will smith
I found something weird when I ran a SELECT statement against the memberid field. Running the following statement :
SELECT * FROM `membertable` WHERE `memberid` = '2001somecharacter'
It returned the first data.
Why did that happen? There's no record with memberid = 2001somecharacter. It looks like MySQL only search the first 4 character (2001) and when It's found related data, which is the returned data above, it denies the remaining characters.
How could this happen? And is there any way to turn off this behavior?
--
membertable uses innodb engine
This happens because mysql tries to convert "2001somecharacter" into a number which returns 2001.
Since you're comparing a number to a string, you should use
SELECT * FROM `membertable` WHERE CONVERT(`memberid`,CHAR) = '2001somecharacter';
to avoid this behavior.
OR to do it properly, is NOT put your search variable in quotes so that it has to be a number otherwise it'll blow up because of syntax error and then in front end making sure it's a number before passing in the query.
sqlfiddle
Your finding is an expexted MySQL behaviour.
MySQL converts a varchar to an integer starting from the beginning. As long as there are numeric characters wich can easily be converted, they are icluded in the conversion process. If there's a letter, the conversion stops returning the integer value of the numeric string read so far...
Here's some description of this behavior on the MySQL documentation Site. Unfortunately, it's not mentioned directly in the text, but there's an example which exactly shows this behaviour.
MySQL is very liberal in converting string values to numeric values when evaluated in numeric context.
As a demonstration, adding 0 causes the string to evaluated in a numeric context:
SELECT '2001foo' + 0 --> 2001
, '01.2-3E' + 0 --> 1.2
, 'abc567g' + 0 --> 0
When a string is evaluated in a numeric context, MySQL reads the string character by character, until it encounters a character where the string can no longer be interpreted as a numeric value, or until it reaches the end of the string.
I don't know of a way to "turn off" or disable this behavior. (There may be a setting of sql_mode that changes this behavior, but likely that change will impact other SQL statements that are working, which may stop working if that change is made.
Typically, this kind of check of the arguments is done in the application.
But if you need to do this in the SELECT statement, one option would be cast/convert the column as a character string, and then do the comparison.
But that can have some significant performance consequences. If we do a cast or convert (or any function) on a column that's in a condition in the WHERE clause, MySQL will not be able to use a range scan operation on a suitable index. We're forcing MySQL to perform the cast/convert operation on every row in the table, and compare the result to the literal.
So, that's not the best pattern.
If I needed to perform a check like that within the SQL statement, I would do something like this:
WHERE t.memberid = '2001foo' + 0
AND CAST('2001foo' + 0 AS CHAR) = '2001foo'
The first line is doing the same thing as the current query. And that can take advantage of a suitable index.
The second condition is converting the same value to a numeric, then casting that back to character, and then comparing the result to the original. With the values shown here, it will evaluate to FALSE, and the query will not return any rows.
This will also not return a row if the string value has a leading space, ' 2001'. The second condition is going to evaluate as FALSE.
When comparing an INT to a 'string', the string is converted to a number.
Converting a string to a number takes as many of the leading characters as it can and still be a number. So '2001character' is treated as the number 2001.
If you want non-numeric characters in member_id, make it VARCHAR.
If you want only numeric ids, then reject '200.1character'

Hive: Count non zero characters in substrings of a string

I have a string like the following in the column of a hive external table
<id>^<count>^<distinct_count>|<id>^<count>^<distinct_count>|...
There are two delimiters. | on an entity level and ^ on sub-entity level
I have a metric which is defined by the sum of counts of non-zero distinct_counts or counts, which means given a string I have check whether the distinct count (or the count - I can check either) is non zero and if it mark a flag as 1. Then the metric would be sum(flags). I have to store this metric in an aggregated table in the next step.
Please suggest a way for me to do this in Hive
I think it's not possible. Ended up using an external python mapper for the same.
If you want to count number of non-zero count in a string s, it seems to be solved by
length(
regexp_replace(
regexp_replace(s, "[^^|]*\\^0\\^[^^|]*\\|?", ""),
"[^^|]*\\^[^^|]*\\^[^^|]*\\|?",
"1"
)
)
First regexp_replace removes parts with zero count, second regexp_replace replaces remaining parts with single symbols (it should not necessarily be "1", any symbol would suffice), and hence length returns number of parts with non-zero count.

Selecting and compare one value with values in 2 columns of the DB

I have one value, that represents a zip code.
I need to see which city belongs to this zip code by comparing the given value with the values in two columns of the DB.
This is my _cities table:
city (name of the city, VARCHAR)
zipcode_start (the first zip code available, VARCHAR )
zipcode_end ( the last zip code available, VARCHAR ).
Where I leave zipcodes are numbered in sequence.
So if I have for example: city = Rome, zipcode_start = 00118 and zipcode_end = 00199 and the given zipcode is 00119, how do I get the city Rome from the DB?
00119 in this case is included in the sequence 00118 - 00199, so the DB should return Rome.
I can do this in many ways with PHP, but I am looking for an elegant way to do it directly with a SQL statement.
Is this possible?
Thanks for any help
I'd use BETWEEN, or if you don't like that, you can use >= and <=. And if you prefer joins over where clauses, you can even join on an inequality sometimes.
For example,
select city
from my_cities
where zipcode between zipcode_start and zipcode_end
You have to deal with getting the variable into that statement, but I get the impression that you can handle that part already. Is this what you are trying to do? I sort of doubt it, but that's all I get out of your question.

Sort by hierarchyid in SQL Server

I have a hierarchyid column defined on a table in SQL Server 2008
Let us say that in the first row, the hierarchyid is '/1/7/1/'
Let us say that in the second row, the hierarchyid is '/1/10/1/'
If I sort by hierarchyid ASC , then I will see the second row, and then the first row. (The sorting will be sort by String, and '10'<'7')
However I have (for compatability reasons with a different system) a wish to see the first row first, and then the second row (I.e. sort by int, and 7<10)
I have solved the problem, by defining a second hierarchyid column, and then setting it to be the same as the first hierarchyid column, but replacing all inside slashes with dots, and then doing a sort by this.
I just wondered if there was a more elegant way.
I know this is a fairly old question, but it was first result in Google so thought I'd add an actual answer in case someone else comes across this. Without seeing the SQL being used it's hard to be 100% but I suspect that the OP is returning the hierarchy Id as a string and sorting on that rather than sorting on the hierarchy id itself:
EG..
declare #results table (Id int, Hierarchy hierarchyId)
-- :
-- Add your data here
-- :
-- This will not work as it's ordering a string
select Id, Hierarchy.ToString() as SortOrder from #results order by SortOrder
-- This will work as it's ordering the hierarchy id
select Id, Hierarchy.ToString() as SortOrder from #results order by Hierarchy
you would need to isolate whats between the two "/" and order by it.
you can use this function: http://www.sqlusa.com/bestpractices2005/nthindex/
to get the nth Index on a string, so
declare #aux_str varchar(50)
set #aux_str='/1/7/3/'
select dbo.fnNthIndex(#aux_str,'/',2)
returns 3. Them you have to find out the position of the third "/" and get what's between it.
Its not hard, but its quite a lot of work