I'm looking for an elegant way (in terms of syntax, not necessarily efficient) to get the frequency distribution of a decimal range.
For example, I have a table with ratings column which can be a negative or positive. I want to get the frequency of rows with a rating of certain range.
- ...
- [-140.00 to -130.00): 5
- [-130.00 to -120.00): 2
- [-120.00 to -110.00): 1
- ...
- [120.00 to 130.00): 17
- and so on.
[i to j) means i inclusive to j exclusive.
Thanks in advance.
You could get pretty close using 'select floor(rating / 10), count(*) from (table) group by 1'
I was thinking of seomthing that could do many levels like
DELIMITER $$
CREATE PROCEDURE populate_stats()
BEGIN
DECLARE range_loop INT Default 500 ;
simple_loop: LOOP
SET the_next = range_loop - 10;
Select sum(case when range between range_loop and the_next then 1 else 0 end) from table,
IF the_next=-500 THEN
LEAVE simple_loop;
END IF;
END LOOP simple_loop;
END $$
usage: call populate_stats();
Would handle 100 ranges from 500-490, 490-480, ... -480 - -490, -490 - -500
assuming a finite number of ranges.
Select
sum(case when val between -140 to -130 then 1 else 0 end) as sum-140_to_-130,
sum(Case when val between -130 to -120 then 1 else 0 end) as sum-130_to_-140,
...
FROM table
and if not, you could use dynamic SQL to generate the select allowing a number of ranges however you may run into a column limitation.
Just put your desired ranges into a table, and use that to discriminate the values.
-- SET search_path='tmp';
DROP TABLE measurements;
CREATE TABLE measurements
( zval INTEGER NOT NULL PRIMARY KEY
);
INSERT INTO measurements (zval)
SELECT generate_series(1,1000);
DELETE FROM measurements WHERE random() < 0.20 ;
DROP TABLE ranges;
CREATE TABLE ranges
( zmin INTEGER NOT NULL PRIMARY KEY
, zmax INTEGER NOT NULL
);
INSERT INTO ranges(zmin,zmax) VALUES
(0, 100), (100, 200), (200, 300), (300, 400), (400, 500),
(500, 600), (600, 700), (700, 800), (800, 900), (900, 1000)
;
SELECT ra.zmin,ra.zmax
, COUNT(*) AS zcount
FROM ranges ra
JOIN measurements me
ON me.zval >= ra.zmin AND me.zval < ra.zmax
GROUP BY ra.zmin,ra.zmax
ORDER BY ra.zmin
;
Results:
zmin | zmax | zcount
------+------+--------
0 | 100 | 89
100 | 200 | 76
200 | 300 | 76
300 | 400 | 74
400 | 500 | 86
500 | 600 | 78
600 | 700 | 75
700 | 800 | 75
800 | 900 | 80
900 | 1000 | 82
(10 rows)
I'm trying to calculate various time period returns (monthly, quarterly, yearly etc.) for each unique member (identified by Code in the example below) of a data set. The data set will contain monthly pricing information for a 20 year period for approximately 500 stocks. An example of the data is below:
Date Code Price Dividend
1 2005-01-31 xyz 1000.00 20.0
2 2005-01-31 abc 1.00 0.1
3 2005-02-28 xyz 1030.00 20.0
4 2005-02-28 abc 1.01 0.1
5 2005-03-31 xyz 1071.20 20.0
6 2005-03-31 abc 1.03 0.1
7 2005-04-30 xyz 1124.76 20.0
I am fairly new to R, but thought that there would be a more efficient solution than looping through each Code and then each Date as shown here:
uniqueDates <- unique(data$Date)
uniqueCodes <- unique(data$Code
for (date in uniqueDates) {
for (code in uniqueCodes) {
nextDate <- seq.Date(from=stock_data$Date[i], by="3 months",length.out=2)[2]
curPrice <- data$Price[data$Date == date]
futPrice <- data$Price[data$Date == nextDate]
data$ret[(data$Date == date) & (data$Code == code)] <- (futPrice/curPrice)-1
}
}
This method in itself has an issue in that seq.Date does not always return the final day in the month.
Unfortunately the data is not uniform (the number of companies/codes varies over time) so using a simple row offset won't work. The calculation must match the Code and Date with the desired date offset.
I had initially tried selecting the future dates by using the seq.Date function
data$ret = (data[(data$Date == (seq.Date(from = data$Date, by="3 month", length.out=2)[2])), "Price"] / data$Price) - 1
But this generated an error as seq.Date requires a single entry.
> Error in seq.Date(from = stock_data$Date, by = "3 month", length.out =
> 2) : 'from' must be of length 1
I thought that R would be well suited to this type of calculation but perhaps not. Since all the data is in a mysql database I am now thinking that it might be faster/easier to do this calc directly in the database.
Any suggestions would be greatly appreciated.
Load data:
tc='
Date Code Price Dividend
2005-01-31 xyz 1000.00 20.0
2005-01-31 abc 1.00 0.1
2005-02-28 xyz 1030.00 20.0
2005-02-28 abc 1.01 0.1
2005-03-31 xyz 1071.20 20.0
2005-03-31 abc 1.03 0.1
2005-04-30 xyz 1124.76 20.0'
df = read.table(text=tc,header=T)
df$Date=as.Date(df$Date,"%Y-%m-%d")
First I would organize the data by date:
library(plyr)
pp1=reshape(df,timevar='Code',idvar='Date',direction='wide')
Then you would like to obtain monthly, quarterly, yearly, etc returns.
For that there are several options, one could be:
Make the data zoo or xts class. i.e
library(xts)
pp1[2:ncol(pp1)] = as.xts(pp1[2:ncol(pp1)],order.by=pp1$Date)
#let's create a function for calculating returns.
rets<-function(x,lag=1){
return(diff(log(x),lag))
}
Since this database is monthly, the lags for the returns will be:
monthly=1, quaterly=3, yearly =12. for instance let's calculate monthly return
for xyz.
lagged=1 #for monthly
This calculates Monthly returns for xyz
pp1$returns_xyz= c(NA,rets(pp1$Price.xyz,lagged))
To get all the returns:
#create matrix of returns
pricelist= ls(pp1)[grep('Price',ls(pp1))]
returnsmatrix = data.frame(matrix(rep(0,(nrow(pp1)-1)*length(pricelist)),ncol=length(pricelist)))
j=1
for(i in pricelist){
n = which(names(pp1) == i)
returnsmatrix[,j] = rets(pp1[,n],1)
j=j+1
}
#column names
codename= gsub("Price.", "", pricelist, fixed = TRUE)
names(returnsmatrix)=paste('ret',codename,sep='.')
returnsmatrix
You can do this very easily with the quantmod and xts packages. Using the data in AndresT's answer:
library(quantmod) # loads xts too
pp1 <- reshape(df,timevar='Code',idvar='Date',direction='wide')
# create an xts object
x <- xts(pp1[,-1], pp1[,1])
# only get the "Price.*" columns
p <- getPrice(x)
# run the periodReturn function on each column
r <- apply(p, 2, periodReturn, period="monthly", type="log")
# merge prior result into a multi-column object
r <- do.call(merge, r)
# rename columns
names(r) <- paste("monthly.return",
sapply(strsplit(names(p),"\\."), "[", 2), sep=".")
Which leaves you with an r xts object containing:
monthly.return.xyz monthly.return.abc
2005-01-31 0.00000000 0.000000000
2005-02-28 0.02955880 0.009950331
2005-03-31 0.03922071 0.019608471
2005-04-30 0.04879016 NA
I want to sort the following data items in the order they are presented below (numbers 1-12):
1
2
3
4
5
6
7
8
9
10
11
12
However, my query - using order by xxxxx asc sorts by the first digit above all else:
1
10
11
12
2
3
4
5
6
7
8
9
Any tricks to make it sort more properly?
Further, in the interest of full disclosure, this could be a mix of letters and numbers (although right now it is not), e.g.:
A1
534G
G46A
100B
100A
100JE
etc....
Thanks!
update: people asking for query
select * from table order by name asc
People use different tricks to do this. I Googled and find out some results each follow different tricks. Have a look at them:
Alpha Numeric Sorting in MySQL
Natural Sorting in MySQL
Sorting of numeric values mixed with alphanumeric values
mySQL natural sort
Natural Sort in MySQL
Edit:
I have just added the code of each link for future visitors.
Alpha Numeric Sorting in MySQL
Given input
1A 1a 10A 9B 21C 1C 1D
Expected output
1A 1C 1D 1a 9B 10A 21C
Query
Bin Way
===================================
SELECT
tbl_column,
BIN(tbl_column) AS binray_not_needed_column
FROM db_table
ORDER BY binray_not_needed_column ASC , tbl_column ASC
-----------------------
Cast Way
===================================
SELECT
tbl_column,
CAST(tbl_column as SIGNED) AS casted_column
FROM db_table
ORDER BY casted_column ASC , tbl_column ASC
Natural Sorting in MySQL
Given input
Table: sorting_test
-------------------------- -------------
| alphanumeric VARCHAR(75) | integer INT |
-------------------------- -------------
| test1 | 1 |
| test12 | 2 |
| test13 | 3 |
| test2 | 4 |
| test3 | 5 |
-------------------------- -------------
Expected Output
-------------------------- -------------
| alphanumeric VARCHAR(75) | integer INT |
-------------------------- -------------
| test1 | 1 |
| test2 | 4 |
| test3 | 5 |
| test12 | 2 |
| test13 | 3 |
-------------------------- -------------
Query
SELECT alphanumeric, integer
FROM sorting_test
ORDER BY LENGTH(alphanumeric), alphanumeric
Sorting of numeric values mixed with alphanumeric values
Given input
2a, 12, 5b, 5a, 10, 11, 1, 4b
Expected Output
1, 2a, 4b, 5a, 5b, 10, 11, 12
Query
SELECT version
FROM version_sorting
ORDER BY CAST(version AS UNSIGNED), version;
Just do this:
SELECT * FROM table ORDER BY column `name`+0 ASC
Appending the +0 will mean that:
0,
10,
11,
2,
3,
4
becomes :
0,
2,
3,
4,
10,
11
I hate this, but this will work
order by lpad(name, 10, 0) <-- assuming maximum string length is 10
<-- you can adjust to a bigger length if you want to
I know this post is closed but I think my way could help some people. So there it is :
My dataset is very similar but is a bit more complex. It has numbers, alphanumeric data :
1
2
Chair
3
0
4
5
-
Table
10
13
19
Windows
99
102
Dog
I would like to have the '-' symbol at first, then the numbers, then the text.
So I go like this :
SELECT name, (name = '-') boolDash, (name = '0') boolZero, (name+0 > 0) boolNum
FROM table
ORDER BY boolDash DESC, boolZero DESC, boolNum DESC, (name+0), name
The result should be something :
-
0
1
2
3
4
5
10
13
99
102
Chair
Dog
Table
Windows
The whole idea is doing some simple check into the SELECT and sorting with the result.
This works for type of data:
Data1,
Data2, Data3 ......,Data21. Means "Data" String is common in all rows.
For ORDER BY ASC it will sort perfectly, For ORDER BY DESC not suitable.
SELECT * FROM table_name ORDER BY LENGTH(column_name), column_name ASC;
I had some good results with
SELECT alphanumeric, integer FROM sorting_test ORDER BY CAST(alphanumeric AS UNSIGNED), alphanumeric ASC
This type of question has been asked previously.
The type of sorting you are talking about is called "Natural Sorting".
The data on which you want to do sort is alphanumeric.
It would be better to create a new column for sorting.
For further help check
natural-sort-in-mysql
If you need to sort an alpha-numeric column that does not have any standard format whatsoever
SELECT * FROM table ORDER BY (name = '0') DESC, (name+0 > 0) DESC, name+0 ASC, name ASC
You can adapt this solution to include support for non-alphanumeric characters if desired using additional logic.
This should sort alphanumeric field like:
1/ Number only, order by 1,2,3,4,5,6,7,8,9,10,11 etc...
2/ Then field with text like: 1foo, 2bar, aaa11aa, aaa22aa, b5452 etc...
SELECT MyField
FROM MyTable
order by
IF( MyField REGEXP '^-?[0-9]+$' = 0,
9999999999 ,
CAST(MyField AS DECIMAL)
), MyField
The query check if the data is a number, if not put it to 9999999999 , then order first on this column, then order on data with text
Good luck!
Instead of trying to write some function and slow down the SELECT query, I thought of another way of doing this...
Create an extra field in your database that holds the result from the following Class and when you insert a new row, run the field value that will be naturally sorted through this class and save its result in the extra field. Then instead of sorting by your original field, sort by the extra field.
String nsFieldVal = new NaturalSortString(getFieldValue(), 4).toString()
The above means:
- Create a NaturalSortString for the String returned from getFieldValue()
- Allow up to 4 bytes to store each character or number (4 bytes = ffff = 65535)
| field(32) | nsfield(161) |
a1 300610001
String sortString = new NaturalSortString(getString(), 4).toString()
import StringUtils;
/**
* Creates a string that allows natural sorting in a SQL database
* eg, 0 1 1a 2 3 3a 10 100 a a1 a1a1 b
*/
public class NaturalSortString {
private String inStr;
private int byteSize;
private StringBuilder out = new StringBuilder();
/**
* A byte stores the hex value (0 to f) of a letter or number.
* Since a letter is two bytes, the minimum byteSize is 2.
*
* 2 bytes = 00 - ff (max number is 255)
* 3 bytes = 000 - fff (max number is 4095)
* 4 bytes = 0000 - ffff (max number is 65535)
*
* For example:
* dog123 = 64,6F,67,7B and thus byteSize >= 2.
* dog280 = 64,6F,67,118 and thus byteSize >= 3.
*
* For example:
* The String, "There are 1000000 spots on a dalmatian" would require a byteSize that can
* store the number '1000000' which in hex is 'f4240' and thus the byteSize must be at least 5
*
* The dbColumn size to store the NaturalSortString is calculated as:
* > originalStringColumnSize x byteSize + 1
* The extra '1' is a marker for String type - Letter, Number, Symbol
* Thus, if the originalStringColumn is varchar(32) and the byteSize is 5:
* > NaturalSortStringColumnSize = 32 x 5 + 1 = varchar(161)
*
* The byteSize must be the same for all NaturalSortStrings created in the same table.
* If you need to change the byteSize (for instance, to accommodate larger numbers), you will
* need to recalculate the NaturalSortString for each existing row using the new byteSize.
*
* #param str String to create a natural sort string from
* #param byteSize Per character storage byte size (minimum 2)
* #throws Exception See the error description thrown
*/
public NaturalSortString(String str, int byteSize) throws Exception {
if (str == null || str.isEmpty()) return;
this.inStr = str;
this.byteSize = Math.max(2, byteSize); // minimum of 2 bytes to hold a character
setStringType();
iterateString();
}
private void setStringType() {
char firstchar = inStr.toLowerCase().subSequence(0, 1).charAt(0);
if (Character.isLetter(firstchar)) // letters third
out.append(3);
else if (Character.isDigit(firstchar)) // numbers second
out.append(2);
else // non-alphanumeric first
out.append(1);
}
private void iterateString() throws Exception {
StringBuilder n = new StringBuilder();
for (char c : inStr.toLowerCase().toCharArray()) { // lowercase for CASE INSENSITIVE sorting
if (Character.isDigit(c)) {
// group numbers
n.append(c);
continue;
}
if (n.length() > 0) {
addInteger(n.toString());
n = new StringBuilder();
}
addCharacter(c);
}
if (n.length() > 0) {
addInteger(n.toString());
}
}
private void addInteger(String s) throws Exception {
int i = Integer.parseInt(s);
if (i >= (Math.pow(16, byteSize)))
throw new Exception("naturalsort_bytesize_exceeded");
out.append(StringUtils.padLeft(Integer.toHexString(i), byteSize));
}
private void addCharacter(char c) {
//TODO: Add rest of accented characters
if (c >= 224 && c <= 229) // set accented a to a
c = 'a';
else if (c >= 232 && c <= 235) // set accented e to e
c = 'e';
else if (c >= 236 && c <= 239) // set accented i to i
c = 'i';
else if (c >= 242 && c <= 246) // set accented o to o
c = 'o';
else if (c >= 249 && c <= 252) // set accented u to u
c = 'u';
else if (c >= 253 && c <= 255) // set accented y to y
c = 'y';
out.append(StringUtils.padLeft(Integer.toHexString(c), byteSize));
}
#Override
public String toString() {
return out.toString();
}
}
For completeness, below is the StringUtils.padLeft method:
public static String padLeft(String s, int n) {
if (n - s.length() == 0) return s;
return String.format("%0" + (n - s.length()) + "d%s", 0, s);
}
The result should come out like the following
-1
-a
0
1
1.0
1.01
1.1.1
1a
1b
9
10
10a
10ab
11
12
12abcd
100
a
a1a1
a1a2
a-1
a-2
áviacion
b
c1
c2
c12
c100
d
d1.1.1
e
MySQL ORDER BY Sorting alphanumeric on correct order
example:
SELECT `alphanumericCol` FROM `tableName` ORDER BY
SUBSTR(`alphanumericCol` FROM 1 FOR 1),
LPAD(lower(`alphanumericCol`), 10,0) ASC
output:
1
2
11
21
100
101
102
104
S-104A
S-105
S-107
S-111
This is from tutorials point
SELECT * FROM yourTableName ORDER BY
SUBSTR(yourColumnName FROM 1 FOR 2),
CAST(SUBSTR(yourColumnName FROM 2) AS UNSIGNED);
it is slightly different from another answer of this thread
For reference, this is the original link
https://www.tutorialspoint.com/mysql-order-by-string-with-numbers
Another point regarding UNSIGNED is written here
https://electrictoolbox.com/mysql-order-string-as-int/
While this has REGEX too
https://www.sitepoint.com/community/t/how-to-sort-text-with-numbers-with-sql/346088/9
SELECT length(actual_project_name),actual_project_name,
SUBSTRING_INDEX(actual_project_name,'-',1) as aaaaaa,
SUBSTRING_INDEX(actual_project_name, '-', -1) as actual_project_number,
concat(SUBSTRING_INDEX(actual_project_name,'-',1),SUBSTRING_INDEX(actual_project_name, '-', -1)) as a
FROM ctts.test22
order by
SUBSTRING_INDEX(actual_project_name,'-',1) asc,cast(SUBSTRING_INDEX(actual_project_name, '-', -1) as unsigned) asc
This is a simple example.
SELECT HEX(some_col) h
FROM some_table
ORDER BY h
order by len(xxxxx),xxxxx
Eg:
SELECT * from customer order by len(xxxxx),xxxxx
Try this For ORDER BY DESC
SELECT * FROM testdata ORDER BY LENGHT(name) DESC, name DESC
SELECT
s.id, s.name, LENGTH(s.name) len, ASCII(s.name) ASCCCI
FROM table_name s
ORDER BY ASCCCI,len,NAME ASC;
Assuming varchar field containing number, decimal, alphanumeric and string, for example :
Let's suppose Column Name is "RandomValues" and Table name is "SortingTest"
A1
120
2.23
3
0
2
Apple
Zebra
Banana
23
86.Akjf9
Abtuo332
66.9
22
ABC
SELECT * FROM SortingTest order by IF( RandomValues REGEXP '^-?[0-9,.]+$' = 0,
9999999999 ,
CAST(RandomValues AS DECIMAL)
), RandomValues
Above query will do sorting on number & decimal values first and after that all alphanumeric values got sorted.
This will always put the values starting with a number first:
ORDER BY my_column REGEXP '^[0-9]' DESC, length(my_column + 0), my_column ";
Works as follows:
Step1 - Is first char a digit? 1 if true, 0 if false, so order by this DESC
Step2 - How many digits is the number? Order by this ASC
Step3 - Order by the field itself
Input:
('100'),
('1'),
('10'),
('0'),
('2'),
('2a'),
('12sdfa'),
('12 sdfa'),
('Bar nah');
Output:
0
1
2
2a
10
12 sdfa
12sdfa
100
Bar nah
Really problematic for my scenario...
select * from table order by lpad(column, 20, 0)
My column is a varchar, but has numeric input (1, 2, 3...) , mixed numeric (1A, 1B, 1C) and too string data (INT, SHIP)