I want to parse a column, and get the key-value pair as column
Input:
I have a dataframe (called df) with the following structure:
ID data
A1 {"userMatch": "{"match":{"phone":{"name":{"score":1}},"name":{"score":1}}}"}
A2 {"userMatch": "{"match":{"phone":{"name":{"score":0.934}},"name":{"score":0.952}}}"}
Expected Output:
I wanted to create new column called 'score' and get the value from the key value pair
ID score1 score2
A1 1 1
A2 0.934 0.952
Attempted Solution:
data_json = df['data'].transform(lambda x: json.loads(x))
df['score1'] = data_json.str.get('userMatch').str.get('match').str.get('phone').str.get('name').str.get('score')
df['score2'] = data_json.str.get('userMatch').str.get('match').str.get('phone').str.get('name').str.get('name').str.get('score')
Error:
TypeError: the JSON object must be str, bytes or bytearray, not Series
Notes:
I am not even sure how to get the next score2
Using mu previous though regarding using regex, this is how I would approach your problem:
import re
def getOffset(row, offset):
vals = re.findall(r"[-+]?\d*\.\d+|\d+", row.data['userMatch'])
if len(vals)> offset:
return vals[offset]
return None
df['score1'] = df.apply(lambda row: getOffset(row, 0), axis= 1)
df['score2'] = df.apply(lambda row: getOffset(row, 1), axis = 1)
df.drop(['data'], axis= 1, inplace=True)
This yields a dataframe of the form:
ID score1 score2
0 A1 1 1
1 A2 0.934 0.952
This isn't pretty, but works with split(). Couldn't get a dictionary to be read, kept getting invalid syntax or missing delimiter.
df = pd.read_csv(io.StringIO('''ID data
A1 {"userMatch": "{"match":{"phone":{"name":{"score":1}},"name":{"score":1}}}"}
A2 {"userMatch": "{"match":{"phone":{"name":{"score":0.934}},"name":{"score":0.952}}}"}'''), sep=' ', engine='python')
df['score1'] = df['data'].apply(lambda x: x.split('{"userMatch": "{"match":{"phone":{"name":{"score":')[1].split('}', 1)[0])
df['score2'] = df['data'].apply(lambda x: x.split('{"userMatch": "{"match":{"phone":{"name":{"score":')[1].split(',"name":{"score":')[1].split('}', 1)[0])
Output:
ID data score1 score2
0 A1 {"userMatch": "{"match":{"phone":{"name":{"score":1}},"name":{"score":1}}}"} 1 1
1 A2 {"userMatch": "{"match":{"phone":{"name":{"score":0.934}},"name":{"score":0.952}}}"} 0.934 0.952
I have a data frame with a column called identifiers which contains product identifiers data as a string which is a list of dictionaries.
test_data <- data.frame(
identifiers = c(
"[{\"type\":\"ISBN\",\"value\":\"9781231027073\"}]",
"[{\"type\":\"EAN\",\"value\":\"5055266202847\"},{\"type\":\"EAN\",\"value\":\"4053162095984\"}]"),
id = c(1,2), stringsAsFactors = FALSE)
> test_data
identifiers id
1 [{"type":"ISBN","value":"9781231027073"}] 1
2 [{"type":"EAN","value":"5055266202847"},{"type":"EAN","value":"4053162095984"}] 2
What I would like to achieve is:
output_test_data <- data.frame(
type = c("ISBN", "EAN", "EAN"),
value = c("9781231027073","5055266202847","4053162095984"),
id = c(1,2,2), stringsAsFactors = FALSE)
> output_test_data
type value id
1 ISBN 9781231027073 1
2 EAN 5055266202847 2
3 EAN 4053162095984 2
The closest I got to the solution is to apply the fomJSON function from jsonlite.
jsonlite::fromJSON(test_data$identifiers[1])
or with a loop like this:
for (i in test_data$identifiers) {
print(jsonlite::fromJSON(i))
}
However I am struggling to:
1) get it applied to all rows.
2) preserve the information about id, from original data into the results.
Could anyone help with this?
You could do this:
df_result <- apply(test_data,1,function(x){
id_tmp <- x[2]
df_out <- jsonlite::fromJSON(x[1])
df_out$id <- id_tmp
return(df_out)
})
df_result <- do.call("rbind",df_result)
I have a Data Frame which has 3 columns like this:
---------------------------------------------
| x(string) | date(date) | value(int) |
---------------------------------------------
I want to SELECT all the the rows [i] that satisfy all 4 conditions:
1) row [i] and row [i - 1] have the same value in column 'x'
AND
2) 'date' at row [i] == 'date' at row [i - 1] + 1 (two consecutive days)
AND
3) 'value' at row [i] > 5
AND
4) 'value' at row [i - 1] <= 5
I think maybe I need a For loop, but don't know how exactly! Please help me!
Every help is much appreciated!
It can be very easily done with Window functions, look at lag function:
import org.apache.spark.sql.types._
import org.apache.spark.sql._
import sqlContext.implicits._
import org.apache.spark.sql.functions._
import org.apache.spark.sql.expressions._
// test data
val list = Seq(
("x", "2016-12-13", 1),
("x", "2016-12-14", 7)
);
val df = sc.parallelize(list).toDF("x", "date", "value");
// add lags - so read previous value from dataset
val withPrevs = df
.withColumn ("prevX", lag('x, 1).over(Window.orderBy($"date")))
.withColumn ("prevDate", lag('date, 1).over(Window.orderBy($"date")))
.withColumn ("prevValue", lag('value, 1).over(Window.orderBy($"date")))
// filter values and select only needed fields
withPrevs
.where('x === 'prevX)
.where('value > lit(5))
.where('prevValue < lit(5))
.where('date === date_add('prevDate, 1))
.select('x, 'date, 'value)
.show()
Note that without order, i.e. by date, this cannot be done. Dataset has none meaningful order, you must specify order explicity
If you have a DataFrame created, then all you need to do is to call a filter function on DataFrame will all your conditions.
For example:
df1.filter($"Column1" === 2 || $"Column2" === 3)
You can pass as many conditions as you want. It will return you a new DataFrame with filtered data.
I want to sort the following data items in the order they are presented below (numbers 1-12):
1
2
3
4
5
6
7
8
9
10
11
12
However, my query - using order by xxxxx asc sorts by the first digit above all else:
1
10
11
12
2
3
4
5
6
7
8
9
Any tricks to make it sort more properly?
Further, in the interest of full disclosure, this could be a mix of letters and numbers (although right now it is not), e.g.:
A1
534G
G46A
100B
100A
100JE
etc....
Thanks!
update: people asking for query
select * from table order by name asc
People use different tricks to do this. I Googled and find out some results each follow different tricks. Have a look at them:
Alpha Numeric Sorting in MySQL
Natural Sorting in MySQL
Sorting of numeric values mixed with alphanumeric values
mySQL natural sort
Natural Sort in MySQL
Edit:
I have just added the code of each link for future visitors.
Alpha Numeric Sorting in MySQL
Given input
1A 1a 10A 9B 21C 1C 1D
Expected output
1A 1C 1D 1a 9B 10A 21C
Query
Bin Way
===================================
SELECT
tbl_column,
BIN(tbl_column) AS binray_not_needed_column
FROM db_table
ORDER BY binray_not_needed_column ASC , tbl_column ASC
-----------------------
Cast Way
===================================
SELECT
tbl_column,
CAST(tbl_column as SIGNED) AS casted_column
FROM db_table
ORDER BY casted_column ASC , tbl_column ASC
Natural Sorting in MySQL
Given input
Table: sorting_test
-------------------------- -------------
| alphanumeric VARCHAR(75) | integer INT |
-------------------------- -------------
| test1 | 1 |
| test12 | 2 |
| test13 | 3 |
| test2 | 4 |
| test3 | 5 |
-------------------------- -------------
Expected Output
-------------------------- -------------
| alphanumeric VARCHAR(75) | integer INT |
-------------------------- -------------
| test1 | 1 |
| test2 | 4 |
| test3 | 5 |
| test12 | 2 |
| test13 | 3 |
-------------------------- -------------
Query
SELECT alphanumeric, integer
FROM sorting_test
ORDER BY LENGTH(alphanumeric), alphanumeric
Sorting of numeric values mixed with alphanumeric values
Given input
2a, 12, 5b, 5a, 10, 11, 1, 4b
Expected Output
1, 2a, 4b, 5a, 5b, 10, 11, 12
Query
SELECT version
FROM version_sorting
ORDER BY CAST(version AS UNSIGNED), version;
Just do this:
SELECT * FROM table ORDER BY column `name`+0 ASC
Appending the +0 will mean that:
0,
10,
11,
2,
3,
4
becomes :
0,
2,
3,
4,
10,
11
I hate this, but this will work
order by lpad(name, 10, 0) <-- assuming maximum string length is 10
<-- you can adjust to a bigger length if you want to
I know this post is closed but I think my way could help some people. So there it is :
My dataset is very similar but is a bit more complex. It has numbers, alphanumeric data :
1
2
Chair
3
0
4
5
-
Table
10
13
19
Windows
99
102
Dog
I would like to have the '-' symbol at first, then the numbers, then the text.
So I go like this :
SELECT name, (name = '-') boolDash, (name = '0') boolZero, (name+0 > 0) boolNum
FROM table
ORDER BY boolDash DESC, boolZero DESC, boolNum DESC, (name+0), name
The result should be something :
-
0
1
2
3
4
5
10
13
99
102
Chair
Dog
Table
Windows
The whole idea is doing some simple check into the SELECT and sorting with the result.
This works for type of data:
Data1,
Data2, Data3 ......,Data21. Means "Data" String is common in all rows.
For ORDER BY ASC it will sort perfectly, For ORDER BY DESC not suitable.
SELECT * FROM table_name ORDER BY LENGTH(column_name), column_name ASC;
I had some good results with
SELECT alphanumeric, integer FROM sorting_test ORDER BY CAST(alphanumeric AS UNSIGNED), alphanumeric ASC
This type of question has been asked previously.
The type of sorting you are talking about is called "Natural Sorting".
The data on which you want to do sort is alphanumeric.
It would be better to create a new column for sorting.
For further help check
natural-sort-in-mysql
If you need to sort an alpha-numeric column that does not have any standard format whatsoever
SELECT * FROM table ORDER BY (name = '0') DESC, (name+0 > 0) DESC, name+0 ASC, name ASC
You can adapt this solution to include support for non-alphanumeric characters if desired using additional logic.
This should sort alphanumeric field like:
1/ Number only, order by 1,2,3,4,5,6,7,8,9,10,11 etc...
2/ Then field with text like: 1foo, 2bar, aaa11aa, aaa22aa, b5452 etc...
SELECT MyField
FROM MyTable
order by
IF( MyField REGEXP '^-?[0-9]+$' = 0,
9999999999 ,
CAST(MyField AS DECIMAL)
), MyField
The query check if the data is a number, if not put it to 9999999999 , then order first on this column, then order on data with text
Good luck!
Instead of trying to write some function and slow down the SELECT query, I thought of another way of doing this...
Create an extra field in your database that holds the result from the following Class and when you insert a new row, run the field value that will be naturally sorted through this class and save its result in the extra field. Then instead of sorting by your original field, sort by the extra field.
String nsFieldVal = new NaturalSortString(getFieldValue(), 4).toString()
The above means:
- Create a NaturalSortString for the String returned from getFieldValue()
- Allow up to 4 bytes to store each character or number (4 bytes = ffff = 65535)
| field(32) | nsfield(161) |
a1 300610001
String sortString = new NaturalSortString(getString(), 4).toString()
import StringUtils;
/**
* Creates a string that allows natural sorting in a SQL database
* eg, 0 1 1a 2 3 3a 10 100 a a1 a1a1 b
*/
public class NaturalSortString {
private String inStr;
private int byteSize;
private StringBuilder out = new StringBuilder();
/**
* A byte stores the hex value (0 to f) of a letter or number.
* Since a letter is two bytes, the minimum byteSize is 2.
*
* 2 bytes = 00 - ff (max number is 255)
* 3 bytes = 000 - fff (max number is 4095)
* 4 bytes = 0000 - ffff (max number is 65535)
*
* For example:
* dog123 = 64,6F,67,7B and thus byteSize >= 2.
* dog280 = 64,6F,67,118 and thus byteSize >= 3.
*
* For example:
* The String, "There are 1000000 spots on a dalmatian" would require a byteSize that can
* store the number '1000000' which in hex is 'f4240' and thus the byteSize must be at least 5
*
* The dbColumn size to store the NaturalSortString is calculated as:
* > originalStringColumnSize x byteSize + 1
* The extra '1' is a marker for String type - Letter, Number, Symbol
* Thus, if the originalStringColumn is varchar(32) and the byteSize is 5:
* > NaturalSortStringColumnSize = 32 x 5 + 1 = varchar(161)
*
* The byteSize must be the same for all NaturalSortStrings created in the same table.
* If you need to change the byteSize (for instance, to accommodate larger numbers), you will
* need to recalculate the NaturalSortString for each existing row using the new byteSize.
*
* #param str String to create a natural sort string from
* #param byteSize Per character storage byte size (minimum 2)
* #throws Exception See the error description thrown
*/
public NaturalSortString(String str, int byteSize) throws Exception {
if (str == null || str.isEmpty()) return;
this.inStr = str;
this.byteSize = Math.max(2, byteSize); // minimum of 2 bytes to hold a character
setStringType();
iterateString();
}
private void setStringType() {
char firstchar = inStr.toLowerCase().subSequence(0, 1).charAt(0);
if (Character.isLetter(firstchar)) // letters third
out.append(3);
else if (Character.isDigit(firstchar)) // numbers second
out.append(2);
else // non-alphanumeric first
out.append(1);
}
private void iterateString() throws Exception {
StringBuilder n = new StringBuilder();
for (char c : inStr.toLowerCase().toCharArray()) { // lowercase for CASE INSENSITIVE sorting
if (Character.isDigit(c)) {
// group numbers
n.append(c);
continue;
}
if (n.length() > 0) {
addInteger(n.toString());
n = new StringBuilder();
}
addCharacter(c);
}
if (n.length() > 0) {
addInteger(n.toString());
}
}
private void addInteger(String s) throws Exception {
int i = Integer.parseInt(s);
if (i >= (Math.pow(16, byteSize)))
throw new Exception("naturalsort_bytesize_exceeded");
out.append(StringUtils.padLeft(Integer.toHexString(i), byteSize));
}
private void addCharacter(char c) {
//TODO: Add rest of accented characters
if (c >= 224 && c <= 229) // set accented a to a
c = 'a';
else if (c >= 232 && c <= 235) // set accented e to e
c = 'e';
else if (c >= 236 && c <= 239) // set accented i to i
c = 'i';
else if (c >= 242 && c <= 246) // set accented o to o
c = 'o';
else if (c >= 249 && c <= 252) // set accented u to u
c = 'u';
else if (c >= 253 && c <= 255) // set accented y to y
c = 'y';
out.append(StringUtils.padLeft(Integer.toHexString(c), byteSize));
}
#Override
public String toString() {
return out.toString();
}
}
For completeness, below is the StringUtils.padLeft method:
public static String padLeft(String s, int n) {
if (n - s.length() == 0) return s;
return String.format("%0" + (n - s.length()) + "d%s", 0, s);
}
The result should come out like the following
-1
-a
0
1
1.0
1.01
1.1.1
1a
1b
9
10
10a
10ab
11
12
12abcd
100
a
a1a1
a1a2
a-1
a-2
áviacion
b
c1
c2
c12
c100
d
d1.1.1
e
MySQL ORDER BY Sorting alphanumeric on correct order
example:
SELECT `alphanumericCol` FROM `tableName` ORDER BY
SUBSTR(`alphanumericCol` FROM 1 FOR 1),
LPAD(lower(`alphanumericCol`), 10,0) ASC
output:
1
2
11
21
100
101
102
104
S-104A
S-105
S-107
S-111
This is from tutorials point
SELECT * FROM yourTableName ORDER BY
SUBSTR(yourColumnName FROM 1 FOR 2),
CAST(SUBSTR(yourColumnName FROM 2) AS UNSIGNED);
it is slightly different from another answer of this thread
For reference, this is the original link
https://www.tutorialspoint.com/mysql-order-by-string-with-numbers
Another point regarding UNSIGNED is written here
https://electrictoolbox.com/mysql-order-string-as-int/
While this has REGEX too
https://www.sitepoint.com/community/t/how-to-sort-text-with-numbers-with-sql/346088/9
SELECT length(actual_project_name),actual_project_name,
SUBSTRING_INDEX(actual_project_name,'-',1) as aaaaaa,
SUBSTRING_INDEX(actual_project_name, '-', -1) as actual_project_number,
concat(SUBSTRING_INDEX(actual_project_name,'-',1),SUBSTRING_INDEX(actual_project_name, '-', -1)) as a
FROM ctts.test22
order by
SUBSTRING_INDEX(actual_project_name,'-',1) asc,cast(SUBSTRING_INDEX(actual_project_name, '-', -1) as unsigned) asc
This is a simple example.
SELECT HEX(some_col) h
FROM some_table
ORDER BY h
order by len(xxxxx),xxxxx
Eg:
SELECT * from customer order by len(xxxxx),xxxxx
Try this For ORDER BY DESC
SELECT * FROM testdata ORDER BY LENGHT(name) DESC, name DESC
SELECT
s.id, s.name, LENGTH(s.name) len, ASCII(s.name) ASCCCI
FROM table_name s
ORDER BY ASCCCI,len,NAME ASC;
Assuming varchar field containing number, decimal, alphanumeric and string, for example :
Let's suppose Column Name is "RandomValues" and Table name is "SortingTest"
A1
120
2.23
3
0
2
Apple
Zebra
Banana
23
86.Akjf9
Abtuo332
66.9
22
ABC
SELECT * FROM SortingTest order by IF( RandomValues REGEXP '^-?[0-9,.]+$' = 0,
9999999999 ,
CAST(RandomValues AS DECIMAL)
), RandomValues
Above query will do sorting on number & decimal values first and after that all alphanumeric values got sorted.
This will always put the values starting with a number first:
ORDER BY my_column REGEXP '^[0-9]' DESC, length(my_column + 0), my_column ";
Works as follows:
Step1 - Is first char a digit? 1 if true, 0 if false, so order by this DESC
Step2 - How many digits is the number? Order by this ASC
Step3 - Order by the field itself
Input:
('100'),
('1'),
('10'),
('0'),
('2'),
('2a'),
('12sdfa'),
('12 sdfa'),
('Bar nah');
Output:
0
1
2
2a
10
12 sdfa
12sdfa
100
Bar nah
Really problematic for my scenario...
select * from table order by lpad(column, 20, 0)
My column is a varchar, but has numeric input (1, 2, 3...) , mixed numeric (1A, 1B, 1C) and too string data (INT, SHIP)
I have an object that looks like this one:
object =
title : 'an object'
properties :
attribute1 :
random_number: 2
attribute_values:
a: 10
b: 'irrelevant'
attribute2 :
random_number: 4
attribute_values:
a: 15
b: 'irrelevant'
some_random_stuff: 'random stuff'
I want to extract the sum of the 'a' values on attribute1 and attribute2.
What would be the best way to do this in Coffeescript?
(I have already found one way to do it but that just looks like Java-translated-to-coffee and I was hoping for a more elegant solution.)
Here is what I came up with (edited to be more generic based on comment):
sum_attributes = (x) =>
sum = 0
for name, value of object.properties
sum += value.attribute_values[x]
sum
alert sum_attributes('a') # 25
alert sum_attributes('b') # 0irrelevantirrelevant
So, that does what you want... but it probably doesn't do exactly what you want with strings.
You might want to pass in the accumulator seed, like sum_attributes 0, 'a' and sum_attributes '', 'b'
Brian's answer is good. But if you wanted to bring in a functional programming library like Underscore.js, you could write a more succinct version:
sum = (arr) -> _.reduce arr, ((memo, num) -> memo + num), 0
sum _.pluck(object.properties, 'a')
total = (attr.attribute_values.a for key, attr of obj.properties).reduce (a,b) -> a+b
or
sum = (arr) -> arr.reduce((a, b) -> a+b)
total = sum (attr.attribute_values.a for k, attr of obj.properties)