data type mismatch in SQL Spark - mysql

I'm trying to extract a small part of an array and have casted the array into string type, then use split/split_part to extract the data. But jupyter keeps saying that the column, which I have already casted it from array to string, cannot be resolved due to data type mismatch.
here's my sql code:
TRIM(SPLIT(CAST(SPLIT(CAST(log as STRING),' ',4) as STRING),'OpenLevel39',2)) as server_launch_date
or another line of code is also using the same method:
datediff('day', DATE(TRIM(SPLIT(SPLIT(CAST(log as STRING),' ',4),'OpenLevel39',2))), current_date) as server_create_in_days
and here's what the error says:
AnalysisException: cannot resolve 'trim(split(CAST(split(CAST(spark_catalog.jxm.timeframe.log AS STRING), ' ', 4) AS STRING), 'OpenLevel39', 2))' due to data type mismatch: argument 1 requires string type, however, 'split(CAST(split(CAST(spark_catalog.jxm.timeframe.log AS STRING), ' ', 4) AS STRING), 'OpenLevel39', 2)' is of array type.; line 20 pos 0;
please can anyone help me with this problem? much appreciated.

Spark's split returns an array with a length of at most limit as stated in the documentation.
On the other hand, trim requires the first parameter to be of type string; you are passing an array.
You can try to cast the array to string first, then use trim, as below:
trim(CAST(split(CAST(split(CAST(spark_catalog.jxm.timeframe.log AS STRING), ' ', 4) AS STRING), 'OpenLevel39', 2) as STRING))
However, this kind of does not make sense because an array has no spaces before or after (even after casted to string).
Good luck!

Related

mySql JSON string field returns encoded

First week having to deal with a MYSQL database and JSON field types and I cannot seem to figure out why values are encoded automatically and then returned in encoded format.
Given the following SQL
-- create a multiline string with a tab example
SET #str ="Line One
Line 2 Tabbed out
Line 3";
-- encode it
SET #j = JSON_OBJECT("str", #str);
-- extract the value by name
SET #strOut = JSON_EXTRACT(#J, "$.str");
-- show the object and attribute value.
SELECT #j, #strOut;
You end up with what appears to be a full formed JSON object with a single attribute encoded.
#j = {"str": "Line One\n\tLine 2\tTabbed out\n\tLine 3"}
but using JSON_EXTRACT to get the attribute value I get the encoded version including outer quotes.
#strOut = "Line One\n\tLine 2\tTabbed out\n\tLine 3"
I would expect to get my original string with the \n \t all unescaped to the original values and no outer quotes. as such
Line One
Line 2 Tabbed out
Line 3
I can't seem to find any JSON_DECODE or JSON_UNESCAPE or similar functions.
I did find a JSON_ESCAPE() function but that appears to be used to manually build a JSON object structure in a string.
What am I missing to extract the values to the original format?
I like to use handy operator ->> for this.
It was introduced in MySQL 5.7.13, and basically combines JSON_EXTRACT() and JSON_UNQUOTE():
SET #strOut = #J ->> '$.str';
You are looking for the JSON_UNQUOTE function
SET #strOut = JSON_UNQUOTE( JSON_EXTRACT(#J, "$.str") );
The result of JSON_EXTRACT() is intentionally a JSON document, not a string.
A JSON document may be:
An object enclosed in { }
An array enclosed in [ ]
A scalar string value enclosed in " "
A scalar number or boolean value
A null — but this is not an SQL NULL, it's a JSON null. This leads to confusing cases because you can extract a JSON field whose JSON value is null, and yet in an SQL expression, this fails IS NULL tests, and it also fails to be equal to an SQL string 'null'. Because it's a JSON type, not a scalar type.

reading .csv file with decimals separated by a comma with CSV.jl

I am trying to read some data into julia into a data frame to work with it. A minimal example of the .csv file could look like this:
A; B; C; D
ab; 1,23; 4; 9,2
ab; 3,4; 7; 1,1
ba; 6; 2,3; 8,6
I load the following to packages and read the data:
using DataFrames
using CSV
d = CSV.read( "test.csv", delim=";")
Julia recognizes the following types:
eltypes(d)
CategoricalArrays.CategoricalString{UInt32}
String
String
String
How could I now turn whole columns to floats with the comma replaced by a dot? My first idea was to use:
float(d[1,2])
But I did not find an option to tell julia to replace the comma with a dot.
My next idea was to first replace the comma and then convert it:
float(replace(d[1,2], ",", "."))
That works fine on a single cell but not on a whole column:
float(replace(d[:,2], ",", "."))
MethodError: no method matching
replace(::WeakRefStrings.WeakRefStringArray{WeakRefString{UInt8},1,Union{}},
::String, ::String)
I also tried:
d = CSV.read( "test.csv", delim=";", decimal=",")
which also just gives an error ...
Any ideas how to handle this problem and how to efficiently read the data into julia?
Thanks a lot!
Best regards.
One straightforward way is to read the file to string, replace the comma decimal separators by dots and then create the DataFrame from it:
s = replace(readstring("test.csv"), ",", ".")
CSV.read(IOBuffer(s); delim=';', types=[String, Float64, Float64, Float64])
Note that you can use the types keyword to specifiy the column types (it will then implicitly parse the string entries).
EDIT: According to this github issue the CSV.jl's read method supports a decimal keyword (from version v0.2.0 on) which allows you to do
CSV.read("test.csv"; delim=';', decimal=',', types=[String, Float64, Float64, Float64])
EDIT: Removed hint to alternatively use readtable from DataFrames.jl because it seems to be deprecated in favor of CSV.read.

How to convert jsonAST.Jint to int

I am attempting to learn Scala, and I'm trying to parse a JSON file. I have two lines of code:
var jVal:JValue = parse(json);
val totalCount:Int = (jVal \\ "totalCount").asInstanceOf[Int];
However, (jVal \\ "totalCount") returns a JInt instead of an int. If I print it as a string, it looks like "JInt(38)".
How on earth do I convert this to a regular int? My current code throws an exception saying that
net.liftweb.json.JsonAST$JInt cannot be cast to java.lang.Integer
I've scoured the internet, but I can't find any answers. I would really prefer not to manually parse and remove the "JInt()" part of the string just to get it as an integer.
Surely I am missing a simple way to do this?
Since JInt is a case class, a convenient way to extract the value is using an extractor expression, either in a match:
myJValue match {
case JInt(x) => /* do something with x */
case JString(s) => /* do something with s */
/* etc. */
}
or just an assignment statement, when you know what type to expect:
val JInt(totalCount) = (jVal \\ "totalCount")
This will define totalCount to be the value of "totalCount" in your JSON. Note that it will be of type BigInt. If you want to, you can convert your BigInt to an Int with the toInt method. But if your number is too big for an Int, this method will give you a different number instead of an error. So if huge numbers are at all a possibility, you'll want to check first with isValidInt.
You can also get the value using the num field or values method, but in your code that's harder to work with. To use num, you'd have to do a cast of your JValue to JInt. And if you don't cast to JInt, you won't know the type of the result of values.

PostgreSQL Nested JSON Querying

On PostgreSQL 9.3.4, I have a JSON type column called "person" and the data stored in it is in the format {dogs: [{breed: <>, name: <>}, {breed: <>, name: <>}]}. I want to retrieve the breed of dog at index 0. Here are the two queries I ran:
Doesn't work
db=> select person->'dogs'->>0->'breed' from people where id = 77;
ERROR: operator does not exist: text -> unknown
LINE 1: select person->'dogs'->>0->'bree...
^
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.
Works
select (person->'dogs'->>0)::json->'breed' from es_config_app_solutiondraft where id = 77;
?column?
-----------
"westie"
(1 row)
Why is the type casting necessary? Isn't it inefficient? Am I doing something wrong or is this necessary for postgres JSON support?
This is because operator ->> gets JSON array element as text. You need a cast to convert its result back to JSON.
You can eliminate this redundant cast by using operator ->:
select person->'dogs'->0->'breed' from people where id = 77;

Argument data type numeric is invalid for argument 1 of substring function

I get the following message with this code
case when substring(New_Limit,11,1)=' ' then '0'+substring(New_Limit,1,10)
The 'then' bit is meant to concat the 0 and substring.
Any help?
This means that your New_Limit variable is a numeric value. You may want to put a CAST to (n)varchar around it.
You try to cast it to a string type (varchar) first:
SUBSTRING(CAST(New_Limit AS varchar(38)), 11, 1)