I'm using a regular dictionary to store matrices and then converting that dict to a Pandas Series and write it out to a CSV. I then use pd.read_csv() on the csv file but the returned items are all strings, literally a string of the entire matrix of values. Anyway I can make it floats?
The datatypes to read in are an argument to read_csv. From the function help:
dtype : Type name or dict of column -> type
Data type for data or columns. E.g. {'a': np.float64, 'b': np.int32}
so you'd make a read call like:
mypd=pd.read_csv('my.csv', dtype={'a': np.float64, 'b': np.int32})
where a, b, etc match your input file.
You can also cast the type of a column after it's been read into a DataFrame.
Related
I want to convert Julia Dictionary Keys that are Strings to Integers
JSON3 converts the keys of my Dictionary into Strings. My understanding is that JSON keys are only strings.
using JSON3
a1 = Dict(1 => "one", 2 => "two", 3 => "three")
a1_json = JSON3.write(a1)
"{\"2\":\"two\",\"3\":\"three\",\"1\":\"one\"}"
a2 = JSON3.read(json,Dict{Int64,String})
ERROR: MethodError: no method matching Int64(::String)
Is there any way to keep the keys in Int?
from JSON3.jl readme:
Declaring my type is JSON3.ObjectType() means it should map to a JSON object of >unordered key-value pairs, where keys are Symbol or String, and values are any other type (or Any).
So, in the parse step, you are gonna get symbols if you use numbers as keys.
with that said, you can use this code to recover the original dict:
Dict(parse(Int,string(k))=>v for (k,v) in pairs(a2))
I created a config file in JSON format, and I want to use KDB to read it in as a dictionary.
In Python, it's so easy:
with open('data.json') as f:
data = json.load(f)
Is there a similar function in KDB?
To read your JSON file into kdb+, you should use read0. This returns the lines of the file as a list of strings.
q)read0`:sample.json
,"{"
"\"name\":\"John\","
"\"age\":30,"
"\"cars\":[ \"Ford\", \"BMW\", \"Fiat\" ]"
,"}"
kdb+ allows for the de-serialisation (and serialisation) of JSON objects to dictionaries using the .j namespace. The inbuilt .j.k expects a single string of characters containing json and converts this into a dictionary. A raze should be used to flatten our list of strings:
q)raze read0`:sample.json
"{\"name\":\"John\",\"age\":30,\"cars\":[ \"Ford\", \"BMW\", \"Fiat\" ]}"
Finally, using .j.k on this string yields the dictionary
q).j.k raze read0`:sample.json
name| "John"
age | 30f
cars| ("Ford";"BMW";"Fiat")
For a particularly large JSON file, it may be more efficient to use read1 rather than raze read0 on your file, e.g.
q).j.k read1`:sample.json
name| "John"
age | 30f
cars| ("Ford";"BMW";"Fiat")
If you're interested in the reverse operation, you can use .j.j to convert a dictionary into a list of strings and use 0: to save.
Further information on the .j namespace can be found here.
You can also see more examples on the Kx wiki of read0, read1 and 0:.
Working with JSON is handled by the .j namespace where .j.j serialises and .j.k deserialises the messages. Note the you will need to use raze to convert the JSON into a single string first.
There is more information available on the Kx wiki, where the following example is presented:
q).j.k "{\"a\":[0,1],\"b\":[\"hello\",\"world\"]}"
a| 0 1
b| "hello" "world"
When using .j.j both symbols and strings in kdb will be encoded into a JSON string while kdb will decode JSON strings to kdb strings except keys where they will be symbols.
To encode a kdb table in JSON an array of objects with identical keys should be sent. kdb will also encode tables as arrays of objects in JSON.
q).j.k "[{\"a\":1,\"b\":2},{\"a\":3,\"b\":4}]"
a b
---
1 2
3 4
When encoding q will use the value of \P to choose the precision, which is by default 7 which could lead to unwanted rounding.
This can be changed with 0 meaning maximum precision although the final digits are unreliable as shown below. See here for more info https://code.kx.com/q/ref/cmdline/#-p-display-precision.
q).j.j 1.000001 1.0000001f
"[1.000001,1]"
q)\P 0
q).j.j 1.000001 1.0000001f
"[1.0000009999999999,1.0000001000000001]"
When I read a json (without schema) to a DataFrame, all the numeric types going to be Long. Is there a way to enforce an Integer type without giving a fully specified json schema?
you can convert the dataframe into a dataset with case class
val df = Seq((1,"ab"),(3,"ba")).toDF("A","B")
case class test(A: Int, B: String)
df.as[test]
or you duplicate the column and you recast the DF.
import org.apache.spark.sql.types.{StringType}
df.withColumn("newA", 'A.cast(StringType))
My requirement is to pass dataframe as input parameter to a scala class which saves the data in json format to hdfs.
The input parameter looks like this:
case class ReportA(
parm1: String,
parm2: String,
parm3: Double,
parm4: Double,
parm5: DataFrame
)
I have created a JSON object for this parameter like:
def write(xx: ReportA) = JsObject(
"field1" -> JsString(xx.parm1),
"field2" -> JsString(xx.parm2),
"field3" -> JsNumber(xx.parm3),
"field4" -> JsNumber(xx.parm4),
"field5" -> JsArray(xx.parm5)
)
parm5 is a dataframe and wanted to convert as Json array.
How can I convert the dataframe to Json array?
Thank you for your help!!!
A DataFrame can be seen to be the equivalent of a plain-old table in a database, with rows and columns. You can't just get a simple array from it, the closest you woud come to an array would be with the following structure :
[
"col1": [val1, val2, ..],
"col2": [val3, val4, ..],
"col3": [val5, val6, ..]
]
To achieve a similar structure, you could use the toJSON method of the DataFrame API to get an RDD<String> and then do collect on it (be careful of any OutOfMemory exceptions).
You now have an Array[String], which you can simply transform in a JsonArray depending on the JSON library you are using.
Beware though, this seems like a really bizarre way to use Spark, you generally don't output and transform an RDD or a DataFrame directly into one of your objects, you usually spill it out onto a storage solution.
Is there a way to automatically convert JSON data into Data.Map or just a list of tuples?
Say, if I have:
{Name : "Stitch", Age : 3, Friend: "Lilo"}
I'd like it to be converted into:
fromList [("Name","Stitch"), ("Age",3), ("Friend","Lilo")]
.. without defining a Stitch data type.
I am happy to parse integers into strings in the resulting map. I can just read them into integers later.
You can use aeson. See Decoding a mixed-type object in its documentation's tutorial:
>>> import qualified Data.ByteString.Lazy.Char8 as BS
>>> :m +Data.Aeson
>>> let foo = BS.pack "{\"Name\" : \"Stitch\", \"Age\" : 3, \"Friend\": \"Lilo\"}"
>>> decode foo :: Maybe Object
Just fromList [("Friend",String "Lilo"),("Name",String "Stitch"),("Age",Number 3.0)]
An Object is just a HashMap from Text keys to Value values, the Value type being a sum type representation of JS values.