Haskell: Parsing JSON data into a Map or a list of tuples? - json

Is there a way to automatically convert JSON data into Data.Map or just a list of tuples?
Say, if I have:
{Name : "Stitch", Age : 3, Friend: "Lilo"}
I'd like it to be converted into:
fromList [("Name","Stitch"), ("Age",3), ("Friend","Lilo")]
.. without defining a Stitch data type.
I am happy to parse integers into strings in the resulting map. I can just read them into integers later.

You can use aeson. See Decoding a mixed-type object in its documentation's tutorial:
>>> import qualified Data.ByteString.Lazy.Char8 as BS
>>> :m +Data.Aeson
>>> let foo = BS.pack "{\"Name\" : \"Stitch\", \"Age\" : 3, \"Friend\": \"Lilo\"}"
>>> decode foo :: Maybe Object
Just fromList [("Friend",String "Lilo"),("Name",String "Stitch"),("Age",Number 3.0)]
An Object is just a HashMap from Text keys to Value values, the Value type being a sum type representation of JS values.

Related

How can I access JSON values from GHCI?

I'm trying to navigate JSON values using Haskell, in GHCI. I can get a JSON payload from an API, with something like this:
import Network.HTTP.Simple
baseURL <- parseRequest "https://www.googleapis.com/books/v1/volumes"
let queryString = B8.pack $ unpack q
let request = setRequestQueryString [("q", Just queryString)] $ baseURL
resp <- httpJSON request
let body = getResponseBody resp :: Object
And that gives me an Object. That object (a HashMap) contains the key "items" whose value is an Array of Objects. I want to get the first object from taht array, then gets its volumeInfo, then its industryIdentifiers, then its isbn.
In Python I would do:
identifiers = body['items'][0]['industryIdentifiers']
isbn = [id['itentifier'] for id in identifiers if id['type'] == 'ISBN_10'][0]
Or in other words, just chain accessors. How can I do this in Haskell? I've tried something like ((body ! "items") !! 0) ! "volumeInfo") but I keep getting errors like Couldn't match expected type ‘[a]’ with actual type ‘Value’.
All the tutorials I can find just say to model the data by creating a complete picture of the data as a Haskell data structure, then writing a decoder to turn that JSON data into a Haskell data object. That seems like massive overkill in this case, when the data structure I'm getting from the API is way bigger than the bit that I need, which is just the ISBN.
How does one normally drill down through a big data structure in Haskell?
The preferred way is to not manually deal with JSON values, but instead parse them into a suitable Haskell type and then index into that, which is much safer: if the input doesn't conform to the expected format, you get a clear parsing error show up at which location in the data structure something is missing, instead of an obscure key-missing error somewhere deep in your code.
{-# LANGUAGE DeriveGeneric, DeriveAnyClass #-}
data GoogleBooksVolumes = GoogleBooksVolumes
{ items :: Array GoogleBooksVolume
, ...
} deriving (Generic, FromJSON, ToJSON)
data GoogleBooksVolume = GoogleBooksVolume
{ ...
, industryIdentifiers :: Array IndustryIdentifier
, ...
} deriving (Generic, FromJSON, ToJSON)
...
If you're going to ad-hoc index into the JSON object, your best bet is the aeson-lens package. That allows you to do something very similar – and similarly unsafe – as in Python.
{-# LANGUAGE OverloadedStrings #-}
Just identifiers = Just body ^. key "items" . nth 0 . key "industryIdentifiers"
Just isbn = head [ Just idf ^. key "identifier"
| idf <- identifiers
, Just idf ^. key "type" == Just (String "ISBN_10") ]
TBH this is even worse than in Python, because if a key fails to match you just get a Nothing result without any information at all what went wrong.
A safer option is to manually pattern-match at every decision where something could go wrong, but that is a lot of boilerplate.

how to convert os.stat_result to a JSON that is an object?

I know that json.dumps can be used to convert variables into a JSON representation. Sadly the conversion of python3's class os.stat_result is an string consisting of an array representing the values of the class instance.
>>> import json
>>> import os
>>> json.dumps(os.stat('/'))
'[16877, 256, 24, 1, 0, 0, 268, 1554977084, 1554976849, 1554976849]'
I would however much prefer to have it convert the os.stat_result being converted to an JSON being an object. How can I achieve this?
It seems that the trouble is that os.stat_result does not have a .__dict__ thing.
seeing the result of this:
>>> import os
>>> str(os.stat('/'))
'os.stat_result(st_mode=16877, st_ino=256, st_dev=24, st_nlink=1, st_uid=0, st_gid=0, st_size=268, st_atime=1554977084, st_mtime=1554976849, st_ctime=1554976849)'
makes me hope there is a swift way to turn an python class instance (e.g. `os.stat_result") into a JSON representation that is an object.
which while is JSON, but the results are
as gst mentioned, manually would be this:
def stat_to_json(fp: str) -> dict:
s_obj = os.stat(fp)
return {k: getattr(s_obj, k) for k in dir(s_obj) if k.startswith('st_')}
I would however much prefer to have it convert the os.stat_result being converted to an JSON being an object. How can I achieve this?
if by JSON you mean have a dict with keys st_mode, st_ino, etc.. then the answer is .. manually.
with a "prevaled" list of keys:
# http://thepythoncorner.com/dev/writing-a-fuse-filesystem-in-python/
def dict_of_lstat(lstat_res):
lstat_keys = ['st_atime', 'st_ctime', 'st_gid', 'st_mode', 'st_mtime', 'st_nlink', 'st_size', 'st_uid', 'st_blocks']
return dict((k, getattr(lstat_res, k)) for k in lstat_keys)
lstat_dict = dict_of_lstat(os.lstat(path))
def dict_of_statvfs(statvfs_res):
statvfs_keys = ['f_bavail', 'f_bfree', 'f_blocks', 'f_bsize', 'f_favail', 'f_ffree', 'f_files', 'f_flag', 'f_frsize', 'f_namemax']
return dict((k, getattr(statvfs_res, k)) for k in statvfs_keys)
statvfs_dict = dict_of_statvfs(os.statvfs(path))
longer but faster than a k.startswith('st_') filter

sparksql Convert dataframe to json

My requirement is to pass dataframe as input parameter to a scala class which saves the data in json format to hdfs.
The input parameter looks like this:
case class ReportA(
parm1: String,
parm2: String,
parm3: Double,
parm4: Double,
parm5: DataFrame
)
I have created a JSON object for this parameter like:
def write(xx: ReportA) = JsObject(
"field1" -> JsString(xx.parm1),
"field2" -> JsString(xx.parm2),
"field3" -> JsNumber(xx.parm3),
"field4" -> JsNumber(xx.parm4),
"field5" -> JsArray(xx.parm5)
)
parm5 is a dataframe and wanted to convert as Json array.
How can I convert the dataframe to Json array?
Thank you for your help!!!
A DataFrame can be seen to be the equivalent of a plain-old table in a database, with rows and columns. You can't just get a simple array from it, the closest you woud come to an array would be with the following structure :
[
"col1": [val1, val2, ..],
"col2": [val3, val4, ..],
"col3": [val5, val6, ..]
]
To achieve a similar structure, you could use the toJSON method of the DataFrame API to get an RDD<String> and then do collect on it (be careful of any OutOfMemory exceptions).
You now have an Array[String], which you can simply transform in a JsonArray depending on the JSON library you are using.
Beware though, this seems like a really bizarre way to use Spark, you generally don't output and transform an RDD or a DataFrame directly into one of your objects, you usually spill it out onto a storage solution.

How to use Haskell "json" package to parse to type [Map String String]?

I've got some sample JSON data like this:
[{
"File:FileSize": "104 MB",
"File:FileModifyDate": "2015:04:11 10:39:00-07:00",
"File:FileAccessDate": "2016:01:17 22:37:23-08:00",
"File:FileInodeChangeDate": "2015:04:26 07:50:50-07:00"
}]
and I'm trying to parse the data using the json package (not aeson):
import qualified Data.Map.Lazy as M
import Text.JSON
content <- readFile "file.txt"
decode content :: Result [M.Map String String]
This gives me an error:
Error "readJSON{Map}: unable to parse array value"
I can get as far as this:
fmap
(map (M.fromList . fromJSObject))
(decode content :: Result [JSObject String])
but it seems like an awfully manual way to do it. Surely the JSON data could be parsed directly into a type [Map String String]. Pointers?
Without MAP_AS_DICT switch, the JSON (MAP a b) instance will be:
instance (Ord a, JSON a, JSON b) => JSON (M.Map a b) where
showJSON = encJSArray M.toList
readJSON = decJSArray "Map" M.fromList
So only JSON array can be parsed to Data.Map, otherwise it will call mkError and terminate.
Due to haskell's restriction on instances, you won't be able to write an instance for JSON (Map a b) yourself, so your current workaround may be the best solution.

Finding a value in ByteString (which is actually JSON)

A web service returns a response as ByteString
req <- parseUrl "https://api.example.com"
res <- withManager $ httpLbs $ configReq req
case (HashMap.lookup "result" $ responseBody res) of .... -- error - responseBody returns ByteString
where
configReq r = --......
To be more specific, responseBody returns data in ByteString, although it's actually valid JSON. I need to find a value in it. Obviously, it would be easier to find it if it was JSON and not ByteString.
If that's the case, how do I convert it to JSON?
UPDATE:
decode $ responseBody resp :: IO (Either String Aeson.Value)
error:
Couldn't match expected type `IO (Either String Value)'
with actual type `Maybe a0'
You'll find several resources for converting bytestring to JSON. The simplest use cases are on the hackage page itself, and the rest you can infer using type signatures of the entities involved.
https://hackage.haskell.org/package/aeson-0.7.0.6/docs/Data-Aeson.html
But here's a super quick intro to JSON with Aeson:
In most languages, you have things like this:
someString = '{ "name" : ["value1", 2] }'
aDict = json.loads(someString)
This is obviously great, because JSON has a nearly one to one mapping with a fundamental data-structure of the language. Containers in most dynamic languages can contain values of any type, and so moving from JSON to data structure is a single step.
However, that is not the case with Haskell. You can't put things of arbitrary types into a container like type (A list, or a dictionary).
So Aeson does a neat thing. It defines an intermediate Haskell type for you, that maps directly to JSON.
A fundamental unit in Aeson is a Value. The Value can contain many things. Like an integer, string, an array, or an object.
https://hackage.haskell.org/package/aeson-0.7.0.6/docs/Data-Aeson.html#t:Value
An aeson array is a Vector (like a list but better) of Values and an aeson object is a HashMap of Text to Values
The next interesting step is that you can define functions that will convert an Aeson value to your Haskell type. This completes the loop. ByteString to Value to a custom type.
So all you do is implement parseJSON and toJSON functions that convert aeson Values to your type and vice-versa. The bit that converts a bytestring into a valid aeson value is implemented by aeson. So the heavy lifting is all done.
Just important to note, that Aeson bytestring is a lazy bytestring, so you might need some strict to lazy helpers.
stringToLazy :: String -> ByteString
stringToLazy x = Data.Bytestring.Lazy.fromChunks [(Data.ByteString.Char8.pack x)]
lazyToString :: ByteString -> String
lazyToString x = Data.ByteString.Char8.unpack $ Data.ByteString.Char8.concat $ Data.ByteString.Lazy.toChunks
That should be enough to get started with Aeson.
--
Common decoding functions with Aeson:
decode :: ByteString -> Maybe YourType
eitherDecode :: ByteString -> Either String YourType.
In your case, you're looking for eitherDecode.