Parsing problematic JSON with Aeson - json

I am trying to parse JSON objects, which are generally of the form
{
"objects": [a bunch of records that can assume a few different forms],
"parameters": [same deal],
"values": {
"k1": "v1",
"k2": "v2",
...
}
}
using Haskell's Aeson library. Part of this task is simple in the sense that the parameters and values fields need no custom parsing whatsoever (and so seem to need only a generically derived instance of FromJSON), and most of the records contained within the array associated to objects also need no special parsing. However, there are some parts of parsing the records within the array of objects that, when considered separately, have documented solutions, but together present problems that I haven't figured out how to address.
Now, the possible variants of record inside the objects and parameters arrays are finite in number and often contain the same keys; for example, all of them have a "name" key or an "id" key, or such. But also many of them have a "type" key, which is a reserved keyword, and so cannot be parsed generically. This is the first problem.
The second problem is that one of the possible variants of record inside objects can have a key -- "depends" let's say -- whose value may assume different types. It can either be a single record
{
"objects": [
{
"depends": {
"reference": "r1"
},
...
],
...
}
or a list of records
{
"objects": [
"depends": [
{"reference": "r1"},
{"reference": "r2"},
etc.
],
],
...
}
and it happens that this is the one field that I would like to manipulate in a custom fashion after converting to a Haskell object (eventually I want to represent the collection of such "depends" references as a Data.Graph graph).
My initial attempt was to create one huge record type that subsumes all of the possible keys in the elements of the objects and parameters arrays. Something like this:
{-# LANGUAGE DeriveAnyClass #-}
{-# LANGUAGE DeriveGeneric #-}
{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE RecordWildCards #-}
import Data.Aeson
import GHC.Generics
data Ref = Ref
{ ref :: String
} deriving (Show, Generic, FromJSON, ToJSON)
data Reference
= Reference Ref
| References [Ref]
deriving (Show, Generic, FromJSON, ToJSON)
type MString = Maybe String -- I'm writing this a lot using this approach
data PObject = PObject
-- Each of the object/parameter records have these keys
{ _name :: String
, _id :: String
-- Other keys that might appear in a given object/parameter record
, _type :: MString
, _role :: MString
, _depends :: Maybe Reference
-- A bunch more
} deriving Show
instance FromJSON PObject where
parseJSON = withObject "PObject" $ \o -> do
_name <- o .: "name"
_id <- o .: "id"
_type <- o .:? "type"
_role <- o .:? "role"
_depends <- o .:? "depends"
-- etc.
return PObject{..}
And then finally, the whole JSON object would be represented like
data MyJSONObject = MyJSONObject
{ objects :: Maybe [PObject]
, parameters :: Maybe [PObject]
, values :: Maybe Object
} deriving (Show, Generic, FromJSON)
This works until it tries to parse a "depends" field, reporting that
"Error in $.objects[2].depends: key \"tag\" not present"
There are no "tag" keys, so I'm not sure what this means. I suspect it has to do with the generic instances of FromJSON for Ref and Reference.
My questions:
What does this error indicate? So far in my learning of Haskell, the errors have always been very helpful. This one is not. Do I need to do something special for the "depends" key in my parseJSON function?
All of this boilerplate is really because of two keys -- "type" and "depends". Is there a more elegant way to deal with these keys?
Relatedly, this is part of my first real Haskell project, so I have a more general design question. Experienced Haskellers and Aeson users, how would you lay out your types and instances for this type of JSON? I tried listing out each possible variant of objects/parameters record as its own separate type, and only writing custom FromJSON instances for those that have a "depends" or "type" key, but this produced a lot more boilerplate code and in any case doesn't solve any of the other issues I have. General pointers on "best practices", idiomatic usage, etc. would be extremely useful and appreciated.

There are no "tag" keys, so I'm not sure what this means. I suspect it has to do with the generic instances of FromJSON for Ref and Reference.
That's spot on. By default, aeson will use the defaultTaggedObject to encode sum types. References is a sum type. Therefore, aeson introduces a tag to distinguish the constructors. You can try that with a short example:
ghci> data Example = A () | B deriving (Generic,ToJSON)
ghci> encode B
"{\"tag\":\"B\",\"contents\":[]}"
When you use _depends <- o .:? "depends", the Reference parser does not find its tag. You have to write some parsing code there yourself.

All of this boilerplate is really because of two keys -- "type" and
"depends". Is there a more elegant way to deal with these keys?
You could keep the underscores in the field names and use fieldLabelModifier in the Options data type to strip them for parsing purposes.

Related

How can I access JSON values from GHCI?

I'm trying to navigate JSON values using Haskell, in GHCI. I can get a JSON payload from an API, with something like this:
import Network.HTTP.Simple
baseURL <- parseRequest "https://www.googleapis.com/books/v1/volumes"
let queryString = B8.pack $ unpack q
let request = setRequestQueryString [("q", Just queryString)] $ baseURL
resp <- httpJSON request
let body = getResponseBody resp :: Object
And that gives me an Object. That object (a HashMap) contains the key "items" whose value is an Array of Objects. I want to get the first object from taht array, then gets its volumeInfo, then its industryIdentifiers, then its isbn.
In Python I would do:
identifiers = body['items'][0]['industryIdentifiers']
isbn = [id['itentifier'] for id in identifiers if id['type'] == 'ISBN_10'][0]
Or in other words, just chain accessors. How can I do this in Haskell? I've tried something like ((body ! "items") !! 0) ! "volumeInfo") but I keep getting errors like Couldn't match expected type ‘[a]’ with actual type ‘Value’.
All the tutorials I can find just say to model the data by creating a complete picture of the data as a Haskell data structure, then writing a decoder to turn that JSON data into a Haskell data object. That seems like massive overkill in this case, when the data structure I'm getting from the API is way bigger than the bit that I need, which is just the ISBN.
How does one normally drill down through a big data structure in Haskell?
The preferred way is to not manually deal with JSON values, but instead parse them into a suitable Haskell type and then index into that, which is much safer: if the input doesn't conform to the expected format, you get a clear parsing error show up at which location in the data structure something is missing, instead of an obscure key-missing error somewhere deep in your code.
{-# LANGUAGE DeriveGeneric, DeriveAnyClass #-}
data GoogleBooksVolumes = GoogleBooksVolumes
{ items :: Array GoogleBooksVolume
, ...
} deriving (Generic, FromJSON, ToJSON)
data GoogleBooksVolume = GoogleBooksVolume
{ ...
, industryIdentifiers :: Array IndustryIdentifier
, ...
} deriving (Generic, FromJSON, ToJSON)
...
If you're going to ad-hoc index into the JSON object, your best bet is the aeson-lens package. That allows you to do something very similar – and similarly unsafe – as in Python.
{-# LANGUAGE OverloadedStrings #-}
Just identifiers = Just body ^. key "items" . nth 0 . key "industryIdentifiers"
Just isbn = head [ Just idf ^. key "identifier"
| idf <- identifiers
, Just idf ^. key "type" == Just (String "ISBN_10") ]
TBH this is even worse than in Python, because if a key fails to match you just get a Nothing result without any information at all what went wrong.
A safer option is to manually pattern-match at every decision where something could go wrong, but that is a lot of boilerplate.

Can aeson handle JSON with imprecise types?

I have to deal with JSON from a service that sometimes gives me "123" instead of 123 as the value of field. Of course this is ugly, but I cannot change the service. Is there an easy way to derive an instance of FromJSON that can handle this? The standard instances derived by means of deriveJSON (https://hackage.haskell.org/package/aeson-1.5.4.1/docs/Data-Aeson-TH.html) cannot do that.
One low-hanging (although perhaps not so elegant) option is to define the property as an Aeson Value. Here's an example:
{-#LANGUAGE DeriveGeneric #-}
module Q65410397 where
import GHC.Generics
import Data.Aeson
data JExample = JExample { jproperty :: Value } deriving (Eq, Show, Generic)
instance ToJSON JExample where
instance FromJSON JExample where
Aeson can decode a JSON value with a number:
*Q65410397> decode "{\"jproperty\":123}" :: Maybe JExample
Just (JExample {jproperty = Number 123.0})
It also works if the value is a string:
*Q65410397> decode "{\"jproperty\":\"123\"}" :: Maybe JExample
Just (JExample {jproperty = String "123"})
Granted, by defining the property as Value this means that at the Haskell side, it could also hold arrays and other objects, so you should at least have a path in your code that handles that. If you're absolutely sure that the third-party service will never give you, say, an array in that place, then the above isn't the most elegant solution.
On the other hand, if it gives you both 123 and "123", there's already some evidence that maybe you shouldn't trust the contract to be well-typed...
Assuming you want to avoid writing FromJSON instances by hand as much as possible, perhaps you could define a newtype over Int with a hand-crafted FromJSON instance—just for handling that oddly parsed field:
{-# LANGUAGE TypeApplications #-}
import Control.Applicative
import Data.Aeson
import Data.Text
import Data.Text.Read (decimal)
newtype SpecialInt = SpecialInt { getSpecialInt :: Int } deriving (Show, Eq, Ord)
instance FromJSON SpecialInt where
parseJSON v =
let fromInt = parseJSON #Int v
fromStr = do
str <- parseJSON #Text v
case decimal str of
Right (i, _) -> pure i
Left errmsg -> fail errmsg
in SpecialInt <$> (fromInt <|> fromStr)
You could then derive FromJSON for records which have a SpecialInt as a field.
Making the field a SpecialInt instead of an Int only for the sake of the FromJSON instance feels a bit intrusive though. "Needs to be parsed in an odd way" is a property of the external format, not of the domain.
In order to avoid this awkwardness and keep our domain types clean, we need a way to tell GHC: "hey, when deriving the FromJSON instance for my domain type, please treat this field as if it were a SpecialInt, but return an Int at the end". That is, we want to deal with SpecialInt only when deserializing. This can be done using the "generic-data-surgery" library.
Consider this type
{-# LANGUAGE DeriveGeneric #-}
import GHC.Generics
data User = User { name :: String, age :: Int } deriving (Show,Generic)
and imagine we want to parse "age" as if it were a SpecialInt. We can do it like this:
{-# LANGUAGE DataKinds #-}
import Generic.Data.Surgery (toOR', modifyRField, fromOR, Data)
instance FromJSON User where
parseJSON v = do
r <- genericParseJSON defaultOptions v
-- r is a synthetic Data which we must tweak in the OR and convert to User
let surgery = fromOR . modifyRField #"age" #1 getSpecialInt . toOR'
pure (surgery r)
Putting it to work:
{-# LANGUAGE OverloadedStrings #-}
main :: IO ()
main = do
print $ eitherDecode' #User $ "{ \"name\" : \"John\", \"age\" : \"123\" }"
print $ eitherDecode' #User $ "{ \"name\" : \"John\", \"age\" : 123 }"
One limitation is that "generic-data-surgery" works by tweaking Generic representations, so this technique won't work with deserializers generated using Template Haskell.

Finding a value in ByteString (which is actually JSON)

A web service returns a response as ByteString
req <- parseUrl "https://api.example.com"
res <- withManager $ httpLbs $ configReq req
case (HashMap.lookup "result" $ responseBody res) of .... -- error - responseBody returns ByteString
where
configReq r = --......
To be more specific, responseBody returns data in ByteString, although it's actually valid JSON. I need to find a value in it. Obviously, it would be easier to find it if it was JSON and not ByteString.
If that's the case, how do I convert it to JSON?
UPDATE:
decode $ responseBody resp :: IO (Either String Aeson.Value)
error:
Couldn't match expected type `IO (Either String Value)'
with actual type `Maybe a0'
You'll find several resources for converting bytestring to JSON. The simplest use cases are on the hackage page itself, and the rest you can infer using type signatures of the entities involved.
https://hackage.haskell.org/package/aeson-0.7.0.6/docs/Data-Aeson.html
But here's a super quick intro to JSON with Aeson:
In most languages, you have things like this:
someString = '{ "name" : ["value1", 2] }'
aDict = json.loads(someString)
This is obviously great, because JSON has a nearly one to one mapping with a fundamental data-structure of the language. Containers in most dynamic languages can contain values of any type, and so moving from JSON to data structure is a single step.
However, that is not the case with Haskell. You can't put things of arbitrary types into a container like type (A list, or a dictionary).
So Aeson does a neat thing. It defines an intermediate Haskell type for you, that maps directly to JSON.
A fundamental unit in Aeson is a Value. The Value can contain many things. Like an integer, string, an array, or an object.
https://hackage.haskell.org/package/aeson-0.7.0.6/docs/Data-Aeson.html#t:Value
An aeson array is a Vector (like a list but better) of Values and an aeson object is a HashMap of Text to Values
The next interesting step is that you can define functions that will convert an Aeson value to your Haskell type. This completes the loop. ByteString to Value to a custom type.
So all you do is implement parseJSON and toJSON functions that convert aeson Values to your type and vice-versa. The bit that converts a bytestring into a valid aeson value is implemented by aeson. So the heavy lifting is all done.
Just important to note, that Aeson bytestring is a lazy bytestring, so you might need some strict to lazy helpers.
stringToLazy :: String -> ByteString
stringToLazy x = Data.Bytestring.Lazy.fromChunks [(Data.ByteString.Char8.pack x)]
lazyToString :: ByteString -> String
lazyToString x = Data.ByteString.Char8.unpack $ Data.ByteString.Char8.concat $ Data.ByteString.Lazy.toChunks
That should be enough to get started with Aeson.
--
Common decoding functions with Aeson:
decode :: ByteString -> Maybe YourType
eitherDecode :: ByteString -> Either String YourType.
In your case, you're looking for eitherDecode.

FromJSON custom for custom type

The newest version of Data.Aeson changed the way that ToJSON and FromJSON work for simple types like:
data Permission = Read | Write
It used to be that the generic call:
instance ToJSON Permission where
...Would create JSON that looked like {"Read":[]} or {"Write":[]}.
But now it creates:
{tag:"Read",contents:"[]"}
Which makes sense but breaks code I have written. I wrote a toJSON part by hand to give the correct looking stuff but writing the fromJSON is confusing me.
Any ideas?
Thanks
You could control how datatype with all nullary constructors is encoded using allNullaryToStringTag field on Data.Aeson.Options. Set it to True and it will be encoded simply as string.
import Data.Aeson.Types (Options (..), defaultOptions)
data Permission = Read | Write
$(deriveToJSON (defaultOptions {allNullaryToStringTag = True}) ''Permission)
Take a look at Options definition, it contains other handy fields.
Since the value contained in the Object constructor for Data.Aeson.Value is just a strict HashMap, we can extract the keys from it and make a decision based on that. I tried this and it worked pretty well.
{-# LANGUAGE OverloadedStrings #-}
module StackOverflow where
import Data.Aeson
import Control.Monad
import Data.HashMap.Strict (keys)
data Permission = Read | Write
instance FromJSON Permission where
parseJSON (Object v) =
let ks = keys v
in case ks of
["Read"] -> return Read
["Write"] -> return Write
_ -> mzero
parseJSON _ = mzero
You can test it with decode "{\"Read\": []}" :: Maybe Permission. The mzero in parseJSON ensures that if something else is passed in, it'll just return Nothing. Since you seem to want to only check if there is a single key matching one of your two permissions, this is pretty straightforward and will properly return Nothing on all other inputs.

parsing complicated jsons with Aeson

I'm trying to parse a call to an API into a haskell record type using the Aeson Library
I'm using wikipedia pages, and parsing them to the title and a list of links.
A sample would be this,
{"query":{"pages":{"6278041":{"pageid":6278041,"ns":0,"title":"Lass","links":[{"ns":0,"title":"Acronym"},{"ns":0,"title":"Dead Like Me"},{"ns":0,"title":"Donna Lass"},{"ns":0,"title":"George Lass"},{"ns":0,"title":"Girl"},{"ns":0,"title":"Lassana Diarra"},{"ns":0,"title":"Lightning Lass"},{"ns":0,"title":"Real Madrid"},{"ns":0,"title":"Shadow Lass"},{"ns":0,"title":"Solway Lass"},{"ns":0,"title":"Szymon Lass"},{"ns":0,"title":"The Bonnie Lass o' Fyvie"},{"ns":0,"title":"The Tullaghmurray Lass"},{"ns":0,"title":"Woman"},{"ns":12,"title":"Help:Disambiguation"}]}}}}
and I would like to parse it to the title and a list of links in a data type like this.
data WikiPage = WikiPage { title :: String,
links :: String }
What code I currently have is this,
instance FromJSON WikiPage where
parseJSON j = do
o <- parseJSON j
let id = head $ o .: "query" .: "pages"
let name = o .: "query" .: "pages" .: id .: "title"
let links = mapM (.: "title") (o .: "query".: "pages" .: id .: "links")
return $ WikiPage name links
I'm getting the error,
Couldn't match expected type `Data.Text.Internal.Text'
with actual type `[Char]'
In the second argument of `(.:)', namely `"title"'
I don't really get whats going on, I feel like there must be a problem with how I'm mapping over the links but I'm not sure exactly what has to be done. I also don't get how I'm supposed to use id in the second query string as it's a parser (I'm sure I need to use applicative in here somewhere but I'm not sure how.) I haven't found any examples that decompose more complicated jsons like this.
I'm also trying to figure out aeson. I had the same problem you were having, and I solved it by adding {-# LANGUAGE OverloadedStrings #-} at the top of my source file. I'm very new to Haskell, but I believe it adds an unofficial extension to the language, presumably to allow strings to double as other string-like datatypes.