Parsing JSON string into record in Haskell - json

I'm struggling to understand this (I'm still a bit new to Haskell) but I'm finding the documentation for the Text.JSON package to be a little confusing. Basically I have this data record type: -
data Tweet = Tweet
{
from_user :: String,
to_user_id :: String,
profile_image_url :: String,
created_at :: String,
id_str :: String,
source :: String,
to_user_id_str :: String,
from_user_id_str :: String,
from_user_id :: String,
text :: String,
metadata :: String
}
and I have some tweets in JSON format that conform to the structure of this type. The thing that I'm struggling with is how to map the above to what gets returned from the following code
decode tweet :: Result JSValue
into the above datatype. I understand that I'm supposed to create an instance of instance JSON Tweet but I don't know where to go from there.
Any pointers would be greatly appreciated, thanks!

I'd recommend that you use the new aeson package instead of the json package, as the former performs much better. Here's how you'd convert a JSON object to a Haskell record, using aeson:
{-# LANGUAGE OverloadedStrings #-}
module Example where
import Control.Applicative
import Control.Monad
import Data.Aeson
data Tweet = Tweet {
from_user :: String,
to_user_id :: String,
profile_image_url :: String,
created_at :: String,
id_str :: String,
source :: String,
to_user_id_str :: String,
from_user_id_str :: String,
from_user_id :: String,
text :: String,
metadata :: String
}
instance FromJSON Tweet where
parseJSON (Object v) =
Tweet <$> v .: "from_user"
<*> v .: "to_user_id"
<*> v .: "profile_image_url"
<*> v .: "created_at"
<*> v .: "id_str"
<*> v .: "source"
<*> v .: "to_user_id_str"
<*> v .: "from_user_id_str"
<*> v .: "from_user_id"
<*> v .: "text"
<*> v .: "metadata"
-- A non-Object value is of the wrong type, so use mzero to fail.
parseJSON _ = mzero
Then use Data.Aeson.json to get a attoparsec parser that converts a ByteString into a Value. The call fromJSON on the Value to attempt to parse it into your record. Note that there are two different parsers involved in these two steps, a Data.Attoparsec.Parser parser for converting the ByteString into a generic JSON Value and then a Data.Aeson.Types.Parser parser for converting the JSON value into a record. Note that both steps can fail:
The first parser can fail if the ByteString isn't a valid JSON value.
The second parser can fail if the (valid) JSON value doesn't contain one of the fields you mentioned in your fromJSON implementation.
The aeson package prefers the new Unicode type Text (defined in the text package) to the more old school String type. The Text type has a much more memory efficient representation than String and generally performs better. I'd recommend that you change the Tweet type to use Text instead of String.
If you ever need to convert between String and Text, use the pack and unpack functions defined in Data.Text. Note that such conversions require O(n) time, so avoid them as much as possible (i.e. always use Text).

You need to write a showJSON and readJSON method, for your type, that builds your Haskell values out of the JSON format. The JSON package will take care of parsing the raw string into a JSValue for you.
Your tweet will be a JSObject containing a map of strings, most likely.
Use show to look at the JSObject, to see how the fields are laid out.
You can lookup each field using get_field on the JSObject.
You can use fromJSString to get a regular Haskell strings from a JSString.
Broadly, you'll need something like,
{-# LANGUAGE RecordWildCards #-}
import Text.JSON
import Text.JSON.Types
instance JSON Tweet where
readJSON (JSObject o) = return $ Tweet { .. }
where from_user = grab o "from_user"
to_user_id = grab o "to_user_id"
profile_image_url = grab o "proile_image_url"
created_at = grab o "created_at"
id_str = grab o "id_str"
source = grab o "source"
to_user_id_str = grab o "to_user_id_str"
from_user_id_str = grab o "from_user_id_str"
from_user_id = grab o "from_user_id"
text = grab o "text"
metadata = grab o "metadata"
grab o s = case get_field o s of
Nothing -> error "Invalid field " ++ show s
Just (JSString s') -> fromJSString s'
Note, I'm using the rather cool wild cards language extension.
Without an example of the JSON encoding, there's not much more I can advise.
Related
You can find example instances for the JSON encoding via instances
in the source, for
simple types. Or in other packages that depend on json.
An instance for AUR messages is here, as a (low level) example.

Import Data.JSon.Generic and Data.Data, then add deriving (Data) to your record type, and then try using decodeJSON on the tweet.

I support the answer by #tibbe.
However, I would like to add How you check put some default value in case, the argument misses in the JSON provided.
In tibbe's answer you can do the following:
Tweet <$> v .: "from_user"
<*> v .:? "to_user_id" .!= "some user here"
<*> v .: "profile_image_url" .!= "url to image"
<*> v .: "created_at"
<*> v .: "id_str" != 232131
<*> v .: "source"
this will the dafault parameters to be taken while parsing the JSON.

Related

Aeson does not find a key that I believe is present

I'm trying to parse a JSON blob that looks like this:
"{\"order_book\":{\"asks\":[[\"0.06777\",\"0.00006744\"],[\"0.06778\",\"0.01475361\"], ... ]],\"bids\":[[\"0.06744491\",\"1.35\"],[\"0.06726258\",\"0.148585363\"], ...]],\"market_id\":\"ETH-BTC\"}}"
Those lists of pairs of numbers are actually much longer; I've replaced their tails with ellipses.
Here's my code:
{-# LANGUAGE OverloadedStrings #-}
module Demo where
import Data.Aeson
import Data.ByteString.Lazy hiding (putStrLn)
import Data.Either (fromLeft)
import Network.HTTP.Request
data OrderBook = OrderBook
{ orderBook_asks :: [[(Float,Float)]]
, orderBook_bids :: [[(Float,Float)]]
, orderBook_marketId :: String
}
instance FromJSON OrderBook where
parseJSON = withObject "order_book" $ \v -> OrderBook
<$> v .: "asks"
<*> v .: "bids"
<*> v .: "market_id"
demo :: IO ()
demo = do
r <- get "https://www.buda.com/api/v2/markets/eth-btc/order_book"
let d = eitherDecode $ fromStrict $ responseBody r :: Either String OrderBook
putStrLn $ "Here's the parse error:"
putStrLn $ fromLeft undefined d
putStrLn $ "\n\nAnd here's the data:"
putStrLn $ show $ responseBody r
Here's what running demo gets me:
Here's the parse error:
Error in $: key "asks" not found
And here's the data:
"{\"order_book\":{\"asks\":[[\"0.06777\",\"0.00006744\"],[\"0.06778\",\"0.01475361\"], ... ]],\"bids\":[[\"0.06744491\",\"1.35\"],[\"0.06726258\",\"0.148585363\"], ...]],\"market_id\":\"ETH-BTC\"}}"
The "asks" key looks clearly present to me -- it's the first one nested under the "order_book" key.
The key is present, but it's wrapped inside another nested object, so you have to unwrap the outer object before you can parse the keys.
The smallest-diff way to do this is probably just inline:
instance FromJSON OrderBook where
parseJSON = withObject "order_book" $ \outer -> do
v <- outer .: "order_book"
OrderBook
<$> v .: "asks"
<*> v .: "bids"
<*> v .: "market_id"
Though you might want to consider introducing another wrapping type instead. This would really depend on the semantics of the data format you have.
I guess you were probably assuming that this is what withObject "order_book" would do, but that's not what it does. The first parameter of withObject is just a human-readable name of the object being parsed, used to create error messages. Customarily that parameter should name the type that is being parsed - i.e. withObject "OrderBook". See the docs.
Separately, I think your asks and bids fields are mistyped.
First, your JSON input looks like they are supposed to be arrays of tuples, but your Haskell type says doubly nested arrays of tuples. So this will fail to parse.
Second, your JSON input has strings as elements of those tuples, but your Haskell type says Float. This will also fail to parse.
The correct type, according to your JSON input, should be:
{ orderBook_asks :: [(String,String)]
, orderBook_bids :: [(String,String)]
Alternatively, if you really want the floats, you'll have to parse them from strings:
instance FromJSON OrderBook where
parseJSON = withObject "order_book" $ \outer -> do
v <- outer .: "order_book"
OrderBook
<$> (map parseTuple <$> v .: "asks")
<*> (map parseTuple <$> v .: "bids")
<*> v .: "market_id"
where
parseTuple (a, b) = (read a, read b)
(note that this ☝️ code is not to be copy&pasted: I'm using read for parsing strings into floats, which will crash at runtime if the strings are malformatted; in a real program you should use a better way of parsing)
withObject "order_book" does not look into the value at key "order_book". In fact, the "order_book" argument is ignored apart from appearing in the error message; actually you should have withObject "OrderBook" there.
All withObject does is confirm that what you have is an object. Then it proceeds using that object to look for the keys "asks", "bids" and "market_id" – but the only key that's there at this level is order_book.
The solution is to only use this parser with the {"asks":[["0.06777"...]...]...} object. The "order_book" key tells no information anyway, unless there are other keys present there as well. You can represent that outer object with another Haskell type and its own FromJSON instance.

Reading nested JSON data encoded as a nested string with Aeson

I have this weird JSON to parse containing nested JSON ... a string. So instead of
{\"title\": \"Lord of the rings\", \"author\": {\"666\": \"Tolkien\"}\"}"
I have
{\"title\": \"Lord of the rings\", \"author\": \"{\\\"666\\\": \\\"Tolkien\\\"}\"}"
Here's my (failed) attempt to parse the nested string using decode, inside an instance of FromJSON :
{-# LANGUAGE DeriveGeneric #-}
{-# LANGUAGE OverloadedStrings #-}
module Main where
import Data.Maybe
import GHC.Generics
import Data.Aeson
import qualified Data.Map as M
type Authors = M.Map Int String
data Book = Book
{
title :: String,
author :: Authors
}
deriving (Show, Generic)
decodeAuthors x = fromJust (decode x :: Maybe Authors)
instance FromJSON Book where
parseJSON = withObject "Book" $ \v -> do
t <- v .: "title"
a <- decodeAuthors <?> v .: "author"
return $ Book t a
jsonTest = "{\"title\": \"Lord of the rings\", \"author\": \"{\\\"666\\\": \\\"Tolkien\\\"}\"}"
test = decode jsonTest :: Maybe Book
Is there a way to decode the whole JSON in a single pass ? Thanks !
A couple problems here.
First, your use of <?> is nonsensical. I'm going to assume it's a typo, and what you actually meant was <$>.
Second, the type of decodeAuthors is ByteString -> Authors, which means its parameter is of type ByteString, which means that the expression v .: "author" must be of type Parser ByteString, which means that there must be an instance FromJSON ByteString, but such instance doesn't exists (for reasons that escape me at the moment).
What you actually want is for v .: "author" to return a Parser String (or perhaps Parser Text), and then have decodeAuthors accept a String and convert it to ByteString (using pack) before passing to decode:
import Data.ByteString.Lazy.Char8 (pack)
decodeAuthors :: String -> Authors
decodeAuthors x = fromJust (decode (pack x) :: Maybe Authors)
(also note: it's a good idea to give you declarations type signatures that you think they should have. This lets the compiler point out errors earlier)
Edit:
As #DanielWagner correctly points out, pack may garble Unicode text. If you want to handle it correctly, use Data.ByteString.Lazy.UTF8.fromString from utf8-string to do the conversion:
import Data.ByteString.Lazy.UTF8 (fromString)
decodeAuthors :: String -> Authors
decodeAuthors x = fromJust (decode (fromString x) :: Maybe Authors)
But in that case you should also be careful about the type of jsonTest: the way your code is written, its type would be ByteString, but any non-ASCII characters that may be inside would be cut off because of the way IsString works. To preserve them, you need to use the same fromString on it:
jsonTest = fromString "{\"title\": \"Lord of the rings\", \"author\": \"{\\\"666\\\": \\\"Tolkien\\\"}\"}"

Haskell - Aeson : Getting "Nothing" when trying to decode JSON URL Req

I'm relatively new to haskell and right now I'm trying to get a deeper understanding and trying to get used to different popular libraries.
Right now I'm trying "aeson".
What I want to do is parse MSFT quote request from https://www.alphavantage.co/query?function=GLOBAL_QUOTE&symbol=MSFT&apikey=demo
This is what it looks like
{
"Global Quote": {
"01. symbol": "MSFT",
"02. open": "105.3500",
"03. high": "108.2400",
"04. low": "105.2700",
"05. price": "107.6000",
"06. volume": "23308066",
"07. latest trading day": "2018-10-11",
"08. previous close": "106.1600",
"09. change": "1.4400",
"10. change percent": "1.3564%"
}
}
This is what I've got so far
{-# LANGUAGE DeriveGeneric #-}
{-# LANGUAGE OverloadedStrings #-}
import Data.Aeson
import qualified Data.ByteString.Lazy as B
import GHC.Exts
import GHC.Generics
import Network.HTTP
import Network.URI
jsonURL :: String
jsonURL = "http://www.alphavantage.co/query?function=GLOBAL_QUOTE&symbol=MSFT&apikey=demo"
getRequest_ :: HStream ty => String -> Request ty
getRequest_ s = let Just u = parseURI s in defaultGETRequest_ u
jsonReq = getRequest_ jsonURL
data Quote = Quote {quote :: String,
symbol :: String,
open :: Float,
high :: Float,
low :: Float,
price :: Float,
volume :: Float,
ltd :: String,
previousClose :: Float,
change :: Float,
changePerct :: Float
} deriving (Show, Generic)
instance FromJSON Quote
instance ToJSON Quote
main :: IO ()
main = do
d <- simpleHTTP jsonReq
body <- getResponseBody d
print (decode body :: Maybe Quote)
What am I doing wrong?
Edit: Fixed version in the answers.
First off: Aeson is not the easiest library for a beginner. There are more difficult ones, sure, but it supposes you already a fair number of things about the language. You didn't pick the "simplest task" to begin with. I know this can be surprising, and you might think that parsing JSON should be simple, but parsing JSON with strong type guarantees is actually not that simple.
But here's what I can tell you to help you a bit:
First, use eitherDecode rather than decode: you will get an error message rather than simply Nothing, which will help you a bit.
Deriving through Generic is neat and very often, a time saver, but it's not magic either. The name of the object key and the name of your datatype fields have to match exactly. Sadly, this is not the case here and due to haskell syntax, you couldn't name your fields like the keys of the object. Your best solution is to implement FromJSON manually (see the recommended link below). A good way to see "what is expect" by the generic FromJSON is to also derive ToJSON, create a dummy Quote and see the result of encode.
Your first field (quote) is not a key of the object itself, but rather the name of this object. So you have dynamic keys ("Global Quote" being one here). Once again, this typically a case where you want to write the FromJSON instance manually.
I recommend you read this famous tutorial written by Artyom Kazak on Aeson. This will help you tremendously and is probably the best advice I can give.
For your manual instance, supposing it was exactly the document you want to parse and you had only the "Global Quote" to deal with, it would look more or less like this:
instance ToJSON Quote where
parseJSON = withObject "Document" $
\d -> do
glob <- d .: "Global Quote"
withObject "Quote" v (\gq ->
Quote <$> gq .: "01. symbol"
<*> pure "Global Quote"
<*> gq .: "02. open"
<*> gq .: "03. high"
-- ... and so on
) v
(It's not the most pretty way, nor the best way, to write it, but it should be one possible way).
Also note that, as an astute commenter wrote, the types of your fields are not always aligned with the type of your example JSON document. "volume" is an Int (byte-limited int), potentially an Integer ("mathematical" integer, no bound), but not a Float. Your "ltd" can be parsed a string - but it probably should be a date (Day from Data.Time would be the first choice - it already has a FromJSON instance so chances are it should be parseable as is). Change percent is most likely not parseable as Float like that, you'll need to write a dedicated parser for this type (and decide how you want to store it - Ratio is a potential solution).
#Raveline with their answer above pointed me to the right direction. I was able to solve all those issues, here's the final product!
{-# LANGUAGE DeriveGeneric #-}
{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE RecordWildCards #-}
module Test where
import Data.Aeson
import qualified Data.ByteString.Lazy as B
import GHC.Exts
import GHC.Generics
import Network.HTTP.Conduit (simpleHttp)
jsonURL :: String
jsonURL = "https://www.alphavantage.co/query?function=GLOBAL_QUOTE&symbol=MSFT&apikey=demo"
getJSON :: IO B.ByteString
getJSON = simpleHttp jsonURL
data Quote = Quote {
symbol :: String,
open :: String,
high :: String,
low :: String,
price :: String,
volume :: String,
ltd :: String,
previousClose :: String,
change :: String,
changePercent :: String
} deriving (Show, Generic)
instance FromJSON Quote where
parseJSON = withObject "Global Quote" $
\o -> do
globalQuote <- o .: "Global Quote"
symbol <- globalQuote .: "01. symbol"
open <- globalQuote .: "02. open"
high <- globalQuote .: "03. high"
low <- globalQuote .: "04. low"
price <- globalQuote .: "05. price"
volume <- globalQuote .: "06. volume"
ltd <- globalQuote .: "07. latest trading day"
previousClose <- globalQuote .: "08. previous close"
change <- globalQuote .: "09. change"
changePercent <- globalQuote .: "10. change percent"
return Quote {..}
main :: IO ()
main = do
d <- (eitherDecode <$> getJSON) :: IO (Either String Quote)
case d of
Left e -> print e
Right qt -> print (read (price qt) :: Float)

Understanding the Data.Aeson FromJSON typeclass

I recently started using Data.Aeson for one of my projects. And I am recently new to Haskell as well. So I am trying to figure out how the implementation of parseJSON function in FromJSON typeclass works.
So I have a code from my codebase.
data MyProfile = MyProfile { name :: String, age :: Int } deriving Show
instance FromJSON MyProfile where
parseJSON (Object m) = MyProfile <$>
m .: "name" <*>
m .: "age"
parseJSON x = fail ("not an object: " ++ show x)
And the YAML file I am trying to read is pretty simple as well.
profile:
name: "Foo"
age: 16
I am trying to understand the working of that applicative functor. I browsed through the Data.Aeson module and found that (.:) returns a Parser (FromJSON a).
So the facts that I have understood is,
Object m holds the profile: section of the yaml
MyProfile corresponds to profile
name in ParseJSON is trying to get the value for the key in the JSON object m
Similar with age as well
And each <*> returns a Parser (FromJSON) which is then applied over to the next <*>
What I am not understanding is,
How does MyProfile gets mapped to the profile section? What if I have a huge yaml file and multiple data defined in my program?
In the code MyProfile <$> m .: "name", shouldn't the first argument of <$> be a function? I perceive that <$> is similar to fmap and hence the first argument must be a function (which is applied to the second argument). But MyProfile is a data! Confusing!
How is the yaml values, in this case, Foo and 16, added to the MyProfile data?
Please correct me if any of my understandings is wrong.

Error checking with Aeson

This code parses a recursive JSON structure into a haskell object that I made. I'm using the Aeson library. The problem that I'm encountering is that I want to be able to do error checking easily, even with a recursive call. Right now I use a dummy value (ayyLmao) whenever an error occurs. However I would like to leverage the error checking I get from the Parser monad. How can I do this and possibly clean up my code in the process? If necessary I can also post some sample JSON.
EDIT: I'd like to point out that I'd like to get rid of "ayyLmao" (hence the stupid name), and somehow use 'mzero' for the Parser monad for my error checking instead.
type Comments = Vector Comment
data Comment = Comment
{ author :: Text
, body :: Text
, replies :: Comments
} deriving Show
-- empty placeholder value (only should appear when errors occur)
ayyLmao :: Comment
ayyLmao = Comment "Ayy" "Lmao" V.empty
parseComment :: Object -> Maybe Comments
parseComment obj = flip parseMaybe obj $ \listing -> do
-- go through intermediate objects
comments <- listing .: "data" >>= (.: "children")
-- parse every comment in an array
return $ flip fmap comments $ \commentData -> case commentData of
-- if the data in the array is an object, parse the comment
-- (using a dummy value on error)
Object v -> fromMaybe ayyLmao (parseMaybe parseComment' v)
-- use a dummy value for errors (we should only get objects in
-- the array
_ -> ayyLmao
where
parseComment' :: Object -> Parser Comment
parseComment' v = do
-- get all data from the object
comment <- v .: "data"
authorField <- comment .: "author"
bodyField <- comment .: "body"
replyObjs <- comment .: "replies"
return $ case replyObjs of
-- if there are more objects, then parse recursively
Object more -> case parseComment more of
-- errors use the dummy value again
Just childReplies -> Comment authorField bodyField childReplies
Nothing -> ayyLmao
-- otherwise, we've reached the last comment in the
-- tree
_ -> Comment authorField bodyField V.empty
EDIT: The code in the answer below is correct, but I'd like to add my modified solution. The solution given assumes that "null" indicates no more replies, but for some reason the API designers decided that that should be represented by the empty string.
instance FromJSON Comment where
parseJSON = withObject "Comment" $ \obj -> do
dat <- obj .: "data"
commReplies <- dat .: "replies"
Comment
<$> dat .: "author"
<*> dat .: "body"
<*> case commReplies of
Object _ -> getComments <$> dat .: "replies"
String "" -> return V.empty
_ -> fail "Expected more comments or a the empty string"
You hit the mark with "Or I could have a list of Parsers and then fold it into one larger parser". This is exactly how you would propagate errors from nested parsers. The minimum change to your code to remove ayyLmao would be:
parseComment :: Object -> Maybe Comments
parseComment obj = flip parseMaybe obj $ \listing -> do
-- go through intermediate objects
comments <- listing .: "data" >>= (.: "children")
-- parse every comment in an array
V.sequence $ flip fmap comments $ \commentData -> case commentData of
-- if the data in the array is an object, parse the comment
-- (using a dummy value on error)
Object v -> parseComment' v
-- use a dummy value for errors (we should only get objects in
-- the array
_ -> mzero
where
parseComment' :: Object -> Parser Comment
parseComment' v = do
-- get all data from the object
comment <- v .: "data"
authorField <- comment .: "author"
bodyField <- comment .: "body"
replyObjs <- comment .: "replies"
case replyObjs of
-- if there are more objects, then parse recursively
Object more -> case parseComment more of
-- errors use the dummy value again
Just childReplies -> return $ Comment authorField bodyField childReplies
Nothing -> mzero
-- otherwise, we've reached the last comment in the
-- tree
_ -> return $ Comment authorField bodyField V.empty
This uses mzero for the error cases and propagates errors from the list of replies with V.sequence. sequence is a exactly the thing that takes a list of parsers (or, in this case, a vector) and folds into a single parser that either succeeds or fails.
However, the above is not a very good way to use aeson. It's usually better to derive an instance of the FromJSON type-class and work from there. I would implement the above as
{-# LANGUAGE OverloadedStrings #-}
import qualified Data.Vector as V
import Data.Vector (Vector)
import Data.Text (Text)
import Data.Aeson
import Data.Maybe (fromMaybe)
import Control.Applicative
type Comments = Vector Comment
data Comment = Comment
{ author :: Text
, body :: Text
, replies :: Comments
} deriving Show
newtype CommentList = CommentList { getComments :: Comments }
instance FromJSON Comment where
parseJSON = withObject "Comment" $ \obj -> do
dat <- obj .: "data"
Comment
<$> dat .: "author"
<*> dat .: "body"
<*> (fromMaybe V.empty . fmap getComments <$> dat .: "replies")
instance FromJSON CommentList where
parseJSON = withObject "CommentList" $ \obj -> do
dat <- obj .: "data"
CommentList <$> dat .: "children"
This introduces a wrapper type CommentList which is used to fetch the obj.data.children attribute from the JSON. This takes advantages of the existing FromJSON instance for Vector so you don't have to manually loop through the replies and parse them separately.
The expression
fromMaybe V.empty . fmap getComments <$> dat .: "replies"
assumes that the replies attribute in the JSON contains either a null value or a valid CommentList so it tries to parse a Maybe CommentList value (null is parsed to Nothing) and then replaces a Nothing value with an empty vector using fromMaybe.