Parsing a JSON string in Haskell - json

I'm working on simple Haskell programme that fetches a JSON string from a server, parses it, and does something with the data. The specifics are not really pertinent for the moment, the trouble I'm having is with parsing the JSON that is returned.
I get the JSON string back from the server as an IO String type and can't seem to figure out how to parse that to a JSON object.
Any help would be much appreciated :)
Here is my code thus far.
import Data.Aeson
import Network.HTTP
main = do
src <- openURL "http://www.reddit.com/user/chrissalij/about.json"
-- Json parsing code goes here
openURL url = getResponseBody =<< simpleHTTP (getRequest url)
Note: I'm using Data.Aeson in the example as that is what seems to be recommended, however I'd be more than willing to use another library.
Also any and all of this code can be changed. If getting the

Data.Aeson is designed to be used with Attoparsec, so it only gives you a Parser that you must then use with Attoparsec. Also, Attoparsec prefers to work on ByteString, so you have to alter the way the request is made slightly to get a ByteString result instead of a String.
This seems to work:
import Data.Aeson
import Data.Attoparsec
import Data.ByteString
import Data.Maybe
import Network.HTTP
import Network.URI
main = do
src <- openURL "http://www.reddit.com/user/chrissalij/about.json"
print $ parse json src
openURL :: String -> IO ByteString
openURL url = getResponseBody =<< simpleHTTP (mkRequest GET (fromJust $ parseURI url))
Here I've just parsed the JSON as a plain Value, but you'll probably want to create your own data type and write a FromJSON instance for it to handle the conversion neatly.

Related

Haskell building simple JSON parser

Getting my feet wet with building stuff, and not being able to get Aeson to work properly I decided my new project is building a JSON parser. Very abstract since it is one way or another, so it wouldn't make sense to put all the code here.
The ByteString library lets me do what I need. Remove characters, replace stuff, but: I have a very hard time reconstructing it the exact way I took it apart. Data.Text however seems more appropriate for the job but when generated a lot of noise with /"/, \n etc.
What would be the best and fastest way to clear a file from all rubbish and restore the remaining parts to useful text? Very small part below. Remarks on the code are welcome. Learning here.
import Network.HTTP.Simple
import GHC.Generics
import qualified Data.ByteString as B
import qualified Data.ByteString.Char8 as C
import Data.Text as T
import Data.Char
import Data.Text.Encoding as DTE
word8QuoteMark = fromIntegral (ord '"')
word8Newline = fromIntegral (ord '\n')
word8Backslash = fromIntegral (ord ':')
filterJson jsonData = B.filter (/= word8Backslash)
(B.filter (/= word8Newline)
(B.filter (/= word8QuoteMark) jsonData))
importJson :: IO ()
importJson = do
jsonData <- B.readFile "local.json"
output <- return (filterJson jsonData)
print $ (output)
Now the downside is, that if someone is called eg. François, it is now returned as Fran\195\167ois. I think I would need a lot more steps to do this in Data.Text, but correct me if I am wrong...
Note: i saw in a post that Daniel Wagner strongly advises against ByteString for text, but just for the sake of argument.
JSON is, by definition, a Unicode string that represents a data structure. What you get from B.readFile, though, is a raw byte string that you must first decode to get a Unicode string. To do that, you need to know what encoding was used to create the file. Assuming the file uses UTF-8 encoding, you can do something like
import Data.Text
importJson :: String -> IO Text
importJson name = do
jsonData <- B.readFile name
return (Data.Text.Encoding.decodeUtf8 jsonData)
Once you have a Text value, you can parse that into some data structure according to the JSON grammar.

decoding a complex string in haskell

I have made a json to haskell parser which is working absolutely correct and the parser is
decodeToMaybeValue::BLC.ByteString->Maybe Value
decodeToMaybeValue = decode
main = do
interact (show . decodeToMaybeValue . BLC.pack)
This is working absolutely correct when compiled on the compiler directly but when i try to store the string into a variable to decode it gives this error. I was trying this
x =`"{\"apiVersion\": \"2.0\",\"data\": {\"updated\": \"2010-01-07T19:58:42.949Z\",\"totalItems\": 800,\"startIndex\": 1,\"itemsPerPage\": 1,\"items\": [{\"id\": \"hYB0mn5zh2c\",\"uploaded\":\"2007-06-05T22:07:03.000Z\",\"updated\": \"2010-01-07T13:26:50.000Z\",\"uploader\": \"GoogleDeveloperDay\",\"category\": \"News\",\"title\": \"Google Developers Day US - Maps API Introduction\",\"description\": \"Google Maps API Introduction ...\",\"tags\": [\"GDD07\",\"GDD07US\",\"Maps\"],\"duration\": 2840,\"aspectRatio\": \"widescreen\",\"rating\": 4.63,\"ratingCount\": 68,\"viewCount\": 220101,\"favoriteCount\":201,\"commentCount\": 22 }]}}"`
y = BLC.pack x
Invalid type signature: decode y :: Maybe Value
-- Should be of form <variable> :: <type>
do anyone have idea about it?
You won't need to convert your string to Text if you use {-# LANGUAGE OverloadedStrings #-} at the top line of the file....
Is this what you want?
{-# LANGUAGE OverloadedStrings #-}
import Data.Aeson
import qualified Data.ByteString.Lazy.Char8 as BLC
decodeToMaybeValue::BLC.ByteString->Maybe Value
decodeToMaybeValue = decode
theData="{\"a\": \"b\"}" --Put in the data here.
main = do
print $ decodeToMaybeValue theData

Parsing HTTP Response in Python

I want to manipulate the information at THIS url. I can successfully open it and read its contents. But what I really want to do is throw out all the stuff I don't want, and to manipulate the stuff I want to keep.
Is there a way to convert the string into a dict so I can iterate over it? Or do I just have to parse it as is (str type)?
from urllib.request import urlopen
url = 'http://www.quandl.com/api/v1/datasets/FRED/GDP.json'
response = urlopen(url)
print(response.read()) # returns string with info
When I printed response.read() I noticed that b was preprended to the string (e.g. b'{"a":1,..). The "b" stands for bytes and serves as a declaration for the type of the object you're handling. Since, I knew that a string could be converted to a dict by using json.loads('string'), I just had to convert the byte type to a string type. I did this by decoding the response to utf-8 decode('utf-8'). Once it was in a string type my problem was solved and I was easily able to iterate over the dict.
I don't know if this is the fastest or most 'pythonic' way of writing this but it works and theres always time later of optimization and improvement! Full code for my solution:
from urllib.request import urlopen
import json
# Get the dataset
url = 'http://www.quandl.com/api/v1/datasets/FRED/GDP.json'
response = urlopen(url)
# Convert bytes to string type and string type to dict
string = response.read().decode('utf-8')
json_obj = json.loads(string)
print(json_obj['source_name']) # prints the string with 'source_name' key
You can also use python's requests library instead.
import requests
url = 'http://www.quandl.com/api/v1/datasets/FRED/GDP.json'
response = requests.get(url)
dict = response.json()
Now you can manipulate the "dict" like a python dictionary.
json works with Unicode text in Python 3 (JSON format itself is defined only in terms of Unicode text) and therefore you need to decode bytes received in HTTP response. r.headers.get_content_charset('utf-8') gets your the character encoding:
#!/usr/bin/env python3
import io
import json
from urllib.request import urlopen
with urlopen('https://httpbin.org/get') as r, \
io.TextIOWrapper(r, encoding=r.headers.get_content_charset('utf-8')) as file:
result = json.load(file)
print(result['headers']['User-Agent'])
It is not necessary to use io.TextIOWrapper here:
#!/usr/bin/env python3
import json
from urllib.request import urlopen
with urlopen('https://httpbin.org/get') as r:
result = json.loads(r.read().decode(r.headers.get_content_charset('utf-8')))
print(result['headers']['User-Agent'])
TL&DR: When you typically get data from a server, it is sent in bytes. The rationale is that these bytes will need to be 'decoded' by the recipient, who should know how to use the data. You should decode the binary upon arrival to not get 'b' (bytes) but instead a string.
Use case:
import requests
def get_data_from_url(url):
response = requests.get(url_to_visit)
response_data_split_by_line = response.content.decode('utf-8').splitlines()
return response_data_split_by_line
In this example, I decode the content that I received into UTF-8. For my purposes, I then split it by line, so I can loop through each line with a for loop.
I guess things have changed in python 3.4. This worked for me:
print("resp:" + json.dumps(resp.json()))

FromJSON custom for custom type

The newest version of Data.Aeson changed the way that ToJSON and FromJSON work for simple types like:
data Permission = Read | Write
It used to be that the generic call:
instance ToJSON Permission where
...Would create JSON that looked like {"Read":[]} or {"Write":[]}.
But now it creates:
{tag:"Read",contents:"[]"}
Which makes sense but breaks code I have written. I wrote a toJSON part by hand to give the correct looking stuff but writing the fromJSON is confusing me.
Any ideas?
Thanks
You could control how datatype with all nullary constructors is encoded using allNullaryToStringTag field on Data.Aeson.Options. Set it to True and it will be encoded simply as string.
import Data.Aeson.Types (Options (..), defaultOptions)
data Permission = Read | Write
$(deriveToJSON (defaultOptions {allNullaryToStringTag = True}) ''Permission)
Take a look at Options definition, it contains other handy fields.
Since the value contained in the Object constructor for Data.Aeson.Value is just a strict HashMap, we can extract the keys from it and make a decision based on that. I tried this and it worked pretty well.
{-# LANGUAGE OverloadedStrings #-}
module StackOverflow where
import Data.Aeson
import Control.Monad
import Data.HashMap.Strict (keys)
data Permission = Read | Write
instance FromJSON Permission where
parseJSON (Object v) =
let ks = keys v
in case ks of
["Read"] -> return Read
["Write"] -> return Write
_ -> mzero
parseJSON _ = mzero
You can test it with decode "{\"Read\": []}" :: Maybe Permission. The mzero in parseJSON ensures that if something else is passed in, it'll just return Nothing. Since you seem to want to only check if there is a single key matching one of your two permissions, this is pretty straightforward and will properly return Nothing on all other inputs.

Return JSON from yesod handler

I'm trying to write a simplest JSON response from Yesod's handler, but have some really stupid error (apparently). My handler code is this:
-- HelloYesod/Handler/Echo.hs
module Handler.Echo where
import Data.Aeson (object, (.=))
import qualified Data.Aeson as J
import Data.Text (pack)
import Import
import Yesod.Core.Json (returnJson)
getEchoR :: String -> Handler RepJson
getEchoR theText = do
let json = object $ ["data" .= "val"]
return json
Error is this:
Handler/Echo.hs:12:10:
Couldn't match expected type `RepJson' with actual type `Value'
In the first argument of `return', namely `json'
In a stmt of a 'do' block: return json
In the expression:
do { let json = object $ ...;
return json }
Build failure, pausing...
I got caught by this one too: you just have to change your type signature and it will work:
getEchoR :: String -> Handler Value
My understanding is that the whole Rep system is deprecated in Yesod 1.2, so Handler's now return Html and Value rather than RepHtml and RepJson.
Hope this helps!