I'm new to Haskell and in order to learn the language I am working on a project that involves dealing with JSON. I am currently getting the feeling Haskell is the wrong language for the job, but that isn't the point here.
I've been struggling to understand how this works for a few days. I have searched and everything I have found does not seem to work. Here's the issue:
I have some JSON in the following format:
>>>less "path/to/json"
{
"stringA1_stringA2": {"stringA1":floatA1,
"stringA2":foatA2},
"stringB1_stringB2": {"stringB1":floatB1,
"stringB2":floatB2}
...
}
Here floatX1 and floatX2 are actually strings of the form "0.535613567", "1.221362183" etc. What I want to do is parse this into the following data
data Mydat = Mydat { name :: String, num :: Float} deriving (Show)
where name would correspond to "stringX1_stringX2" and num to floatX1 for X = A,B,...
So far I have reached a 'solution' which feels fairly hackish and convoluted and doesn't work properly.
{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE DeriveGeneric #-}
import Data.Functor
import Data.Monoid
import Data.Aeson
import Data.List
import Data.Text
import Data.Map (Map)
import qualified Data.HashMap.Strict as DHM
--import qualified Data.HashMap as DHM
import qualified Data.ByteString.Lazy as LBS
import System.Environment
import GHC.Generics
import Text.Read
data Mydat = Mydat {name :: String, num :: Float} deriving (Show)
test s = do
d <- LBS.readFile s
let v = decode d :: Maybe (DHM.HashMap String Object)
case v of
-- Just v -> print v
Just v -> return $ Prelude.map dataFromList $ DHM.toList $ DHM.map (DHM.lookup "StringA1") v
good = ['1','2','3','4','5','6','7','8','9','0','.']
f x = elem x good
dataFromList :: (String, Maybe Value) -> Mydat
dataFromList (a,b) = Mydat a (read (Prelude.filter f (show b)) :: Float)
Now I can compile this and run
test "path/to/json"
in ghci and it prints a list of Mydat's in the case where "stringX1"="stringA1" for all X. In reality there are two values for "stringX1" so aside from the hackyness this is not satisfactory. There must be a better way to do this. I get that I need to write my own parser probably but I am confused about how this works so any suggestions would be great. Thanks in advance.
The structure of your JSON is pretty nasty, but here's a basic working solution:
#!/usr/bin/env stack
-- stack --resolver lts-11.5 script --package containers --package aeson
{-# LANGUAGE OverloadedStrings #-}
import qualified Data.Map as Map
import qualified Data.Aeson as Aeson
data Mydat = Mydat { name :: String
, num :: Float
} deriving (Show)
instance Eq Mydat where
(Mydat _ x1) == (Mydat _ x2) = x1 == x2
instance Ord Mydat where
(Mydat _ x1) `compare` (Mydat _ x2) = x1 `compare` x2
type MydatRaw = Map.Map String (Map.Map String String)
processRaw :: MydatRaw -> [Mydat]
processRaw = Map.foldrWithKey go []
where go key value accum =
accum ++ (Mydat key . read <$> Map.elems value)
main :: IO ()
main =
do let json = "{\"stringA1_stringA2\":{\"stringA1\":\"0.1\",\"stringA2\":\"0.2\"}}"
print $ fmap processRaw (Aeson.eitherDecode json)
Note that read is partial and generally not a good idea. But I'll leave it to you to flesh out a safer version :)
As I commented, the best thing would probably be to make your JSON file well-formed in the sense that the float fields should really be floats, not strings.
If that's not an option, I would recommend you phrase out the type that the JSON file seems to represent as simple as possible (but without dynamic Objects), and then convert that to the type you actually want.
import Data.Map (Map)
import qualified Data.Map as Map
type GarbledJSON = Map String (Map String String)
-- ^ you could also stick with hash maps for everything, but
-- usually `Map` is actually more sensible in Haskell.
data MyDat = MyDat {name :: String, num :: Float} deriving (Show)
test :: FilePath -> IO [MyDat]
test s = do
d <- LBS.readFile s
case decode d :: Maybe GarbledJSON of
Just v -> return [ MyDat iName ( read . filter (`elem`good)
$ iVals Map.! valKey )
| (iName, iVals) <- Map.toList v
, let valKey = takeWhile (/='_') iName ]
Note that this will crash completely if any of the items don't contain the first part of the name as a string of float format, and likely give bogus items when you filter out characters that aren't good. If you just want to ignore any malformed items (which is also not a very clean approach...), you can do it this way:
test :: FilePath -> IO [MyDat]
test s = do
d <- LBS.readFile s
return $ case decode d :: Maybe GarbledJSON of
Just v -> [ MyDat iName iVal
| (iName, iVals) <- Map.toList v
, let valKey = takeWhile (/='_') iName
, Just iValStr <- [iVals Map.!? valKey]
, [(iVal,"")] <- [reads iValStr] ]
Nothing -> []
Related
How can I set up cassava to ignore missing columns/fields and fill the respective data type with a default value? Consider this example:
{-# LANGUAGE DeriveGeneric #-}
{-# LANGUAGE OverloadedStrings #-}
import Data.ByteString.Lazy.Char8
import Data.Csv
import Data.Vector
import GHC.Generics
data Foo = Foo {
a :: String
, b :: Int
} deriving (Eq, Show, Generic)
instance FromNamedRecord Foo
decodeAndPrint :: ByteString -> IO ()
decodeAndPrint csv = do
print $ (decodeByName csv :: Either String (Header, Vector Foo))
main :: IO ()
main = do
decodeAndPrint "a,b,ignore\nhu,1,pu" -- [1]
decodeAndPrint "ignore,b,a\npu,1,hu" -- [2]
decodeAndPrint "ignore,b\npu,1" -- [3]
[1] and [2] work perfectly fine, but [3] fails with
Left "parse error (Failed reading: conversion error: no field named \"a\") at \"\""
How could I make decodeAndPrint capable of handling this incomplete input?
I could of course manipulate the input bytestring, but maybe there is a more elegant solution.
A Solution thanks to the input of Daniel Wagner below:
{-# LANGUAGE DeriveGeneric #-}
{-# LANGUAGE OverloadedStrings #-}
import Control.Applicative
import Data.ByteString.Lazy.Char8
import Data.Csv
import Data.Vector
import GHC.Generics
data Foo = Foo {
a :: Maybe String
, b :: Maybe Int
} deriving (Eq, Show, Generic)
instance FromNamedRecord Foo where
parseNamedRecord rec = pure Foo
<*> ((Just <$> Data.Csv.lookup rec "a") <|> pure Nothing)
<*> ((Just <$> Data.Csv.lookup rec "b") <|> pure Nothing)
decodeAndPrint :: ByteString -> IO ()
decodeAndPrint csv = do
print $ (decodeByName csv :: Either String (Header, Vector Foo))
main :: IO ()
main = do
decodeAndPrint "a,b,ignore\nhu,1,pu" -- [1]
decodeAndPrint "ignore,b,a\npu,1,hu" -- [2]
decodeAndPrint "ignore,b\npu,1" -- [3]
(Warning: completely untested! Code is for idea transmission only, not suitable for any use, etc. etc.)
The Parser type demanded by FromNamedRecord is an Alternative, so just toss a default on with (<|>).
instance FromNamedRecord Foo where
parseNamedRecord rec = pure Foo
<*> (lookup rec "a" <|> pure "missing")
<*> (lookup rec "b" <|> pure 0)
If you want to know later whether the field was there or not, make your fields rich enough to record that:
data RichFoo = RichFoo
{ a :: Maybe String
, b :: Maybe Int
}
instance FromNamedRecord Foo where
parseNamedRecord rec = pure RichFoo
<*> ((Just <$> lookup rec "a") <|> pure Nothing)
<*> ((Just <$> lookup rec "b") <|> pure Nothing)
An Http server returns data in this JSON format:
{
some_value: "fdsafsafdsafs"
}
Object with single key and value.
I want to parse a returned data in that format and I've not been able to. I don't want to create a special data for that.
Instead I want to parse or deconstract/pattern match it and get the value of "some_value"
Code:
import qualified Data.ByteString as BS
import qualified Data.Text as T
import qualified Data.Aeson as Aeson
func1 :: IO (Either MyError BS.ByteString)
func1 = do
resp <- sendRequestAndReturnJsonBody
-- [.........]
I've tried:
1)
case Aeson.decode resp of
Just (Aeson.Object obj) -> -- how to exctract "some_value" from "obj" now?
_ -> _
2)
let (Aeson.Object ("some_value", String s)) = resp
-- [......]
3)
case resp of
(Object obj) ->
case (lookup "some_value" obj) of
Just (String s) -> pure $ Right s
_ -> undefined
All the attemps are wrong.
How do I do it?
Likely in your third attempt, you did not use the lookup of the Data.HashMap.Strict module from the unordered-containers package. You furthermore should enable the OverloadedStrings option to make use of string literals that have a Text type. You thus can implement this as:
{-# LANGUAGE OverloadedStrings #-}
import qualified Data.HashMap.Strict as HM
import qualified Data.ByteString as BS
import qualified Data.Text as T
import qualified Data.Aeson as Aeson
func1 :: IO (Either MyError BS.ByteString)
func1 = do
resp <- sendRequestAndReturnJsonBody
case Aeson.decode resp of
Just (Aeson.Object obj) -> case (HM.lookup "some_value" obj) of
Just (Aeson.String s) -> pure (Right s)
_ -> undefined
_ -> undefined
If we construct a function:
f :: Applicative f => ByteString -> f (Either a Text)
f resp = case Aeson.decode resp of
Just (Aeson.Object obj) -> case (HM.lookup "some_value" obj) of
Just (Aeson.String s) -> pure (Right s)
_ -> undefined
_ -> undefined
It has a type that given resp is a ByteString, it will return an Applicative f => f (Either a Text), hence if in your case resp is indeed a Value, it can return an IO (Either MyError).
For objects that contain one element, we can use the OverloadedLists extension, and thus make use of that to pattern match on a list pattern for that HashMap:
{-# LANGUAGE OverloadedLists, OverloadedStrings #-}
import qualified Data.ByteString as BS
import qualified Data.Text as T
import qualified Data.Aeson as Aeson
func1 :: IO (Either MyError BS.ByteString)
func1 = do
resp <- sendRequestAndReturnJsonBody
case Aeson.decode resp of
Just (Aeson.Object [("some_value", Aeson.String s)]) -> pure (Right s)
_ -> undefined
For more items, this will not match. Trying this for more items can fail, since the order of the items with toList is unspecified, and thus can depend on implementation details.
Even though you said you didn't want to create a custom data type, this is still the most straightforward way of getting the let some_pattern = result syntax that you want. Note that you don't need to use the data type for anything other than parsing. Think of it as the "usual" Aeson method for creating a new pattern that you can match the result on.
You can either use generics to define the data type or write a custom FromJSON instance to avoid cluttering your namespace with a some_value field:
{-# LANGUAGE OverloadedStrings #-}
import Data.ByteString (ByteString)
import Data.Aeson
newtype SomeValue = SomeValue String
instance FromJSON SomeValue where
parseJSON = withObject "SomeValue" $ \o -> SomeValue <$> o .: "some_value"
myjson :: ByteString
myjson = "{ \"some_value\": \"fdsafsafdsafs\" }"
main = do
case decodeStrict myjson of
Just (SomeValue v) -> print v
_ -> error "didn't work!"
I need to serialize a record in Haskell, and am trying to do it with Aeson. The problem is that some of the fields are ByteStrings, and I can't work out from the examples how to encode them. My idea is to first convert them to text via base64. Here is what I have so far (I put 'undefined' where I didn't know what to do):
{-# LANGUAGE DeriveGeneric #-}
{-# LANGUAGE OverloadedStrings #-}
module Main where
import qualified Data.Aeson as J
import qualified Data.ByteString as B
import qualified Data.ByteString.Base64 as B64
import qualified Data.Text as T
import qualified Data.Text.Encoding as E
import qualified GHC.Generics as G
data Data = Data
{ number :: Int
, bytestring :: B.ByteString
} deriving (G.Generic, Show)
instance J.ToJSON Data where
toEncoding = J.genericToEncoding J.defaultOptions
instance J.FromJSON Data
instance J.FromJSON B.ByteString where
parseJSON = undefined
instance J.ToJSON B.ByteString where
toJSON = undefined
byteStringToText :: B.ByteString -> T.Text
byteStringToText = E.decodeUtf8 . B64.encode
textToByteString :: T.Text -> B.ByteString
textToByteString txt =
case B64.decode . E.encodeUtf8 $ txt of
Left err -> error err
Right bs -> bs
encodeDecode :: Data -> Maybe Data
encodeDecode = J.decode . J.encode
main :: IO ()
main = print $ encodeDecode $ Data 1 "A bytestring"
It would be good if it was not necessary to manually define new instances of ToJSON and FromJSON for every record, because I have quite a few different records with bytestrings in them.
parseJson needs to return a value of type Parser B.ByteString, so you just need to call pure on the return value of B64.decode.
import Control.Monad
-- Generalized to any MonadPlus instance, not just Either String
textToByteString :: MonadPlus m => T.Text -> m B.ByteString
textToByteString = case B64.decode (E.encodeUtf8 x) of
Left _ -> mzero
Right bs -> pure bs
instance J.FromJSON B.ByteString where
parseJSON (J.String x) = textToByteString x
parseJSON _ = mzero
Here, I've chosen to return mzero both if you try to decode anything other than a JSON string and if there is a problem with the base-64 decoding.
Likewise, toJSON needs just needs to encode the Text value you create from the base64-encoded ByteString.
instance J.ToJSON B.ByteString where
toJSON = J.toJSON . byteStringToText
You might want to consider using a newtype wrapper instead of defining the ToJSON and FromJSON instances on B.ByteString directly.
I have JSON date data in the following form:
{"date": "2015-04-12"}
and a corresponding haskell type:
data Date = Date {
year :: Int
, month :: Int
, day :: Int
}
How can I write the custom FromJSON and ToJSON functions for the
Aeson library?
Deriving the instances does not work because of the formatting.
Why reinvent the wheel? There is a semi-standard representation for what you call Date in the time package - it is called Day. It gets better: not only does that same package even give you the utilities for parsing Day from the format you have, those utilities are even exported to aeson. Yep, there are already ToJSON and FromJSON instances in aeson for Day:
ghci> :set -XOverloadedStrings
ghci> import Data.Time.Calendar
ghci> import Data.Aeson
ghci> fromJSON "2015-04-12" :: Result Day
Success 2015-04-12
ghci> toJSON (fromGregorian 2015 4 12)
String "2015-04-12"
If you really want to extract the days, months, and years, you can always use toGregorian :: Day -> (Integer, Int, Int). Sticking to the standard abstraction is probably a good long-term choice though.
You have convert y/m/d to/from string
{-# LANGUAGE OverloadedStrings #-}
{-# OPTIONS_GHC -fno-warn-tabs #-}
import Control.Monad
import Data.Aeson
import qualified Data.Text as T
import Text.Read (readMaybe)
-- import qualified Data.Attoparsec.Text as A
data Date = Date Int Int Int deriving (Read, Show)
instance ToJSON Date where
toJSON (Date y m d) = toJSON $ object [
"date" .= T.pack (str 4 y ++ "-" ++ str 2 m ++ "-" ++ str 2 d)]
where
str n = pad . show where
pad s = replicate (n - length s) '0' ++ s
instance FromJSON Date where
parseJSON = withObject "date" $ \v -> do
str <- v .: "date"
let
ps#(~[y, m, d]) = T.split (== '-') str
guard (length ps == 3)
Date <$> readNum y <*> readNum m <*> readNum d
where
readNum = maybe (fail "not num") return . readMaybe . T.unpack
-- -- or with attoparsec
-- parseJSON = withObject "date" $ \v -> do
-- str <- v .: "date"
-- [y, m, d] <- either fail return $
-- A.parseOnly (A.decimal `A.sepBy` A.char '-') str
-- return $ Date y m d
I'm using aeson / attoparsec and conduit / conduit-http connected by conduit-attoparsec to parse JSON data from a file / webserver. My problem is that my pipeline always throws this exception...
ParseError {errorContexts = ["demandInput"], errorMessage = "not enough bytes", errorPosition = 1:1}
...once the socket closes or we hit EOF. Parsing and passing on the resulting data structures through the pipeline etc. works just fine, but it always ends with the sinkParser throwing this exception. I invoke it like this...
j <- CA.sinkParser json
...inside of my conduit that parses ByteStrings into my message structures.
How can I have it just exit the pipeline cleanly once there is no more data (no more top-level expressions)? Is there any decent way to detect / distinguish this exception without having to look at error strings?
Thanks!
EDIT: Example:
{-# LANGUAGE OverloadedStrings #-}
module Main where
import Control.Applicative
import qualified Data.ByteString as B
import qualified Data.ByteString.Char8 as B8
import qualified Data.Conduit.Attoparsec as CA
import Data.Aeson
import Data.Conduit
import Data.Conduit.Binary
import Control.Monad.IO.Class
data MyMessage = MyMessage String deriving (Show)
parseMessage :: (MonadIO m, MonadResource m) => Conduit B.ByteString m B.ByteString
parseMessage = do
j <- CA.sinkParser json
let msg = fromJSON j :: Result MyMessage
yield $ case msg of
Success r -> B8.pack $ show r
Error s -> error s
parseMessage
main :: IO ()
main =
runResourceT $ do
sourceFile "./input.json" $$ parseMessage =$ sinkFile "./out.txt"
instance FromJSON MyMessage where
parseJSON j =
case j of
(Object o) -> MyMessage <$> o .: "text"
_ -> fail $ "Expected Object - " ++ show j
Sample input (input.json):
{"text":"abc"}
{"text":"123"}
Outputs:
out: ParseError {errorContexts = ["demandInput"], errorMessage = "not enough bytes", errorPosition = 3:1}
and out.txt:
MyMessage "abc"MyMessage "123"
This is a perfect use case for conduitParserEither:
parseMessage :: (MonadIO m, MonadResource m) => Conduit B.ByteString m B.ByteString
parseMessage =
CA.conduitParserEither json =$= awaitForever go
where
go (Left s) = error $ show s
go (Right (_, msg)) = yield $ B8.pack $ show msg ++ "\n"
If you're on FP Haskell Center, you can clone my solution into the IDE.