How to handle variability of JSON objects in Haskell? - json

Some REST service has variable returning JSONs, for example some fields can appear or disappear depending on the parameters of the request, the structure itself may change, nesting, etc.
So, this leads to avalanche-type growth in the number of types (along with FromJSON instances). Options are to:
try to make a lot of fields under Maybe (but this does not help very much with the variability in structure)
to introduce a lot of types
to create different phantom types (actually no big difference with prev.)
The 1. has drawback that if your call with some fixed parameters always returns good knows fields, you have to handle Nothing cases too, code becomes more complex. The 2. and 3. is tiring.
What is the most simple/convenient way to handle such variability in Haskell (if you use Aeson, sure, another option is to avoid Aeson usage)?

A possible solution to the existing/non-existing fields problem using type-level computation.
Some required extensions and imports:
{-# LANGUAGE DeriveGeneric, ScopedTypeVariables, DataKinds, KindSignatures,
TypeApplications, TypeFamilies, TypeOperators, FlexibleContexts #-}
import Data.Aeson
import Data.Proxy
import GHC.Generics
import GHC.TypeLits
Here's a data type (to be used promoted) that indicates if some field is absent or present. Also a type family that maps absent types to ():
data Presence = Present
| Absent
type family Encode p v :: * where
Encode Present v = v
Encode Absent v = ()
Now we can define a parameterized record containing all possible fields, like this:
data Foo (a :: Presence)
(b :: Presence)
(c :: Presence) = Foo {
field1 :: Encode a Int,
field2 :: Encode b Bool,
field3 :: Encode c Char
} deriving Generic
instance (FromJSON (Encode a Int),
FromJSON (Encode b Bool),
FromJSON (Encode c Char)) => FromJSON (Foo a b c)
One problem: writing the full type for each combination of occurrences/absences would be tedious, especially if only a few fields are present each time. But perhaps we could define an auxiliary type synonym FooWith that let us mention only those fields that are present:
type family Mentioned (ns :: [Symbol]) (n :: Symbol) :: Presence where
Mentioned '[] _ = Absent
Mentioned (n ': _) n = Present
Mentioned (_ ': ns) n = Mentioned ns n
-- the field names are repeated as symbols, how to avoid this?
type FooWith (ns :: [Symbol]) = Foo (Mentioned ns "field1")
(Mentioned ns "field2")
(Mentioned ns "field3")
Example of use:
ghci> :kind! FooWith '["field2","field3"]
FooWith '["field2","field3"] :: * = Foo 'Absent 'Present 'Present
Another problem: for each request, we must repeat the list of required fields two times: one in the URL ("fields=a,b,c...") and another in the expected type. It would be better to have a single source of truth.
We can deduce the term-level list of fields to be added to the URL from the type-level list of fields, by using an auxiliary type class Demote:
class Demote (ns :: [Symbol]) where
demote :: Proxy ns -> [String]
instance Demote '[] where
demote _ = []
instance (KnownSymbol n, Demote ns) => Demote (n ': ns) where
demote _ = symbolVal (Proxy #n) : demote (Proxy #ns)
For example:
ghci> demote (Proxy #["field2","field3"])
["field2","field3"]

Related

How to parse row-polymorphic records with SimpleJSON in PureScript?

I wrote a utility type and function that is meant to aid in parsing certain row-polymorphic types (sepcifically, in my case, anything that extends BaseIdRows:
type IdTypePairF r = (identifier :: Foreign, identifierType :: Foreign | r)
readIdTypePair :: forall r. Record (IdTypePairF r) -> F Identifier
readIdTypePair idPairF = do
id <- readNEStringImpl idPairF.identifier
idType <- readNEStringImpl idPairF.identifierType
pure $ {identifier: id, identifierType: idType}
When I try to use it, however, it causes the code to get this type error (in my larger code base, things were working fine before I implemented the readIdTypePair function):
No type class instance was found for
Prim.RowList.RowToList ( identifier :: Foreign
, identifierType :: Foreign
| t3
)
t4
The instance head contains unknown type variables. Consider adding a type annotation.
while applying a function readJSON'
of type ReadForeign t2 => String -> ExceptT (NonEmptyList ForeignError) Identity t2
to argument jsStr
while checking that expression readJSON' jsStr
has type t0 t1
in value declaration readRecordJSON
where t0 is an unknown type
t1 is an unknown type
t2 is an unknown type
t3 is an unknown type
t4 is an unknown type
I have a live gist that demonstrates my issue.
But, here is the complete example as it stands, for posterity:
module Main where
import Control.Monad.Except (except, runExcept)
import Data.Array.NonEmpty (NonEmptyArray, fromArray)
import Data.Either (Either(..))
import Data.HeytingAlgebra ((&&), (||))
import Data.Lazy (Lazy, force)
import Data.Maybe (Maybe(..))
import Data.Semigroup ((<>))
import Data.String.NonEmpty (NonEmptyString, fromString)
import Data.Traversable (traverse)
import Effect (Effect(..))
import Foreign (F, Foreign, isNull, isUndefined)
import Foreign as Foreign
import Prelude (Unit, bind, pure, ($), (>>=), unit)
import Simple.JSON as JSON
main :: Effect Unit
main = pure unit
type ResourceRows = (
identifiers :: Array Identifier
)
type Resource = Record ResourceRows
type BaseIdRows r = (
identifier :: NonEmptyString
, identifierType :: NonEmptyString
| r
)
type Identifier = Record (BaseIdRows())
-- Utility type for parsing
type IdTypePairF r = (identifier :: Foreign, identifierType :: Foreign | r)
readNEStringImpl :: Foreign -> F NonEmptyString
readNEStringImpl f = do
str :: String <- JSON.readImpl f
except $ case fromString str of
Just nes -> Right nes
Nothing -> Left $ pure $ Foreign.ForeignError
"Nonempty string expected."
readIdTypePair :: forall r. Record (IdTypePairF r) -> F Identifier
readIdTypePair idPairF = do
id <- readNEStringImpl idPairF.identifier
idType <- readNEStringImpl idPairF.identifierType
pure $ {identifier: id, identifierType: idType}
readRecordJSON :: String -> Either Foreign.MultipleErrors Resource
readRecordJSON jsStr = runExcept do
recBase <- JSON.readJSON' jsStr
--foo :: String <- recBase.identifiers -- Just comment to check inferred type
idents :: Array Identifier <- traverse readIdTypePair recBase.identifiers
pure $ recBase { identifiers = idents }
Your problem is that recBase is not necessarily of type Resource.
The compiler has two points of reference for determining the type of recBase: (1) the fact that recBase.identifiers is used with readIdTypePair and (2) the return type of readRecordJSON.
From the first point the compiler can conclude that:
recBase :: { identifiers :: Array (Record (IdTypePair r)) | p }
for some unknown r and p. The fact that it has (at least) a field named identifiers comes from the dot-syntax, and the type of that field comes from readIdTypePair's parameter combined with the fact that idents is an Array. But there could be more fields besides identifiers (which is represented by p), and every element of identifiers is a partial record (which is represented by r).
From the second point the compiler can conclude that:
recBase :: { identifiers :: a }
Wait, what? Why a and not Array Identifier? Doesn't the definition of Resource clearly specify that identifiers :: Array Identifier?
Well, yes, it does, but here's the trick: the type of recBase doesn't have to be Resource. The return type of readRecordJSON is Resource, but between recBase and return type of readRecordJSON stands a record update operation recBase { identifiers = idents }, which can change the type of the field.
Yes, record updates in PureScript are plymorphic. Check this out:
> x = { a: 42 }
> y = x { a = "foo" }
> y
{ a: "foo" }
See how the type of x.a changed? Here x :: { a :: Int }, but y :: { a :: String }
And so it is in your code: recBase.identifiers :: Array (IdTypePairF r) for some unknown r, but (recBase { identifiers = idents }).identifiers :: Array Identifier
The return type of readRecordJSON is satisfied, but the row r is still unknown.
To fix, you have two options. Option 1 - make readIdTypePair take a full record, not a partial one:
readIdTypePair :: Record (IdTypePairF ()) -> F Identifier
Option 2 - specify the type of recBase explicitly:
recBase :: { identifiers :: Array (Record (IdTypePairF ())) } <- JSON.readJSON' jsStr
Separately, I feel the need to comment on your weird way of specifying records: you first declare a row and then make a record out of it. FYI it can be done directly with curly braces, for example:
type Resource = {
identifiers :: Array Identifier
}
In case you're doing it this way for aesthetic reasons, I have no objections. But in case you didn't know - now you know :-)

Get Column in Haskell CSV and infer the column type

I'm exploring a csv file in an interactive ghci session (in a jupyter notebook):
import Text.CSV
import Data.List
import Data.Maybe
dat <- parseCSVFromFile "/home/user/data.csv"
headers = head dat
records = tail dat
-- define a way to get a particular row by index
indexRow :: [[Field]] -> Int -> [Field]
indexRow csv index = csv !! index
indexRow records 1
-- this works!
-- Now, define a way to get a particular column by index
indexField :: [[Field]] -> Int -> [Field]
indexField records index = map (\x -> x !! index) records
While this works if I know in advance the type of column 3:
map (\x -> read x :: Double) $ indexField records 3
How can I ask read to infer what the type might be when for example my columns could contain strings or num? I'd like it to try for me, but:
map read $ indexField records 3
fails with
Prelude.read: no parse
I don't care whether they are string or num, I just need that they are all the same and I am failing to find a way to specify that generally with the read function at least.
Weirdly, if I define a mean function like so:
mean :: Fractional a => [a] -> Maybe a
mean [] = Nothing
mean [x] = Just x
mean xs = Just (sum(xs) / (fromIntegral (length xs)))
This works:
mean $ map read $ indexField records 2
Just 13.501359655240003
But without the mean, this still fails:
map read $ indexField records 2
Prelude.read: no parse
Unfortunately, read is at the end of its wits when it comes to situations like this. Let's revisit read:
read :: Read a => String -> a
As you can see, a doesn't depend on the input, but solely on the output, and therefore of the context of our function. If you use read a + read b, then the additional Num context will limit the types to Integer or Double due to default rules. Let's see it in action:
> :set +t
> read "1234"
*** Exception: Prelude.read: no parse
> read "1234" + read "1234"
2468
it :: (Num a, Read a) => a
Ok, a is still not helpful. Is there any type that we can read without additional context? Sure, unit:
> read "()"
()
it :: Read a => a
That's still not helpful at all, so let's enable the monomorphism restriction:
> :set -XMonomorphismRestriction
> read "1234" + read "1234"
2468
it :: Integer
Aha. In the end, we had an Integer. Due to +, we had to decide on a type. Now, with the MonomorphismRestriction enabled, what happens on read "1234" without additional context?
> read "1234"
<interactive>:20:1
No instance for (Read a0) arising from a use of 'read'
The type variable 'a0' is ambiguous
Now GHCi doesn't pick any (default) type and forces you to chose one. Which makes the underlying error much more clear.
So how do we fix this? As CSV can contain arbitrary fields at run-time and all types are determined statically, we have to cheat by introducing something like
data CSVField = CSVString String | CSVNumber Double | CSVUnknown
and then write
parse :: Field -> CSVField
After all, our type needs to cover all possible fields.
However, in your case, we can just restrict read's type:
myRead :: String -> Double
myRead = read
But that's not wise, as we can still end up with errors if the column doesn't contain Doubles to begin with. So instead, let's use readMaybe and mapM:
columnAsNumbers :: [Field] -> Maybe [Double]
columnAsNumbers = mapM readMaybe
That way, the type is fixed, and we're forced to check whether we have Just something or Nothing:
mean <$> columnAsNumbers (indexFields records 2)
If you find yourself often using columnAsNumbers create an operator, though:
(!!$) :: [[Field]] -> Maybe [Double]
records !!$ index = columnAsNumbers $ indexFields records index

Recursively change a JSON data structure in Haskell

I am trying to write a function that will take a JSON object, make a change to every string value in it and return a new JSON object. So far my code is:
applyContext :: FromJSON a => a -> a
applyContext x =
case x of
Array _ -> map applyContext x
Object _ -> map applyContext x
String _ -> parseValue x
_ -> x
However, the compiler complains about second second case line:
Couldn't match expected type `[b0]' with actual type `a'
`a' is a rigid type variable bound by
the type signature for:
applyContext :: forall a. FromJSON a => a -> a
at app\Main.hs:43:17
I'm guessing that is because map is meant to work on lists, but I would have naively expected it to use Data.HashMap.Lazy.map instead, since that is what the type actually is in that case. If I explicitly use that function I get
Couldn't match expected type `HashMap.HashMap k0 v20' with actual type `a'
which also makes sense, since I haven't constrained a to that extent because then it wouldn't work for the other cases. I suspect that if I throw enough explicit types at this I could make it work but it feels like it should be a lot simpler. What is an idiomatic way of writing this function, or if this is good then what would be the simplest way of getting the types right?
First of all, what FromJSON a => a does mean? It's type of some thing what says: it can be thing with any type but only from class FromJSON. This class can contain types which very differently constructed and you can't do any pattern matching. You can only do what is specified in the class FromJSON declaration by programmer. Basically, there is one method parseJSON :: FromJSON a => Value -> Parser a.
Secondly, you should use some isomorphic representation of JSON for your work. The type Value is good one. So, you can do the main work by the function like Value -> Value. After that, you can compose this fuction with parseJSON and toJSON for generalse types.
Like this:
change :: Value -> Value
change (Array x) = Array . fmap change $ x
change (Object x) = Object . fmap change $ x
change (String x) = Object . parseValue $ x
change x = x
apply :: (ToJSON a, FromJSON b) => (Value -> Value) -> a -> Result b
apply change = fromJSON . change . toJSON
unsafeApply :: (ToJSON a, FromJSON b) => (Value -> Value) -> a -> b
unsafeApply change x = case apply change x of
Success x -> x
Error msg -> error $ "unsafeApply: " ++ msg
applyContext :: (ToJSON a, FromJSON b) => a -> b
applyContext = unsafeApply change
You can write more complicated transformations like Value -> Value with lens and lens-aeson. For example:
import Control.Lens
import Control.Monad.State
import Data.Aeson
import Data.Aeson.Lens
import Data.Text.Lens
import Data.Char
change :: Value -> Value
change = execState go
where
go = do
zoom values go
zoom members go
_String . _Text . each %= toUpper
_Bool %= not
_Number *= 10
main = print $ json & _Value %~ change
where json = "{\"a\":[1,\"foo\",false],\"b\":\"bar\",\"c\":{\"d\":5}}"
Output will be:
"{\"a\":[10,\"FOO\",true],\"b\":\"BAR\",\"c\":{\"d\":50}}"

Streaming parsing of JSON in Haskell with Pipes.Aeson

The Pipes.Aeson library exposes the following function:
decode :: (Monad m, ToJSON a) => Parser ByteString m (Either DecodingError a)
If I use evalStateT with this parser and a file handle as an argument, a single JSON object is read from the file and parsed.
The problem is that the file contains several objects (all of the same type) and I'd like to fold or reduce them as they are read.
Pipes.Parse provides:
foldAll :: Monad m => (x -> a -> x) -> x -> (x -> b) -> Parser a m b
but as you can see this returns a new parser - I can't think of a way of supplying the first parser as an argument.
It looks like a Parser is actually a Producer in a StateT monad transformer. I wondered whether there's a way of extracting the Producer from the StateT so that evalStateT can be applied to the foldAll Parser, and the Producer from the decode Parser.
This is probably completely the wrong approach though.
My question, in short:
When parsing a file using Pipes.Aeson, what's the best way to fold all the objects in the file?
Instead of using decode, you can use the decoded parsing lens from Pipes.Aeson.Unchecked. It turns a producer of ByteString into a producer of parsed JSON values.
{-# LANGUAGE OverloadedStrings #-}
module Main where
import Pipes
import qualified Pipes.Prelude as P
import qualified Pipes.Aeson as A
import qualified Pipes.Aeson.Unchecked as AU
import qualified Data.ByteString as B
import Control.Lens (view)
byteProducer :: Monad m => Producer B.ByteString m ()
byteProducer = yield "1 2 3 4"
intProducer :: Monad m => Producer Int m (Either (A.DecodingError, Producer B.ByteString m ()) ())
intProducer = view AU.decoded byteProducer
The return value of intProducer is a bit scary, but it only means that intProducer finishes either with a parsing error and the unparsed bytes after the error, or with the return value of the original producer (which is () in our case).
We can ignore the return value:
intProducer' :: Monad m => Producer Int m ()
intProducer' = intProducer >> return ()
And plug the producer into a fold from Pipes.Prelude, like sum:
main :: IO ()
main = do
total <- P.sum intProducer'
putStrLn $ show total
In ghci:
λ :main
10
Note also that the functions purely and impurely let you apply to producers folds defined in the foldl package.

Int and Num type of haskell

I have below code to take the args to set some offset time.
setOffsetTime :: (Ord a, Num b)=>[a] -> b
setOffsetTime [] = 200
setOffsetTime (x:xs) = read x::Int
But compiler says "Could not deduce (b ~ Int) from the context (Ord a, Num b) bound by the type signature for setOffsetTime :: (Ord a, Num b) => [a] -> b
Also I found I could not use 200.0 if I want float as the default value. The compilers says "Could not deduce (Fractional b) arising from the literal `200.0'"
Could any one show me some code as a function (not in the prelude) that takes an arg to store some variable so I can use in other function? I can do this in the main = do, but hope
to use an elegant function to achieve this.
Is there any global constant stuff in Hasekll? I googled it, but seems not.
I wanna use Haskell to replace some of my python script although it is not easy.
I think this type signature doesn't quite mean what you think it does:
setOffsetTime :: (Ord a, Num b)=>[a] -> b
What that says is "if you give me a value of type [a], for any type a you choose that is a member of the Ord type class, I will give you a value of type b, for any type b that you choose that is a member of the Num type class". The caller gets to pick the particular types a and b that are used each time setOffsetTime is called.
So trying to return a value of type Int (or Float, or any particular type) doesn't make sense. Int is indeed a member of the type class Num, but it's not any member of the type class Num. According to that type signature, I should be able to make a brand new instance of Num that you've never seen before, import setOffsetTime from your module, and call it to get a value of my new type.
To come up with an acceptable return value, you can only use functions that likewise return an arbitrary Num. You can't use any functions of particular concrete types.
Existential types are essentially a mechanism for allowing the callee to choose the value for a type variable (and then the caller has to be written to work regardless of what that type is), but that's not really something you want to be getting into while you're still learning.
If you are convinced that the implementation of your function is correct, i.e., that it should interpret the first element in its input list as the number to return and return 200 if there is no such argument, then you only need to make sure that the type signature matches that implementation (which it does not do, right now).
To do so, you could, for example, remove the type signature and ask ghci to infer the type:
$ ghci
GHCi, version 7.6.2: http://www.haskell.org/ghc/ :? for help
Loading package ghc-prim ... linking ... done.
Loading package integer-gmp ... linking ... done.
Loading package base ... linking ... done.
Prelude> :{
Prelude| let setOffsetTime [] = 200
Prelude| setOffsetTime (x : xs) = read x :: Int
Prelude| :}
Prelude> :t setOffsetTime
setOffsetTime :: [String] -> Int
Prelude> :q
Leaving GHCi.
$
And indeed,
setOffsetTime :: [String] -> Int
setOffsetTime [] = 200
setOffsetTime (x : xs) = read x :: Int
compiles fine.
If you want a slightly more general type, you can drop the ascription :: Int from the second case. The above method then tells you that you can write
setOffsetTime :: (Num a, Read a) => [String] -> a
setOffsetTime [] = 200
setOffsetTime (x : xs) = read x
From the comment that you added to your question, I understand that you want your function to return a floating-point number. In that case, you can write
setOffsetTime :: [String] -> Float
setOffsetTime [] = 200.0
setOffsetTime (x : xs) = read x
or, more general:
setOffsetTime :: (Fractional a, Read a) => [String] -> a
setOffsetTime [] = 200.0
setOffsetTime (x : xs) = read x