Is there an established standard for this format of data? - html

This is a bit of a strange question, bear with me: I stumbled upon some data in a machine-readable format in the source code of an HTML page (inside a comment above the opening <html> tag), but I have never seen data that looked anything like this format.
Can anyone identify this data format? Does anyone know if there is an established/documented standard for transmitting/storing data like this? (my hope is that, if this is a standard data format, I can find pre-existing libraries for parsing it and save myself from reinventing the wheel)
Here's the raw data (I omitted some of the data just to keep the post short):
<!--
Fin :: 0
ErrorMsg ::
MoreErrors ::
MFErrorArray :: ARRAY[2 * 120]
[1]
[0:ErrorCode]{ }
[1:ArrayIndex]{ }
MFErrorArray2 :: ARRAY[3 * 60]
[1]
[0:ErrorCode2]{ }
[1:Substitution]{ }
[2:ArrayIndex2]{ }
NotUsed ::
AllControlNumber ::
Datu ::
Pgm :: BXS2BL40
VlNumbHous ::
NmStrt ::
NmBoro ::
VlBin ::
VlNumbZip ::
VlTaxBlock ::
VlTaxLot ::
VlCensTract ::
VlHlthArea ::
HseLo ::
HseHi ::
GlJobType ::
GlPageN :: 0001
GlRecCountN :: 0000000517
FoilIndicator ::
GlMax ::
DebugMsg ::
VlLicnType :: B
NmLicnType :: ELECTRICAL FIRM
VlLicn :: ARRAY[13 * 70]
[1]
[0:NmLicn]{}
[1:VlNumbLIcn]{B001572}
[2:StLicn]{INACTIVE}
[3:DtLicnExp]{12312050}
[4:NmBusn1]{A & A ELEC. CONTRACTING}
[5:NmBusn2]{}
[6:NbIsn]{0000023530}
[7:FirmIsn]{}
[8:FirmLicenseNumber]{}
[9:JobCount]{0000000000}
[10:LLicenseClass]{}
[11:LLicenseClassType]{}
[12:GreenFlag]{N}
[2]
[0:NmLicn]{}
[1:VlNumbLIcn]{B002944}
[2:StLicn]{ACTIVE}
[3:DtLicnExp]{12312050}
[4:NmBusn1]{A & A ELEC'L CONTR'G CORP}
[5:NmBusn2]{}
[6:NbIsn]{0000024858}
[7:FirmIsn]{}
[8:FirmLicenseNumber]{}
[9:JobCount]{0000000000}
[10:LLicenseClass]{}
[11:LLicenseClassType]{}
[12:GreenFlag]{N}
[3]
[0:NmLicn]{}
[1:VlNumbLIcn]{B000014}
[2:StLicn]{INACTIVE}
[3:DtLicnExp]{12312050}
[4:NmBusn1]{A & A ELECTRIC INC.}
[5:NmBusn2]{}
[6:NbIsn]{0000021979}
[7:FirmIsn]{}
[8:FirmLicenseNumber]{}
[9:JobCount]{0000000000}
[10:LLicenseClass]{}
[11:LLicenseClassType]{}
[12:GreenFlag]{N}
*** I've removed entries 4 through 67 in this array for sake of brevity ***
[68]
[0:NmLicn]{}
[1:VlNumbLIcn]{B003051}
[2:StLicn]{ACTIVE}
[3:DtLicnExp]{12312050}
[4:NmBusn1]{A.L. ELECTRICAL CORP.}
[5:NmBusn2]{}
[6:NbIsn]{0000024954}
[7:FirmIsn]{}
[8:FirmLicenseNumber]{}
[9:JobCount]{0000000000}
[10:LLicenseClass]{}
[11:LLicenseClassType]{}
[12:GreenFlag]{N}
[69]
[0:NmLicn]{}
[1:VlNumbLIcn]{B002419}
[2:StLicn]{ACTIVE}
[3:DtLicnExp]{12312050}
[4:NmBusn1]{A.M. ELECTRIC CORP. OF NY}
[5:NmBusn2]{}
[6:NbIsn]{0000024375}
[7:FirmIsn]{}
[8:FirmLicenseNumber]{}
[9:JobCount]{0000000000}
[10:LLicenseClass]{}
[11:LLicenseClassType]{}
[12:GreenFlag]{N}
[70]
[0:NmLicn]{}
[1:VlNumbLIcn]{B003863}
[2:StLicn]{ACTIVE}
[3:DtLicnExp]{12312050}
[4:NmBusn1]{A.M.A HOLDINGS INC.D/B/A}
[5:NmBusn2]{}
[6:NbIsn]{0000028205}
[7:FirmIsn]{}
[8:FirmLicenseNumber]{}
[9:JobCount]{0000000000}
[10:LLicenseClass]{}
[11:LLicenseClassType]{}
[12:GreenFlag]{N}
-->

This data excerpt seems to contain information about drivers, vehicles, licensing, etc.
I have not personally seen data formatted exactly this way before, but if you came across it on a commercial website it is likely either a highly specialized data standard for that industry or some ad hoc solution thrown together by that company in lieu of a better standard. Perhaps if you could share a link to the site we could dig into it further, in context.
It looks fairly straightforward though, why not just write a parsing algorithm?

Related

Trouble with IO objects Haskell

For uni I have this project where i need to program a simple game in haskell. Right now I'm facing the following problem:
instance Renderable Player where
render (MkPlayer pos rad bults _) = do playerpic <- displayimg pos rad "./images/player.bmp"
bulletpics <- ...
return $ pictures (playerpic:bulletpics)
at the ... i need a function f :: [Bullet] -> IO [Picture]
where the function producing a picture for the bullet object is :
render :: Bullet -> IO Picture
is there a way to create the function I need. I've been toying around on paper with monads and functors but cannot find a way to get this done. Any help at all with this is greatly appreciated!!
You can use mapM :: (Traversable t, Monad m) => (a -> m b) -> t a -> m (t b) for this:
instance Renderable Player where
render (MkPlayer pos rad bults _) = do
playerpic <- displayimg pos rad "./images/player.bmp"
bulletpics <- mapM render bults
return $ pictures (playerpic:bulletpics)
You can use traverse :: (Traversable t, Applicative f) => (a -> f b) -> t a -> f (t b). In your code that looks like this:
instance Renderable Player where
render (MkPlayer pos rad bults _) = do playerpic <- displayimg pos rad "./images/player.bmp"
bulletpics <- traverse render bults
return $ pictures (playerpic:bulletpics)
The do notation solutions provided are quite normal, and easy for a beginner to understand. But with more experience, you might also consider using applicative style, to make it clearer (to both the reader and the compiler) that the handling of the player and the bullets are independent:
instance Renderable Player where
render (MkPlayer pos rad bults _) = liftA2 go
(displayimg pos rad "./images/player.bmp")
(traverse render bults)
where go playerpic bulletpics = pictures $ playerpic : bulletpics

Beginner Haskell: Making a last function with reverse

I'm attempting to make a function that generates the last item in a list. I want to use reverse and !!. This is what I have so far:
myLast :: [a] -> [a] -> Int -> a
myLast xs = (reverse xs) !! 1
I know the problem lies somewhere within the type, but I'm having trouble identifying how to fix it.
A function's type signature has nothing to do with what you use in the function, it only describes how other people can use this function you're defining. So by writing
myLast :: [a] -> [a] -> Int -> a
you're saying, users need to supply two lists and and integer. Just to get the last element of one of the lists?? That doesn't make sense.
You surely mean
myLast :: [a] -> a
You should generally write that down before even thinking about how you're going to implement that function.
With that signature, you can write various implementations:
myLast :: [a] -> a
myLast xs = head $ reverse xs
myLast' :: [a] -> a
myLast' [l] = l
myLast' (_:xs) = myLast' xs
myLast'' :: [a] -> a
myLast'' = fix $ \f (x:xs) -> maybe x id . teaspoon $ f xs
or whatever weird implementation you choose, it has nothing to do with the signature.
On an unrelated note: though last is actually a standard function from the prelude, it's a kind of function avoided in modern Haskell: last [] gives an error, because the is no a value to be found in the empty list! Errors are bad. Hence the “ideal” way to write it is actually
myLast :: [a] -> Maybe a
myLast [] = Nothing
myLast [x] = x
myLast (_:xs) = myLast xs
I would recommend not using !! at all, but to use head.
myLast xs = head (reverse xs)
Head returns the first element of the list it is given as argument.
If you insist on using !!, in Haskell arrays are indeed zero-based, which means that !! 0 gets the first element, !! 1 the second, etc.
As for the type: myLast takes an array of some type and returns one item of that same type. That is denoted as follows:
myLast :: [a] -> a
#leftaroundabout covered this way better in his answer.
Based on #leftaroundabout 's answer, here's an implementation that should do what you want:
safeHead :: [a] -> Maybe a
safeHead [] = Nothing
safeHead (x:_) = Just x
myLast :: [a] -> Maybe a
myLast [] = Nothing
myLast xs = safeHead $ reverse xs
The Maybe type is constructed as follows (from Hackage):
data Maybe a = Nothing | Just a
deriving (Eq, Ord)
myLast [1, 2, 3, 4], for example, will return Just 4. If you want to use the value 4 you can use the function fromJust function from the Data.Maybe module (fromJust (Just 4) returns 4). fromJust is defined like this:
-- | The 'fromJust' function extracts the element out of a 'Just' and
-- throws an error if its argument is 'Nothing'.
--
-- ==== __Examples__
--
-- Basic usage:
--
-- >>> fromJust (Just 1)
-- 1
--
-- >>> 2 * (fromJust (Just 10))
-- 20
--
-- >>> 2 * (fromJust Nothing)
-- *** Exception: Maybe.fromJust: Nothing
--
fromJust :: Maybe a -> a
fromJust Nothing = error "Maybe.fromJust: Nothing" -- yuck
fromJust (Just x) = x

Rewriting an uncurried function haskell

I've been learning about uncurrying and applying $ in functions in haskell but I'm still having issues converting an uncurried function to something less mysterious.
The function I'm given is
apple = map $ uncurry $ flip ($)
and I realize that this takes a list of tuples and applies to corresponding function in the tuple to the variable inside. So I'm trying to rewrite it as
apple ls = foldr function _ ls
where function (a,b) c = (uncurry b) (a,c)
I get the error for _ as a parse error and I have no idea which starting point to use. I need to make this polymorphic and I'm realizing that this most likely will not be the way to make it less mysterious. Any ideas? They'd be greatly appreciated
Apple has the type
apple :: [(a, a->b)] -> [b]
We could rewrite it as
apple ls = map (\(a, f) -> f a) ls
So writing this with foldr is very doable,
apple ls = foldr (\(a, f) rest -> f a : rest) [] ls
Or, we can rewrite this to pointfree
apple = foldr ( (:) . (uncurry . flip $ ($)) ) []
The reason for the parse error is that _ is the special syntax for "variables I don't care about". This let's you write things like
foo _ _ _ _ a = a
And not get an error about repeated variables. Basically we just filled in _ with the starting empty list and fixed function so that it appends to c rather than trying to apply it to a.
If I wanted to write this in the clearest way possible, then the original
apple = map . uncurry . flip $ ($)
Is quite nice.
The key for understanding is removing complexity.
Thus I would suggest you deal with a single tuple first. Write the following function:
tapp :: (a, a ->b) -> b
in terms of ($) and flip and uncurry.
(To make it even easier, you could first do it for a tuple (a -> b, a) first).
Next, make clear to yourself how map works: If you have a function f :: (a -> b), then map f will be a function [a] -> [b]. Hence map tapp does what you want.
You can now replace tapp in map (tapp) by it's definition (this are the benefits of referential transparency).
And this should take you back to your original expression. More or less so, because, for example:
f $ g h
can be written
f (g h)
or
(f . g) h

json parsing in haskell part 2 - Non-exhaustive patterns in lambda

This is actually in continuation of the question I asked a few days back. I took the applicative functors route and made my own instances.
I need to parse a huge number of json statements all in a file, one line after the other. An example json statement is something like this -
{"question_text": "How can NBC defend tape delaying the Olympics when everyone has
Twitter?", "context_topic": {"followers": 21, "name": "NBC Coverage of the London
Olympics (July & August 2012)"}, "topics": [{"followers": 2705,
"name": "NBC"},{"followers": 21, "name": "NBC Coverage of the London
Olympics (July & August 2012)"},
{"followers": 17828, "name": "Olympic Games"},
{"followers": 11955, "name": "2012 Summer Olympics in London"}],
"question_key": "AAEAABORnPCiXO94q0oSDqfCuMJ2jh0ThsH2dHy4ATgigZ5J",
"__ans__": true, "anonymous": false}
sorry for the json formatting. It got bad
I have about 10000 such json statements and I need to parse them. The code I have written is
something like this -
parseToRecord :: B.ByteString -> Question
parseToRecord bstr = (\(Ok x) -> x) decodedObj where decodedObj = decode (B.unpack bstr) :: Result Question
main :: IO()
main = do
-- my first line in the file tells how many json statements
-- are there followed by a lot of other irrelevant info...
ts <- B.getContents >>= return . fst . fromJust . B.readInteger . head . B.lines
json_text <- B.getContents >>= return . tail . B.lines
let training_data = take (fromIntegral ts) json_text
let questions = map parseToRecord training_data
print $ questions !! 8922
This code gives me a runtime error Non-exhaustive patterns in lambda. The error references to \(Ok x) -> x in the code. By hit and trial, I came to the conclusion that the program works ok till the 8921th index and fails on the 8922th iteration.
I checked the corresponding json statement and tried to parse it standalone by calling the function on it and it works. However, it doesn't work when I call map. I don't really understand what is going on. Having learnt a little bit of haskell in "learn haskell for a great good", I wanted to dive into a real world programming project but seem to have got stuck here.
EDIT :: complete code is as follows
{-# LANGUAGE BangPatterns #-}
{-# OPTIONS_GHC -O2 -optc-O2 #-}
{-# OPTIONS_GHC -fno-warn-incomplete-uni-patterns #-}
import qualified Data.ByteString.Lazy.Char8 as B
import Data.Maybe
import NLP.Tokenize
import Control.Applicative
import Control.Monad
import Text.JSON
data Topic = Topic
{ followers :: Integer,
name :: String
} deriving (Show)
data Question = Question
{ question_text :: String,
context_topic :: Topic,
topics :: [Topic],
question_key :: String,
__ans__ :: Bool,
anonymous :: Bool
} deriving (Show)
(!) :: (JSON a) => JSObject JSValue -> String -> Result a
(!) = flip valFromObj
instance JSON Topic where
-- Keep the compiler quiet
showJSON = undefined
readJSON (JSObject obj) =
Topic <$>
obj ! "followers" <*>
obj ! "name"
readJSON _ = mzero
instance JSON Question where
-- Keep the compiler quiet
showJSON = undefined
readJSON (JSObject obj) =
Question <$>
obj ! "question_text" <*>
obj ! "context_topic" <*>
obj ! "topics" <*>
obj ! "question_key" <*>
obj ! "__ans__" <*>
obj ! "anonymous"
readJSON _ = mzero
isAnswered (Question _ _ _ _ status _) = status
isAnonymous (Question _ _ _ _ _ status) = status
parseToRecord :: B.ByteString -> Question
parseToRecord bstr = handle decodedObj
where handle (Ok k) = k
handle (Error e) = error (e ++ "\n" ++ show bstr)
decodedObj = decode (B.unpack bstr) :: Result Question
--parseToRecord bstr = (\(Ok x) -> x) decodedObj where decodedObj = decode (B.unpack bstr) :: Result Question
main :: IO()
main = do
ts <- B.getContents >>= return . fst . fromJust . B.readInteger . head . B.lines
json_text <- B.getContents >>= return . tail . B.lines
let training_data = take (fromIntegral ts) json_text
let questions = map parseToRecord training_data
let correlation = foldr (\x acc -> if (isAnonymous x == isAnswered x) then (fst acc + 1, snd acc + 1) else (fst acc, snd acc + 1)) (0,0) questions
print $ fst correlation
here's the data which can be given as input to the executable. I'm using ghc 7.6.3. If the program name is ans.hs, I followed these steps.
$ ghc --make ans.hs
$ ./ans < path/to/the/file/sample/answered_data_10k.in
thanks a lot!
The lambda function (\(Ok x) -> x) is partial in that it will only be able to match objects that were successfully decoded. If you are experiencing this, it indicates that your JSON parser is failing to parse a record, for some reason.
Making the parseToRecord function more informative would help you find the error. Try actually reporting the error, rather than reporting a failed pattern match.
parseToRecord :: B.ByteString -> Question
parseToRecord bstr = handle decodedObj
where handle (Ok k) = k
handle (Error e) = error e
decodedObj = decode (B.unpack bstr) :: Result Question
If you want more help, it might be useful to include the parser code.
Update
Based on your code and sample JSON, it looks like your code is first failing
when it encounters a null in the context_topic field of your JSON.
Your current code cannot handle a null, so it fails to parse. My fix would
be something like the following, but you could come up with other ways to
handle it.
data Nullable a = Null
| Full a
deriving (Show)
instance JSON a => JSON (Nullable a) where
showJSON Null = JSNull
showJSON (Full a) = showJSON a
readJSON JSNull = Ok Null
readJSON c = Full `fmap` readJSON c
data Question = Question
{ question_text :: String,
context_topic :: Nullable Topic,
topics :: [Topic],
question_key :: String,
__ans__ :: Bool,
anonymous :: Bool
} deriving (Show)
It also seems to fail on line 9002, where there is a naked value of "1000" on
that line, and it seems that several JSON values after that line lack the
'__ans__' field.
I would have suggestion to use Maybe in order to parse the null values:
data Question = Question
{ question_text :: String
, context_topic :: Maybe Topic
, topics :: [Topic]
, question_key :: String
, __ans__ :: Bool
, anonymous :: Bool
} deriving (Show)
And then change the readJSON function as follows (in addition, the missing ans-fields can be fixed by returning False on an unsuccessful parsing attempt):
instance JSON Question where
-- Keep the compiler quiet
showJSON = undefined
readJSON (JSObject obj) = Question <$>
obj ! "question_text" <*>
(fmap Just (obj ! "context_topic") <|> return Nothing) <*>
obj ! "topics" <*>
obj ! "question_key" <*>
(obj ! "__ans__" <|> return False) <*>
obj ! "anonymous"
readJSON _ = mzero
After getting rid of the 1000 in line 9000-something (like sabauma mentioned), I got 4358 as result. So maybe these slight changes are enough?

Weeding duplicates from a list of functions

Is it possible to remove the duplicates (as in nub) from a list of functions in Haskell?
Basically, is it possible to add an instance for (Eq (Integer -> Integer))
In ghci:
let fs = [(+2), (*2), (^2)]
let cs = concat $ map subsequences $ permutations fs
nub cs
<interactive>:31:1:
No instance for (Eq (Integer -> Integer))
arising from a use of `nub'
Possible fix:
add an instance declaration for (Eq (Integer -> Integer))
In the expression: nub cs
In an equation for `it': it = nub cs
Thanks in advance.
...
Further, based on larsmans' answer, I am now able to do this
> let fs = [AddTwo, Double, Square]
> let css = nub $ concat $ map subsequences $ permutations fs
in order to get this
> css
[[],[AddTwo],[Double],[AddTwo,Double],[Square],[AddTwo,Square],[Double,Square],[AddTwo,Double,Square],[Double,AddTwo],[Double,AddTwo,Square],[Square,Double],[Square,AddTwo],[Square,Double,AddTwo],[Double,Square,AddTwo],[Square,AddTwo,Double],[AddTwo,Square,Double]]
and then this
> map (\cs-> call <$> cs <*> [3,4]) css
[[],[5,6],[6,8],[5,6,6,8],[9,16],[5,6,9,16],[6,8,9,16],[5,6,6,8,9,16],[6,8,5,6],[6,8,5,6,9,16],[9,16,6,8],[9,16,5,6],[9,16,6,8,5,6],[6,8,9,16,5,6],[9,16,5,6,6,8],[5,6,9,16,6,8]]
, which was my original intent.
No, this is not possible. Functions cannot be compared for equality.
The reason for this is:
Pointer comparison makes very little sense for Haskell functions, since then the equality of id and \x -> id x would change based on whether the latter form is optimized into id.
Extensional comparison of functions is impossible, since it would require a positive solution to the halting problem (both functions having the same halting behavior is a necessary requirement for equality).
The workaround is to represent functions as data:
data Function = AddTwo | Double | Square deriving Eq
call AddTwo = (+2)
call Double = (*2)
call Square = (^2)
No, it's not possible to do this for Integer -> Integer functions.
However, it is possible if you're also ok with a more general type signature Num a => a -> a, as your example indicates! One naïve way (not safe), would go like
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE NoMonomorphismRestriction #-}
data NumResLog a = NRL { runNumRes :: a, runNumResLog :: String }
deriving (Eq, Show)
instance (Num a) => Num (NumResLog a) where
fromInteger n = NRL (fromInteger n) (show n)
NRL a alog + NRL b blog
= NRL (a+b) ( "("++alog++ ")+(" ++blog++")" )
NRL a alog * NRL b blog
= NRL (a*b) ( "("++alog++ ")*(" ++blog++")" )
...
instance (Num a) => Eq (NumResLog a -> NumResLog a) where
f == g = runNumResLog (f arg) == runNumResLog (g arg)
where arg = NRL 0 "THE ARGUMENT"
unlogNumFn :: (NumResLog a -> NumResLog c) -> (a->c)
unlogNumFn f = runNumRes . f . (`NRL`"")
which works basically by comparing a "normalised" version of the functions' source code. Of course this fails when you compare e.g. (+1) == (1+), which are equivalent numerically but yield "(THE ARGUMENT)+(1)" vs. "(1)+(THE ARGUMENT)" and thus are indicated as non-equal. However, since functions Num a => a->a are essentially constricted to be polynomials (yeah, abs and signum make it a bit more difficult, but it's still doable), you can find a data type that properly handles those equivalencies.
The stuff can be used like this:
> let fs = [(+2), (*2), (^2)]
> let cs = concat $ map subsequences $ permutations fs
> let ncs = map (map unlogNumFn) $ nub cs
> map (map ($ 1)) ncs
[[],[3],[2],[3,2],[1],[3,1],[2,1],[3,2,1],[2,3],[2,3,1],[1,2],[1,3],[1,2,3],[2,1,3],[1,3,2],[3,1,2]]