Can't define an exception only in a mli file - exception

Ok, this is mostly about curiosity but I find it too strange.
Let's suppose I have this code
sig.mli
type t = A | B
main.ml
let f =
let open Sig in
function A | B -> ()
If I compile, everything will work.
Now, let's try to modify sig.mli
sig.mli
type t = A | B
exception Argh
and main.ml
main.ml
let f =
let open Sig in
function
| A -> ()
| B -> raise Argh
And let's try to compile it :
> ocamlc -o main sig.mli main.ml
File "main.ml", line 1:
Error: Error while linking main.cmo:
Reference to undefined global `Sig'
Well, is it just because I added the exception ? Maybe it means that exceptions are like functions or modules, you need a proper implementation.
But then, what if I write
main.ml
let f =
let open Sig in
function A | B -> ()
And try to compile ?
> ocamlc -o main sig.mli main.ml
>
It worked ! If I don't use the exception, it compiles !
There is no reason to this behaviour, right ? (I tested it on different compilers, 3.12.0, 4.00.0, 4.02.3 and 4.03.0 and all of them gave the same error)

Unlike variants, exception is not a pure type and requires its implementation in .ml file. Compile the following code with ocamlc -dlambda -c x.ml:
let x = Exit
-- the output --
(setglobal X!
(seq (opaque (global Pervasives!))
(let (x/1199 = (field 2 (global Pervasives!)))
(pseudo _none_(1)<ghost>:-1--1 (makeblock 0 x/1199)))))
You can see (let (x/1999 = (field 2 (global Pervasives!))).. which means assigning the value stored in the 2nd position of module Pervasives. This is the value of Exit. Exceptions have their values and therefore need .ml.
Variants do not require implementation. It is since their values can be constructed purely from their type information: constructors' tag integers. We cannot assign tag integers to exceptions (and their generalized version, open type constructors) since they are openly defined. Instead they define values for their identification in .ml.

To get an implementation of the exception, you need sig.ml. A .mli file is an interface file, a .ml file is an implementation file.
For this simple example you could just rename sig.mli to sig.ml:
$ cat sig.ml
type t = A | B
exception Argh
$ cat main.ml
let f =
let open Sig in
function
| A -> ()
| B -> raise Argh
$ ocamlc -o main sig.ml main.ml
I don't see a problem with this behavior, though it would be nice not to have to duplicate types and exceptions between .ml and .mli files. The current setup has the advantage of being simple and explicit. (I'm not a fan of compilers being too clever and doing things behind my back.)

Related

Get Column in Haskell CSV and infer the column type

I'm exploring a csv file in an interactive ghci session (in a jupyter notebook):
import Text.CSV
import Data.List
import Data.Maybe
dat <- parseCSVFromFile "/home/user/data.csv"
headers = head dat
records = tail dat
-- define a way to get a particular row by index
indexRow :: [[Field]] -> Int -> [Field]
indexRow csv index = csv !! index
indexRow records 1
-- this works!
-- Now, define a way to get a particular column by index
indexField :: [[Field]] -> Int -> [Field]
indexField records index = map (\x -> x !! index) records
While this works if I know in advance the type of column 3:
map (\x -> read x :: Double) $ indexField records 3
How can I ask read to infer what the type might be when for example my columns could contain strings or num? I'd like it to try for me, but:
map read $ indexField records 3
fails with
Prelude.read: no parse
I don't care whether they are string or num, I just need that they are all the same and I am failing to find a way to specify that generally with the read function at least.
Weirdly, if I define a mean function like so:
mean :: Fractional a => [a] -> Maybe a
mean [] = Nothing
mean [x] = Just x
mean xs = Just (sum(xs) / (fromIntegral (length xs)))
This works:
mean $ map read $ indexField records 2
Just 13.501359655240003
But without the mean, this still fails:
map read $ indexField records 2
Prelude.read: no parse
Unfortunately, read is at the end of its wits when it comes to situations like this. Let's revisit read:
read :: Read a => String -> a
As you can see, a doesn't depend on the input, but solely on the output, and therefore of the context of our function. If you use read a + read b, then the additional Num context will limit the types to Integer or Double due to default rules. Let's see it in action:
> :set +t
> read "1234"
*** Exception: Prelude.read: no parse
> read "1234" + read "1234"
2468
it :: (Num a, Read a) => a
Ok, a is still not helpful. Is there any type that we can read without additional context? Sure, unit:
> read "()"
()
it :: Read a => a
That's still not helpful at all, so let's enable the monomorphism restriction:
> :set -XMonomorphismRestriction
> read "1234" + read "1234"
2468
it :: Integer
Aha. In the end, we had an Integer. Due to +, we had to decide on a type. Now, with the MonomorphismRestriction enabled, what happens on read "1234" without additional context?
> read "1234"
<interactive>:20:1
No instance for (Read a0) arising from a use of 'read'
The type variable 'a0' is ambiguous
Now GHCi doesn't pick any (default) type and forces you to chose one. Which makes the underlying error much more clear.
So how do we fix this? As CSV can contain arbitrary fields at run-time and all types are determined statically, we have to cheat by introducing something like
data CSVField = CSVString String | CSVNumber Double | CSVUnknown
and then write
parse :: Field -> CSVField
After all, our type needs to cover all possible fields.
However, in your case, we can just restrict read's type:
myRead :: String -> Double
myRead = read
But that's not wise, as we can still end up with errors if the column doesn't contain Doubles to begin with. So instead, let's use readMaybe and mapM:
columnAsNumbers :: [Field] -> Maybe [Double]
columnAsNumbers = mapM readMaybe
That way, the type is fixed, and we're forced to check whether we have Just something or Nothing:
mean <$> columnAsNumbers (indexFields records 2)
If you find yourself often using columnAsNumbers create an operator, though:
(!!$) :: [[Field]] -> Maybe [Double]
records !!$ index = columnAsNumbers $ indexFields records index

In a Haskell script, how does one programatically obtain the type signature of a function?

In Haskell (GHC), how can one obtain the type signature of the list of functions shown below?
[tail,init,reverse]
I unsuccessfully tried using the typeOf function of the Data.Typeable module. Specifically, I try to run the following Haskell script:
import Data.Typeable
import Test.HUnit
myTest = TestCase
( assertEqual "\n\nShould have been \"[[a] -> [a]]\""
"[[a] -> [a]]"
(show ( typeOf [tail,init,reverse] )) )
tests = TestList [ (TestLabel "myTest" myTest) ]
However, GHC responds with the following error:
C:\>ghci my_script.hs
GHCi, version 8.0.2: http://www.haskell.org/ghc/ :? for help
[1 of 1] Compiling Main ( my_script.hs, interpreted )
my_script.hs:7:21: error:
* No instance for (Typeable a0) arising from a use of `typeOf'
* In the first argument of `show', namely
`(typeOf [tail, init, reverse])'
In the third argument of `assertEqual', namely
`(show (typeOf [tail, init, reverse]))'
In the first argument of `TestCase', namely
`(assertEqual
"\n\
\\n\
\Should have been \"[[a] -> [a]]\""
"[[a] -> [a]]"
(show (typeOf [tail, init, reverse])))'
Failed, modules loaded: none.
Prelude>
Update: The following HUnit test case isn't quite what I wanted, but I did get it to pass (based on David Young's suggestion). This test case at least forces the compiler to confirm that [tail,init,reverse] is of type [ [a] -> [a] ].
import Data.Typeable
import Test.HUnit
myTest = TestCase
( assertEqual "\n\nShould have been 3"
3
( length ( [tail,init,reverse] :: [[a]->[a]] ) ) )
tests = TestList [ (TestLabel "myTest" myTest) ]
C:\>my_script.hs
GHCi, version 8.0.2: http://www.haskell.org/ghc/ :? for help
[1 of 1] Compiling Main ( my_script.hs, interpreted )
Ok, modules loaded: Main.
*Main> runTestTT tests
Cases: 1 Tried: 1 Errors: 0 Failures: 0
You don't need a unit test to check a function's type. A unit tests runs after the code has been compiled, it's a dynamic test. However, type checking is a static test: all types are tested during the compilation of your program. Therefore, we can use GHC as a minimal static type checker and reduce your program to:
main :: IO ()
main = return ()
where
tailInitReverseAreListFunctions :: [[a] -> [a]]
tailInitReverseAreListFunctions = [tail, init, reverse]
You don't even need that test anymore the moment you actually test your functions with real data, because that application will (statically) test the function's type too.
Remember, Haskell is a statically typed language. The types are checked during compilation, before your code is run. Any type checking unit-test is therefore more or less a code-smell, because it can only pass.

Haskell - Pass arbitrary function and argument list as arguments to another function

My main goal is to redirect stderr to a file.
I got hold of the following code snippet...
catchOutput :: IO a -> IO (res, String)
catchOutput f = do
tmpd <- getTemporaryDirectory
(tmpf, tmph) <- openTempFile tmpd "haskell_stderr"
stderr_dup <- hDuplicate stderr
hDuplicateTo tmph stderr
hClose tmph
res <- f
hDuplicateTo stderr_dup stderr
str <- readFile tmpf
removeFile tmpf
return (res, str)
I hoped to make this more general and pass any function and argument list to catchOutput and get the function result as well as message written to stderr (if any).
I thought that an argument list of type [Data.Dynamic] might work but I failed to retrieve the function result with
res <- Data.List.foldl (f . fromDyn) Nothing $ args
Is this even possible? Help will be greatly appreciated.
There is not reason to use Data.Dynamic. You already know type the return type of f, it's a so you can use just that, i.e.
catchOutput :: IO a -> IO (a, String)
Note though, that there some significant issues with your approach:
By redirecting stderr to a file, this will also affect all other concurrent threads. So you could possibly get unrelated data sent to the temporary file.
If an exception is thrown while stderr is redirected, the original stderr will not be restored. Any operation between the two hDuplicateTo lines (hClose and f in this case) could possibly throw an exception, or the thread may receive an asynchronous exception. For this reason, you have to use something like bracket to make your code exception safe.

couldn't match type 'ByteString o0 m0 Value' Vs 'ByteString Data.Void.Void IO Value'

I am trying the haskell-json-service. When I run the code, it throws error here:
app req sendResponse = handle (sendResponse . invalidJson) $ do
value <- sourceRequestBody req $$ sinkParser json
newValue <- liftIO $ modValue value
sendResponse $ responseLBS
status200
[("Content-Type", "application/json")]
$ encode newValue
Error is,
Couldn't match type ‘conduit-1.2.4:Data.Conduit.Internal.Conduit.ConduitM
ByteString o0 m0 Value’
with ‘conduit-1.2.4.1:Data.Conduit.Internal.Conduit.ConduitM
ByteString Data.Void.Void IO Value’
NB: ‘conduit-1.2.4:Data.Conduit.Internal.Conduit.ConduitM’
is defined in ‘Data.Conduit.Internal.Conduit’
in package ‘conduit-1.2.4’
‘conduit-1.2.4.1:Data.Conduit.Internal.Conduit.ConduitM’
is defined in ‘Data.Conduit.Internal.Conduit’
in package ‘conduit-1.2.4.1’
Expected type: conduit-1.2.4.1:Data.Conduit.Internal.Conduit.Sink
ByteString IO Value
Actual type: conduit-1.2.4:Data.Conduit.Internal.Conduit.ConduitM
ByteString o0 m0 Value
In the second argument of ‘($$)’, namely ‘sinkParser json’
In a stmt of a 'do' block:
value <- sourceRequestBody req $$ sinkParser json
What does double dollar do? And what is this type - ByteString o0 m0 Value?
This appears to be the problem:
conduit-1.2.4:...
conduit-1.2.4.1:...
Your code is using a ByteString type from two different versions of the conduit library. From the point of view of GHC, these two types are unrelated: for instance, you can not pass the first type to a library function which expects the second one.
A cause for this could be using a library X which was compiled against the "old" conduit and a library Y which instead was compiled against the newer version. If your program imports X and Y, you will get in trouble when passing bytestrings from X to Y or vice versa. I have no idea about what X or Y actually are.
Maybe you can recompile X or Y so that they use the same version of conduit.

Printing stack traces

I have a very short test file:
let print_backtrace () = try raise Not_found with
Not_found -> Printexc.print_backtrace stdout;;
let f () = print_backtrace (); Printf.printf "this is to make f non-tail-recursive\n";;
f ();
I compile and run:
% ocamlc -g test.ml
% OCAMLRUNPARAM=b ./a.out
Raised at file "test.ml", line 1, characters 35-44
this is to make f non-tail-recursive
Why isn't f listed in the stack trace? How can I write a function that will print a stack trace of the location it's called from?
The documentation for Printexc.print_backtrace says:
The backtrace lists the program locations where the most-recently raised exception was raised and where it was propagated through function calls.
It actually seems to be doing the right thing. The exception hasn't been propagated back through f.
If I move the call to Printexc.print_backtrace outside the call to f, I see a full backtrace.
$ cat test2.ml
let print_backtrace () = raise Not_found
let f () = let res = print_backtrace () in res ;;
try f () with Not_found -> Printexc.print_backtrace stdout
$ /usr/local/ocaml312/bin/ocamlc -g test2.ml
$ OCAMLRUNPARAM=b a.out
Raised at file "test2.ml", line 1, characters 31-40
Called from file "test2.ml", line 3, characters 21-39
Called from file "test2.ml", line 5, characters 4-8
Here is the code to do what I suggested. I recommend using ocamldebug if at all possible, this code is much too tricky. But it works on my system for this simple example.
let print_backtrace () =
match Unix.fork () with
| 0 -> raise Not_found
| pid -> let _ = Unix.waitpid [] pid in ()
let f () =
begin
print_backtrace ();
Printf.printf "after the backtrace\n";
end
;;
f ()
Here is a test run.
$ /usr/local/ocaml312/bin/ocamlc unix.cma -g test3.ml
$ OCAMLRUNPARAM=b a.out
Fatal error: exception Not_found
Raised at file "test3.ml", line 3, characters 17-26
Called from file "test3.ml", line 8, characters 4-22
Called from file "test3.ml", line 14, characters 0-4
after the backtrace
I realized that because of the uncaught exception, you don't really have any control over the way the child process exits. That's one reason this code is much too tricky. Please don't blame me if it doesn't work for you, but I hope it does prove useful.
I tested the code on Mac OS X 10.6.8 using OCaml 3.12.0.
Best regards,