Say I'm going to open a file and parse its contents, and I want to do that lazily:
parseFile :: FilePath -> IO [SomeData]
parseFile path = openBinaryFile path ReadMode >>= parse' where
parse' handle = hIsEOF handle >>= \eof -> do
if eof then hClose handle >> return []
else do
first <- parseFirst handle
rest <- unsafeInterleaveIO $ parse' handle
return (first : rest)
The above code is fine if no error occurs during the whole reading process. But if an exception is thrown, there would be no chance to execute hClose, and the handle won't be correctly closed.
Usually, if the IO process isn't lazy, exception handling could be easily solved by catch or bracket. However in this case normal exception handling methods will cause the file handle to be closed before the actual reading process starts. That of course not acceptable.
So what is the common way to release some resources that need to be kept out of its scope because of laziness, like what I'm doing, and still ensuring exception safety?
Instead of using openBinaryFile, you could use withBinaryFile:
parseFile :: FilePath -> ([SomeData] -> IO a) -> IO a
parseFile path f = withBinaryFile path ReadMode $ \h -> do
values <- parse' h
f values
where
parse' = ... -- same as now
However, I'd strongly recommend you consider using a streaming data library instead, as they are designed to work with this kind of situation and handle exceptions properly. For example, with conduit, your code would look something like:
parseFile :: MonadResource m => FilePath -> Producer m SomeData
parseFile path = bracketP
(openBinaryFile path ReadMode)
hClose
loop
where
loop handle = do
eof <- hIsEOF handle
if eof
then return ()
else parseFirst handle >>= yield >> loop handle
And if you instead rewrite your parseFirst function to use conduit itself and not drop down to the Handle API, this glue code would be shorter, and you wouldn't be tied directly to Handle, which makes it easier to use other data sources and perform testing.
The conduit tutorial is available on the School of Haskell.
UPDATE One thing I forgot to mention is that, while the question focuses on exceptions preventing the file from being closed, even non-exceptional situations will result in that, if you don't completely consume the input. For example, if you file has more than one record, and you only force evaluation of the first one, the file will not be closed until the garbage collector is able to reclaim the handle. Yet another reason for either withBinaryFile or a streaming data library.
Related
I'm reading FP and I have two basic questions:
FP says function should take one input and gives single output. So what should I do with void methods? It doesn't return anything right?
FP says function should have single
resresponsibility, then how do we handle log statements inside the method? That doesn't violate the rule?
Wish to know how they handle these things in Scala, Haskell.
Thanks in advance.
I'm assuming you're reading a book called "Functional Programming", although it would help to know who the author is as well. In any case, these questions are relatively easy to answer and I'll give my answers with respect to Haskell because I don't know Scala.
So what should I do with void methods? It doesn't return anything right?
There are no void methods in a pure functional language like Haskell. A pure function has no side effects, so a pure function without a return value is meaningless, something like
f :: Int -> ()
f x = let y = x * x + 3 in ()
won't do any computation, y is never calculated and all inputs you give will return the same value. However, if you have an impure function, such as one that writes a file or prints something to the screen then it must exist in a monadic context. If you don't understand monads yet, don't worry. They take a bit to get used to, but they're a very powerful and useful abstraction that can make a lot of problems easier. A monad is something like IO, and in Haskell this takes a type parameter to indicate the value that can be stored inside this context. So you can have something like
putStrLn :: String -> IO ()
Or
-- FYI: FilePath is an alias for String
writeFile :: FilePath -> String -> IO ()
these have side effects, denoted by the return value of IO something, and the () something means that there is no meaningful result from that operation. In Python 3, for example, the print function returns None because there isn't anything meaningful to return after printing a value to the screen. The () can also mean that a monadic context has a meaningful value, such as in readFile or getLine:
getLine :: IO String
readFile :: FilePath -> IO String
When writing your main function, you could do something like
main = do
putStrLn "Enter a filename:"
fname <- getLine -- fname has type String
writeFile fname "This text will be in a file"
contents <- readFile fname
putStrLn "I wrote the following text to the file:"
putStrLn contents
FP says function should have single resresponsibility, then how do we handle log statements inside the method? That doesn't violate the rule?
Most functions don't need logging inside them. I know that sounds weird, but it's true. In Haskell and most other functional languages, you'll write a lot of small, easily testable functions that each do one step. It's very common to have lots of 1 or 2 line functions in your application.
When you actually do need to do logging, say you're building a web server, there are a couple different approaches you can take. There is actually a monad out there called Writer that lets you aggregate values as you perform operations. These operations don't have to be impure and do IO, they can be entirely pure. However, a true logging framework that one might use for a web server or large application would likely come with its own framework. This is so that you can set up logging to the screen, to files, network locations, email, and more. This monad will wrap the IO monad so that it can perform these side effects. A more advanced one would probably use some more advanced libraries like monad transformers or extensible effects. These let you "combine" different monads together so you can use utilities for both at the same time. You might see code like
type MyApp a = LogT IO a
-- log :: Monad m => LogLevel -> String -> LogT m ()
getConnection :: Socket -> MyApp Connection
getConnection sock = do
log DEBUG "Waiting for next connection"
conn <- liftIO $ acceptConnection sock
log INFO $ "Accepted connection from IP: " ++ show (connectionIP conn)
return conn
I'm not expecting you to understand this code fully, but I hope you can see that it has logging and network operations mixed together. The liftIO function is a common one with monad transformers that "transforms" an IO operation into a new monad that wraps IO.
This may sound pretty confusing, and it can be at first if you're used to Python, Java, or C++ like languages. I certainly was! But after I got used to thinking about problems in this different way makes me wish I had these constructs in OOP languages all the time.
I can answer from Haskell perspective.
FP says function should take one input and gives single output. So what should I do with void methods? It doesn't return anything right?
Because that's what actually functions are! In mathematics, every functions takes some input and gives you some output. You cannot expect some output without giving any input. void methods you see in other languages doesn't make sense in a mathematical way. But in reality void methods in other languages do some kind of IO operations, which is abstracted as IO monad in Haskell.
how do we handle log statements inside the method
You can use a monad transformer stack and lift your IO log operations to perform there. In fact, writer monad can do log operations purely without any IO activities.
I can't get my database access work with lwt. Should I include it in a thread? How? Or make a new thread which returns a 'a lwt value? If so, what to do with that value?
The same goes for Printf.eprintf, which also seems to be blocked by lwt. So I use Lwt_io instead. But why would lwt block regular io?
What I have is a simple db request like Db.update session. It is within an Lwt_main.run main function. All this is within a CGI script (should not matter, database access works fine until I start with the lwt commands).
I can give you more code if needed.
Regards
Olle
Edit
let main sock env =
(* code omitted *)
Gamesession.update_game_session env#db game_session_connected;
(* code omitted *)
Lwt_main.run (main sock_listen env)
Edit 2
This was the solution:
Lwt_preemptive.detach (fun () -> Db.call) ()
Printf.eprintf is not "blocked", it's just that the buffering parameters are changed and often messages do not display before the end of the program. You should try eprintf "something\n%!" (%! means "flush"), but yes it's better to use Lwt_io.
For the database, I don't know, you did not say which library you're using (at least the one called ocaml-mysql is not Lwt-friendly, so it may require using Lwt_preemptive).
Edit
Your:
Lwt_preemptive.detach (fun () -> Db.call) ()
This call creates a thread that, once executed, returns immediately the function Db.call. So, basically in that case Lwt_preemptive.detach does nothing :)
I don't know ocaml-mysql but if:
Db.call: connection_params -> connection_handle
you would have
let lwt_db_call connection_params =
Lwt_preemptive.detach Db.call connection_params
I had made a daemon that used a very primitive form of ipc (telnet and send a String that had certain words in a certain order). I snapped out of it and am now using JSON to pass messages to a Yesod server. However, there were some things I really liked about my design, and I'm not sure what my choices are now.
Here's what I was doing:
buildManager :: Phase -> IO ()
buildManager phase = do
let buildSeq = findSeq phase
jid = JobID $ pack "8"
config = MkConfig $ Just jid
flip C.catch exceptionHandler $
runReaderT (sequence_ $ buildSeq <*> stages) config
-- ^^ I would really like to keep the above line of code, or something like it.
return ()
each function in buildSeq looked like this
foo :: Stage -> ReaderT Config IO ()
data Config = MkConfig (Either JobID Product) BaseDir JobMap
JobMap is a TMVar Map that tracks information about current jobs.
so now, what I have are Handlers, that all look like this
foo :: Handler RepJson
foo represents a command for my daemon, each handler may have to process a different JSON object.
What I would like to do is send one JSON object that represents success, and another JSON object that espresses information about some exception.
I would like foos helper function to be able to return an Either, but I'm not sure how I get that, plus the ability to terminate evaluation of my list of actions, buildSeq.
Here's the only choice I see
1) make sure exceptionHandler is in Handler. Put JobMap in the App record. Using getYesod alter the appropriate value in JobMap indicating details about the exception,
which can then be accessed by foo
Is there a better way?
What are my other choices?
Edit: For clarity, I will explain the role ofHandler RepJson. The server needs some way to accept commands such as build stop report. The client needs some way of knowing the results of these commands. I have chosen JSON as the medium with which the server and client communicate with each other. I'm using the Handler type just to manage the JSON in/out and nothing more.
Philosophically speaking, in the Haskell/Yesod world you want to pass the values forward, rather than return them backwards. So instead of having the handlers return a value, have them call forwards to the next step in the process, which may be to generate an exception.
Remember that you can bundle any amount of future actions into a single object, so you can pass a continuation object to your handlers and foos that basically tells them, "After you are done, run this blob of code." That way they can be void and return nothing.
I'm writing a REST service in Erlang and need to verify the received data before passing it to other internal functions for further processing; in order to do that, I'm currently using nested case expressions like this:
case all_args_defined(Args) of
true ->
ActionSuccess = action(Args),
case ActionSuccess of
{ok, _} -> ...;
{fail, reason} -> {fail, reason}
end,
_ ->
{fail, "args not defined"}
end,
...
I realize this is kind of ugly, but this way I can provide detailed error messages. Additionally, I don't think the usual make it crash philosophy is applicable here - I don't want my REST service to crash and be restarted every time somebody throws invalid arguments at it.
However, I'm considering abandoning all those cases in favor of an umbrella try/catch block catching any badmatch errors - would this work?
fun() ->
true = all_args_defined(Args),
{ok, _} = action(Args).
%% somewhere else
catch fun().
Since what you want to achieve is error reporting, you should structure the thing around the execution of actions and reporting of the result. Perhaps something like this:
execute(Action, Args) ->
try
check_args(Args),
Result = action(Action, Args),
send_result(Result)
catch
throw:{fail, Reason} ->
report_error(Reason);
ExceptionClass:Term ->
%% catch-all for all other unexpected exceptions
Trace = erlang:get_stacktrace(),
report_error({crash, ExceptionClass, Term, Trace})
end.
%% all of these throw {fail, Reason} if they detect something fishy
%% and otherwise they return some value as result (or just crash)
action(foo, [X1, X2]) -> ...;
action(foo, Args) -> throw({fail, {bad_arity, foo, 2, Args}});
action(...) -> ...
%% this handles the formatting of all possible errors
report_error({bad_arity, Action, Arity, Args}) ->
send_error(io_lib:format("wrong number of arguments for ~w: "
"expected ~w, but got ~w",
[Action, Arity, length(Args)]));
report_error(...) -> ...;
report_error({crash, Class, Term, Trace}) ->
send_error(io_lib:format("internal error: "
"~w:~w~nstacktrace:~n~p~n",
[Class, Term, Trace])).
I've had this problem while developing an application that create users.
I first come with a solution like this:
insert() ->
try
check_1(), % the check functions throw an exception on error.
check_2(),
check_3(),
do_insert()
catch
throw:Error1 ->
handle_error_1();
throw:Error2 ->
handle_error_2();
_:Error ->
internal_error()
end.
The problem with this solution is that you lose the stack trace with the try...catch block.
Instead of this, a better solution is:
insert() ->
case catch execute() of
ok -> all_ok;
{FuncName, Error} ->
handle_error(FuncName, Error);
{'EXIT', Error} ->
internal_error(Error)
end.
execute() ->
check_1(), % the check functions throw an exception on error.
check_2(),
check_3(),
do_insert().
This way you have the full error stack on Error.
I have faced exactly the same question when writing my own REST services.
Let's start with the philosophy:
I like to think of my applications like a box. On the inside of the box are all of the parts I built and have direct control over. If something breaks here, it's my fault, it should crash, and I should read about it in an error log. On the edge of the box are all of the connection points to the outside world - these are not to be trusted. I avoid exception handling in the inside parts and use it as needed for the outer edge.
On similar projects I have worked on:
I usually have about a dozen checks on the user input. If something looks bad, I log it and return an error to the user. Having a stack trace isn't particularly meaningful to me - if the user forgot a parameter there is nothing in my code to hunt down and fix. I'd rather see a text log that says something like: “at 17:35, user X accessed path Y but was missing parameter Z”.
I organize my checks into functions that return ok or {error, string()}. The main function just iterates over the checks and returns ok if they all pass, otherwise it returns the first error, which is then logged. Inside of my check functions I use exception handling as needed because I can't possibly consider all of the ways users can screw up.
As suggested by my colleagues, you can alternatively have each check throw an exception instead of using a tuple.
As for your implementation, I think your idea of using a single exception handler is a good one if you only have the single check. If you end up needing more checks you may want to implement something like I described so that you can have more specific logging.
One last question for the evening, I'm building the main input function of my Haskell program and I have to check for the args that are brought in
so I use
args <- getArgs
case length args of
0 -> putStrLn "No Arguments, exiting"
otherwise -> { other methods here}
Is there an intelligent way of setting up other methods, or is it in my best interest to write a function that the other case is thrown to within the main?
Or is there an even better solution to the issue of cases. I've just got to take in one name.
args <- getArgs
case length args of
0 -> putStrLn "No Arguments, exiting"
otherwise -> do
other
methods
here
Argument processing should isolated in a separate function.
Beyond that it's hard to generalize, because there are so many different ways of handling arguments.
Here are some type signatures that are worth considering:
exitIfNonempty :: [Arg] -> IO [Arg] -- return args unless empty
processOptions :: [Arg] -> (OptionRecord, [Arg]) -- convert options to record,
-- return remaining args
processOptionsBySideEffect :: [Arg] -> State [Arg] -- update state from options,
-- return remaining args
callFirstArgAsCommand :: [(Name, [Arg] -> IO ())] -> [Arg] -> IO ()
And a couple sketches of implementations (none of this code has been anywhere near a compiler):
exitIfNonempty [] = putStrLen "No arguments; exiting"
exitIfNonempty args = return args
callFirstArgAsCommand commands [] = fail "Missing command name"
callFirstArgAsCommand commands (f:as) =
case lookup f commands in
Just f -> f as
Nothing -> fail (f ++ " is not the name of any command")
I'll leave the others to your imagination.
Is it in my best interest to write a function that the other case is thrown to within the main?
Yes. Moreover, you should build up a library of combinators that you can call on to process command-line argument easily, for a variety of programs. Such libraries undoubtedly already exist on Hackage, but this is one of those cases where it may be easier to roll your own than to learn somebody else's API (and it will definitely be more fun).
View Patterns might be helpful here.