haskell word searching program development - function

hello I am making some word searching program
for example
when "text.txt" file contains "foo foos foor fo.. foo fool"
and search "foo"
then only number 2 printed
and search again and again
but I am haskell beginner
my code is here
:module +Text.Regex.Posix
putStrLn "type text file"
filepath <- getLine
data <- readFile filepath
--1. this makes <interactive>:1:1: parse error on input `data' how to fix it?
parsedData =~ "[^- \".,\n]+" :: [[String]]
--2. I want to make function and call it again and again
searchingFunc = do putStrLn "search for ..."
search <- getLine
result <- map (\each -> if each == search then count = count + 1) data
putStrLn result
searchingFunc
}
sorry for very very poor code
my development environment is Windows XP SP3 WinGhci 1.0.2
I started the haskell several hours ago sorry
thank you very much for reading!
edit: here's original scheme code
thanks!
#lang scheme/gui
(define count 0)
(define (search str)
(set! count 0)
(map (λ (each) (when (equal? str each) (set! count (+ count 1)))) data)
(send msg set-label (format "~a Found" count)))
(define path (get-file))
(define port (open-input-file path))
(define data '())
(define (loop [line (read-line port)])
(when (not (eof-object? line))
(set! data (append data
(regexp-match* #rx"[^- \".,\n]+" line)))
(loop)))
(loop)
(define (cb-txt t e) (search (send t get-value)))
(define f (new frame% (label "text search") (min-width 300)))
(define txt (new text-field% (label "type here to search") (parent f) (callback (λ (t e) (cb-txt t e)))))
(define msg (new message% (label "0Found ") (parent f)))
(send f show #t)

I should start by iterating what everyone would (and should) say: Start with a book like Real World Haskell! That said, I'll post a quick walkthrough of code that compiles, and hopefully does something close to what you originally intended. Comments are inline, and hopefully should illustrate some of the shortcomings of your approach.
import Text.Regex.Posix
-- Let's start by wrapping your first attempt into a 'Monadic Action'
-- IO is a monad, and hence we can sequence 'actions' (read as: functions)
-- together using do-notation.
attemptOne :: IO [[String]]
-- ^ type declaration of the function 'attemptOne'
-- read as: function returning value having type 'IO [[String]]'
attemptOne = do
putStrLn "type text file"
filePath <- getLine
fileData <- readFile filePath
putStrLn fileData
let parsed = fileData =~ "[^- \".,\n]+" :: [[String]]
-- ^ this form of let syntax allows us to declare that
-- 'wherever there is a use of the left-hand-side, we can
-- substitute it for the right-hand-side and get equivalent
-- results.
putStrLn ("The data after running the regex: " ++ concatMap concat parsed)
return parsed
-- ^ return is a monadic action that 'lifts' a value
-- into the encapsulating monad (in this case, the 'IO' Monad).
-- Here we show that given a search term (a String), and a body of text to
-- search in, we can return the frequency of occurrence of the term within the
-- text.
searchingFunc :: String -> [String] -> Int
searchingFunc term
= length . filter predicate
where
predicate = (==)term
-- ^ we use function composition (.) to create a new function from two
-- existing ones:
-- filter (drop any elements of a list that don't satisfy
-- our predicate)
-- length: return the size of the list
-- Here we build a wrapper-function that allows us to run our 'pure'
-- searchingFunc on an input of the form returned by 'attemptOne'.
runSearchingFunc :: String -> [[String]] -> [Int]
runSearchingFunc term parsedData
= map (searchingFunc term) parsedData
-- Here's an example of piecing everything together with IO actions
main :: IO ()
main = do
results <- attemptOne
-- ^ run our attemptOne function (representing IO actions)
-- and save the result
let searchResults = runSearchingFunc "foo" results
-- ^ us a 'let' binding to state that searchResults is
-- equivalent to running 'runSearchingFunc'
print searchResults
-- ^ run the IO action that prints searchResults
print (runSearchingFunc "foo" results)
-- ^ run the IO action that prints the 'definition'
-- of 'searchResults'; i.e. the above two IO actions
-- are equivalent.
return ()
-- as before, lift a value into the encapsulating Monad;
-- this time, we're lifting a value corresponding to 'null/void'.
To load this code, save it into a .hs file (I saved it into 'temp.hs'), and run the following from ghci. Note: the file 'f' contains a few input words:
*Main Text.Regex.Posix> :l temp.hs
[1 of 1] Compiling Main ( temp.hs, interpreted )
Ok, modules loaded: Main.
*Main Text.Regex.Posix> main
type text file
f
foo foos foor fo foo foo
The data after running the regex: foofoosfoorfofoofoo
[1,0,0,0,1,1]
[1,0,0,0,1,1]
There is a lot going on here, from do notation to Monadic actions, 'let' bindings to the distinction between pure and impure functions/values. I can't stress the value of learning the fundamentals from a good book!

Here is what I made of it. It doesn't does any error checking and is as basic as possible.
import Text.Regex.Posix ((=~))
import Control.Monad (when)
import Text.Printf (printf)
-- Calculates the number of matching words
matchWord :: String -> String -> Int
matchWord file word = length . filter (== word) . concat $ file =~ "[^- \".,\n]+"
getInputFile :: IO String
getInputFile = do putStrLn "Enter the file to search through:"
path <- getLine
readFile path -- Attention! No error checking here
repl :: String -> IO ()
repl file = do putStrLn "Enter word to search for (empty for exit):"
word <- getLine
when (word /= "") $
do print $ matchWord file word
repl file
main :: IO ()
main = do file <- getInputFile
repl file

Please start step by step. IO in Haskell is hard, so you shouldn't start with file manipulation. I would suggest to write a function that works properly on a given String. That way you can learn about syntax, pattern matching, list manipulation (maps, folds) and recursion without beeing distracted by the do notation (which kinda looks imperative, but isn't, and really needs a deeper understanding).
You should check out Learn you a Haskell or Real World Haskell to get a sound foundation. What you do now is just stumbling in the dark - which may work if you learn languages that are similar to the ones you know, but definitely not for Haskell.

Related

Get Column in Haskell CSV and infer the column type

I'm exploring a csv file in an interactive ghci session (in a jupyter notebook):
import Text.CSV
import Data.List
import Data.Maybe
dat <- parseCSVFromFile "/home/user/data.csv"
headers = head dat
records = tail dat
-- define a way to get a particular row by index
indexRow :: [[Field]] -> Int -> [Field]
indexRow csv index = csv !! index
indexRow records 1
-- this works!
-- Now, define a way to get a particular column by index
indexField :: [[Field]] -> Int -> [Field]
indexField records index = map (\x -> x !! index) records
While this works if I know in advance the type of column 3:
map (\x -> read x :: Double) $ indexField records 3
How can I ask read to infer what the type might be when for example my columns could contain strings or num? I'd like it to try for me, but:
map read $ indexField records 3
fails with
Prelude.read: no parse
I don't care whether they are string or num, I just need that they are all the same and I am failing to find a way to specify that generally with the read function at least.
Weirdly, if I define a mean function like so:
mean :: Fractional a => [a] -> Maybe a
mean [] = Nothing
mean [x] = Just x
mean xs = Just (sum(xs) / (fromIntegral (length xs)))
This works:
mean $ map read $ indexField records 2
Just 13.501359655240003
But without the mean, this still fails:
map read $ indexField records 2
Prelude.read: no parse
Unfortunately, read is at the end of its wits when it comes to situations like this. Let's revisit read:
read :: Read a => String -> a
As you can see, a doesn't depend on the input, but solely on the output, and therefore of the context of our function. If you use read a + read b, then the additional Num context will limit the types to Integer or Double due to default rules. Let's see it in action:
> :set +t
> read "1234"
*** Exception: Prelude.read: no parse
> read "1234" + read "1234"
2468
it :: (Num a, Read a) => a
Ok, a is still not helpful. Is there any type that we can read without additional context? Sure, unit:
> read "()"
()
it :: Read a => a
That's still not helpful at all, so let's enable the monomorphism restriction:
> :set -XMonomorphismRestriction
> read "1234" + read "1234"
2468
it :: Integer
Aha. In the end, we had an Integer. Due to +, we had to decide on a type. Now, with the MonomorphismRestriction enabled, what happens on read "1234" without additional context?
> read "1234"
<interactive>:20:1
No instance for (Read a0) arising from a use of 'read'
The type variable 'a0' is ambiguous
Now GHCi doesn't pick any (default) type and forces you to chose one. Which makes the underlying error much more clear.
So how do we fix this? As CSV can contain arbitrary fields at run-time and all types are determined statically, we have to cheat by introducing something like
data CSVField = CSVString String | CSVNumber Double | CSVUnknown
and then write
parse :: Field -> CSVField
After all, our type needs to cover all possible fields.
However, in your case, we can just restrict read's type:
myRead :: String -> Double
myRead = read
But that's not wise, as we can still end up with errors if the column doesn't contain Doubles to begin with. So instead, let's use readMaybe and mapM:
columnAsNumbers :: [Field] -> Maybe [Double]
columnAsNumbers = mapM readMaybe
That way, the type is fixed, and we're forced to check whether we have Just something or Nothing:
mean <$> columnAsNumbers (indexFields records 2)
If you find yourself often using columnAsNumbers create an operator, though:
(!!$) :: [[Field]] -> Maybe [Double]
records !!$ index = columnAsNumbers $ indexFields records index

Haskell Errors Binding a CSV File to a Handle

So I was writing a small toybox of utility functions for CSV files and throughout testing the functions I'd been binding the file by hand with
table' <- parseCSVFromFile filepath
but (from Text.CSV)
parseCSVFromFile :: FilePath -> IO (Either parsec-3.1.9:Text.Parsec.Error.ParseError CSV)
and so I had to write a quick line to strip off this either error csv crap
stripBare csv = head $ rights $ flip (:) [] csv
and redefine it as table = stripBare table'. After this, list functions work on the csv files contents and life goes on
(Digression: Surprisingly there isn't a direct Either a b -> b function in Data.Either. So I used Data.Either.rights :: [Either a b] -> [b])
I wanted to do the work of undressing the csv type and binding it to a handle in one shot. Something like
table = stripBare $ table' <- parseCSVFromFile filepath
but this gives a parse error on (<-) saying I may be missing a do... then
table = stripBare $ do table' <- parseCSVFromFile filepath
yells at me saying that the last statement in a do block must be an expression.
What am I doing wrong?
As a separate curiosity, I saw here that
do notation in Haskell desugars in a pretty simple way.
do
x <- foo
e1
e2
...
turns into
foo >>= \x ->
do
e1
e2
I find this attractive and tried the following line which gave me a type error
*Toy> :type (parseCSVFromFile "~/.csv") >>= \x -> x
<interactive>:1:52: error:
* Couldn't match type `Either
parsec-3.1.9:Text.Parsec.Error.ParseError'
with `IO'
Expected type: IO CSV
Actual type: Either parsec-3.1.9:Text.Parsec.Error.ParseError CSV
* In the expression: x
In the second argument of `(>>=)', namely `\ x -> x'
In the expression:
(parseCSVFromFile "~/.csv") >>= \ x -> x
Code such as
head $ rights $ flip (:) [] csv
is dangerous. You are exploiting the partiality of head to hide the fact that csv might be a Left something. We can rewrite it as
= head $ rights $ (:) csv []
= head $ rights [csv]
= case csv of
Left _ -> error "Left found!"
Right x -> x
Usually it's better to handle the Left case directly in the do IO block. Something like: (pseudo code follows)
foo :: String -> IO ()
foo filepath = do
table' <- parseCSVFromFile filepath
case table' of
Left err -> do
putStrLn "Error in parsing CSV"
print err
moreErrorHandlingHere
Right table -> do
putStrLn "CSV loaded!"
use table

Haskell - Pass arbitrary function and argument list as arguments to another function

My main goal is to redirect stderr to a file.
I got hold of the following code snippet...
catchOutput :: IO a -> IO (res, String)
catchOutput f = do
tmpd <- getTemporaryDirectory
(tmpf, tmph) <- openTempFile tmpd "haskell_stderr"
stderr_dup <- hDuplicate stderr
hDuplicateTo tmph stderr
hClose tmph
res <- f
hDuplicateTo stderr_dup stderr
str <- readFile tmpf
removeFile tmpf
return (res, str)
I hoped to make this more general and pass any function and argument list to catchOutput and get the function result as well as message written to stderr (if any).
I thought that an argument list of type [Data.Dynamic] might work but I failed to retrieve the function result with
res <- Data.List.foldl (f . fromDyn) Nothing $ args
Is this even possible? Help will be greatly appreciated.
There is not reason to use Data.Dynamic. You already know type the return type of f, it's a so you can use just that, i.e.
catchOutput :: IO a -> IO (a, String)
Note though, that there some significant issues with your approach:
By redirecting stderr to a file, this will also affect all other concurrent threads. So you could possibly get unrelated data sent to the temporary file.
If an exception is thrown while stderr is redirected, the original stderr will not be restored. Any operation between the two hDuplicateTo lines (hClose and f in this case) could possibly throw an exception, or the thread may receive an asynchronous exception. For this reason, you have to use something like bracket to make your code exception safe.

Interaction between optimizations and testing for error calls

I have a function in a module that looks something like this:
module MyLibrary (throwIfNegative) where
throwIfNegative :: Integral i => i -> String
throwIfNegative n | n < 0 = error "negative"
| otherwise = "no worries"
I could of course return Maybe String or some other variant, but I think it's fair to say that it's a programmer error to call this function with a negative number so using error is justified here.
Now, since I like having my test coverage at 100% I want to have a test case that checks this behavior. I have tried this
import Control.Exception
import Test.HUnit
import MyLibrary
case_negative =
handleJust errorCalls (const $ return ()) $ do
evaluate $ throwIfNegative (-1)
assertFailure "must throw when given a negative number"
where errorCalls (ErrorCall _) = Just ()
main = runTestTT $ TestCase case_negative
and it sort of works, but it fails when compiling with optimizations:
$ ghc --make -O Test.hs
$ ./Test
### Failure:
must throw when given a negative number
Cases: 1 Tried: 1 Errors: 0 Failures: 1
I'm not sure what's happening here. It seems like despite my use of evaluate, the function does not get evaluated. Also, it works again if I do any of these steps:
Remove HUnit and call the code directly
Move throwIfNegative to the same module as the test case
Remove the type signature of throwIfNegative
I assume this is because it causes the optimizations to be applied differently. Any pointers?
Optimizations, strictness, and imprecise exceptions can be a bit tricky.
The easiest way to reproduce this problem above is with a NOINLINE on throwIfNegative (the function isn't being inlined across module boundaries either):
import Control.Exception
import Test.HUnit
throwIfNegative :: Int -> String
throwIfNegative n | n < 0 = error "negative"
| otherwise = "no worries"
{-# NOINLINE throwIfNegative #-}
case_negative =
handleJust errorCalls (const $ return ()) $ do
evaluate $ throwIfNegative (-1)
assertFailure "must throw when given a negative number"
where errorCalls (ErrorCall _) = Just ()
main = runTestTT $ TestCase case_negative
Reading the core, with optimizations on, the GHC inlines evaluate properly (?):
catch#
# ()
# SomeException
(\ _ ->
case throwIfNegative (I# (-1)) of _ -> ...
and then floats out the call to throwIfError, outside of the case scrutinizer:
lvl_sJb :: String
lvl_sJb = throwIfNegative lvl_sJc
lvl_sJc = I# (-1)
throwIfNegative =
\ (n_adO :: Int) ->
case n_adO of _ { I# x_aBb ->
case <# x_aBb 0 of _ {
False -> lvl_sCw; True -> error lvl_sCy
and strangely, at this point, no other code now calls lvl_sJb, so the entire test becomes dead code, and is stripped out -- GHC has determined that it is unused!
Using seq instead of evaluate is happy enough:
case_negative =
handleJust errorCalls (const $ return ()) $ do
throwIfNegative (-1) `seq` assertFailure "must throw when given a negative number"
where errorCalls (ErrorCall _) = Just ()
or a bang pattern:
case_negative =
handleJust errorCalls (const $ return ()) $ do
let !x = throwIfNegative (-1)
assertFailure "must throw when given a negative number"
where errorCalls (ErrorCall _) = Just ()
so I think we should look at the semantics of evaluate:
-- | Forces its argument to be evaluated to weak head normal form when
-- the resultant 'IO' action is executed. It can be used to order
-- evaluation with respect to other 'IO' operations; its semantics are
-- given by
--
-- > evaluate x `seq` y ==> y
-- > evaluate x `catch` f ==> (return $! x) `catch` f
-- > evaluate x >>= f ==> (return $! x) >>= f
--
-- /Note:/ the first equation implies that #(evaluate x)# is /not/ the
-- same as #(return $! x)#. A correct definition is
--
-- > evaluate x = (return $! x) >>= return
--
evaluate :: a -> IO a
evaluate a = IO $ \s -> let !va = a in (# s, va #) -- NB. see #2273
That #2273 bug is a pretty interesting read.
I think GHC is doing something suspicious here, and recommend not using evalaute (instead, use seq directly). This needs more thinking about what GHC is doing with the strictness.
I've filed a bug report to help get a determination from GHC HQ.

Why pattern matching does not throw exception in Maybe monad

My question is simple. Why wrong pattern matching does not throw exception in Maybe monad. For clarity :
data Task = HTTPTask {
getParams :: [B.ByteString],
postParams :: [B.ByteString],
rawPostData :: B.ByteString
} deriving (Show)
tryConstuctHTTPTask :: B.ByteString -> Maybe Task
tryConstuctHTTPTask str = do
case decode str of
Left _ -> fail ""
Right (Object trie) -> do
Object getP <- DT.lookup (pack "getParams") trie
Object postP <- DT.lookup (pack "postParams") trie
String rawData <- DT.lookup (pack "rawPostData") trie
return $ HTTPTask [] [] rawData
Look at tryConstuctHTTPTask function. I think that when the pattern does not match (for example "Object getP") we must get something like "Prelude.Exception", instead i get the "Nothing". I like this behavior but i am not understand why.
Thanks.
Doing pattern <- expression in a do-block, will call fail when the pattern does not match. So it is equivalent to doing
expression >>= \x ->
case x of
pattern -> ...
_ -> fail
Since fail is defined as Nothing in the Maybe monad, you get Nothing for failed pattern matches using <-.