Read An Input.md file and output a .html file Haskell - html

I had a question concerning some basic transformations in Haskell.
Basically, I have a written Input file, named Input.md. This contains some markdown text that is read in my project file, and I want to write a few functions to do transformations on the text. After completing these functions under a function called convertToHTML, I have output the file as an .html file in the correct format.
module Main
(
convertToHTML,
main
) where
import System.Environment (getArgs)
import System.IO
import Data.Char (toLower, toUpper)
process :: String -> String
process s = head $ lines s
convertToHTML :: String -> String
convertToHTML str = do
x <- str
if (x == '#')
then "<h1>"
else return x
--convertToHTML x = map toUpper x
main = do
args <- getArgs -- command line args
let (infile,outfile) = (\(x:y:ys)->(x,y)) args
putStrLn $ "Input file: " ++ infile
putStrLn $ "Output file: " ++ outfile
contents <- readFile infile
writeFile outfile $ convertToHTML contents
So,
How would I read through my input file, and transform any line that starts with a # to an html tag
How would I read through my input file once more and transform any WORD that is surrounded by _word_ (1 underscore) to another html tag
Replace any Character with an html string.
I tried using such functions such as Map, Filter, ZipWith, but could not figure out how to iterate through the text and transform each text. Please if anybody has any suggestions. I've been working on this for 2 days straight and have a bunch of failed code to show for a couple of weeks and have a bunch of failed code to show it.

I tried using such functions such as Map, Filter, ZipWith, but could not figure out how to iterate through the text and transform each text.
Because they work on appropriate element collection. And they don't really "iterate"; you simply have to feed the appropriate data. Let's tackle the # problem as an example.
Our file is one giant String, and what we'd like is to have it nicely split in lines, so [String]. What could do it for us? I have no idea, so let's just search Hoogle for String -> [String].
Ah, there we go, lines function! Its counterpart, unlines, is also going to be useful. Now we can write our line wrapper:
convertHeader :: String -> String
convertHeader [] = [] -- that prevents us from calling head on an empty line
convertHeader x = if head x == '#' then "<h1>" ++ x ++ "</h1>"
else x
and so:
convertHeaders :: String -> String
convertHeaders = unlines . map convertHeader . lines
-- ^String ^[String] ^[String] ^String
As you can see the function first converts the file to lines, maps convertHeader on each line, and the puts the file back together.
See it live on Ideone
Try now doing the same with words to replace your formatting patterns. As a bonus exercise, change convertHeader to count the number of # in front of the line and output <h1>, <h2>, <h3> and so on accordingly.

Related

Read a file in R with mixed character encodings

I'm trying to read tables into R from HTML pages that are mostly encoded in UTF-8 (and declare <meta charset="utf-8">) but have some strings in some other encodings (I think Windows-1252 or ISO 8859-1). Here's an example. I want everything decoded properly into an R data frame. XML::readHTMLTable takes an encoding argument but doesn't seem to allow one to try multiple encodings.
So, in R, how can I try several encodings for each line of the input file? In Python 3, I'd do something like:
with open('file', 'rb') as o:
for line in o:
try:
line = line.decode('UTF-8')
except UnicodeDecodeError:
line = line.decode('Windows-1252')
There do seem to be R library functions for guessing character encodings, like stringi::stri_enc_detect, but when possible, it's probably better to use the simpler determinstic method of trying a fixed set of encodings in order. It looks like the best way to do this is to take advantage of the fact that when iconv fails to convert a string, it returns NA.
linewise.decode = function(path)
sapply(readLines(path), USE.NAMES = F, function(line) {
if (validUTF8(line))
return(line)
l2 = iconv(line, "Windows-1252", "UTF-8")
if (!is.na(l2))
return(l2)
l2 = iconv(line, "Shift-JIS", "UTF-8")
if (!is.na(l2))
return(l2)
stop("Encoding not detected")
})
If you create a test file with
$ python3 -c 'with open("inptest", "wb") as o: o.write(b"This line is ASCII\n" + "This line is UTF-8: I like π\n".encode("UTF-8") + "This line is Windows-1252: Müller\n".encode("Windows-1252") + "This line is Shift-JIS: ハローワールド\n".encode("Shift-JIS"))'
then linewise.decode("inptest") indeed returns
[1] "This line is ASCII"
[2] "This line is UTF-8: I like π"
[3] "This line is Windows-1252: Müller"
[4] "This line is Shift-JIS: ハローワールド"
To use linewise.decode with XML::readHTMLTable, just say something like XML::readHTMLTable(linewise.decode("http://example.com")).

CSV Parsing Issue with Attoparsec

Here is my code that does CSV parsing, using the text and attoparsec
libraries:
import qualified Data.Attoparsec.Text as A
import qualified Data.Text as T
-- | Parse a field of a record.
field :: A.Parser T.Text -- ^ parser
field = fmap T.concat quoted <|> normal A.<?> "field"
where
normal = A.takeWhile (A.notInClass "\n\r,\"") A.<?> "normal field"
quoted = A.char '"' *> many between <* A.char '"' A.<?> "quoted field"
between = A.takeWhile1 (/= '"') <|> (A.string "\"\"" *> pure "\"")
-- | Parse a block of text into a CSV table.
comma :: T.Text -- ^ CSV text
-> Either String [[T.Text]] -- ^ error | table
comma text
| T.null text = Right []
| otherwise = A.parseOnly table text
where
table = A.sepBy1 record A.endOfLine A.<?> "table"
record = A.sepBy1 field (A.char ',') A.<?> "record"
This works well for a variety of inputs but is not working in case that there
is a trailing \n at the end of the input.
Current behaviour:
> comma "hello\nworld"
Right [["hello"],["world"]]
> comma "hello\nworld\n"
Right [["hello"],["world"],[""]]
Wanted behaviour:
> comma "hello\nworld"
Right [["hello"],["world"]]
> comma "hello\nworld\n"
Right [["hello"],["world"]]
I have been trying to fix this issue but I ran out of idaes. I am almost
certain that it will have to be something with A.endOfInput as that is the
significant anchor and the only "bonus" information we have. Any ideas on how
to work that into the code?
One possible idea is to look at the end of the string before running the
Attoparsec parser and removing the last character (or two in case of \r\n)
but that seems to be a hacky solution that I would like avoid in my code.
Full code of the library can be found here: https://github.com/lovasko/comma

Julia - Rewriting a CSV

Complete Julia newbie here.
I'd like to do some processing on a CSV. Something along the lines of:
using CSV
in_file = CSV.Source('/dir/in.csv')
out_file = CSV.Sink('/dir/out.csv')
for line in CSV.eachline(in_file)
replace!(line, "None", "")
CSV.writeline(out_file, line)
end
This is in pseudocode, those aren't existing functions.
Idiomatically, should I iterate on 1:CSV.countlines(in_file)? Do a while and check something?
If all you want to do is replace a string in the line, you do not need any CSV parsing utilities. All you do is read the file line by line, replace, and write. So:
infile = "/path/to/input.csv"
outfile = "/path/to/output.csv"
out = open(outfile, "w+")
for line in readlines(infile)
newline = replace(line, "a", "b")
write(out, newline)
end
close(out)
This will replicate the pseudocode you have in your question.
If you need to parse and read the csv field by field, use the readcsv function in base.
data=readcsv(infile)
typeof(data) #Array{Any,2}
This will return the data in the file as a 2 dimensional array. You can process this data any way you want, and write it back using the writecsv function.
for i in 1:size(data,1) #iterate by rows
data[i, 1] = "This is " * data[i, 1] # Add text to first column
end
writecsv(outfile, data)
Documentation for these functions:
http://docs.julialang.org/en/release-0.5/stdlib/io-network/?highlight=readcsv#Base.readcsv
http://docs.julialang.org/en/release-0.5/stdlib/io-network/?highlight=readcsv#Base.writecsv

octave/matlab read text file line by line and save only numbers into matrix

I have a question regarding octave or matlab data post processing.
I have files exported from fluent like below:
"Surface Integral Report"
Mass-Weighted Average
Static Temperature (k)
crossplane-x-0.001 1242.9402
crossplane-x-0.025 1243.0017
crossplane-x-0.050 1243.2036
crossplane-x-0.075 1243.5321
crossplane-x-0.100 1243.9176
And I want to use octave/matlab for post processing.
If I read first line by line, and save only the lines with "crossplane-x-" into a new file, or directly save the data in those lines into a matrix. Since I have many similar files, I can make plots by just calling their titles.
But I go trouble on identify lines which contain the char "crossplane-x-". I am trying to do things like this:
clear, clean, clc;
% open a file and read line by line
fid = fopen ("h20H22_alongHGpath_temp.dat");
% save full lines into a new file if only chars inside
txtread = fgetl (fid)
num_of_lines = fskipl(fid, Inf);
char = 'crossplane-x-'
for i=1:num_of_lines,
if char in fgetl(fid)
[x, nx] = fscanf(fid);
print x
endif
endfor
fclose (fid);
Would anybody shed some light on this issue ? Am I using the right function ? Thank you.
Here's a quick way for your specific file:
>> S = fileread("myfile.dat"); % collect file contents into string
>> C = strsplit(S, "crossplane-x-"); % first cell is the header, rest is data
>> M = str2num (strcat (C{2:end})) % concatenate datastrings, convert to numbers
M =
1.0000e-03 1.2429e+03
2.5000e-02 1.2430e+03
5.0000e-02 1.2432e+03
7.5000e-02 1.2435e+03
1.0000e-01 1.2439e+03

Haskell print the first line into Browser Tab [duplicate]

This question already has an answer here:
Return the first line of a String in Haskell
(1 answer)
Closed 8 years ago.
Just a simple question, my code is complete. It takes an input file, breaks it into lines, reads the file line by line, does the conversions, which is in this case, turns certain things into HTML format (ex: #This is a line into a line with H1 HTML tags, formatting it into a header). The only thing I have left is to take the First line of code, and print that code into the browser tab. Also, the body, or tail must be printed into the window, not the tab. So the first line of my .txt file is The Title! which I want to show in the tab of the web browser. Here is something I have for that:
formatToHTML :: String -> String
formatToHTML [] = []
formatToHTML x
| head x == --any char = "<title>" ++ head ++ "</title>"
| tail x == --rest of file = "<body>" ++ tail ++ "</tail>"
| otherwise = null
or
formatToHTML :: [String] -> String
formatToHTML = unlines. map (show) "<title>" ++ head ++ </title>" $ lines
I dont want to, or I think even need to use guards here, but I cant think of a shorter way to do my task.
I would call this from my main method before I output my file to html.
Also, I know its a amateur haskell question. but how would I represent any char. Say, I want to say, if the head of x exists, print the head with the title tags. print tail with body tags. Help? Thank You
My guess of what you want is:
formatHtml :: [String] -> String
formatHtml [] = ""
formatHtml (x:xs) = unlines theLines
where theLines = [ "<title>" ++ ...convert x to html... ++ "</title>",
"<body>" ] ++ map toHtml xs ++ [ "</body>" ]
toHtml :: String -> String
toHmtl str = ...converts str to HTML...
Example:
formatHtml [ "the title", "body line 1", "body line2" ]
results in:
<title>the title</title>
<body>
body line 1
body line 2
</body>
You still have to define the toHtml function and decide how to convert the first line to the inner html of the tag.