Write to file from string and array - csv

I'm trying to write an csv file where some of the values comes from arrays.
let handle = "thetitle"
let title = "The Title"
let body = "ipsum lorem"
let mutable variantPrice = [|1000,2000,3000,4000|]
let mutable variantComparePrice = [|2000,4000,6000,8000|]
let mutable storlek = ["50x50","50x60","50x70","50x80"]
let Header = [|
(handle, title, body,variantPrice, variantComparePrice, storlek)
|]
let lines = Header |> Array.map (fun (h, t, vp,vcp,b,s) -> sprintf "Handle\tTitle\tStorlek\tVariantPrice\tVariantComparePrice\tBody\n %s\t%s\t%s\t%s"h t s vp vcp b)
File.WriteAllLines( "data\\test.csv", lines, Encoding.UTF8)
But the problem is that the expression in lines are expected string but im sending in string[]
Ideal would be that the csv file looked something like this
|handle|title|body|variantPrice|variantComparePrice|storlek|
|thetitle|The Title|ipsum lorem|1000|2000|50x50|
|thetitle| | |2000|4000|50x60|
|thetitle| | |3000|6000|50x70|
|thetitle| | |4000|8000|50x80|

The first issue is that your variables storing data like variantPrice are currently arrays containing just a single element, which is a tuple - this is because you've separated elements using , rather than ;. Most likely, you'll want something like:
let variantPrice = [|1000;2000;3000;4000|]
let variantComparePrice = [|2000;4000;6000;8000|]
let storlek = [|"50x50";"50x60";"50x70";"50x80"|]
With this, you can then use Array.zip3 to get a single array with all the data (one item per row).
let data = Array.zip3 variantPrice variantComparePrice storlek
Now you can use Array.map to format the individual lines. The following is my guess based on your sample:
let lines = data |> Array.map (fun (vp, vcp, s) ->
sprintf "|%s| | |%d|%d|%s" handle vp vcp s)
This is an array of lines represented as strings. Finally, you can append the header to the lines and write this to a file:
let header = "|handle|title|body|variantPrice|variantComparePrice|storlek|"
System.IO.File.WriteAllLines("c:/temp/test.csv",
Array.append [| header |] lines, Encoding.UTF8)

Related

How to decode a heterogenous array with remaining values as a list

I want to decode json string like below.
"[[\"aaa\",1,2,3,4],[\"bbb\",1,2,3]]"
and decode to Elm tuple list.
[("aaa",[1,2,3,4]),("bbb",[1,2,3])] : List (String, List Int)
How to decode it?
jsdecode=index 0 string
|> andThen xxxxxxx??
This isn't straightforward to do, but before I jump straight in how to do it, let me collect a series of thoughts about the data we are trying to decode:
We are decoding a list of lists
Each list should be composed by a starting string and a series of values
But actually there might be an empty list, no initial string but some values or an initial string and no values
So in my mind the difficulty of building the right decoder reflects the complexity of handling all these edge cases. But let's start defining the data we would like to have:
type alias Record =
( String, List Int )
type alias Model =
List Record
jsonString : String
jsonString =
"[[\"aaa\",1,2,3,4],[\"bbb\",1,2,3]]"
decoder : Decoder Model
decoder =
Decode.list recordDecoder
Now we need to define a type that represents that the list could contain either strings or ints
type EntryFlags
= EntryId String
| EntryValue Int
type RecordFlags
= List EntryFlags
And now for our decoder
recordDecoder : Decoder Record
recordDecoder =
Decode.list
(Decode.oneOf
[ Decode.map EntryId Decode.string
, Decode.map EntryValue Decode.int
]
)
|> Decode.andThen buildRecord
So buildRecord takes this list of EntryId String or EntryValue Int and builds the record we are looking for.
buildRecord : List EntryFlags -> Decoder Record
buildRecord list =
case list of
[] ->
Decode.fail "No values were passed"
[ x ] ->
Decode.fail "Only key passed, but no values"
x :: xs ->
case buildRecordFromFlags x xs of
Nothing ->
Decode.fail "Could not build record"
Just value ->
Decode.succeed value
As you can see, we are dealing with a lot of edge cases in our decoder. Now for the last bit let's check out buildRecordFromFlags:
buildRecordFromFlags : EntryFlags -> List EntryFlags -> Maybe Record
buildRecordFromFlags idEntry valueEntries =
let
maybeId =
case idEntry of
EntryId value ->
Just value
_ ->
Nothing
maybeEntries =
List.map
(\valueEntry ->
case valueEntry of
EntryValue value ->
Just value
_ ->
Nothing
)
valueEntries
|> Maybe.Extra.combine
in
case ( maybeId, maybeEntries ) of
( Just id, Just entries ) ->
Just ( id, entries )
_ ->
Nothing
In this last bit, we are using a function from maybe-extra to verify that all the values following the initial EntryId are indeed all of the EntryValue type.
You can check out a working example here: https://ellie-app.com/3SwvFPjmKYFa1
There are two subproblems here: 1. decoding the list, and 2. transforming it to the shape you need. You could do it as #SimonH suggests by decoding to a list of JSON values, post processing it and then (or during the post-processing) decode the inner values. I would instead prefer to decode it fully into a custom type first, and then do the post processing entirely in the realm of Elm types.
So, step 1, decoding:
type JsonListValue
= String String
| Int Int
decodeListValue : Decode.Decoder JsonListValue
decodeListValue =
Decode.oneOf
[ Decode.string |> Decode.map String
, Decode.int |> Decode.map Int
]
decoder : Decode.Decoder (List (List JsonListValue))
decoder =
Decode.list (Decode.list decodeListValue)
This is a basic pattern you can use to decode any heterogenous array. Just use oneOf to try a list of decoders in order, and map each decoded value to a common type, typically a custom type with a simple constructor for each type of value.
Then onto step 2, the transformation:
extractInts : List JsonListValue -> List Int
extractInts list =
list
|> List.foldr
(\item acc ->
case item of
Int n ->
n :: acc
_ ->
acc
)
[]
postProcess : List JsonListValue -> Result String ( String, List Int )
postProcess list =
case list of
(String first) :: rest ->
Ok ( first, extractInts rest )
_ ->
Err "first item is not a string"
postProcess will match the first item to a String, run extractInts on the rest, which should all be Ints, then put them together into the tuple you want. If the first item is not a String it will return an error.
extractInts folds over each item and adds it to the list if it is an Int and ignores it otherwise. Note that it does not return an error if an item is not an Int, it just doesn't include it.
Both of these functions could have been written to either fail if the values don't conform to the expectations, like postProcess, or to handle it "gracefully", like extractInts. I chose to do one of each just to illustrate how you might do both.
Then, step 3, is to put it together:
Decode.decodeString decoder json
|> Result.mapError Decode.errorToString
|> Result.andThen
(List.map postProcess >> Result.Extra.combine)
Here Result.mapError is used to get error from decoding to conform to error type we get from postProcess. Result.Extra.combine is a function from elm-community/result-extra which turns a List of Results into a Result of List, which comes in very handy here.

F# merge CSV files with different columns

I'm fairly new to F# but I'm fascinated about it and want to apply it to some applications. Currently, I have multiple csv files which is just timestamp and some sensor's values, the timestamp is unique but the columns values' are different.
For example I have two csv file
csv1:
timestamp, sensor1
time1, 1.0
csv2:
timestamp, sensor1, sensor2
time2, 2.0, 3.0
The result I want is
timestamp, sensor1, sensor2
time1, 1.0,
time2, 2.0, 3.0
I wonder if any easy way to do it in F#. Thanks
UPDATE 1:
Here my current solution which involves using LumenWorks.Framework.IO.Csv (https://www.nuget.org/packages/LumenWorksCsvReader) to parse csv to Data.DataTable and Deedle (https://www.nuget.org/packages/Deedle) to convert Data.DataTable to Frame and use the SaveCsv method to save to csv files.
open System.IO
open System
open LumenWorks.Framework.IO.Csv
open Deedle
// get list of csv files
let filelist = expression_to_get_list_of_csv_file_path
// func to readCsv from path and return Data.DataTable
let funcReadCSVtoDataTable (path:string) =
use csv = new CachedCsvReader(new StreamReader(path), true)
let tmpdata = new Data.DataTable()
tmpdata.Load(csv)
tmpdata
// map list of file paths to get list of datatable
let allTables = List.map funcReadCSVtoDataTable filelist
// create allData table to iterate over the list
let allData = new Data.DataTable()
List.iter (fun (x:Data.DataTable) -> allData.Merge(x)) allTables
//convert datatable to Deedle Frame and save to csv file
let df = Frame.ReadReader (allData.CreateDataReader())
df.SaveCsv("./final_csv.csv")
The reason for using LumenWorks.Framework.IO.Csv is because I need to parse a few thousands of files at the same time, and according to this article (https://www.codeproject.com/Articles/11698/A-Portable-and-Efficient-Generic-Parser-for-Flat-F) LumenWorks.Framework.IO.Csv is the fastest.
UPDATE 2: FINAL SOLUTION
Thanks to Tomas about the RowsKey map solution (see his comment below), I re-twisted his code for the case of list of files
// get list of csv files
let filelist = expression_to_get_list_of_csv_file_path
// function to merge two Frames
let domerge (df0:Frame<int,string>) (df1:Frame<int,string>) =
df1
|> Frame.mapRowKeys (fun k-> k+df0.Rows.KeyCount)
|> Frame.merge df0
// read filelist to Frame list
let dflist = filelist |> List.map (fun (x:string)-> Frame.ReadCsv x)
// using List.fold to "fold" through the list with dflist.[0] is the intial state
let dffinal = List.tail dflist |> List.fold domerge (List.head dflist)
dffinal.SaveCsv("./final_csv.csv")
Now the code looks "functional", however, I get a small warning of Frame.ReadCsv that the method is not meant for F#, but it works anyway.
If you are happy to use an external library, then you can do this very easily using the data frame manipulation library called Deedle. Deedle lets you read data frames from CSV files and when you merge data frames, it makes sure to align column and row keys for you:
open Deedle
let f1 = Frame.ReadCsv("c:/temp/f1.csv")
let f2 = Frame.ReadCsv("c:/temp/f2.csv")
let merged =
f2
|> Frame.mapRowKeys (fun k -> k + f1.Rows.KeyCount)
|> Frame.merge f1
merged.SaveCsv("c:/temp/merged.csv")
The one tricky thing that we have to do here is to use mapRowKeys. When you read the frames, Deedle automatically generates ordinal row keys for your data and so merging would fail because you have two rows with a key 0. The mapRowKeys function lets us transform the keys so that they are unique and the frames can be merged. (Saving the CSV file does not automatically write the row keys to the output, so the result of this is exactly what you wanted.)
If yo do a lot of processing like this you should look into the CSV TypeProvider and Parser or my favorite FileHelpers.
If you don't want to use any third party libraries, here's a quick step-by-step process to read, re-assemble and write out the file:
open System.IO
open System
let csv1path = #"E:\tmp\csv1.csv"
let csv2path = #"E:\tmp\csv2.csv"
/// Read the file, split it up, and remove the header from the first csv file
let csv1 =
File.ReadAllLines(csv1path)
|> Array.map (fun x -> x.Split(','))
|> Array.tail
let csv2 =
File.ReadAllLines(csv2path)
|> Array.map (fun x -> x.Split(','))
///Split the header and data in the second csv file
let header', data = (csv2.[0], Array.tail csv2)
let header = String.Join(",", header')
///put back the data together, this is an array of arrays
let csv3 =
Array.append(csv1) data
///Sort the combined file, put it back together as a csv and add back the header
let csv4 =
csv3
|> Array.sort
|> Array.map (fun x -> String.Join(",", x))
|> Array.append [|header|]
///Write it out
File.WriteAllLines(#"E:\tmp\combined.csv",csv4)

F# CSV Type Provider : how to ignore some rows?

I am a beginner and starting to use FSharp.Data library
http://fsharp.github.io/FSharp.Data/library/CsvProvider.html
let rawfile = CsvFile.Load("mydata.csv")
for row in rawfile.Rows do
let d = System.DateTime.Parse (row.GetColumn("Date"))
let p = float (row.GetColumn("Close Price"))
printfn "%A %A" d p
price_table.[BTC].Add (d,p)
I have a csv file whose last lines I would like to ignore because
they are something like "this data was produced by ...."
by the way, even if i delete those lines, save the file, when i reopen again those cells reappear... sticky ones !!!
There's an overload for CsvFile.Load that takes a TextReader-derived parameter.
If you know how many lines to skip, you can create a StreamReader on the file, you can skip lines with ReadLine.
use reader = StreamReader("mydata.csv")
reader.ReadLine() |> ignore
reader.ReadLine() |> ignore
let rawfile = CsvFile.Load(reader)
If you're ok with loading whole scv into memory, then you can simply reverse your rows, skip what you want and then (optionally) reverse back.
Example:
let skipNLastValues (n:int) (xs:seq<'a>) =
xs
|> Seq.rev
|> Seq.skip n
|> Seq.rev
for i in (skipNLastValues 2 {1..10}) do
printfn "%A" i

Pass Function to reduce duplicate code

I'm trying to learn F# and I feel like I can write / rewrite this block of code to be more "idiomatic" F# but I just can't figure out how I can accomplish it.
My simple program will be loading values from 2 csv files: A list of Skyrim potion effects, and a list of Skyrim Ingredients. An ingredient has 4 Effects. Once I have the Ingredients, I can write something to process them - right now, I just want to write the CSV load in a way that makes sense.
Code
Here are my types:
type Effect(name:string, id, description, base_cost, base_mag, base_dur, gold_value) =
member this.Name = name
member this.Id = id
member this.Description = description
member this.Base_Cost = base_cost
member this.Base_Mag = base_mag
member this.Base_Dur = base_dur
member this.GoldValue = gold_value
type Ingredient(name:string, id, primary, secondary, tertiary, quaternary, weight, value) =
member this.Name = name
member this.Id = id
member this.Primary = primary
member this.Secondary = secondary
member this.Tertiary = tertiary
member this.Quaternary = quaternary
member this.Weight = weight
member this.Value = value
Here is where I parse an individual comma-separated string, per type:
let convertEffectDataRow (csvLine:string) =
let cells = List.ofSeq(csvLine.Split(','))
match cells with
| name::id::effect::cost::mag::dur::value::_ ->
let effect = new Effect(name, id, effect, Decimal.Parse(cost), Int32.Parse(mag), Int32.Parse(dur), Int32.Parse(value))
Success effect
| _ -> Failure "Incorrect data format!"
let convertIngredientDataRow (csvLine:string) =
let cells = List.ofSeq(csvLine.Split(','))
match cells with
| name::id::primary::secondary::tertiary::quaternary::weight::value::_ ->
Success (new Ingredient(name, id, primary, secondary, tertiary, quaternary, Decimal.Parse(weight), Int32.Parse(value)))
| _ -> Failure "Incorrect data format!"
So I feel like I should be able to build a function that accepts one of these functions or chains them or something, so that I can recursively go through the lines in the CSV file and pass those lines to the correct function above. Here is what I've tried so far:
type csvTypeEnum = effect=1 | ingredient=2
let rec ProcessStuff lines (csvType:csvTypeEnum) =
match csvType, lines with
| csvTypeEnum.effect, [] -> []
| csvTypeEnum.effect, currentLine::remaining ->
let parsedLine = convertEffectDataRow2 currentLine
let parsedRest = ProcessStuff remaining csvType
parsedLine :: parsedRest
| csvTypeEnum.ingredient, [] -> []
| csvTypeEnum.ingredient, currentLine::remaining ->
let parsedLine = convertIngredientDataRow2 currentLine
let parsedRest = ProcessStuff remaining csvType
parsedLine :: parsedRest
| _, _ -> Failure "Error in pattern matching"
But this (predictably) has a compile error on second instance of recursion and the last pattern. Specifically, the second time parsedLine :: parsedRest shows up does not compile. This is because the function is attempting to both return an Effect and an Ingredient, which obviously won't do.
Now, I could just write 2 entirely different functions to handle the different CSVs, but that feels like extra duplication. This might be a harder problem than I'm giving it credit for, but it feels like this should be rather straightforward.
Sources
The CSV parsing code I took from chapter 4 of this book: https://www.manning.com/books/real-world-functional-programming
Since the line types aren't interleaved into the same file and they refer to different csv file formats, I would probably not go for a Discriminated Union and instead pass the processing function to the function that processes the file line by line.
In terms of doing things idiomatically, I would use a Record rather than a standard .NET class for this kind of simple data container. Records provide automatic equality and comparison implementations which are useful in F#.
You can define them like this:
type Effect = {
Name : string; Id: string; Description : string; BaseCost : decimal;
BaseMag : int; BaseDuration : int; GoldValue : int
}
type Ingredient= {
Name : string; Id: string; Primary: string; Secondary : string; Tertiary : string;
Quaternary : string; Weight : decimal; GoldValue : int
}
That requires a change to the conversion function, e.g.
let convertEffectDataRow (csvLine:string) =
let cells = List.ofSeq(csvLine.Split(','))
match cells with
| name::id::effect::cost::mag::dur::value::_ ->
Success {Name = name; Id = id; Description = effect; BaseCost = Decimal.Parse(cost);
BaseMag = Int32.Parse(mag); BaseDuration = Int32.Parse(dur); GoldValue = Int32.Parse(value)}
| _ -> Failure "Incorrect data format!"
Hopefully it's obvious how to do the other one.
Finally, cast aside the enum and simply replace it with the appropriate line function (I've also swapped the order of the arguments).
let rec processStuff f lines =
match lines with
|[] -> []
|current::remaining -> f current :: processStuff f remaining
The argument f is just a function that is applied to each string line. Suitable f values are the functions we created above, e.g.convertEffectDataRow. So you can simply call processStuff convertEffectDataRow to process an effect file and processStuff convertIngredientDataRow to process and ingredients file.
However, now we've simplified the processStuff function, we can see it has type: f:('a -> 'b) -> lines:'a list -> 'b list. This is the same as the built-in List.map function so we can actually remove this custom function entirely and just use List.map.
let processEffectLines lines = List.map convertEffectDataRow lines
let processIngredientLines lines = List.map convertIngredientDataRow lines
(optional) Convert Effect and Ingredient to records, as s952163 suggested.
Think carefully about the return types of your functions. ProcessStuff returns a list from one case, but a single item (Failure) from the other case. Thus compilation error.
You haven't shown what Success and Failure definitions are. Instead of generic success, you could define the result as
type Result =
| Effect of Effect
| Ingredient of Ingredient
| Failure of string
And then the following code compiles correctly:
let convertEffectDataRow (csvLine:string) =
let cells = List.ofSeq(csvLine.Split(','))
match cells with
| name::id::effect::cost::mag::dur::value::_ ->
let effect = new Effect(name, id, effect, Decimal.Parse(cost), Int32.Parse(mag), Int32.Parse(dur), Int32.Parse(value))
Effect effect
| _ -> Failure "Incorrect data format!"
let convertIngredientDataRow (csvLine:string) =
let cells = List.ofSeq(csvLine.Split(','))
match cells with
| name::id::primary::secondary::tertiary::quaternary::weight::value::_ ->
Ingredient (new Ingredient(name, id, primary, secondary, tertiary, quaternary, Decimal.Parse(weight), Int32.Parse(value)))
| _ -> Failure "Incorrect data format!"
type csvTypeEnum = effect=1 | ingredient=2
let rec ProcessStuff lines (csvType:csvTypeEnum) =
match csvType, lines with
| csvTypeEnum.effect, [] -> []
| csvTypeEnum.effect, currentLine::remaining ->
let parsedLine = convertEffectDataRow currentLine
let parsedRest = ProcessStuff remaining csvType
parsedLine :: parsedRest
| csvTypeEnum.ingredient, [] -> []
| csvTypeEnum.ingredient, currentLine::remaining ->
let parsedLine = convertIngredientDataRow currentLine
let parsedRest = ProcessStuff remaining csvType
parsedLine :: parsedRest
| _, _ -> [Failure "Error in pattern matching"]
csvTypeEnum type looks fishy, but I'm not sure what you were trying to achieve, so just fixed the compilation errors.
Now you can refactor your code to reduce duplication by passing functions as parameters when needed. But always start with types!
You can certainly pass a function to another function and use a DU as a return type, for example:
type CsvWrapper =
| CsvA of string
| CsvB of int
let csvAfunc x =
CsvA x
let csvBfunc x =
CsvB x
let csvTopFun x =
x
csvTopFun csvBfunc 5
csvTopFun csvAfunc "x"
As for the type definitions, you can just use records, will save you some typing:
type Effect = {
name:string
id: int
description: string
}
let eff = {name="X";id=9;description="blah"}

How to read value of property depending on an argument

How can I get the value of a property given a string argument.
I have a Object CsvProvider.Row which has attributes a,b,c.
I want to get the attribute value depending on property given as a string argument.
I tried something like this:
let getValue (tuple, name: string) =
snd tuple |> Seq.averageBy (fun (y: CsvProvider<"s.csv">.Row) -> y.```name```)
but it gives me the following error:
Unexpected reserved keyword in lambda expression. Expected incomplete
structured construct at or before this point or other token.
Simple invocation of function should look like this:
getValue(tuple, "a")
and it should be equivalent to the following function:
let getValue (tuple) =
snd tuple |> Seq.averageBy (fun (y: CsvProvider<"s.csv">.Row) -> y.a)
Is something like this is even possible?
Thanks for any help!
The CSV type provider is great if you are accessing data by column names statically, because you get nice auto-completion with type inference and checking.
However, for a dynamic access, it might be easier to use the underlying CsvFile (also a part of F# Data) directly, rather than using the type provider:
// Read the given file
let file = CsvFile.Load("c:/test.csv")
// Look at the parsed headers and find the index of column "A"
let aIdx = file.Headers.Value |> Seq.findIndex (fun k -> k = "A")
// Iterate over rows and print A values
for r in file.Rows do
printfn "%A" (r.Item(aIdx))
The only unfortunate thing is that the items are accessed by index, so you need to build some lookup table if you want to easily access them by their name.