Beginner in Elixir : parse CSV file - csv

Hello I'm a beginner in Elixir and I want to parse and stock a CSV file in an Elixir object.
But it's display that:
** (FunctionClauseError) no function clause matching in anonymous fn/1 in Siren.parseCSV/0
The following arguments were given to anonymous fn/1 in Siren.parseCSV/0:
# 1
["41", "5", "59", "N", "80", "39", "0", "W", "Youngstown", "OH"]
anonymous fn/1 in Siren.parseCSV/0
(elixir 1.10.3) lib/stream.ex:482: anonymous fn/4 in Stream.filter/2
(elixir 1.10.3) lib/stream.ex:1449: Stream.do_element_resource/6
(elixir 1.10.3) lib/stream.ex:1609: Enumerable.Stream.do_each/4
(elixir 1.10.3) lib/enum.ex:959: Enum.find/3
(mix 1.10.3) lib/mix/task.ex:330: Mix.Task.run_task/3
(mix 1.10.3) lib/mix/cli.ex:82: Mix.CLI.run_task/2
Here my code:
defmodule Siren do
def parseCSV do
IO.puts("Let's parse CSV file...")
File.stream!("../name.csv")
|> Stream.map(&String.trim(&1))
|> Stream.map(&String.split(&1, ","))
|> Stream.filter(fn
["LatD" | _] -> false
end)
|> Enum.find(fn State -> String
[LatD, LatM, LatS, NS, LonD, LonM, LonS, EW, City, State] ->
IO.puts("find -> #{State}")
true
end)
end
end
And the csv file:
LatD,LatM,LatS,NS,LonD,LonM,LonS,EW,City,State
41,5,59,N,80,39,0,W,Youngstown,OH
42,52,48,N,97,23,23,W,Yankton,SD
46,35,59,N,120,30,36,W,Yakima,WA
42,16,12,N,71,48,0,W,Worcester,MA
43,37,48,N,89,46,11,W,WisconsinDells,WI
36,5,59,N,80,15,0,W,Winston-Salem,NC
49,52,48,N,97,9,0,W,Winnipeg,MB
39,11,23,N,78,9,36,W,Winchester,VA
34,14,24,N,77,55,11,W,Wilmington,NC
39,45,0,N,75,33,0,W,Wilmington,DE
48,9,0,N,103,37,12,W,Williston,ND
41,15,0,N,77,0,0,W,Williamsport,PA
37,40,48,N,82,16,47,W,Williamson,WV
33,54,0,N,98,29,23,W,WichitaFalls,TX
37,41,23,N,97,20,23,W,Wichita,KS
40,4,11,N,80,43,12,W,Wheeling,WV
26,43,11,N,80,3,0,W,WestPalmBeach,FL
47,25,11,N,120,19,11,W,Wenatchee,WA
41,25,11,N,122,23,23,W,Weed,CA

The first issue is here:
|> Stream.filter(fn
["LatD" | _] -> false
end)
all the lines should pass this and the only first one matches the given clauses. This would fix the issue
|> Stream.filter(fn
["LatD" | _] -> false
_ -> true
end)
or
|> Stream.reject(&match?(["LatD" | _], &1))
Enum.find(fn State -> String after looks unclear and would be surely the next issue. I failed to understand what have you tried to achieve here.
The general advice would be: don’t reinvent the wheel and use NimbleCSV written by José Valim to parse CSVs, because there are lot of corner cases (like commas inside quotes in any field etc,) handled properly in the aforementioned library.

Aleksei Matiushkin gave you the right answer but also you have this function:
fn
State ->
String
[LatD, LatM, LatS, NS, LonD, LonM, LonS, EW, City, State] ->
IO.puts("find -> #{State}")
true
end
It accepts two possible values, either State which is an atom, or a list of 10 specific atoms.
What you want to do is use variables, and variables in Elixir start with a lowercase letter or an underscore if it has to be ignored.
fn
state ->
String
[latd, latm, lats, ns, lond, lonm, lons, ew, city, state] ->
IO.puts("find -> #{state}")
true
end
But in this case, the first clause of the function will always match anything because it acts like a catch-all clause.
What you probably want is:
fn
[_latd, _latm, _lats, _ns, _lond, _lonm, _lons, _ew, _city, state] ->
IO.puts("find -> #{state}")
# here decide if you want to return true or false,
# for instance `state == NC`
true
end

Related

Seeding the Database from a CSV file in Phoenix/Elixir

When I try to run: mix run priv/repo/seeds.exs, I have a problem: (FunctionClauseError) no function clause matching in anonymous fn/1 in :elixir_compiler_1.__FILE__/1 The following arguments were given to anonymous fn/1 in :elixir_compiler_1.__FILE__/1:
This is my seeds.exs file:
alias FlightsList.Repo
alias FlightsList.Management.Flights
File.stream!("C:/Users/vukap/phx_projects/flights_list/priv/repo/flights.csv")
|> Stream.drop(1)
|> CSV.decode(headers: [:Id, :Origin, :Destination, :DepartureDate, :DepartureTime, :ArrivalDate, :ArrivalTime, :Number])
|> Enum.each(fn {:ok, map} ->
Flights.changeset(
%Flights{},
%{Id: String.to_integer(map[:Id]), Origin: map[:Origin], Destination: map[:Destination], DepartureDate: String.to_integer(map[:DepartureDate]), DepartureTime: String.to_integer(map[:DepartureTime]), ArrivalDate: String.to_integer(map[:ArrivalDate]), ArrivalTime: String.to_integer(map[:ArrivalTime]), Number: map[:Number]})
|> Repo.insert!()
end)
How can I fix it?
It’s impossible to answer precisely until you have specified CSV library you used, or at least what the error actually says after The following arguments were given to anonymous fn/1, but the issue is definitely with CSV.decode/2 returning something different from {:ok, map} your next clause expects.
To fix this and similar issues, one would add the catch-all clause to the processing and examine the outcome.
...
|> Enum.each(fn
{:ok, map} -> Flights.changeset(...)
other -> IO.inspect(other, label: "Unexpected")
end)
Check what the above would print out and fix it accordingly.
I guess you are missing a separator in CSV.decode function, here is an example of how I do it, you can call stream_csv in seed file.
def store_it(row) do
{:ok, result} = row
%Segments{
id: result.id,
name: result.name
} |> Repo.insert!
end
def stream_csv do
Path.expand("~/Project/segmments.csv")
|> File.stream!
|> CSV.decode(separator: ?;, headers: [:id, :name])
|> Enum.each(&store_it/1)
end

Elixir : how to display a structured data element?

I'm trying to parse a CSV file. Actually I have this code :
alias NimbleCSV.RFC4180, as: CSV
defmodule Siren do
def parseCSV do
IO.puts("Let's parse CSV file!")
stream = File.stream!("name.csv")
original_line = CSV.parse_stream(stream)
filter_line = Stream.filter(original_line, fn
["JeremyGuthrie" | _] -> true
_ -> false
end)
map = Stream.map(filter_line,
fn [name, team, position, height, weight, age] ->
%{name: name, team: team, position: position,
height: String.to_integer(height),
weight: String.to_integer(weight),
age: Float.parse(age) |> elem(0)
}
end)
end
end
According to my view I build a stream who handle each line of my name.csv file. With NimbleCSV library I parse this line and avoid the header line. Then, I filter each line to keep only the one corresponding to JeremyGuthrie. And finally I stock the line element into a structured data map. But now how to print just the name of my filter line : here JeremyGuthrie.
And I have an other question : I'm having some problems to filter my stream according to a number like an age, height or weight.
Here I apply Aleksei's advice with another code :
NimbleCSV.define(MyParser, separator: ";", escape: "\"")
defmodule Siren do
def parseCSV do
IO.puts("Let's parse CSV file!")
"ActeursEOF.csv"
|> File.stream!()
|> MyParser.parse_stream()
|> Stream.filter(fn
["RAZEL BEC" | _] -> true
["" | _] -> false
_ -> false
end)
|> Stream.map(fn [name, description, enr_competences] ->
%{name: name, description: description, enr_competences: enr_competences}
end)
|> Enum.to_list()
|> IO.inspect()
end
end
My output:
Compiling 1 file (.ex)
Let's parse CSV file!
[%{description: "Génie Civil", enr_competences: "Oui", name: "RAZEL BEC"}]
But now to close this subject I would to access and stock just the description for instance. And I don't see how to do that... And finally display this data.
Producing intermediate variables is redundant, in elixir we have Kernel.|>/2 aka pipe operator to pipe the functions’ output to the first argument of the next function.
"name.csv"
|> File.stream!()
|> CSV.parse_stream()
|> Stream.filter(fn
["JeremyGuthrie" | _] -> true
_ -> false
end)
|> Stream.map(fn
[name, team, position, height, weight, age] ->
%{name: name, team: team, position: position,
height: String.to_integer(height),
weight: String.to_integer(weight),
age: Float.parse(age) |> elem(0)
}
end)
|> Enum.to_list() # THIS
Note the last line in the chain. Streams are to be terminated to retrieve the result. Until the termination happens, it’s lazily constructed, but not evaluated at all. That makes it possible to e.g. produce and operate infinite streams.
Any greedy function from Enum module would do: Enum.take/2, or, as I pointed out above, Enum.to_list/1.
For the sake of reference, in the future, when you feel fully familiar with elixir, you might use Flow instead of Stream to parallelize mapping. For now (and for relatively small files) Stream is good enough.

In Elixir, How can I extract a lambda to a named function when the lambda is in a closure?

I have the following closure:
def get!(Item, id) do
Enum.find(
#items,
fn(item) -> item.id == id end
)
end
As I believe this looks ugly and difficult to read, I'd like to give this a name, like:
def get!(Item, id) do
defp has_target_id?(item), do: item.id = id
Enum.find(#items, has_target_id?/1)
end
Unfortunately, this results in:
== Compilation error in file lib/auction/fake_repo.ex ==
** (ArgumentError) cannot invoke defp/2 inside function/macro
(elixir) lib/kernel.ex:5238: Kernel.assert_no_function_scope/3
(elixir) lib/kernel.ex:4155: Kernel.define/4
(elixir) expanding macro: Kernel.defp/2
lib/auction/fake_repo.ex:28: Auction.FakeRepo.get!/2
Assuming it is possible, what is the correct way to do this?
The code you posted has an enormous amount of syntax errors/glitches. I would suggest you start with getting accustomed to the syntax, rather than trying to make Elixir better by inventing the things that nobody uses.
Here is the correct version that does what you wanted. The task might be accomplished with an anonymous function, although I hardly see a reason to make a perfectly looking idiomatic Elixir look ugly.
defmodule Foo do
#items [%{id: 1}, %{id: 2}, %{id: 3}]
def get!(id) do
has_target_id? = fn item -> item.id == id end
Enum.find(#items, has_target_id?)
end
end
Foo.get! 1
#⇒ %{id: 1}
Foo.get! 4
#⇒ nil
You can do this:
def get!(Item, id) do
Enum.find(
#items,
&compare_ids(&1, id)
)
end
defp compare_ids(%Item{}=item, id) do
item.id == id
end
But, that's equivalent to:
Enum.find(
#items,
fn item -> compare_ids(item, id) end
)
and may not pass your looks ugly and difficult to read test.
I was somehow under the impression Elixir supports nested functions?
Easy enough to test:
defmodule A do
def go do
def greet do
IO.puts "hello"
end
greet()
end
end
Same error:
$ iex a.ex
Erlang/OTP 20 [erts-9.2] [source] [64-bit] [smp:4:4] [ds:4:4:10] [async-threads:10] [hipe] [kernel-poll:false]
** (ArgumentError) cannot invoke def/2 inside function/macro
(elixir) lib/kernel.ex:5150: Kernel.assert_no_function_scope/3
(elixir) lib/kernel.ex:3906: Kernel.define/4
(elixir) expanding macro: Kernel.def/2
a.ex:3: A.go/0
wouldn't:
defp compare_ids(item, id), do: item.id == id
be enough? Is there any advantage to including %Item{} or making
separate functions for returning both true and false conditions?
What you gain by specifying the first parameter as:
func(%Item{} = item, target_id)
is that only an Item struct will match the first parameter. Here is an example:
defmodule Item do
defstruct [:id, :name, :description]
end
defmodule Dog do
defstruct [:id, :name, :owner]
end
defmodule A do
def go(%Item{} = item), do: IO.inspect(item.id, label: "id: ")
end
In iex:
iex(1)> item = %Item{id: 1, name: "book", description: "old"}
%Item{description: "old", id: 1, name: "book"}
iex(2)> dog = %Dog{id: 1, name: "fido", owner: "joe"}
%Dog{id: 1, name: "fido", owner: "joe"}
iex(3)> A.go item
id: : 1
1
iex(4)> A.go dog
** (FunctionClauseError) no function clause matching in A.go/1
The following arguments were given to A.go/1:
# 1
%Dog{id: 1, name: "fido", owner: "joe"}
a.ex:10: A.go/1
iex(4)>
You get a function clause error if you call the function with a non-Item, and the earlier an error occurs, the better, because it makes debugging easier.
Of course, by preventing the function from accepting other structs, you make the function less general--but because it's a private function, you can't call it from outside the module anyway. On the other hand, if you wanted to call the function on both Dog and Item structs, then you could simply specify the first parameter as:
|
V
func(%{}=thing, target_id)
then both an Item and a Dog would match--but not non-maps.
What you gain by specifying the first parameter as:
|
V
func(%Item{id: id}, target_id)
is that you let erlang's pattern matching engine extract the data you need, rather than calling item.id as you would need to do with this definition:
func(%Item{}=item, target_id)
In erlang, pattern matching in a parameter list is the most efficient/convenient/stylish way to write functions. You use pattern matching to extract the data that you want to use in the function body.
Going even further, if you write the function definition like this:
same variable name
| |
V V
func(%Item{id: target_id}, target_id)
then erlang's pattern matching engine not only extracts the value for the id field from the Item struct, but also checks that the value is equal to the value of the target_id variable in the 2nd argument.
Defining multiple function clauses is a common idiom in erlang, and it is considered good style because it takes advantage of pattern matching rather than logic inside the function body. Here's an erlang example:
get_evens(List) ->
get_evens(List, []).
get_evens([Head|Tail], Results) when Head rem 2 == 0 ->
get_evens(Tail, [Head|Results]);
get_evens([Head|Tail], Results) when Head rem 2 =/= 0 ->
get_evens(Tail, Results);
get_evens([], Results) ->
lists:reverse(Results).

Erlang: Writing mnesia table to .csv one record per line

I have got a Mnesia table which consists of the following format:
-record(state, {key, tuple, state, timestamp, fin_from}).
The entries look like follows (read with ets:tab2list(Tab)):
[{state,{80,43252,tcp,tcp_syn_received,{192,168,101,5},{192,168,101,89}},
{80,43252,tcp,{192,168,101,5},{192,168,101,89}},
tcp_syn_received,1463850419221,undefined},
{state,{80,41570,tcp,tcp_syn_received,{192,168,101,5},{192,168,101,89}},
{80,41570,tcp,{192,168,101,5},{192,168,101,89}},
tcp_syn_received,1463850403214,undefined},
...]
I would like to write these data to a .csv file with one entry per line - preferred with the following format:
state,80,43252,tcp,tcp_syn_received,192.168.101.5,192.168.101.89,80,43252,tcp,192.168.101.5,192.168.101.89,tcp_syn_received,1463850419221,undefined
state,80,41570,tcp,tcp_syn_received,192.168.101.5,192.168.101.89,80,41570,tcp,192.168.101.5,192.168.101.89,tcp_syn_received,1463850419221,undefined
There should be a line break after undefined.
I tried using the following code (while Content = ets:tab2list(states)):
do_logging_async(File, Format, Content, Append)->
F = fun() ->
file:write_file(File, io_lib:fwrite(Format, [Content]), [Append])
end,
spawn(F).
However, I cannot get anything similar to my output.
The data should afterwards be evaluated with R.
UPDATE: The key was to read the table line by line and parse it with ~w but not ~p. I ended up with the following solution (which produces a slightly different output, however, there is less redundant data):
do_state_logging(File, EtsAsList) ->
% write header (columnnames)
file:write_file(File, io_lib:fwrite("~w,~w,~w,~w,~w,~w,~w,~w~n", [record,dstPort,srcPort,proto,dstIP,srcIP,state,timestamp]),[append]),
case EtsAsList of
[] ->
ok;
_ ->
F = fun({Record,_Key,
[P1, P2, Proto, {D_Ip_1,D_Ip_2,D_Ip_3,D_Ip_4}, {S_Ip_1,S_Ip_2,S_Ip_3,S_Ip_4}],
State, Timestamp, _}) ->
file:write_file(File, io_lib:fwrite("~w,", [Record]),[append]),
file:write_file(File, io_lib:fwrite("~w,~w,~w,", [P1,P2,Proto]),[append]),
file:write_file(File, io_lib:fwrite("~w.~w.~w.~w,", [D_Ip_1,D_Ip_2,D_Ip_3,D_Ip_4]), [append]),
file:write_file(File, io_lib:fwrite("~w.~w.~w.~w,", [S_Ip_1,S_Ip_2,S_Ip_3,S_Ip_4]), [append]),
file:write_file(File, io_lib:fwrite("~w,", [State]),[append]),
file:write_file(File, io_lib:fwrite("~w", [Timestamp]),[append]),
file:write_file(File, ["\n"],[append])
end,
lists:foreach(F, EtsAsList)
end,
io:format("Finished logging of statetable to file: ~p~n" , [File]).
Thanks to the answer who pushed me to this idea.
Assuming you change your ets records values to be in lists and not tuples you can use this code to write your ETS Table to a file.
do_logging_async(File, EtsAsList) ->
F = fun({Key, Value}) ->
file:write_file(File, [atom_to_list(Key) ++ ","],[append]),
write_value(File,lists:flatten(Value)),
file:write_file(File, ["\n"],[append])
end,
lists:foreach(F,EtsAsList).
write_value(_File, []) -> ok;
write_value(File, [H|T]) ->
case is_integer(H) of
true -> file:write_file(File, [integer_to_list(H)],[append]);
false -> file:write_file(File, [atom_to_list(H)],[append])
end,
case T=:=[] of
true -> ok;
false -> file:write_file(File, [","],[append])
end,
write_value(File,T).
do_logging_async/2 takes every {Key, Value} pair. First, it writes the Key to the file and then it runs write_value/2 on the Value, at the end of each pair it writes \n.
write_value/2 takes the flatten value list (assuming it is a flatten list that contains only integers and atoms) and writes it to the file.

Parsing file with multiple JSON entries in Scala

I have a JSON file that I am trying to parse using Scala. I have figured out how to use Scala JSON parsing library to parse 1 entry in this format:
{"name":"John","number":"005","fav_colour":"blue"}
this is the code that works:
val result = JSON.parseFull("""{"name":"John","number":"005","fav_colour":"blue"}""")
result match {
case Some(e) => println(e)
case None => println("Failed.")
}
This prints Map(name -> John, number -> 005, fav_colour -> blue)
The code is based of of this: https://gist.github.com/takezoe/1540223
However, I am working with a file like this:
""" {"name":"John","number":"005","fav_colour":"blue"}
{"name":"Mary","number":"010","fav_colour":"yellow"}
{"name":"Anna","number":"007","fav_colour":"pink"}
{"name":"Dave","number":"003","fav_colour":"purple"}
"""
Note, I also tried separating with commas and still it did not work.
I am just wondering if I have to write a function to separate each {bracketed entry} or if there is some functionality of the JSON library that I am missing. So far, when I pass in my file it returns None instead of Some(valid information).
Thanks!
You dont have a valid Json file. This would be valid:
[
{"name":"John","number":"005","fav_colour":"blue"},
{"name":"Mary","number":"010","fav_colour":"yellow"},
{"name":"Anna","number":"007","fav_colour":"pink"},
{"name":"Dave","number":"003","fav_colour":"purple"}
]
Result:
Some(List(Map(name -> John, number -> 005, fav_colour -> blue), Map(name -> Mary, number -> 010, fav_colour -> yellow), Map(name -> Anna, number -> 007, fav_colour -> pink), Map(name -> Dave, number -> 003, fav_colour -> purple)))
http://www.scalakata.com/522bdbfeebb25c7f5d823c7d
The format you use is convenient for gathering information over time, e.g. keeping logs.
You can parse it by reusing the parser combinators!
For example:
import scala.util.parsing.json.JSON
val parseResult = JSON.rep1(JSON.root)(new JSON.lexical.Scanner("{\"a\": 1} {\"b\": 2}"))
parseResult match {case JSON.Success (result, _) => result; case _ => Nil}
returns
List({"a" : 1.0}, {"b" : 2.0})