Parsing file with multiple JSON entries in Scala - json

I have a JSON file that I am trying to parse using Scala. I have figured out how to use Scala JSON parsing library to parse 1 entry in this format:
{"name":"John","number":"005","fav_colour":"blue"}
this is the code that works:
val result = JSON.parseFull("""{"name":"John","number":"005","fav_colour":"blue"}""")
result match {
case Some(e) => println(e)
case None => println("Failed.")
}
This prints Map(name -> John, number -> 005, fav_colour -> blue)
The code is based of of this: https://gist.github.com/takezoe/1540223
However, I am working with a file like this:
""" {"name":"John","number":"005","fav_colour":"blue"}
{"name":"Mary","number":"010","fav_colour":"yellow"}
{"name":"Anna","number":"007","fav_colour":"pink"}
{"name":"Dave","number":"003","fav_colour":"purple"}
"""
Note, I also tried separating with commas and still it did not work.
I am just wondering if I have to write a function to separate each {bracketed entry} or if there is some functionality of the JSON library that I am missing. So far, when I pass in my file it returns None instead of Some(valid information).
Thanks!

You dont have a valid Json file. This would be valid:
[
{"name":"John","number":"005","fav_colour":"blue"},
{"name":"Mary","number":"010","fav_colour":"yellow"},
{"name":"Anna","number":"007","fav_colour":"pink"},
{"name":"Dave","number":"003","fav_colour":"purple"}
]
Result:
Some(List(Map(name -> John, number -> 005, fav_colour -> blue), Map(name -> Mary, number -> 010, fav_colour -> yellow), Map(name -> Anna, number -> 007, fav_colour -> pink), Map(name -> Dave, number -> 003, fav_colour -> purple)))
http://www.scalakata.com/522bdbfeebb25c7f5d823c7d

The format you use is convenient for gathering information over time, e.g. keeping logs.
You can parse it by reusing the parser combinators!
For example:
import scala.util.parsing.json.JSON
val parseResult = JSON.rep1(JSON.root)(new JSON.lexical.Scanner("{\"a\": 1} {\"b\": 2}"))
parseResult match {case JSON.Success (result, _) => result; case _ => Nil}
returns
List({"a" : 1.0}, {"b" : 2.0})

Related

Elixir : how to display a structured data element?

I'm trying to parse a CSV file. Actually I have this code :
alias NimbleCSV.RFC4180, as: CSV
defmodule Siren do
def parseCSV do
IO.puts("Let's parse CSV file!")
stream = File.stream!("name.csv")
original_line = CSV.parse_stream(stream)
filter_line = Stream.filter(original_line, fn
["JeremyGuthrie" | _] -> true
_ -> false
end)
map = Stream.map(filter_line,
fn [name, team, position, height, weight, age] ->
%{name: name, team: team, position: position,
height: String.to_integer(height),
weight: String.to_integer(weight),
age: Float.parse(age) |> elem(0)
}
end)
end
end
According to my view I build a stream who handle each line of my name.csv file. With NimbleCSV library I parse this line and avoid the header line. Then, I filter each line to keep only the one corresponding to JeremyGuthrie. And finally I stock the line element into a structured data map. But now how to print just the name of my filter line : here JeremyGuthrie.
And I have an other question : I'm having some problems to filter my stream according to a number like an age, height or weight.
Here I apply Aleksei's advice with another code :
NimbleCSV.define(MyParser, separator: ";", escape: "\"")
defmodule Siren do
def parseCSV do
IO.puts("Let's parse CSV file!")
"ActeursEOF.csv"
|> File.stream!()
|> MyParser.parse_stream()
|> Stream.filter(fn
["RAZEL BEC" | _] -> true
["" | _] -> false
_ -> false
end)
|> Stream.map(fn [name, description, enr_competences] ->
%{name: name, description: description, enr_competences: enr_competences}
end)
|> Enum.to_list()
|> IO.inspect()
end
end
My output:
Compiling 1 file (.ex)
Let's parse CSV file!
[%{description: "Génie Civil", enr_competences: "Oui", name: "RAZEL BEC"}]
But now to close this subject I would to access and stock just the description for instance. And I don't see how to do that... And finally display this data.
Producing intermediate variables is redundant, in elixir we have Kernel.|>/2 aka pipe operator to pipe the functions’ output to the first argument of the next function.
"name.csv"
|> File.stream!()
|> CSV.parse_stream()
|> Stream.filter(fn
["JeremyGuthrie" | _] -> true
_ -> false
end)
|> Stream.map(fn
[name, team, position, height, weight, age] ->
%{name: name, team: team, position: position,
height: String.to_integer(height),
weight: String.to_integer(weight),
age: Float.parse(age) |> elem(0)
}
end)
|> Enum.to_list() # THIS
Note the last line in the chain. Streams are to be terminated to retrieve the result. Until the termination happens, it’s lazily constructed, but not evaluated at all. That makes it possible to e.g. produce and operate infinite streams.
Any greedy function from Enum module would do: Enum.take/2, or, as I pointed out above, Enum.to_list/1.
For the sake of reference, in the future, when you feel fully familiar with elixir, you might use Flow instead of Stream to parallelize mapping. For now (and for relatively small files) Stream is good enough.

Beginner in Elixir : parse CSV file

Hello I'm a beginner in Elixir and I want to parse and stock a CSV file in an Elixir object.
But it's display that:
** (FunctionClauseError) no function clause matching in anonymous fn/1 in Siren.parseCSV/0
The following arguments were given to anonymous fn/1 in Siren.parseCSV/0:
# 1
["41", "5", "59", "N", "80", "39", "0", "W", "Youngstown", "OH"]
anonymous fn/1 in Siren.parseCSV/0
(elixir 1.10.3) lib/stream.ex:482: anonymous fn/4 in Stream.filter/2
(elixir 1.10.3) lib/stream.ex:1449: Stream.do_element_resource/6
(elixir 1.10.3) lib/stream.ex:1609: Enumerable.Stream.do_each/4
(elixir 1.10.3) lib/enum.ex:959: Enum.find/3
(mix 1.10.3) lib/mix/task.ex:330: Mix.Task.run_task/3
(mix 1.10.3) lib/mix/cli.ex:82: Mix.CLI.run_task/2
Here my code:
defmodule Siren do
def parseCSV do
IO.puts("Let's parse CSV file...")
File.stream!("../name.csv")
|> Stream.map(&String.trim(&1))
|> Stream.map(&String.split(&1, ","))
|> Stream.filter(fn
["LatD" | _] -> false
end)
|> Enum.find(fn State -> String
[LatD, LatM, LatS, NS, LonD, LonM, LonS, EW, City, State] ->
IO.puts("find -> #{State}")
true
end)
end
end
And the csv file:
LatD,LatM,LatS,NS,LonD,LonM,LonS,EW,City,State
41,5,59,N,80,39,0,W,Youngstown,OH
42,52,48,N,97,23,23,W,Yankton,SD
46,35,59,N,120,30,36,W,Yakima,WA
42,16,12,N,71,48,0,W,Worcester,MA
43,37,48,N,89,46,11,W,WisconsinDells,WI
36,5,59,N,80,15,0,W,Winston-Salem,NC
49,52,48,N,97,9,0,W,Winnipeg,MB
39,11,23,N,78,9,36,W,Winchester,VA
34,14,24,N,77,55,11,W,Wilmington,NC
39,45,0,N,75,33,0,W,Wilmington,DE
48,9,0,N,103,37,12,W,Williston,ND
41,15,0,N,77,0,0,W,Williamsport,PA
37,40,48,N,82,16,47,W,Williamson,WV
33,54,0,N,98,29,23,W,WichitaFalls,TX
37,41,23,N,97,20,23,W,Wichita,KS
40,4,11,N,80,43,12,W,Wheeling,WV
26,43,11,N,80,3,0,W,WestPalmBeach,FL
47,25,11,N,120,19,11,W,Wenatchee,WA
41,25,11,N,122,23,23,W,Weed,CA
The first issue is here:
|> Stream.filter(fn
["LatD" | _] -> false
end)
all the lines should pass this and the only first one matches the given clauses. This would fix the issue
|> Stream.filter(fn
["LatD" | _] -> false
_ -> true
end)
or
|> Stream.reject(&match?(["LatD" | _], &1))
Enum.find(fn State -> String after looks unclear and would be surely the next issue. I failed to understand what have you tried to achieve here.
The general advice would be: don’t reinvent the wheel and use NimbleCSV written by José Valim to parse CSVs, because there are lot of corner cases (like commas inside quotes in any field etc,) handled properly in the aforementioned library.
Aleksei Matiushkin gave you the right answer but also you have this function:
fn
State ->
String
[LatD, LatM, LatS, NS, LonD, LonM, LonS, EW, City, State] ->
IO.puts("find -> #{State}")
true
end
It accepts two possible values, either State which is an atom, or a list of 10 specific atoms.
What you want to do is use variables, and variables in Elixir start with a lowercase letter or an underscore if it has to be ignored.
fn
state ->
String
[latd, latm, lats, ns, lond, lonm, lons, ew, city, state] ->
IO.puts("find -> #{state}")
true
end
But in this case, the first clause of the function will always match anything because it acts like a catch-all clause.
What you probably want is:
fn
[_latd, _latm, _lats, _ns, _lond, _lonm, _lons, _ew, _city, state] ->
IO.puts("find -> #{state}")
# here decide if you want to return true or false,
# for instance `state == NC`
true
end

Scala - How to handle key not found in a Map when need to skip non-existing keys without defaults?

I have a set of Strings and using it as key values to get JValues from a Map:
val keys: Set[String] = Set("Metric_1", "Metric_2", "Metric_3", "Metric_4")
val logData: Map[String, JValue] = Map("Metric_1" -> JInt(0), "Metric_2" -> JInt(1), "Metric_3" -> null)
In the below method I'm parsing values for each metric. First getting all values, then filtering to get rid of null values and then transforming existing values to booleans.
val metricsMap: Map[String, Boolean] = keys
.map(k => k -> logData(k).extractOpt[Int]).toMap
.filter(_._2.isDefined)
.collect {
case (str, Some(0)) => str -> false
case (str, Some(1)) => str -> true
}
I've faced a problem when one of the keys is not found in the logData Map. So I'm geting a java.util.NoSuchElementException: key not found: Metric_4.
Here I'm using extractOpt to extract a value from a JSON and don't need default values. So probably extractOrElse will not be helpful since I only need to get values for existing keys and skip non-existing keys.
What could be a correct approach to handle a case when a key is not present in the logData Map?
UPD: I've achieved the desired result by .map(k => k -> apiData.getOrElse(k, null).extractOpt[Int]).toMap. However still not sure that it's the best approach.
That the values are JSON is a red herring--it's the missing key that's throwing the exception. There's a method called get which retrieves a value from a map wrapped in an Option. If we use Ints as the values we have:
val logData = Map("Metric_1" -> 1, "Metric_2" -> 0, "Metric_3" -> null)
keys.flatMap(k => logData.get(k).map(k -> _)).toMap
> Map(Metric_1 -> 1, Metric_2 -> 0, Metric_3 -> null)
Using flatMap instead of map means unwrap the Some results and drop the Nones. Now, if we go back to your actual example, we have another layer and that flatMap will eliminate the Metric_3 -> null item:
keys.flatMap(k => logData.get(k).flatMap(_.extractOpt[Int]).map(k -> _)).toMap
You can also rewrite this using a for comprehension:
(for {
k <- keys
jv <- logData.get(k)
v <- jv.extractOpt[Int]
} yield k -> v).toMap
I used Success and Failure in place of the JSON values to avoid having to set up a shell with json4s to make an example:
val logData = Map("Metric_1" -> Success(1), "Metric_2" -> Success(0), "Metric_3" -> Failure(new RuntimeException()))
scala> for {
| k <- keys
| v <- logData.get(k)
| r <- v.toOption
| } yield k -> r
res2: scala.collection.immutable.Set[(String, Int)] = Set((Metric_1,1), (Metric_2,0))

F# JsonValue example doesn't work

In F# Data: JSON Parser
There is an example showing how to extract data:
let info =
JsonValue.Parse("""
{ "name": "Tomas", "born": 1985,
"siblings": [ "Jan", "Alexander" ] } """)
open FSharp.Data.JsonExtensions
// Print name and birth year
let n = info?name
printfn "%s (%d)" (info?name.AsString()) (info?born.AsInteger())
// Print names of all siblings
for sib in info?siblings do
printfn "%s" (sib.AsString())
I copied and pasted this code to try it out, but it won't compile, I get the error:
Error 53 The field, constructor or member 'AsString' is not defined
Is there something missing in the example code?
This can't work with VS 2012 because it lacks the ability to handle extensions for F#

Converting epgsql results to JSON

I am a total beginner with Erlang and functional programming in general. For fun, to get me started, I am converting an existing Ruby Sinatra REST(ish) API that queries PostgreSQL and returns JSON.
On the Erlang side I am using Cowboy, Epgsql and Jiffy as the JSON library.
Epgsql returns results in the following format:
{ok, [{column,<<"column_name">>,int4,4,-1,0}], [{<<"value">>}]}
But Jiffy expects the following format when encoding to JSON:
{[{<<"column_name">>,<<"value">>}]}
The following code works to convert epgsql output into suitable input for jiffy:
Assuming Data is the Epgsql output and Key is the name of the JSON object being created:
{_, C, R} = Data,
Columns = [X || {_, X, _, _, _, _} <- C,
Rows = tuple_to_list(hd(R)),
Result = {[{atom_to_binary(Key, utf8), {lists:zip(Columns, Rows)}}]}.
However, I am wondering if this is efficient Erlang?
I've looked into the documentation for Epgsql and Jiffy and can't see any more obvious ways to perform the conversion.
Thank you.
Yes, need parse it.
For example function parse result
parse_result({error, #error{ code = <<"23505">>, extra = Extra }}) ->
{match, [Column]} =
re:run(proplists:get_value(detail, Extra),
"Key \\(([^\\)]+)\\)", [{capture, all_but_first, binary}]),
throw({error, {non_unique, Column}});
parse_result({error, #error{ message = Msg }}) ->
throw({error, Msg});
parse_result({ok, Cols, Rows}) ->
to_map(Cols, Rows);
parse_result({ok, Counts, Cols, Rows}) ->
{ok, Counts, to_map(Cols, Rows)};
parse_result(Result) ->
Result.
And function convert result to map
to_map(Cols, Rows) ->
[ maps:from_list(lists:zipwith(fun(#column{name = N}, V) -> {N, V} end,
Cols, tuple_to_list(Row))) || Row <- Rows ].
And encode it to json. You can change my code and make output as proplist.