How to parse a text file to csv file using Perl

How to parse a text file to csv file using Perl - csv

I am learning Perl and would like to parse a text file to csv file using Perl. I have a loop that generates the following text file:
//This part is what outputs on the text file
for $row(#$data) {
while(my($key,$value) = each(%$row)) {
print "${key}=${value}, ";
}
print "\n";
}
Text File Output:
name=Mary, id=231, age=38, weight=130, height=5.05, speed=26.233, time=30,
time=25, name=Jose, age=30, id=638, weight=150, height=6.05, speed=20.233,
age=40, weight=130, name=Mark, id=369, speed=40.555, height=5.07, time=30
CSV File Desired Output:
name,age,weight,height,speed,time
Mary,38,130,5.05,26.233,30,
Jose,30,150,6.05,20.233,25,
Mark,40,130,5.04,40.555,30
Any good feedback is welcome!

The key part here is how to manipulate your data so to extract what need be printed for each line. Then you are best off using a module to produce valid CSV, and Text::CSV is very good.
A program using an array of small hashrefs, mimicking data in the question
use strict;
use warnings;
use feature 'say';
use Text::CSV;
my #data = (
{ name => 'A', age => 1, weight => 10 },
{ name => 'B', age => 2, weight => 20 },
);
my $csv = Text::CSV->new({ binary => 1, auto_diag => 2 });
my $outfile = 'test.csv';
open my $ofh, '>', $outfile or die "Can't open $outfile: $!";
# Header, also used below for order of values for fields
my #hdr = qw(name age weight);
$csv->say($ofh, \#hdr);
foreach my $href (#data) {
$csv->say($ofh, [ #{$href}{#hdr} ]);
}
The values from hashrefs in a desired order are extracted using a hashref slice #{$href}{#hdr}, what is in general
#{ expression returning hash reference } { list of keys }
This returns a list of values for the given list of keys, from the hashref that the expression in the block {} must return. That is then used to build an arrayref (an anonymous array here, using []), what the module's say method needs in order to make and print a string of comma-separated-values† from that list of values.
Note a block that evaluates to a hash reference, used instead of a hash name that is used for a slice of a hash. This is a general rule that
Anywhere you'd put an identifier (or chain of identifiers) as part of a variable or subroutine name, you can replace the identifier with a BLOCK returning a reference of the correct type.
Some further comments
Look over the supported constructor's attributes; there are many goodies
For very simple data you can simply join fields with a comma and print
say $ofh join ',', #{$href}{#hdr};
But it is far safer to use a module to construct a valid CSV record. With the right choice of attributes in the constructor it can handle whatever is legal to embed in fields (some of what can take quite a bit of work to do correctly by hand) and it calls things which aren't
I list column names explicitly. Instead, you can fetch the keys and then sort in a desired order, but this will again need a hard-coded list for sorting
The program creates the file test.csv and prints to it the expected header and data lines.
† But separating those "values" with commas may involve a whole lot more than merely what the acronym for the "CSV format" stands for. A variety of things may come between those commas, including commas, newlines, and whatnot. This is why one is best advised to always use a library. Seeing constructor's options is informative.
The following commentary referred to the initial question. In the meanwhile the problems this addresses were corrected in OP's code and the question updated. I'm still leaving this text for some general comments that can be useful.
As for the code in the question and its output, there is almost certainly an issue with how the data is processed to produce #data, judged by the presence of keys HASH(address) in the output.
That string HASH(0x...) is output when one prints a variable which is a hash reference (what cannot show any of hash's content). Perl handles such a print by stringifying (producing a printable string out of something which is more complex) the reference in that way.
There is no good reason to have a hash reference for a hash key. So I'd suggest that you review your data and its processing and see how that comes about. (Or briefly show this, or post another question with it if it isn't feasible to add that to this one.)
One measure you can use to bypass that is to only use a list of keys that you know are valid, like I show above; however, then you may be leaving some outright error unhandled. So I'd rather suggest to find what is wrong.

Related

How to marshal a predicate from JSON in Prolog?

In Python it is common to marshal objects from JSON. I am seeking similar functionality in Prolog, either swi-prolog or scryer.
For instance, if we have JSON stating
{'predicate':
{'mortal(X)', ':-', 'human(X)'}
}
I'm hoping to find something like load_predicates(j) and have that data immediately consulted. A version of json.dumps() and loads() would also be extremely useful.
EDIT: For clarity, this will allow interoperability with client applications which will be collecting rules from users. That application is probably not in Prolog, but something like React.js.

I agree with the commenters that it would be easier to convert the JSON data to a .pl file in the proper format first and then load that.
However, you can load the predicates from JSON directly, convert them to a representation that Prolog understands, and use assertz to add them to the knowledge base.
If indeed the data contains all the syntax needed for a predicate (as is the case in the example data in the question) then converting the representation is fairly simple as you just need to concatenate the elements of the list into a string and then create a term out of the string. Note that this assumption skips step 2 in the first comment by Guy Coder.
Note that the Prolog JSON library is rather strict in which format it accepts: only double quotes are valid as string delimiters, and lists with singleton values (i.e., not key-value pairs) need to use the notation [a,b,c] instead of {a,b,c}. So first the example data needs to be rewritten:
{"predicate":
["mortal(X)", ":-", "human(X)"]
}
Then you can load it in SWI-Prolog. Minimal working example:
:- use_module(library(http/json)).
% example fact for testing
human(aristotle).
load_predicate(J) :-
% open the file
open(J, read, JSONstream, []),
% parse the JSON data
json_read(JSONstream, json(L)),
% check for an occurrence of the predicate key with value L2
member(predicate=L2, L),
% concatenate the list into a string
atomics_to_string(L2, S),
% create a term from the string
term_string(T, S),
% add to knowledge base
assertz(T).
Example run:
?- consult('mwe.pl').
true.
?- load_predicate('example_predicate.json').
true.
?- mortal(X).
X = aristotle.
Detailed explanation:
The predicate json_read stores the data in the following form:
json([predicate=['mortal(X)', :-, 'human(X)']])
This is a list inside a json term with one element for each key-value pair. The element has the syntax key=value. In the call to json_read you can already strip the json() term and store the list directly in the variable L.
Then member/2 is used to search for the compound term predicate=L2. If you have more than one predicate in the JSON file then you should turn this into a foreach or in a recursive call to process all predicates in the list.
Since the list L2 already contains a syntactically well-formed Prolog predicate it can just be concatenated, turned into a term using term_string/2 and asserted. Note that in case the predicate is not yet in the required format, you can construct a predicate out of the various pieces using built-in predicate manipulation functionality, see https://www.swi-prolog.org/pldoc/doc_for?object=copy_predicate_clauses/2 for some pointers.

Processing JSON from a .txt file and converting to a DataFrame in Julia

Cross posting from Julia Discourse in case anyone here has any leads.
I’m just looking for some insight into why the below code is returning a dataframe containing just the first line of my json file. If you’d like to try working with the file I’m working with, you can download the aminer_papers_0.zip from the Microsoft Open Academic Graph site, I’m using the first file in that group of files.
using JSON3, DataFrames, CSV
file_name = "path/aminer_papers_0.txt"
json_string = read(file_name, String)
js = JSON3.read(json_string)
df = DataFrame([js])
The resulting DataFrame has just one line, but the column titles are correct, as is the first line. To me the mystery is why the rest isn’t getting processed. I think I can rule out that read() is only reading the first JSON object, because I can index into the resulting object and see many JSON objects:
enter image description here
My first guess was maybe the newline \n was causing escape issues, and tried to use chomp to get rid of them, but couldn’t get it to work.
Anyway - any help would be greatly appreciated!

I think the problem is that the file is in JSON Lines format, and the JSON3 library only returns the first valid JSON value that it finds at the start of a string unless told otherwise.
tl;dr
Call JSON3.read with the keyword argument jsonlines=true.
Why?
By default, JSON3 interprets a string passed to its read function as a single "JSON text", defined by RFC 8259 section 1.3.2:
A JSON text is a serialized value....
(My emphasis on the use of the indefinite singular article "a.") A "JSON value" is defined in section 1.3.3:
A JSON value MUST be an object, array, number, or string, or one of the following three literal names: false, null, true.
A string with multiple JSON values in it is technically multiple "JSON texts." It is up to the parser to determine what part of the string argument you give it is a JSON text, and the authors of JSON3 chose as the default behavior to parse from the start of the string to the end of the first valid JSON value.
In order to get JSON3 to read the string as multiple JSON values, you have to give it the keyword option jsonlines=true, which is documented as:
jsonlines: A Bool indicating that the json_str contains newline delimited JSON strings, which will be read into a JSON3.Array of the JSON values. See jsonlines for reference. [default false]
Example
Take for example this simple string:
two_values = "3.14\n2.72"
Each one of these lines is a valid JSON serialization of a number. However, when passed to JSON3.read, only the first is parsed:
using JSON3
#assert JSON3.read(two_values) == 3.14
Using jsonlines=true, both values are parsed and returned as a JSON3.Array struct:
#assert JSON3.read(two_values, jsonlines=true) == [3.14, 2.72]
Other Packages
The JSON.jl library, which people might use by default given the name, does not implement parsing of JSON Lines strings at all, leaving it up to the caller to properly split the string as needed:
using JSON
JSON.parse(two_values)
# ERROR: Expected end of input
# Line: 1
# Around: ...3.14 2.72...
# ^
A simple way to implement reading multiple values is to use eachline:
#assert [JSON.parse(line) for line in eachline(IOBuffer(two_values))] == [3.14, 2.72]

perl decode and encode json preserving order

I have in a text database field a json encoded chart configuration in the form of:
{"Name":[[1,1],[1,2],[2,1]],"Name2":[[3,2]]}
The first number of these IDs is a primary key of another table. I'd like to remove those entries with a trigger when the row is deleted, a plperl function would be good except it does not preserve the order of the hash and the order is important in this project. What can I do (without changing the format of the json encoded config)? Note: the chart name can contain any characters so it's hard to do it with regex.

You need to use a streaming JSON decoder, such as JSON::Streaming::Reader. You could then store your JSON as an array of key/value pairs, instead of a hash.
The actual implementation of how you might use do this is highly dependent on the structure of your data, but given the simple example provided... here's a simple implementation.
use strict;
use warnings;
use JSON::Streaming::Reader;
use JSON 'to_json';
my $s = '{"Name":[[1,1],[1,2],[2,1]],"Name2":[[3,2]]}';
my $jsonr = JSON::Streaming::Reader->for_string($s);
my #data;
while (my $token = $jsonr->get_token) {
my ($key, $value) = #$token;
if ($key eq 'start_property') {
push #data, { $value => $jsonr->slurp };
}
}
print to_json(\#data);
The output for this script is always: -
[{"Name":[[1,1],[1,2],[2,1]]},{"Name2":[[3,2]]}]

Well, I managed to solve my problem, but it's not a general solution so it will probably not help the casual reader. Anyway I got the order of keys using the help of the database, I called my function like this:
SELECT remove_from_chart(
chart_config,
array(select * from json_object_keys(chart_config::json)),
id);
then I walked through the keys in the order of the second parameter and put the results in a new tied (IxHash) hash and json encoded it.
It's pretty sad that there is no perl json decoder that could preserve the key order when everything else I work with, at least on this project, does it (php, postgres, firefox, chrome).

JSON objects are unordered. You will have to encode the desired order into your data somehow
{"Name":[[1,1],[1,2],[2,1]],"Name2":[[3,2]], "__order__":["Name","Name2"]}
[{"Name":[[1,1],[1,2],[2,1]]},{"Name2":[[3,2]]}]

May be you want streaming decoder of JSON data like SAX parser. If so then see JSON::Streaming::Reader, or JSON::SL.

Determining if a perl scalar is a string or integer

From what I read, perl doesn't have a real type for integer or string, for him any $variable is scalar.
Recently I had a lot of trouble with an old script that was generating JSON objects required by another process for which values inside JSON must be integers, after some debug I found that because a simple print function:
JSON::encode_json
was generating a string instead an integer, here's my example:
use strict;
use warnings;
use JSON;
my $score1 = 998;
my $score2 = 999;
print "score1: ".$score1."\n";
my %hash_object = ( score1 => $score1, score2 => $score2 );
my $json_str = encode_json(\%hash_object); # This will work now
print "$json_str";
And it outputs:
score1: 998
{"score1":"998","score2":999}
Somehow perl variables have a type or at least this is how JSON::encode_json thinks.
Is there a way to find this type programmatically and why this type is changed when making an operation like the concatenation above?

First, it is somewhat incorrect. Yes, every values is scalar, but scalars have separate numeric and string values. That's all you need to know for this particular question, so I'll leave details out.
What you really need is to take a look at MAPPING / PERL -> JSON / simple scalars section of documentation for JSON module:
JSON::XS and JSON::PP will encode undefined scalars as JSON null
values, scalars that have last been used in a string context before
encoding as JSON strings, and anything else as number value.
(The docs are actually slightly wrong here. It is not "last been used in string context" it should be "was ever used in string context" - once scalar obtains string value it won't go away until you explicitly write a new number into it.)
You can force the type to be a string by stringifying it:
my $x = 3.1; # some variable containing a number
"$x"; # stringified
$x .= ""; # another, more awkward way to stringify
print $x; # perl does it for you, too, quite often
Incidentally this comment above is exactly about why your 998 turned to string.
You can force the type to be a number by numifying it:
my $x = "3"; # some variable containing a string
$x += 0; # numify it, ensuring it will be dumped as a number
$x *= 1; # same thing, the choice is yours.

Perl scalars may hold both a string and a scalar value simultaneously. Either of those is valid as a JSON value, so the string value is always used if there is one
This is a rare problem, as most software should accept either a quoted or unquoted representation of a number. Everything in a JSON document is originally a string, and significant data may be lost by requiring the module to apply an arbitrary string → float conversion before it hands you the data
But if you really need to write a numeric value without quotation marks then you can coerce a scalar to behave as a numeric type: just add zero to it.
Likewise, if a scalar variable holds only a numeric variable then you can force Perl to generate and store its equivalent string just by using it as a string. The most obvious method for me is to enclose it in quotation marks
This code appears to do what you want
use strict;
use warnings 'all';
use feature 'say';
use JSON;
my ($score1, $score2) = (998, 999);
print "score1: $score1\n";
my %data = ( score1 => 0+$score1, score2 => 0+$score2 );
say encode_json(\%data);
output
score1: 998
{"score1":998,"score2":999}

CGI::Application::Plugin::JSON - json_body returns backwards

I was wondering if anyone knew why this return is backwards with CGI::Application::Plugin::JSON
sub {
my ($self) = #_;
my $q = $self->query;
return $self->json_body({ result => '1', message => 'I should be AFTER result'} );
}
The Output is as follows:
{"message":"I should be AFTER result","result":"1"}
I would assume it would format the JSON left to right from the key/value pairs, and remembering it will be backwards is okay, but I have alot of returns to handle and the validation on the client-side is done with the 'result' value so if I am just missing something I would like to have it output just like it is input.
EDIT:
Also I just notices it is not returning a JSON Boolean type object as "result":"1" will deserialize as as sting object and not a JSON Boolean. Is there a way to have it output "result":1
Thanks for any help I can get with this one.

I would assume it would format the JSON left to right from the key/value pairs
You're confusing the list you assigned to the hash with the hash itself. Hashes don't have a left and a right; they have an array of linked lists.
You're getting the order in which the elements are found in the hash. You can't control that order as long as you use a hash.
If you really do need to have the fields in a specific order (which would be really weird), you could try using something that looks like a hash but remembers insertion order (like Tie::IxHash).
remembering it will be backwards is okay
Not only are they not "backwards", the order isn't even predictable.
$ perl -MJSON -E'say encode_json {a=>1,b=>2,c=>3} for 1..3'
{"b":2,"c":3,"a":1}
{"c":3,"a":1,"b":2}
{"a":1,"c":3,"b":2}
Is there a way to have it output "result":1
result => 1

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

How to parse a text file to csv file using Perl - csv

Related

How to marshal a predicate from JSON in Prolog?

Processing JSON from a .txt file and converting to a DataFrame in Julia

perl decode and encode json preserving order

Determining if a perl scalar is a string or integer

CGI::Application::Plugin::JSON - json_body returns backwards

Categories

Resources