If JSON contains $myid then (Perl) - json

I'm trying to loop through JSON:
my $cards = $json_obj->decode( $jsoncards->content );
foreach my $card ( #$cards )
{
print Dumper $card->{idMembers};
if ( $card->{idMembers} =~ $myid )
{
print $card->{name} . "\n";
}
}
The output from print Dumper $card->{idMembers}; is:
$VAR1 = [
'50e442a195105cde670743e4',
'50fd66804825017002070285',
'50f71f02a30d2a8c0d07d10d'
];
How do I compare to those ids?

The bind operator =~ treats its LHS as a string and the RHS as a pattern. The stringification of an arrayref looks like ARRAY(0x12ABF14), so this isn't useful.
We have two possibilities to match the $myid against each member of the array:
The grep EXPR, LIST builtin. Selects all elements where the expression returns a true value. If the count of the returned items is ≥ 1, then a matching element was found.
if ( grep $myid eq $_, #{ $card->{idMembers} }) { do stuff }
# or: grep /\Q$myid/, ... if you don't want string equality
Use the smartmatch operator ~~ in the member-of meaning:
if ( $myid ~~ $card->{idMembers} ) { do stuff }
This is subjects to multiple caveats: (1) It is only usable since v10.1. Therefore, code using smartmatch should at least use 5.010001. (2) Smartmatch was re-labeled as experimental in the latest release of perl, and may change without much notice. (3) If the idMembers entry is not an array, smartmatch may hide the error.
Smartmatch depends on the type of both operands. If you want to select all entries that contain $myid as a substring, you should probably pass it as a regex object: qr/\Q$myid/ ~~ .... Otherwise, it will likely test for equality

Related

Using Text::CSV on a String Containing Quotes

I have pored over this site (and others) trying to glean the answer for this but have been unsuccessful.
use Text::CSV;
my $csv = Text::CSV->new ( { binary => 1, auto_diag => 1 } );
$line = q(data="a=1,b=2",c=3);
my $csvParse = $csv->parse($line);
my #fields = $csv->fields();
for my $field (#fields) {
print "FIELD ==> $field\n";
}
Here's the output:
# CSV_XS ERROR: 2034 - EIF - Loose unescaped quote # rec 0 pos 6 field 1
FIELD ==>
I am expecting 2 array elements:
data="a=1,b=2"
c=3
What am I missing?
You may get away with using Text::ParseWords. Since you are not using real csv, it may be fine. Example:
use strict;
use warnings;
use Data::Dumper;
use Text::ParseWords;
my $line = q(data="a=1,b=2",c=3);
my #fields = quotewords(',', 1, $line);
print Dumper \#fields;
This will print
$VAR1 = [
'data="a=1,b=2"',
'c=3'
];
As you requested. You may want to test further on your data.
Your input data isn't "standard" CSV, at least not the kind that Text::CSV expects and not the kind that things like Excel produce. An entire field has to be quoted or not at all. The "standard" encoding of that would be "data=""a=1,b=2""",c=3 (which you can see by asking Text::CSV to print your expected data using say).
If you pass the allow_loose_quotes option to the Text::CSV constructor, it won't error on your input, but it won't consider the quotes to be "protecting" the comma, so you will get three fields, namely data="a=1, b=2" and c=3.

getting specific filename from bash

So I have a perl module that uses a bash command to obtain the file(s) with certain "table" names. In my specific case, it is looking for tables with the name "event", but I need this to work with all names too.
Currently, I have the following code in my perl script to obtain MYI files with the name table, and I am receiving not only event_* but also event_extra_data_* as well. For my example, I only need the 2nd table that exists in my database for event_. As my test info, I have, currently,
event_1459161160_0
event_1459182760_0
event_extra_data_1459182745_0
event_extra_data_1459182760_0
which are partitioned tables from tables "event" and event_extra_data which is the value that the $table variable sees below.
Anyways, my question is, how do i limit this to only receiving event_1459182760_0.MYI and not event_extra_data_1459182760_0.MYI which it is currently getting?
elsif ($sql =~ /\{LAST\}/i )
{
$cmd = 'ls -1 /var/lib/mysql/sfsnort/'.$table.'_*MYI | grep -v template | tail -n1 | cut -d"/" -f6 | cut -d"." -f1';
$value = `$cmd`;
print "Search Value: $value\n";
if ($value eq "")
{
$sql = ""; # same as with FIRST
}
else
{
$sql =~ s/\{LAST\}/$value/g;
}
}
Don't parse ls - there's no point, and it's prone to causing problems.
I would point out this - the glob function within perl allows you to do to a limited number of "regex-like" patterns. (Note - they aren't regex, so don't get them mixed up).
foreach my $filename ( glob "event_[0-9]*" ) {
#do something with $filename
}
If you're just after the last - when sorted numerically:
my ( $last ) = reverse sort glob "event_[0-9]*";
Given you have a single path, then you should be able to:
my ( $last ) = reverse sort glob "/var/lib/mysql/sfsnort/event_[0-9]*.MYI";
Note - that this works, assuming you're working with time() numeric values - it's doing an alphanumeric sort (and on directory names too).
If that isn't a valid assumption, you'll need a custom sort - which is quite easy, you can feed sort a subroutine to sort by.
Either:
sort { my ($a1) = $a =~ /(\d+)/; my ($b1) = $b =~ /(\d+)/; $b1 <=> $a1 }
To extract the first 'string of digits' from the path. (note - also includes directories).
Or use the -M file test:
sort { -M $a <=> -M $b }
Which will read modification time from the file (technically -M is age in days).
You can remove the reverse if you custom sort, just by swapping $a and $b.
Though I think this would be better done all in perl, to answer your specific question about how to get event_* but not event_extra*, you could of course add that to your grep to filter out, or you could use a different glob, like $table_[0-9]* if there's always an _ then a digit after the table name.
In perl you could do it something like the following though:
opendir( DIR, '/var/lib/mysql/sfsnort/' );
my #files = sort grep { /$table_\d/ } readdir( DIR );
closedir( DIR );
$files[$#files] =~ /(^[^.]+)/;
my $value = $1;

Why does the JSON module quote some numbers but not others?

We recently switched to the new JSON2 perl module.
I thought all and everything gets returned quoted now.
But i encountered some cases in which a number (250) got returned as unquoted number in the json string created by perl.
Out of curiosity:
Does anyone know why such cases exist and how the json module decides if to quote a value?
It will be unquoted if it's a number. Without getting too deeply into Perl internals, something is a number if it's a literal number or the result of an arithmetic operation, and it hasn't been stringified since its numeric value was produced.
use JSON::XS;
my $json = JSON::XS->new->allow_nonref;
say $json->encode(42); # 42
say $json->encode("42"); # "42"
my $x = 4;
say $json->encode($x); # 4
my $y = "There are $x lights!";
say $json->encode($x); # "4"
$x++; # modifies the numeric value of $x
say $json->encode($x); # 5
Note that printing a number isn't "stringifying it" even though it produces a string representation of the number to output; print $x doesn't cause a number to be a string, but print "$x" does.
Anyway, all of this is a bit weird, but if you want a value to be reliably unquoted in JSON then put 0 + $value into your structure immediately before encoding it, and if you want it to be reliably quoted then use "" . $value or "$value".
You can force it into a string by doing something like this:
$number_str = '' . $number;
For example:
perl -MJSON -le 'print encode_json({foo=>123, bar=>"".123})'
{"bar":"123","foo":123}
It looks like older versions of JSON has autoconvert functionality that can be set. Did you not have $JSON::AUTOCONVERT set to a true value?

HTML parser using perl

I'm trying to parse the html file using perl script. I'm trying to grep all the text with html tag p. If I view the source code the data is written in this format.
<p> Metrics are all virtualization specific and are prioritized and grouped as follows: </p>
Here is the following code.
use HTML::TagParser();
use URI::Fetch;
//my #list = $html->getElementsByTagName( "p" );
foreach my $elem ( #list ) {
my $tagname = $elem->tagName;
my $attr = $elem->attributes;
my $text = $elem->innerText;
push (#array,"$text");
foreach $_ (#array) {
# print "$_\n";
print $html_fh "$_\n";
chomp ($_);
push (#array1, "$_");
}
}
}
$end = $#array1+1;
print "Elements in the array: $end\n";
close $html_fh;
The problem that I'm facing is that the output which is generated is 4.60 Mb and lot of the array elements are just repetition sentences. How can I avoid such repetition? Is there any other efficient way to grep the lines which I'm interested. Can anybody help me out with this issue?
The reason you are seeing duplicated lines is that you are printing your entire array once for every element in it.
foreach my $elem ( #list ) {
my $tagname = $elem->tagName;
my $attr = $elem->attributes;
my $text = $elem->innerText;
push (#array,"$text"); # this array is printed below
foreach $_ (#array) { # This is inside the other loop
# print "$_\n";
print $html_fh "$_\n"; # here comes the print
chomp ($_);
push (#array1, "$_");
}
}
So for example, if you have an array "foo", "bar", "baz", it would print:
foo # first iteration
foo # second
bar
foo # third
bar
baz
So, to fix your duplication errors, move the second loop outside the first one.
Some other notes:
You should always use these two pragmas:
use strict;
use warnings;
They will provide more help than any other single thing that you can do. The short learning curve associated with fixing the errors that appear more than make up for the massively reduced time spent debugging.
//my #list = $html->getElementsByTagName( "p" );
Comments in perl start with #. Not sure if this is a typo, because you use this array below.
foreach my $elem ( #list ) {
You don't need to actually store the tags into an array unless you need an array. This is an intermediate variable only in this case. You can simply do the following (note that for and foreach are exactly the same):
for my $elem ($html->getElementsByTagName("p")) {
These variables are also intermediate, and two of them unused.
my $tagname = $elem->tagName;
my $attr = $elem->attributes;
my $text = $elem->innerText;
push (#array,"$text");
Also note that you never have to quote a variable this way. You can simply do this:
push #array, $elem->innerText;
foreach $_ (#array) {
The $_ variable is used by default, no need to specify it explicitly.
print $html_fh "$_\n";
chomp ($_);
push (#array1, "$_");
I'm not sure why you are chomping the variable after you print it, but before you store it in this other array, but it doesn't seem to make sense to me. Also, this other array will contain the exact same elements as the other array, only duplicated.
$end = $#array1+1;
This is another intermediate variable, and also it can be simplified. The $# sigil will give you the index of the last element, but the array itself in scalar context will give you the size of it:
$end = #array1; # size = last index + 1
But you can do this in one go:
print "Elements in the array: " . #array1 . "\n";
Note that using the concatenation operator . here enforces scalar context on the array. If you had used the comma operator , it would have list context, and the array would have been expanded into a list of its elements. This is a typical way to manipulate by context.
close $html_fh;
Explicitly closing a file handle is not required as it will automatically closed when the script ends.
If you use Web::Scraper instead, your code gets even simpler and clearer (as long as you are able to construct CSS selectors or XPath queries):
#!/usr/bin/env perl
use strict;
use warnings qw(all);
use URI;
use Web::Scraper;
my $result = scraper {
process 'p',
'paragraph[]' => 'text';
}->scrape(URI->new('http://www.perl.org/'));
for my $test (#{$result->{paragraph}}) {
print "$test\n";
}
print "Elements in the array: " . (scalar #{$result->{paragraph}});
Here is another way to get all the content from between <p> tags, this time using Mojo::DOM part of the Mojolicious project.
#!/usr/bin/env perl
use strict;
use warnings;
use v5.10; # say
use Mojo::DOM;
my $html = <<'END';
<p>Paragraph 1</p>
<p>Paragraph 2</p>
<div>Should not find this</div>
<p>Paragraph 3</p>
END
my $dom = Mojo::DOM->new($html);
my #paragraphs = $dom->find('p')->pluck('text')->each;
say for #paragraphs;

Perl: HTML::PrettyPrinter - Handling self-closing tags

I am a newcomer to Perl (Strawberry Perl v5.12.3 on Windows 7), trying to write a script to aid me with a repetitive HTML formatting task. The files need to be hand-edited in future and I want them to be human-friendly, so after processing using the HTML package (HTML::TreeBuilder etc.), I am writing the result to a file using HTML::PrettyPrinter. All of this works well and the output from PrettyPrinter is very nice and human-readable. However, PrettyPrinter is not handling self-closing tags well; basically, it seems to be treat the slash as an HTML attribute. With input like:
<img />
PrettyPrinter returns:
<img /="/" >
Is there anything I can do to avoid this other than preprocessing with a regex to remove the backslash?
Not sure it will be helpful, but here is my setup for the pretty printing:
my $hpp = HTML::PrettyPrinter->new('linelength' => 120, 'quote_attr' => 1);
$hpp->allow_forced_nl(1);
my $output = new FileHandle ">output.html";
if (defined $output) {
$hpp->select($output);
my $linearray_ref = $hpp->format($internal);
undef $output;
$hpp->select(undef),
}
You can print formatted human readable html with TreeBuilder method:
$h = HTML::TreeBuilder->new_from_content($html);
print $h->as_HTML('',"\t");
but if you still prefer this bugged prettyprinter try to remove problem tags, no idea why someone need ...
$h = HTML::TreeBuilder->new_from_content($html);
while(my $n = $h->look_down(_tag=>img,'src'=>undef)) { $n->delete }
UPD:
well... then we can fix the PrettyPrinter. It's pure perl module so lets see...
No idea where on windows perl modules are for me it's /usr/local/share/perl/5.10.1/HTML/PrettyPrinter.pm
maybe not an elegant solution, but will work i hope.
this sub parse attribute/value pairs, a little fix and it will add single '/' at the end
~line 756 in PrettyPrinter.pm
I've marked the stings that i added with ###<<<<<< at the end
#
# format the attributes
#
sub _attributes {
my ($self, $e) = #_;
my #result = (); # list of ATTR="value" strings to return
my $self_closing = 0; ###<<<<<<
my #attrs = $e->all_external_attr(); # list (name0, val0, name1, val1, ...)
while (#attrs) {
my ($a,$v) = (shift #attrs,shift #attrs); # get current name, value pair
if($a eq '/') { ###<<<<<<
$self_closing=1; ###<<<<<<
next; ###<<<<<<
} ###<<<<<<
# string for output: 1. attribute name
my $s = $self->uppercase? "\U$a" : $a;.
# value part, skip for boolean attributes if desired
unless ($a eq lc($v) &&
$self->min_bool_attr &&.
exists($HTML::Tagset::boolean_attr{$e->tag}) &&
(ref($HTML::Tagset::boolean_attr{$e->tag}).
? $HTML::Tagset::boolean_attr{$e->tag}{$a}.
: $HTML::Tagset::boolean_attr{$e->tag} eq $a)) {
my $q = '';
# quote value?
if ($self->quote_attr || $v =~ tr/a-zA-Z0-9.-//c) {
# use single quote if value contains double quotes but no single quotes
$q = ($v =~ tr/"// && $v !~ tr/'//) ? "'" : '"'; # catch emacs ");
}
# add value part
$s .= '='.$q.(encode_entities($v,$q.$self->entities)).$q;
}
# add string to resulting list
push #result, $s;
}
push #result,'/' if $self_closing; ###<<<<<<
return #result; # return list ('attr="val"','attr="val"',...);
}