I have this problem in my script but I don't have ideas for resolve this !
when I launch the script :
boak#boak-LX:~/Documents$ perl dl-sound.pl --url http://soundcloud.com/alexorion/bigger-room-radio-015
Missing or empty input at dl-sound.pl line 100.
the script :
sub fetch_music_info {
my ($self, $music_url) = #_;
$music_url ||= $self->{url};
my $page = $self->_get_content($music_url);
my $jsmusic = $1 if ($page =~ m{window.SC.bufferTracks.push\((.*)}i);
$jsmusic =~ s/;//g if defined($jsmusic);
$jsmusic =~ s/\)//g if defined($jsmusic);
my $music_info = JSON::Tiny::decode_json($jsmusic);
return $music_info;
}
It is unlikely to he the cause of your problem, but you shouldn't make declarations conditional. The behaviour is undefined, and it can lead to all sorts of nonsense.
So this
my $jsmusic = $1 if $page =~ m{window.SC.bufferTracks.push\((.*)}i;
should be
my $jsmusic;
$jsmusic = $1 if $page =~ /window\.SC\.bufferTracks\.push\(([^)]*)/i;
Note also that I have changed the regex to escape the dots, and to capture only the characters up to the next closing parenthesis. That means the following substitutions shouldn't be necessary if I understand your data properly
Update
In fact, looking again, the whole purpose of your subroutine is invalidated if the pattern doesn't match, so you should write it more like
sub fetch_music_info {
my ($self, $music_url) = #_;
$music_url ||= $self->{url};
my $page = $self->_get_content($music_url);
if ($page =~ /window\.SC\.bufferTracks\.push\(([^)]*)/i) {
return JSON::Tiny::decode_json($1);
}
else {
die "Music Info not found";
}
}
I guess that $jsmusic is not defined here:
my $music_info = JSON::Tiny::decode_json($jsmusic);
Change to:
my $music_info;
$music_info = JSON::Tiny::decode_json($jsmusic) if defined $jsmusic;
return $music_info;
I agree with Borodin, but there's a more Perlish way to write this:
my ($jsmusic) = ($page =~ m{window.SC.bufferTracks.push\((.*)}i);
That's because the =~ operator returns a list of the matchvars ($1,$2, etc.).
In this case, $jsmusic will always be declared but it will be undefined if the regexp doesn't match. You need the parentheses around ($jsmusic) to force the operator to list context.
Related
I used an anonymous hash to pass value from two different subroutines to a new subroutine. But, now I'm not able to perform calculations using the passed variables.
use warnings;
use strict;
use feature 'say';
use DBI;
use autodie;
use Data::Dumper;
use CGI;
print "Enter sequence";
my $seq = <STDIN>;
chomp $seq;
$len = length $seq;
my $f = nuc($seq);
perc({ len => $len });
sub nuc {
my ($c) = #_;
chomp $c;
my $len = length $c;
for (my $i = 0; $i< = $len; $i++) {
my $seq2 = substr($c, $i, 1);
$nuc=$nuc . $seq2;
chomp $nuc;
}
my $l = perc({nuc => $nuc});
}
sub perc {
my $params = shift;
my $k = $params->{nuc};
my $w = $params->{len};
my $db = "hnf1a";
my $user = "root";
my $password = "";
my $host = "localhost";
my $dbh = DBI->connect("DBI:mysql:database=$db:$host",$user,$password);
my $sth = $dbh->prepare('SELECT COUNT(*) FROM mody where nm = ?');
for (1..100) {
$sth->execute(int(rand(10)));
}
chomp (my $input = $k);
my #num = split /':'/, $input;
for my $num(#num) {
say "rows matching input nuc <$num>:";
$sth->execute($num);
my $count = $sth->fetchrow_array;
say "$count";
$u += $count;
}
}
$h = $u / $w;
print $h;
I passed the variables : $nuc and $len to the last subroutine 'perc' by declaring an anonymous hash.
When I use these variables to perform calculations I don't get a proper answer.
For the above division performed I got a statement as 'Illegal division'.
Please help me out. Thanks in advance.
You are making two separate calls to perc, each with only one of the required values in the hash. You can't do that: the subroutine won't "remember" a value passed to it across separate calls unless you write the code to do that
You need to collect all the values and pass them in a single call to perc
There are rather a lot of misunderstandings here. Let's go through your code.
use CGI;
Using CGI.pm is a bit dated, but it's not a terrible idea if you're writing a CGI program. But this isn't a CGI program, so this isn't necessary.
print "Enter sequence";
my $seq = <STDIN>;
chomp $seq;
$len = length $seq;
my $f = nuc($seq);
This looks OK. You prompt the user, get some input, remove the newline from the end of the input, get the length of the input and then pass your input into nuc().
So, let's look at nuc() - which could probably have a better name!
sub nuc {
my ($c) = #_;
chomp $c;
my $len = length $c;
for (my $i = 0; $i< = $len; $i++) {
my $seq2 = substr($c, $i, 1);
$nuc=$nuc . $seq2;
chomp $nuc;
}
my $l = perc({nuc => $nuc});
}
You get the parameter that has been passed in and remove the newline from the end of it (which does nothing as this is $seq which has already had its newline removed). You then get the length of this string (again!)
Then it gets very strange. Firstly, there's a syntax error (< = should be <=). Then you use a C-style for loop together with substr() too... well, basically you just copy $c to $nuc in a really inefficient manner. So this subroutine could be written as:
sub nuc {
my ($c) = #_;
$nuc = $c;
my $l = perc({ nuc => $nuc });
}
Oh, and I don't know why you chomp($nuc) each time round the loop.
Two more strange things. Firstly, you don't declare $nuc anywhere, and you have use strict in your code. Which means that this code doesn't even compile. (Please don't waste our time with code that doesn't compile!) And secondly, you don't explicitly return a value from nuc(), but you store the return value in $f. Because of the way Perl works, this subroutine will return the value in $l. But it's best to be explicit.
Then there's your perc() subroutine.
sub perc {
my $params = shift;
my $k = $params->{nuc};
my $w = $params->{len};
my $db = "hnf1a";
my $user = "root";
my $password = "";
my $host = "localhost";
my $dbh = DBI->connect("DBI:mysql:database=$db:$host",$user,$password);
my $sth = $dbh->prepare('SELECT COUNT(*) FROM mody where nm = ?');
for (1..100) {
$sth->execute(int(rand(10)));
}
chomp (my $input = $k);
my #num = split /':'/, $input;
for my $num(#num) {
say "rows matching input nuc <$num>:";
$sth->execute($num);
my $count = $sth->fetchrow_array;
say "$count";
$u += $count;
}
}
You get the hash ref which is passed in an store that in $params. You then extract the nuc and len values from that hash and store them in variables called $k and $w (you really need to improve your variable and subroutine names!) But each call to perc only has one of those values set - so only one of your two variables get a value, the other will be undef.
So then you connect to the database. And you run a select query a hundred times passing in random integers between 0 and 9. And ignore the value returned from the select statement. Which is bizarre and pointless.
Eventually, you start doing something with one of your input parameters, $k (the other, $w, is completely ignored). You copy it into another scalar variable before splitting it into an array. You then run the same SQL select statement once for each element in that array and add the number you get back to the running total in $u. And $u is another variable that you never declare, so (once again) this code doesn't compile.
Outside of your subroutines, you then do some simple maths with $u (an undeclared variable) and $w (a variable that was declared in a different scope) and store the result in $h (another undeclared variable).
I really don't understand what this code is supposed to do. And, to be honest, I don't think you do too. If you're at school, then you need to go back to your teacher and say that you have no idea what you are doing. If you're in a job, you need to tell your boss that you're not the right person for this task.
Either way, if you want to be a programmer, you need to go right back to the start and cover the very basics again.
Hope some Perl gurus out there can help me out here. Basically my issue is when a JSON string starts with a "[" instead of a "{", Perl doesn't treat the variable as a hash after I use decode_json.
Here's a sample code.
#!/usr/bin/perl
use JSON;
use Data::Dumper;
$string1 = '{"Peti Bar":{"Literature":88,"Mathematics":82,"Art":99},"Foo Bar":{"Literature":67,"Mathematics":97}}';
$string = '[{"ActionID":5,"ActionName":"TEST- 051017"},{"ActionID":10,"ActionName":"Something here"},{"ActionID":13,"ActionName":"Some action"},{"ActionID":141,"ActionName":"Email Reminder"}]';
print "First string that starts with \"{\" below:\n$string1\n\n";
my $w = decode_json $string1;
my $count = keys %$w;
print "printing \$count's value -> $count\n\n";
print "Second string starts with \"[\" below:\n$string\n\n";
my $x = decode_json $string;
my $count2 = keys %$x;
print "printing \$count2's value -> $count2\n\n";
Below is the script output.
Both $w and $x works though. It's just I have to use keys $x instead of keys %$x on the other json string.
Now the issue with using that is I get a keys on reference is experimental at tests/jsontest.pl error. It won't stop the script but I'm worried about future compatibility issues.
What's the best way to approach this?
Use the ref function to determine what type the reference is. See perldoc -f ref.
my $w = decode_json $string1;
my $count = 1;
if( my $ref = ref( $w ) ){
if( $ref eq 'HASH' ){
$count = keys %$w;
}elsif( $ref eq 'ARRAY' ){
$count = scalar #$w;
}else{
die "invalid reference '$ref'\n";
}
}
I know that HTML:Parser is a thing and from reading around, I've realized that trying to parse html with regex is usually a suboptimal way of doing things, but for a Perl class I'm currently trying to use regular expressions (hopefully just a single match) to identify and store the sentences from a saved html doc. Eventually I want to be able to calculate the number of sentences, words/sentence and hopefully average length of words on the page.
For now, I've just tried to isolate things which follow ">" and precede a ". " just to see what if anything it isolates, but I can't get the code to run, even when manipulating the regular expression. So I'm not sure if the issue is in the regex, somewhere else or both. Any help would be appreciated!
#!/usr/bin/perl
#new
use CGI qw(:standard);
print header;
open FILE, "< sample.html ";
$html = join('', <FILE>);
close FILE;
print "<pre>";
###Main Program###
&sentences;
###sentence identifier sub###
sub sentences {
#sentences;
while ($html =~ />[^<]\. /gis) {
push #sentences, $1;
}
#for debugging, comment out when running
print join("\n",#sentences);
}
print "</pre>";
Your regex should be />[^<]*?./gis
The *? means match zero or more non greedy. As it stood your regex would match only a single non < character followed by a period and a space. This way it will match all non < until the first period.
There may be other problems.
Now read this
A first improvement would be to write $html =~ />([^<.]+)\. /gs, you need to capture the match with the parents, and to allow more than 1 letter per sentence ;--)
This does not get all the sentences though, just the first one in each element.
A better way would be to capture all the text, then extract sentences from each fragment
while( $html=~ m{>([^<]*<}g) { push #text_content, $1};
foreach (#text_content) { while( m{([^.]*)\.}gs) { push #sentences, $1; } }
(untested because it's early in the morning and coffee is calling)
All the usual caveats about parsing HTML with regexps apply, most notably the presence of '>' in the text.
I think this does more or less what you need. Keep in mind that this script only looks at text inside p tags. The file name is passed in as a command line argument (shift).
#!/usr/bin/perl
use strict;
use warnings;
use HTML::Grabber;
my $file_location = shift;
print "\n\nfile: $file_location";
my $totalWordCount = 0;
my $sentenceCount = 0;
my $wordsInSentenceCount = 0;
my $averageWordsPerSentence = 0;
my $char_count = 0;
my $contents;
my $rounded;
my $rounded2;
open ( my $file, '<', $file_location ) or die "cannot open < file: $!";
while( my $line = <$file>){
$contents .= $line;
}
close( $file );
my $dom = HTML::Grabber->new( html => $contents );
$dom->find('p')->each( sub{
my $p_tag = $_->text;
++$totalWordCount while $p_tag =~ /\S+/g;
while ($p_tag =~ /[.!?]+/g){
$p_tag =~ s/\s//g;
$char_count += (length($p_tag));
$sentenceCount++;
}
});
print "\n Total Words: $totalWordCount\n";
print " Total Sentences: $sentenceCount\n";
$rounded = $totalWordCount / $sentenceCount;
print " Average words per sentence: $rounded.\n\n";
print " Total Characters: $char_count.\n";
my $averageCharsPerWord = $char_count / $totalWordCount ;
$rounded2 = sprintf("%.2f", $averageCharsPerWord );
print " Average words per sentence: $rounded2.\n\n";
I am a newcomer to Perl (Strawberry Perl v5.12.3 on Windows 7), trying to write a script to aid me with a repetitive HTML formatting task. The files need to be hand-edited in future and I want them to be human-friendly, so after processing using the HTML package (HTML::TreeBuilder etc.), I am writing the result to a file using HTML::PrettyPrinter. All of this works well and the output from PrettyPrinter is very nice and human-readable. However, PrettyPrinter is not handling self-closing tags well; basically, it seems to be treat the slash as an HTML attribute. With input like:
<img />
PrettyPrinter returns:
<img /="/" >
Is there anything I can do to avoid this other than preprocessing with a regex to remove the backslash?
Not sure it will be helpful, but here is my setup for the pretty printing:
my $hpp = HTML::PrettyPrinter->new('linelength' => 120, 'quote_attr' => 1);
$hpp->allow_forced_nl(1);
my $output = new FileHandle ">output.html";
if (defined $output) {
$hpp->select($output);
my $linearray_ref = $hpp->format($internal);
undef $output;
$hpp->select(undef),
}
You can print formatted human readable html with TreeBuilder method:
$h = HTML::TreeBuilder->new_from_content($html);
print $h->as_HTML('',"\t");
but if you still prefer this bugged prettyprinter try to remove problem tags, no idea why someone need ...
$h = HTML::TreeBuilder->new_from_content($html);
while(my $n = $h->look_down(_tag=>img,'src'=>undef)) { $n->delete }
UPD:
well... then we can fix the PrettyPrinter. It's pure perl module so lets see...
No idea where on windows perl modules are for me it's /usr/local/share/perl/5.10.1/HTML/PrettyPrinter.pm
maybe not an elegant solution, but will work i hope.
this sub parse attribute/value pairs, a little fix and it will add single '/' at the end
~line 756 in PrettyPrinter.pm
I've marked the stings that i added with ###<<<<<< at the end
#
# format the attributes
#
sub _attributes {
my ($self, $e) = #_;
my #result = (); # list of ATTR="value" strings to return
my $self_closing = 0; ###<<<<<<
my #attrs = $e->all_external_attr(); # list (name0, val0, name1, val1, ...)
while (#attrs) {
my ($a,$v) = (shift #attrs,shift #attrs); # get current name, value pair
if($a eq '/') { ###<<<<<<
$self_closing=1; ###<<<<<<
next; ###<<<<<<
} ###<<<<<<
# string for output: 1. attribute name
my $s = $self->uppercase? "\U$a" : $a;.
# value part, skip for boolean attributes if desired
unless ($a eq lc($v) &&
$self->min_bool_attr &&.
exists($HTML::Tagset::boolean_attr{$e->tag}) &&
(ref($HTML::Tagset::boolean_attr{$e->tag}).
? $HTML::Tagset::boolean_attr{$e->tag}{$a}.
: $HTML::Tagset::boolean_attr{$e->tag} eq $a)) {
my $q = '';
# quote value?
if ($self->quote_attr || $v =~ tr/a-zA-Z0-9.-//c) {
# use single quote if value contains double quotes but no single quotes
$q = ($v =~ tr/"// && $v !~ tr/'//) ? "'" : '"'; # catch emacs ");
}
# add value part
$s .= '='.$q.(encode_entities($v,$q.$self->entities)).$q;
}
# add string to resulting list
push #result, $s;
}
push #result,'/' if $self_closing; ###<<<<<<
return #result; # return list ('attr="val"','attr="val"',...);
}
my $url = "\'http://".$server.":4080/cgi-bin/gen_graph.pl?view=5&SUBSYS=\'";
my $html = HTML::TagParser->new( $url );
my #list = $html->getElementsByTagName( "pre" );
print $list[0];
foreach my $elem ( #list ) {
if($elem->innerText =~ /APIs/){
my $text = $elem->innerText;
if ( $text eq "" ) {
} else {
#API_list = split(/\s+/, $text);
print $API_list[1];
}
}
}
return \#API_list;
}
here the line my #list = $html->getElementsByTagName( "pre" ); not working. if i do this as a seperate script it is working well.. but if i include it in another script there is no value in #list. can anyone help me?
Are you getting an error message? If so, what is it?
Have you thought to check the return value of HTML::TagParser->new()? If it's failing, it may be doing so silently, and you only find out later when you try to use your $html object.
I do think the URL you're handing to it looks odd.
"\'http://".$server.":4080/cgi-bin/gen_graph.pl?view=5&SUBSYS=\'"
Why the two layers of quotes? (double quotes, and then escaped single quotes). Wouldn't this work:
my $url = 'http://'
. $server
. ':4080/cgi-bin-gen_graph.pl?view=5&SUBSYS=';
(Extra whitespace added to make it easier to read the concatenation operator.)