Perl LibXML get last line number of Node - html

I use LibXML in Perl, which store the start line number of each node, but how i can get the last one?
I tried get last line number through..
..counting newlines in innerhtml of the node, but LibXML return the innerhtml in different formatting than original, so that the line number differ.
..node->getLastChild->line_number, but also havin no success.
Any Idea?

If line_number returned the first line of a node as you say, all you'd need is
my $s_line_num = $node->line_number();
my $e_line_num = $node->nextSibling()->line_number();
But it doesn't. What line_number returns is actually closer the number of the last line of the node. For that, we could simply look at the previous sibling's line number.
my $s_line_num = $node->previousSibling()->line_number();
my $e_line_num = $node->line_number();
But while that's what it returns for non-element nodes, it returns the last line number of the start tag (rather than of the element as a whole) for elements. That's completely useless.
Sorry, no can do!

If line_number returned the first line of a node as you say, all you'd need is
my $parser = XML::LibXML->new( XML_LIBXML_LINENUMBERS==>1);
my $e_line_num = $node->line_number();
=> 1 );

Related

Regular expression to ignore each first and second word of a selected sentence

I want to create a Regular expression to ignore each first and second word of a selected sentence
For example I have this phrase "October 27 New Store Products / October 2022". I want to create a regex that will choose only this part of the phrase ~ "New Store Products / October 2022" and ignore the first date part of the phrase ~ "October 27".
Without knowledge of your true requirements, all we can do is provide best guess, so here is mine;
What you could do, is have something such as the following;
/^\S+\s+\S+\s+(.*)$/
What this would do is the following;
From the beginning of the string (^), find one or more non-whitespace chars (\S+), find one or more whitespace chars (\s+) - repeat this again and then use a capture group ((.*)) to get everything else until the end of the string ($).
If you are using JavaScript, you could use this as such;
let sentence = "October 27 New Store Products / October 2022";
let regex = /^\S+\s+\S+\s+(.*)$/;
let match = regex.exec(sentence);
if (match) {
// Ignores the first and second words of the sentence
console.log(match[1]); // Output: "New Store Products / October 2022" ignoring "October 27"
}
Further explanation of this regex taken from regex1011 when this is put into the regex bar
/^\S+\s+\S+\s+(.*)$/
^ asserts position at start of the string
\S matches any non-whitespace character (equivalent to [^\r\n\t\f\v ])
+ matches the previous token between one and unlimited times, as many times as possible, giving back as needed (greedy)
\s matches any whitespace character (equivalent to [\r\n\t\f\v ])
+ matches the previous token between one and unlimited times, as many times as possible, giving back as needed (greedy)
\S matches any non-whitespace character (equivalent to [^\r\n\t\f\v ])
+ matches the previous token between one and unlimited times, as many times as possible, giving back as needed (greedy)
\s matches any whitespace character (equivalent to [\r\n\t\f\v ])
+ matches the previous token between one and unlimited times, as many times as possible, giving back as needed (greedy)
1st Capturing Group (.*)
. matches any character (except for line terminators)
* matches the previous token between zero and unlimited times, as many times as possible, giving back as needed (greedy)
$ asserts position at the end of the string, or before the line terminator right at the end of the string (if any)
1 Emphasis mine
You've not provided any information on the context, but does it need to be a regular expression?
String manipulation by searching on spaces might be easier.
For example in PHP:
$string = "October 27 New Store Products / October 2022";
$string_array = explode(' ', $string, 3);
if (array_key_exists(2, $string_array)) echo $string_array[2];
or Excel:
=RIGHT(A1,LEN(A1)-FIND(" ",A1,FIND(" ",A1)+1))

unable to get return value from MariaDB via perl DBI [duplicate]

I'm getting a bunch of text from an outside source, saving it in a variable, and then displaying that variable as part of a larger block of HTML. I need to display it as is, and dollar signs are giving me trouble.
Here's the setup:
# get the incoming text
my $inputText = "This is a $-, as in $100. It is not a 0.";
print <<"OUTPUT";
before-regex: $inputText
OUTPUT
# this regex seems to have no effect
$inputText =~ s/\$/\$/g;
print <<"OUTPUT";
after-regex: $inputText
OUTPUT
In real life, those print blocks are much larger chunks of HTML with variables inserted directly.
I tried escaping the dollar signs using s/\$/\$/g because my understanding is that the first \$ escapes the regex so it searches for $, and the second \$ is what gets inserted and later escapes the Perl so that it just displays $. But I can't get it to work.
Here's what I'm getting:
before-regex: This is a 0, as in . It is not a 0.
after-regex: This is a 0, as in . It is not a 0.
And here's what I want to see:
before-regex: This is a 0, as in . It is not a 0.
after-regex: This is a $-, as in $100. It is not a 0.
Googling brings me to this question. When I try using the array and for loop in the answer, it has no effect.
How can I get the block output to display the variable exactly as it is?
When you construct a string with double-quotes, the variable substitution happens immediately. Your string will never contain the $ character in that case. If you want the $ to appear in the string, either use single-quotes or escape it, and be aware that you will not get any variable substitution if you do that.
As for your regex, that is odd. It is looking for $ and replacing them with $. If you want backslashes, you have to escape those too.
And here's what I want to see:
before-regex: This is a 0, as in . It is not a 0.
after-regex: This is a $-, as in $100. It is not a 0.
hum, well, I'm not sure what the general case is, but maybe the following will do:
s/0/\$-/;
s/in \K/\$100/;
Or did you mean to start with
my $inputText = "This is a \$-, as in \$100. It is not a 0.";
# Produces the string: This is a $-, as in $100. It is not a 0.
or
my $inputText = 'This is a $-, as in $100. It is not a 0.';
# Produces the string: This is a $-, as in $100. It is not a 0.
Your mistake is using double quotes instead of single quotes in the declaration of your variable.
This should be :
# get the incoming text
my $inputText = 'This is a $-, as in $100. It is not a 0.';
Learn the difference between ' and " and `. See http://mywiki.wooledge.org/Quotes and http://wiki.bash-hackers.org/syntax/words
This is for shell, but it's the same in Perl.

Opening a file of varying row and column structure in Scilab

I habitually use csvRead in scilab to read my data files however I am now faced with one which contains blocks of 200 rows, preceeded by 3 lines of headers, all of which I would like to take into account.
I've tried specifying a range of data following the example on the scilab help website for csvRead (example is right at the bottom of the page) (https://help.scilab.org/doc/6.0.0/en_US/csvRead.html) but I always come out with the same error messages :
The line and/or colmun indices are outside of the limits
or
Error in the column structure.
My first three lines are headers which I know can cause a problem but even if I omit them from my block-range, I still have the same problem.
Otherwise, my data is ordered such that I have my three lines of headers (two lines containing a header over just one or two columns, one line containing a header over all columns), 200 lines of data, and a blank line - this represents data from one image and I have about 500 images in the file, I would like to be able to read and process all of them and keep track of the headers because they state the image number which I need to reference later. Example:
DTN-dist_Devissage-1_0006_0,,,,,,
L0,,,,,,
X [mm],Y [mm],W [mm],exx [1] - Lagrange,eyy [1] - Lagrange,exy [1] - Lagrange,Von Mises Strain [1] - Lagrange
-1.13307,-15.0362,-0.00137507,7.74679e-05,8.30045e-05,5.68249e-05,0.00012711
-1.10417,-14.9504,-0.00193334,7.66086e-05,8.02914e-05,5.43132e-05,0.000122655
-1.07528,-14.8647,-0.00249155,7.57493e-05,7.75786e-05,5.18017e-05,0.0001182
Does anyone have a solution to this?
My current code, following an adapted version of the Scilab-help example looks like this (I have tried varying the blocksize and iblock values to include/omit headers:
blocksize=200;
C1=1;
C2=14;
iblock=1
while (%t)
R1=(iblock-1)*blocksize+4;
R2=blocksize+R1-1;
irange=[R1 C1 R2 C2];
V=csvRead(filepath+filename,",",".","",[],"",irange);
iblock=iblock+1
end
Errors
The CSV
A lot's of your problem comes from the inconsistency of the number of coma in your csv file. Opening it in LibreOffice Calc and saving it puts the right number of comma, even on empty lines.
R1
Your current code doesn't position R1 at the beginning of the values. The right formula is
R1=(iblock-1)*(blocksize+blanksize+headersize)+1+headersize;
End of file
Currently your code raise an error and the end of the file because R1 becomes greater than the number of lines. To solve this, you can specify the maximum number of block or test the value of R1 against the number of lines.
Improved solution for much bigger file.
When solving your probem with a big file, two problems were raised :
We need to know the number of blocks or the number of lines
Each call of csvRead is really slow because it process the whole file at each call (1s / block !)
My idea was to read the whole file and store it in a string matrix ( since mgetl as been improved since 6.0.0 ), then use csvTextScan on a submatrix. Doing so also removes the manual writing of the number of block/lines.
The code follows :
clear all
clc
s = filesep()
filepath='.'+s;
filename='DTN_full.csv';
// header is important as it as the image name
headersize=3;
blocksize=200;
C1=1;
C2=14;
iblock=1
// let save everything. Good for the example.
bigstruct = struct();
// Read all the value in one pass
// then using csvTextScan is much more efficient
text = mgetl(filepath+filename);
nlines = size(text,'r');
while ( %t )
mprintf("Block #%d",iblock);
// Lets read the header
R1=(iblock-1)*(headersize+blocksize+1)+1;
R2=R1 + headersize-1;
// if R1 or R1 is bigger than the number of lines, stop
if sum([R1,R2] > nlines )
mprintf('; End of file\n')
break
end
// We use csvTextScan ony on the lines that matters
// speed the program, since csvRead read thge whole file
// every time it is used.
H=csvTextScan(text(R1:R2),",",".","string");
mprintf("; %s",H(1,1))
R1 = R1 + headersize;
R2 = R1 + blocksize-1;
if sum([R1,R2]> nlines )
mprintf('; End of file\n')
break
end
mprintf("; rows %d to %d\n",R1,R2)
// Lets read the values
V=csvTextScan(text(R1:R2),",",".","double");
iblock=iblock+1
// Let save theses data
bigstruct(H(1,1)) = V;
end
and returns
Block #1; DTN-dist_0005_0; rows 4 to 203
....
Block #178; DTN-dist_0710_0; rows 36112 to 36311
Block #179; End of file
Time elapsed 1.827092s

how to set text indices via a variable

i'm trying to implement a simple line highlighting mechanism in my tcl/tk text widget.
For this I would like to assign all characters marked with one tag to another tag.
as in
.window.text insert end "one line\n" line1
.window.text insert end "a chunk spanning\nmultiple lines" line2
.window.text insert end "thats all\n" line3
# get all text that is tagged as 'line2'
set selected [ .window.text tag ranges line2 ]
# and apply the 'highlighed' tag to it:
.window.text tag add highlighted $selected
Unfortunately this does notwork, as it gives me
bad text index "2.0 4.0"
Using the indices literally works fine:
.window.text tag add highlighted 2.0 4.0
But is not what i want. (I don't know anything about the tagged chunks apart from their tag)
So it seems that I cannot store the list of indices in a variable and use that with tag add (or tag remove for that matter).
Any hints how I can add a tag to an already tagged text?
Solution (in Tcl 8.5 and later):
.window.text tag add highlighted {*}$selected
If command A has given you a list of items to feed to command B, but command B expects each item to appear as an argument in its invocation, the list of items needs to be spliced, or expanded into separate arguments. In Tcl 8.5, this was facilitated by introducing a new syntactic rule that allowed the number of arguments provided to a command to be increased by expanding one of the existing arguments.
To borrow an example, the destroy ?window window ...? command cannot work with the list of windows returned by winfo children ., since each window path needs to be a separate argument. Writing
destroy [winfo children .]
would be evaluated as (say) destroy {.foo .bar .baz}, which won't work. However, using the new expansion prefix {*}
destroy {*}[winfo children .]
the line will be evaluated as destroy .foo .bar .baz, which will work.
One way to understand it is by thinking of the invocation as a list consisting of the command name and the arguments, and that the {*} is an instruction to splice the value of the following argument into that list at that point in the list.
Documentation: {*}

How to calculate the number of characters on a line in Vimscript?

I need an attribute or function in Vimscript that does a certain task if the current line contains a certain number of characters. For instance:
if ‹chars_on_current_line› = 50 " for example
... perform task ...
endif
What can I use for the ‹chars_on_current_line› subexpression in Vimscript to get the length of the cursor line in characters?
(If it happens to help in any way, I'm using Macvim.)
To get the number of characters on the current line, use
virtcol('$')
For the number of bytes, use
col('$')