Regular Expression: adding bookmark html tags depending on various range of numbers - html

I have some html pages with numbers of verses like:
verses 2-5
verses 11-15
verses 21-23
I need to add for each number a code before the word "verses"
to be
<a name="2"></a><a name="3"></a><a name="4"></a><a name="5"></a>verses 2-5
etc.
So it takes the range of the numbers given, and before the beginning it adds:
<a name=""></a>
for each number in the range..
I use notepad++ to search and replace.

You're going to need a script to do this. I whipped up a simple Ruby script to do it. Used it on your sample text, got your output. Just download Ruby, paste this into a file in the directory of that text, and replace the verses.txt line with whatever your file name is. Then run it from the command line like: ruby ./script.rb
d = File.read('./verses.txt')
c = d[0..d.length]
c.scan(/(verses\s+\d+-\d+)/) do |n|
n.each do |a|
a.scan(/(\d+-\d+)/) do |nums|
z = nums.to_s.split(/-/)
st=''
in1 = z[0].gsub(/\["/, '').to_i
in2 = z[1].chomp("\"]").to_i
(in1..in2).each do |index|
st += "<a name=\"#{index}\"></a>"
end
b = st + a;
d.gsub!(a, b)
end
end
end
puts d
f = File.new('verses2.txt', "w")
f.write(d)
Per your request, here is a modification that will overwrite the opened file and run on all files in a directory. For ease, I won't do directory entry, so place the script in the directory of all the files to run it. Here goes:
Dir.entries('.').each do |entry|
entry.scan(/.*.html/) do
|fn|
d = File.read('./' + fn.to_s)
c = d[0..d.length]
c.scan(/(verses\s+\d+-\d+)/) do |n|
n.each do |a|
a.scan(/(\d+-\d+)/) do |nums|
z = nums.to_s.split(/-/)
st=''
in1 = z[0].gsub(/\["/, '').to_i
in2 = z[1].chomp("\"]").to_i
(in1..in2).each do |index|
st += "<a name=\"#{index}\"></a>"
end
b = st + a;
d.gsub!(a, b)
end
end
end
puts d
f = File.new('./' + fn.to_s, "w")
f.write(d)
end
end
I'll think about how to do the arabic encodings. This will run on all text files, if they have different extensions or have a similar name, let me know and I'll update the script.
This should fully work, just tested it. Let me know if there are issues.

You can do it for 2-digit verses 10 to 99 like this:
Search: verses (\d)(\d)-
Replace: <a name="$1">verses $1$2-</a>
For 3+ digit numbers, add another group for the extra digit(s) and treat similarly.
This extra complication is required because notepad++ doesn't support look-aheads AFAIK.

Related

For loop that stores variable in progressive order

I am trying to work on a matlab script that calculates a 1x1854 matrix called N2. This routine has to be performed 1000 times because each iteration the input data files are different. I am trying to store the matrix N2 in progressive order for each iteration, like N2_1, N2_2 ecc. How should implement that?
for ii=1:1000
file1 = load(['/Users/gianmarcobroilo/Desktop/1000shifts/delays/GRV_JUGR_2021158_1648X35X35001KV03.NEWFES_delay_' num2str(ii) '.TXT']);
file2 = load(['/Users/gianmarcobroilo/Desktop/1000shifts/delays/GRV_JUGR_2021158_1648X35K35001KV03.NEWFES_delay_' num2str(ii) '.TXT']);
%%calculations...
[N,bind] = elecdensity(omega_new,closestapproach);
%
N2_num2str(ii) = N./1e6;
end
To generate those variables, change the code line
N2_num2str(ii) = N./1e6;
to
eval(['N2_' num2str(ii) '= ' 'N./1e6']);
This might be computational too expensive. Another approach I will use to avoid the usage of the "eval" command is to save the tables in a structure and each field of it will be the matrix (named N_NUMBER). So, the code will be
% Generate the struct object
myValues = struct;
% Start the for loops
for ii=1:1000
file1 = load(['/Users/gianmarcobroilo/Desktop/1000shifts/delays/GRV_JUGR_2021158_1648X35X35001KV03.NEWFES_delay_' num2str(ii) '.TXT']);
file2 = load(['/Users/gianmarcobroilo/Desktop/1000shifts/delays/GRV_JUGR_2021158_1648X35K35001KV03.NEWFES_delay_' num2str(ii) '.TXT']);
%%calculations...
[N,bind] = elecdensity(omega_new,closestapproach);
%
fieldName = ['N2_' num2str(ii)];
myValues.(fieldName) = N./1e6;
end
% Print the table 54
myValues.N2_54

How can I set the numbering of the x-axis of an Octave plot to engineering notation?

I made a very simple Octave script
a = [10e6, 11e6, 12e6];
b = [10, 11, 12];
plot(a, b, 'rd-')
which outputs the following graph.
Graph
Is it possible to set the numbering on the x-axis to engineering notation, rather than scientific, and have it display "10.5e+6, 11e+6, 11.5e+6" instead of "1.05e+7, 1.1e+7, 1.15+e7"?
While octave provides a 'short eng' formatting option, which does what you're asking for in terms of printing to the terminal, it does not appear to provide this functionality in plots or when formatting strings via sprintf.
Therefore you'll have to find a way to do this by yourself, with some creative string processing of the initial xticks, and substituting the plot's ticklabels accordingly. Thankfully it's not that hard :)
Using your example:
a = [10e6, 11e6, 12e6];
b = [10, 11, 12];
plot(a, b, 'rd-')
format short eng % display stdout in engineering format
TickLabels = disp( xticks ) % collect string as it would be displayed on the stdout
TickLabels = strsplit( TickLabels ) % tokenize at spaces
TickLabels = TickLabels( 2 : end - 1 ) % discard start and end empty tokens
TickLabels = regexprep( TickLabels, '\.0+e', 'e' ) % remove purely zero decimals using a regular expression
TickLabels = regexprep( TickLabels, '(\.[1-9]*)0+e', '$1e' ) % remove non-significant zeros in non-zero decimals using a regular expression
xticklabels( TickLabels ) % set the new ticklabels to the plot
format % reset short eng format back to default, if necessary

How does the 'k' modifier in FINDC() work in SAS?

I'm reading through the book, "SAS Functions by Example - Second Edition" and having trouble trying to understand a certain function due to the example and output they get.
Function: FINDC
Purpose: To locate a character that appears or does not appear within a string. With optional arguments, you can define the starting point for the search, set the direction of the search, ignore case or trailing blanks, or look for characters except the ones listed.
Syntax: FINDC(character-value, find-characters <,'modifiers'> <,start>)
Two of the modifiers are i and k:
i ignore case
k count only characters that are not in the list of find-characters
So now one of the examples has this:
Note: STRING1 = "Apples and Books"
FINDC(STRING1,"aple",'ki')
For the Output, they said it returns 1 because the position of "A" in Apple. However this is what confuses me, because I thought the k modifier says to find characters that are not in the find-characters list. So why is it searching for a when the letter "A", case-ignored, is in the find-characters list. To me, I feel like this example should output 6 for the "s" in Apples.
Is anyone able to help explain the k modifier to me any better, and why the output for this answer is 1 instead of 6?
Edit 1
Reading the SAS documentation online, I found this example which seems to contradict the book I'm reading:
Example 3: Searching for Characters and Using the K Modifier
This example searches a character string and returns the characters that do
not appear in the character list.
data _null_;
string = 'Hi, ho!';
charlist = 'hi';
j = 0;
do until (j = 0);
j = findc(string, charlist, "k", j+1);
if j = 0 then put +3 "That's all";
else do;
c = substr(string, j, 1);
put +3 j= c=;
end;
end;
run;
SAS writes the following output to the log:
j=1 c=H
j=3 c=,
j=4 c=
j=6 c=o
j=7 c=!
That's all
So, is the book wrong?
The book is wrong.
511 data _null_;
512 STRING1 = "Apples and Books" ;
513 x=FINDC(STRING1,"aple",'ki');
514 put x=;
515 if x then do;
516 ch=char(string1,x);
517 put ch=;
518 end;
519 run;
x=6
ch=s

octave/matlab read text file line by line and save only numbers into matrix

I have a question regarding octave or matlab data post processing.
I have files exported from fluent like below:
"Surface Integral Report"
Mass-Weighted Average
Static Temperature (k)
crossplane-x-0.001 1242.9402
crossplane-x-0.025 1243.0017
crossplane-x-0.050 1243.2036
crossplane-x-0.075 1243.5321
crossplane-x-0.100 1243.9176
And I want to use octave/matlab for post processing.
If I read first line by line, and save only the lines with "crossplane-x-" into a new file, or directly save the data in those lines into a matrix. Since I have many similar files, I can make plots by just calling their titles.
But I go trouble on identify lines which contain the char "crossplane-x-". I am trying to do things like this:
clear, clean, clc;
% open a file and read line by line
fid = fopen ("h20H22_alongHGpath_temp.dat");
% save full lines into a new file if only chars inside
txtread = fgetl (fid)
num_of_lines = fskipl(fid, Inf);
char = 'crossplane-x-'
for i=1:num_of_lines,
if char in fgetl(fid)
[x, nx] = fscanf(fid);
print x
endif
endfor
fclose (fid);
Would anybody shed some light on this issue ? Am I using the right function ? Thank you.
Here's a quick way for your specific file:
>> S = fileread("myfile.dat"); % collect file contents into string
>> C = strsplit(S, "crossplane-x-"); % first cell is the header, rest is data
>> M = str2num (strcat (C{2:end})) % concatenate datastrings, convert to numbers
M =
1.0000e-03 1.2429e+03
2.5000e-02 1.2430e+03
5.0000e-02 1.2432e+03
7.5000e-02 1.2435e+03
1.0000e-01 1.2439e+03

LZW Compression In Lua

Here is the Pseudocode for Lempel-Ziv-Welch Compression.
pattern = get input character
while ( not end-of-file ) {
K = get input character
if ( <<pattern, K>> is NOT in
the string table ){
output the code for pattern
add <<pattern, K>> to the string table
pattern = K
}
else { pattern = <<pattern, K>> }
}
output the code for pattern
output EOF_CODE
I am trying to code this in Lua, but it is not really working. Here is the code I modeled after an LZW function in Python, but I am getting an "attempt to call a string value" error on line 8.
function compress(uncompressed)
local dict_size = 256
local dictionary = {}
w = ""
result = {}
for c in uncompressed do
-- while c is in the function compress
local wc = w + c
if dictionary[wc] == true then
w = wc
else
dictionary[w] = ""
-- Add wc to the dictionary.
dictionary[wc] = dict_size
dict_size = dict_size + 1
w = c
end
-- Output the code for w.
if w then
dictionary[w] = ""
end
end
return dictionary
end
compressed = compress('TOBEORNOTTOBEORTOBEORNOT')
print (compressed)
I would really like some help either getting my code to run, or helping me code the LZW compression in Lua. Thank you so much!
Assuming uncompressed is a string, you'll need to use something like this to iterate over it:
for i = 1, #uncompressed do
local c = string.sub(uncompressed, i, i)
-- etc
end
There's another issue on line 10; .. is used for string concatenation in Lua, so this line should be local wc = w .. c.
You may also want to read this with regard to the performance of string concatenation. Long story short, it's often more efficient to keep each element in a table and return it with table.concat().
You should also take a look here to download the source for a high-performance LZW compression algorithm in Lua...