Replacing output text of a command with a string in a Shell Script - html

Hello and thank you for any help you can provide
I have my Apache2 web server set up so that when I go to a specific link, it will run and display the output of a shell script stored on my server. I need to output the results of an SVN command (svn log). If I simply put the command 'svn log -q' (-q for quiet), I get the output of:
(of course not blurred), and with exactly 72 dashes in between each line. I need to be able to take these dashes, and turn them into an html line break, like so:
Basically I need the shell script to take the output of the 'svn log -q' command, search and replace every chunk of 72 dashes with an html line break, and then echo the output.
Is this at all possible?
I'm somewhat a noob at shell scripting, so please excuse any mess-ups.
Thank you so much for your help.

svn log -q | sed -e 's,-{72},<br/>,'

If you want to write it in the script this might help:
${string//substring/replacement}
Replace all matches of $substring with $replacement.
stringZ=abcABC123ABCabc
echo ${stringZ/abc/xyz} # xyzABC123ABCabc
# Replaces first match of 'abc' with 'xyz'.
echo ${stringZ//abc/xyz} # xyzABC123ABCxyz
# Replaces all matches of 'abc' with # 'xyz'.

Related

Getting IPs from a .html file

Their is a site with socks4 proxies online that I use in a proxychains program. Instead of manually entering new IPs in, I was trying to automate the process. I used wget to turn it into a .html file on my home directory, this is some of the output if i cat the file:
</font></a></td><td colspan=1><font class=spy1>111.230.138.177</font> <font class=spy14>(Shenzhen Tencent Computer Systems Company Limited)</font></td><td colspan=1><font class=spy1>6.531</font></td><td colspan=1><TABLE width='13' height='8' CELLPADDING=0 CELLSPACING=0><TR BGCOLOR=blue><TD width=1></TD></TR></TABLE></td><td colspan=1><font class=spy1><acronym title='311 of 436 - last check status=OK'>71% <font class=spy1>(311)</font> <font class=spy5>-</font></acronym></font></td><td colspan=1><font class=spy1><font class=spy14>05-jun-2020</font> 23:06 <font class=spy5>(4 mins ago)</font></font></td></tr><tr class=spy1x onmouseover="this.style.background='#002424'" onmouseout="this.style.background='#19373A'"><td colspan=1><font class=spy14>139.99.104.233<script type="text/javascript">document.write("<font class=spy2>:<\/font>"+(a1j0e5^q7p6)+(m3f6f6^r8c3)+(a1j0e5^q7p6)+(t0b2s9^y5m3)+(w3c3m3^z6j0))</script></font></td><td colspan=1>SOCKS5</td><td colspan=1><a href='/en/anonymous-proxy-list/'><font class=spy1>HIA</font></a></td><td colspan=1><a href='/free-proxy-list/CA/'><font class=spy14>Canada</
As you can see the IP is usually followed by a spy[0-19]> . I tried to parse out the actual IP's with awk using the following code:
awk '/^spy/{FS=">"; print $2 } file-name.html
This is problematic because their would be a bunch of other stuff trailing after the IP, also I guess the anchor on works for the beginning of a line? Anyway I was wondering if anyone could give me any ideas on how to parse out the IP addresses with awk. I just started learning awk, so sorry for the noob question. Thanks
Using a proper XML/HTML parser and a xpath expression:
xidel -se '(//td[#colspan=1]/font[#class="spy1"])[1]/text()' file.html
 Output:
111.230.138.177
Or if it's not all the time the first xpath match:
xidel -se '//td[#colspan=1]/font[#class="spy1"]/text()' file.html |
perl -MRegexp::Common -lne 'print $1 if /($RE{net}{IPv4})/'
AWK is great for hacking IP addresses:
gawk -v RS="spy[0-9]*" '{match($0,/[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}/); ip = substr($0,RSTART,RLENGTH); if (ip) {print ip}}' file.html
Result:
111.230.138.177
139.99.104.233
Explanation.
You must use GAWK if you want the record break to contain a regular expression.
We divide the file into lines containing one IP address using regex in the RS variable.
The match function finds the second regex in the entire line. Regex is 4 groups from 1 to 3 numbers, separated by a dot (the IP address).
Then the substract function retrieves from the entire line ($0) a fragment of RLENGTH length starting from RSTART (the beginning of the searched regex).
IF checks if the result has a value and if so prints it. This protects against empty lines in the result.
This method of hulling IP addresses is independent of the correctness of the file, it does not have to be html.
There's already solutions provided here, I'm rather putting a different one for future readers using egrep utility.
egrep -o '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' file.html

How to only copy lines of code with specific word from perl file into new file

I have a perl file with html code inside of a subroutine. I want to copy the html code into a new file but ONLY the html code, not the rest of the perl syntax. The HTML code is all inside of one subroutine and all the HTML code starts with the 'push':
sub getTable {
push #htmlBase, qq(<html>\n);
push #htmlBase, qq(\n);
push #htmlBase, qq(<head>\n);
push #htmlBase, qq(<meta http-equiv="Content-Language" content="en-us">\n);
In essence, how do I ONLY copy lines that start with 'push' into a new file from my current perl file? Thanks in advance.
If you're using a unix-like OS, try using grep. Something like:
$ grep 'push' myfile.pl | grep -Po '(?<=qq\().*(?=\);)' >Newfile.html
The first grep just grabs lines with 'push' on them. The second grep turns on Perl RE mode (the -P) and only returns matching results. The query has two parts: (?<=qq\() matches "qq(" right before the text (but doesn't include it in the result) and (?=);) looks for the last ");" on the line.
This won't match multi-line quotes and the output will also include escapes, like the \n for newlines.
Using perl to grep the file:
perl -lne'm/push.+qq\((.+)?(\\n)\);/ && print $1' source.pl > target.html
For the output you've shown, this one-liner will work.
If your source script is more complex, e.g. multi-line statements and embedded variables, then you will need to write some temporary code to call getTable, print the contents of #htmlBase, then save that output to the new file.

How to download and then use the file in the same tcl script?

I'm new using Tcl and I have the following script:
proc prepare_xml {pdb_id} {
set filename [exec wget ftp://ftp.ebi.ac.uk/pub/databases/msd/sifts/xml/$pdb_id.xml.gz]
set filename_unzip [exec gunzip "$pdb_id.xml.gz"]
set ready_xml [exec sed -i "/entry /c\<entry>" "$pdb_id.xml"]
return $ready_xml
}
The expected output is the file "filename" uncompress and modified. However, when I execute it the first time, it only downloads the file and it does not uncompress it. If I execute it for a second time, I obtained the expected output and a second copy of the original downloaded file.
Can anyone help me with this? I've tried with after and vwait commands but it doesn't work.
Thank you :)
It's hard to say for sure as you're not describing whether any errors are thrown (that'd be the only reason for the code to not run to completion), but I'd expect something like this to be the right approach:
proc prepare_xml {pdb_id} {
# Double quotes on next line just because of Stack Overflow highlighter
set url "ftp://ftp.ebi.ac.uk/pub/databases/msd/sifts/xml/$pdb_id.xml.gz"
set file $pdb_id.xml
append sedcode {/entry /} "c\\\n" {<entry>}
exec wget -q -O - $url | gunzip -c | sed $sedcode > $file
return $file
}
Firstly, I'm keeping complicated bits in (local) variables to stop the exec line from getting too long. Secondly, I've put all the subprocesses together in the one pipeline. Thirdly, I'm using -q and -O - with wget, and -c with gunzip; look up what they do if you don't understand them. Fourthly, I've put the scriptlet for sed in braces where possible to stop there from being trouble with backslashes, but I've used append and a non-backslashed section to make the pattern because the syntax of c in sed is downright weird (it needs a backslash-newline sequence immediately after on at least some platforms…)
I'd actually use native Tcl code to extract and transform the data if I was doing it for me, but that's a rather larger change.

Extracting CREATE TABLE definitions from MySQL dump?

I have a MySQL dump file over 1 terabyte big. I need to extract the CREATE TABLE statements from it so I can provide the table definitions.
I purchased Hex Editor Neo but I'm kind of disappointed I did. I created a regex CREATE\s+TABLE(.|\s)*?(?=ENGINE=InnoDB) to extract the CREATE TABLE clause, and that seems to be working well testing in NotePad++.
However, the ETA of extracting all instances is over 3 hours, and I cannot even be sure that it is doing it correctly. I don't even know if those lines can be exported when done.
Is there a quick way I can do this on my Ubuntu box using grep or something?
UPDATE
Ran this overnight and output file came blank. I created a smaller subset of data and the procedure is still not working. It works in regex testers however, but grep is not liking it and yielding an empty output. Here is the command I'm running. I'd provide the sample but I don't want to breach confidentiality for my client. It's just a standard MySQL dump.
grep -oP "CREATE\s+TABLE(.|\s)+?(?=ENGINE=InnoDB)" test.txt > plates_schema.txt
UPDATE
It seems to not match on new lines right after the CREATE\s+TABLE part.
You can use Perl for this task... this should be really fast.
Perl's .. (range) operator is stateful - it remembers state between evaluations.
What it means is: if your definition of table starts with CREATE TABLE and ends with something like ENGINE=InnoDB DEFAULT CHARSET=utf8; then below will do what you want.
perl -ne 'print if /CREATE TABLE/../ENGINE=InnoDB/' INPUT_FILE.sql > OUTPUT_FILE.sql
EDIT:
Since you are working with a really large file and would probably like to know the progress, pv can give you this also:
pv INPUT_FILE.sql | perl -ne 'print if /CREATE TABLE/../ENGINE=InnoDB/' > OUTPUT_FILE.sql
This will show you progress bar, speed and ETA.
You can use the following:
grep -ioP "^CREATE\s+TABLE[\s\S]*?(?=ENGINE=InnoDB)" file.txt > output.txt
If you can run mysqldump again, simply add --no-data.
Got it! grep does not support matching across multiple lines. I found this question helpul and I ended up using pcregrep instead.
pcregrep -M "CREATE\s+TABLE(.|\n|\s)+?(?=ENGINE=InnoDB)" test.txt > plates.schema.txt

Tab Delimites to CSV

For a data mining project I need to convert 80 tab delimited files(100 MB each) to CSV files. Anybody is aware of some tools that can be handy in this case.
Download python: https://www.python.org/downloads/
Install it.
And run a script similar to the following.
Save the following as convert_tsv_to_csv.py Or anything ending in .py:
import csv
with open('C:\\path\to\file','r') as f:
tab_file = csv.reader(f, dialect=csv.excel_tab)
with open('C:\path\to\outfile.csv','w') as g:
comma_file = csv.writer(g, dialect=csv.excel)
for row in tab_file:
comma_file.writerow(row)
Change the paths and run it like: python convert_tsv_to_csv.py
The basic idea:
If the files are big, read them line by line.
Learn your basic tools.
On any UNIX/Linux/OSX system, the following commands each should do the trick:
sed -i -e 's/\t/,/g' *.csv
perl -i -p -e 's/\t/,/g' *.csv
These perform the basic tab to comma substitution. They won't take care of things like quoting and escaping if your data contains columns with a tabular or comma, or chaning the file name for you! Note that the syntax of sed and perl are very similar... -i is inplace editing, -e is execute a command, s/// is the syntax for regular expression substitutions. Etc.
Either way, your basic unix tools for this job are
extremely fast (the "stream editor" sed is well optimized, low-level C code)
handy (just some 10 keypresses!)
easy to use, once you've learned the basics (i.e. read the manual)