How to add text to the end of various digits using sed command? - html

I am trying to replace page=#" in various html files with page=#/index.html". I have tried using the command:
sed -i -re 's|"(page=[0-9]+)"|"\1/index.html"|' *.html
along with numerous interpretations but have not been successful. The first part of the code sed -i -re 's|"(page=[0-9]+)"| seems to be working properly but I cannot seem to format the end to achieve my goal. Any suggestions to modify this command would be greatly appreciated!

If you're trying to replace page=#" where the actual strings look like page=99", then the first double quote in the RE isn't going to match anything correctly. It would only match if it looks like:
"page=99"
But I'm guessing this is at the end of a link in html so it probably does not have the initial double quote. This should work instead:
`sed -i -re 's|(page=[0-9]+)"|\1/index.html"|' *.html
Also to confirm, if you're on OS X, you can't use the GNU option -r or use -i without an argument, so it would look like this:
`sed -i '' -Ee 's|(page=[0-9]+)"|\1/index.html"|' *.html
-E means to use Extended Regular Expressions so you can write ( instead of \( for grouping. In GNU sed this is -r.
-i means to edit the files in-place, on GNU it can take no argument, but on other systems you need to pass the extension to make for a backup, or '' for no backup.

Related

How can I replace everything after a string using Bash?

I have a Perl script that uses some local variables as per below:
my $cool_variable="Initial value";
COOLVAR="Initial value for COOLVAR"
I would like to replace the content between the quotes using a bash script.
I got it to work for a non-variable like below:
#!/bin/sh
dummy_var="Replaced value"
sed -i -r "s#^(COOLVAR=).*#\1$dummy_var#" perlscript.pl
But if I replace it with cool_variable or $cool_variable:
sed -i -r "s#^($cool_variable=).*#\1$dummy_var#" perlscript.pl
It does not work..
The are multiple code injection bugs in that snippet. You shouldn't be generating code from the shell or sed.
Say you have
var=COOLVAR
val=coolval
As per How can I process options using Perl in -n or -p mode?, you can use any of
perl -spe's{^$var=\K.*}{"\Q$val\E";};' -- -var="$var" -val="$val" perlscript.pl
var=var val=val perl -pe's{^$ENV{var}=\K.*}{"\Q$ENV{val}\E";};' perlscript.pl
export var
export val
perl -pe's{^$ENV{var}=\K.*}{"\Q$ENV{val}\E";};' perlscript.pl
to transform
COOLVAR="dummy";
HOTVAR="dummy";
into
COOLVAR="coolvar";
HOTVAR="dummy";
The values are passed to the program using arguments to avoid injecting them into the fixer, and the fixer uses Perl's quotemeta (aka \Q..\E) to quote special characters.
Note that $var is assumed to be a valid identifier. No validation checks are performed. This program is absolutely unsafe using untrusted input.
Use -i to modify the file in place.

Find and replace text in JSON with sed [duplicate]

I am trying to change the values in a text file using sed in a Bash script with the line,
sed 's/draw($prev_number;n_)/draw($number;n_)/g' file.txt > tmp
This will be in a for loop. Why is it not working?
Variables inside ' don't get substituted in Bash. To get string substitution (or interpolation, if you're familiar with Perl) you would need to change it to use double quotes " instead of the single quotes:
# Enclose the entire expression in double quotes
$ sed "s/draw($prev_number;n_)/draw($number;n_)/g" file.txt > tmp
# Or, concatenate strings with only variables inside double quotes
# This would restrict expansion to the relevant portion
# and prevent accidental expansion for !, backticks, etc.
$ sed 's/draw('"$prev_number"';n_)/draw('"$number"';n_)/g' file.txt > tmp
# A variable cannot contain arbitrary characters
# See link in the further reading section for details
$ a='foo
bar'
$ echo 'baz' | sed 's/baz/'"$a"'/g'
sed: -e expression #1, char 9: unterminated `s' command
Further Reading:
Difference between single and double quotes in Bash
Is it possible to escape regex metacharacters reliably with sed
Using different delimiters for sed substitute command
Unless you need it in a different file you can use the -i flag to change the file in place
Variables within single quotes are not expanded, but within double quotes they are. Use double quotes in this case.
sed "s/draw($prev_number;n_)/draw($number;n_)/g" file.txt > tmp
You could also make it work with eval, but don’t do that!!
This may help:
sed "s/draw($prev_number;n_)/draw($number;n_)/g"
You can use variables like below. Like here, I wanted to replace hostname i.e., a system variable in the file. I am looking for string look.me and replacing that whole line with look.me=<system_name>
sed -i "s/.*look.me.*/look.me=`hostname`/"
You can also store your system value in another variable and can use that variable for substitution.
host_var=`hostname`
sed -i "s/.*look.me.*/look.me=$host_var/"
Input file:
look.me=demonic
Output of file (assuming my system name is prod-cfm-frontend-1-usa-central-1):
look.me=prod-cfm-frontend-1-usa-central-1
I needed to input github tags from my release within github actions. So that on release it will automatically package up and push code to artifactory.
Here is how I did it. :)
- name: Invoke build
run: |
# Gets the Tag number from the release
TAGNUMBER=$(echo $GITHUB_REF | cut -d / -f 3)
# Setups a string to be used by sed
FINDANDREPLACE='s/${GITHUBACTIONSTAG}/'$(echo $TAGNUMBER)/
# Updates the setup.cfg file within version number
sed -i $FINDANDREPLACE setup.cfg
# Installs prerequisites and pushes
pip install -r requirements-dev.txt
invoke build
Retrospectively I wish I did this in python with tests. However it was fun todo some bash.
Another variant, using printf:
SED_EXPR="$(printf -- 's/draw(%s;n_)/draw(%s;n_)/g' $prev_number $number)"
sed "${SED_EXPR}" file.txt
or in one line:
sed "$(printf -- 's/draw(%s;n_)/draw(%s;n_)/g' $prev_number $number)" file.txt
Using printf to build the replacement expression should be safe against all kinds of weird things, which is why I like this variant.

How to download and then use the file in the same tcl script?

I'm new using Tcl and I have the following script:
proc prepare_xml {pdb_id} {
set filename [exec wget ftp://ftp.ebi.ac.uk/pub/databases/msd/sifts/xml/$pdb_id.xml.gz]
set filename_unzip [exec gunzip "$pdb_id.xml.gz"]
set ready_xml [exec sed -i "/entry /c\<entry>" "$pdb_id.xml"]
return $ready_xml
}
The expected output is the file "filename" uncompress and modified. However, when I execute it the first time, it only downloads the file and it does not uncompress it. If I execute it for a second time, I obtained the expected output and a second copy of the original downloaded file.
Can anyone help me with this? I've tried with after and vwait commands but it doesn't work.
Thank you :)
It's hard to say for sure as you're not describing whether any errors are thrown (that'd be the only reason for the code to not run to completion), but I'd expect something like this to be the right approach:
proc prepare_xml {pdb_id} {
# Double quotes on next line just because of Stack Overflow highlighter
set url "ftp://ftp.ebi.ac.uk/pub/databases/msd/sifts/xml/$pdb_id.xml.gz"
set file $pdb_id.xml
append sedcode {/entry /} "c\\\n" {<entry>}
exec wget -q -O - $url | gunzip -c | sed $sedcode > $file
return $file
}
Firstly, I'm keeping complicated bits in (local) variables to stop the exec line from getting too long. Secondly, I've put all the subprocesses together in the one pipeline. Thirdly, I'm using -q and -O - with wget, and -c with gunzip; look up what they do if you don't understand them. Fourthly, I've put the scriptlet for sed in braces where possible to stop there from being trouble with backslashes, but I've used append and a non-backslashed section to make the pattern because the syntax of c in sed is downright weird (it needs a backslash-newline sequence immediately after on at least some platforms…)
I'd actually use native Tcl code to extract and transform the data if I was doing it for me, but that's a rather larger change.

Tab Delimites to CSV

For a data mining project I need to convert 80 tab delimited files(100 MB each) to CSV files. Anybody is aware of some tools that can be handy in this case.
Download python: https://www.python.org/downloads/
Install it.
And run a script similar to the following.
Save the following as convert_tsv_to_csv.py Or anything ending in .py:
import csv
with open('C:\\path\to\file','r') as f:
tab_file = csv.reader(f, dialect=csv.excel_tab)
with open('C:\path\to\outfile.csv','w') as g:
comma_file = csv.writer(g, dialect=csv.excel)
for row in tab_file:
comma_file.writerow(row)
Change the paths and run it like: python convert_tsv_to_csv.py
The basic idea:
If the files are big, read them line by line.
Learn your basic tools.
On any UNIX/Linux/OSX system, the following commands each should do the trick:
sed -i -e 's/\t/,/g' *.csv
perl -i -p -e 's/\t/,/g' *.csv
These perform the basic tab to comma substitution. They won't take care of things like quoting and escaping if your data contains columns with a tabular or comma, or chaning the file name for you! Note that the syntax of sed and perl are very similar... -i is inplace editing, -e is execute a command, s/// is the syntax for regular expression substitutions. Etc.
Either way, your basic unix tools for this job are
extremely fast (the "stream editor" sed is well optimized, low-level C code)
handy (just some 10 keypresses!)
easy to use, once you've learned the basics (i.e. read the manual)

How to Convert Regex Pattern Match to Lowercase for URL Standardization/Tidying

I am currently trying to convert all links and files and tags on my site from UPPERCASE.ext and CamelCase.ext to lowercase.ext.
I can match the links in pages using a regular expression match for href="[^"]*" and src="[^"]*"
This seems to work fine for identifying the link and images in the HTML.
However what I need to do with this is to take the match and run a ToLowercase() function on the matches. Since I have a lot of pages that I'd like to parse through, I'm looking to make a short shell script that will run on a specified directory and pattern match the specified regexes and perform a lowercase operation on them.
Perl one-liner to rename all regular files to lowercase:
perl -le 'use File::Find; find({wanted=>sub{-f && rename($_, lc)}}, "/path/to/files");'
If you want to be more specific about what files are renamed you could change -f to a regex or something:
perl -le 'use File::Find; find({wanted=>sub{/\.(txt|htm|blah)$/i && rename($_, lc)}}, "/path/to/files");'
EDIT: Sorry, after rereading the question I see you also want to replace occurrences within files as well:
find /path/to/files -name "*.html" -exec perl -pi -e 's/\b(src|href)="(.+)"/$1="\L$2"/gi;' {} \;
EDIT 2: Try this one as the find command uses + instead of \; which is more efficient since multiple files are passed to perl at once (thanks to #ikegami from another post). It also It also handles both ' and " around the URL. Finally, it uses {} instead of // for substitutions since you are substituting URLs (maybe the /s in the URL are confusing perl or your shell?). It shouldn't matter, and I tried both on my system with the same effect (both worked fine), but it's worth a shot:
find . -name "*.html" -exec perl -pi -e \
'$q=qr/"|\x39/; s{\b(src|href)=($q?.+$q?)\b}{$1=\L$2}gi;' {} +
PS: I also have a Macbook and tested these using bash shell with Perl versions 5.8.9 and 5.10.0.
With bash, you can declare a variable to only hold lower case values:
declare -l varname
read varname <<< "This Is LOWERCASE"
echo $varname # ==> this is lowercase
Or, you can convert a value to lowercase (bash version 4, I think)
x="This Is LOWERCASE"
echo ${x,,} # ==> this is lowercase
you want this?
kent$ echo "aBcDEF"|sed 's/.*/\L&/g'
abcdef
or this
kent$ echo "aBcDEF"|awk '$0=tolower($0)'
abcdef
with your own regex:
kent$ echo 'FOO src="htTP://wWw.GOOGLE.CoM" BAR BlahBlah'|sed -r 's/src="[^"]*"/\L&/g'
FOO src="http://www.google.com" BAR BlahBlah
You could use sed with -i (in-place edit):
sed -i'' -re's/(href|src)="[^"]*"/\L&/g' /path/to/files/*