How to reformat file with sed/vim? - csv

I have a .csv file that looks like this.
atomnum,atominfo,metric
238,A-30-CYS-SG,53.7723
889,A-115-CYS-SG,46.2914
724,A-94-CYS-SG,44.6405
48,A-6-CYS-SG,37.2108
630,A-80-CYS-SG,29.574
513,A-64-CYS-SG,23.1925
981,A-127-CYS-SG,19.8903
325,A-41-GLN-OE1,17.6205
601,A-76-CYS-SG,17.5079
I want to change it like this:
atomnum,atominfo,metric
238,C30-SG,53.7723
889,C115-SG,46.2914
724,C94-SG,44.6405
48,C6-SG,37.2108
630,C80-SG,29.574
513,C64-SG,23.1925
981,C127-SG,19.8903
325,Q41-OE1,17.6205
601,C76-SG,17.5079
The part between the commas is an atom identifier: where A-30-CYS-SG is the gamma sulfur of the residue 30, which is a cysteine, in chain A. Residues can be represented with three letters or just one (Table here https://www.iupac.org/publications/pac-2007/1972/pdf/3104x0639.pdf). Basically, I just want to a) change the three letter code to the one letter code, b) remove the chain id (A in this case) and c) put the residue number next to the one letter code.
I've tried matching the patterns between the commas within vim. Something like s%:\(-\d\+\-\)\(\u\+\):\2\1:g gives me c) i.e. (ACYS-30--SG). I do not know how to do a) with vim. I know how to do it with sed and an input file with all the substitute commands in it. But then maybe is better to do all the work with sed... I am asking if is it possible to do a) on vim?
Thanks

This might work for you (GNU sed):
sed -r '1b;s/$/\n:ALAA:ARGR:ASNN:ASPD:CYSC:GLUE:GLNQ:GLYG:HISH:ILEI:LEUL:LYSK:METM:PHEF:PROP:SERS:THRT:TRPW:TYRY:VALV/;s/,A-([0-9]+)-(...)(.*)\n.*:\2(.).*/,\4\1\3/' file
Append a lookup table to each line and use pattern matching to substitute a 3 letter code (and integer value) for a 1 letter code. The lookup key is a colon, followed by the 3 letter key, followed by the 1 letter code.

Using sed, paste, cut, & bash, given input atoms.csv:
paste -d, <(cut -d, -f1 atoms.csv) \
<(cut -d, -f2 atoms.csv | sed 's/.-//
s/\(.*\)-\([A-Z]\{3\}\)-/\2\1-/
s/^ALA/A/
s/^ARG/R/
s/^ASN/N/
s/^ASP/D/
s/^CYS/C/
s/^GLU/E/
s/^GLN/Q/
s/^GLY/G/
s/^HIS/H/
s/^ILE/I/
s/^LEU/L/
s/^LYS/K/
s/^MET/M/
s/^PHE/F/
s/^PRO/P/
s/^SER/S/
s/^THR/T/
s/^TRP/W/
s/^TYR/Y/
s/^VAL/V/') \
<(cut -d, -f3 atoms.csv)
Output:
atomnum,atominfo,metric
238,C30-SG,53.7723
889,C115-SG,46.2914
724,C94-SG,44.6405
48,C6-SG,37.2108
630,C80-SG,29.574
513,C64-SG,23.1925
981,C127-SG,19.8903
325,Q41-OE1,17.6205
601,C76-SG,17.5079

If you know how to do it in sed why not leverage that knowledge and simply call out from Vim?
:%!sed -e '<your sed script>'
Once you done that and it works you can pop it in a Vim function.
functioni Transform()
your sed command
endfunction
and then just use
:call Transform()
which you can map to a key.
Simples!

Related

Remove specific tag with its contents using sed

I would like to remove following tag from HTML including its constantly varying contents:
<span class="the_class_name">li4tuq734g23r74r7Whatever</span>
A following BASH script
.... | sed -e :a -re 's/<span class="the_class_name"/>.*</span>//g' > "$NewFile"
ends with error
sed: -e expression #2, char XX: unknown option to `s'
I tried to escape quotes, slashes and "less than" symbols in various combinations and still get this error.
I suggest using a different sed separator than / when / is contained within the thing you want to match on. Also, prefer -E instead of -r for extended regex to be Posix compatible. Also note that you have a / in your first span in your regex that doesn't belong there.
Also, .* will make it overly greedy and eat up any </span> that follows the first </span> on the line. It's better to match on [^<]*. That is, any character that is not <.
sed -E 's,<span class="the_class_name">[^<]*</span>,,g'
A better option is of course to use a HTML parser for this.

Find and replace text from a large txt file on ubuntu

I want to replace "}{" by "},{" to make a large txt file into valid json. Need help !!!
This can be achieved with sed:
sed -i 's/}{/},{/g' filename
sed is the command, -i implies that the changes have to made to a file, which name you're giving at the end (and you should change filename, of course).
The substitution part starts with the s, between the first //, you set what you want to replace, between the last //, you set what you want instead. The g at the end makes sure that this search/replace is not only executed once, but as long as sed finds matches.
If you have any newlines present after the }, you can simply remove them all, you'll still get a valid JSON afterwards:
cat filename | tr -d '\n' | sed 's/}{/},{/g' >newfilename
This would simply delete all newlines (\n) and pass it to the command. It will create a new file, though.

Find and replace text in JSON with sed [duplicate]

I am trying to change the values in a text file using sed in a Bash script with the line,
sed 's/draw($prev_number;n_)/draw($number;n_)/g' file.txt > tmp
This will be in a for loop. Why is it not working?
Variables inside ' don't get substituted in Bash. To get string substitution (or interpolation, if you're familiar with Perl) you would need to change it to use double quotes " instead of the single quotes:
# Enclose the entire expression in double quotes
$ sed "s/draw($prev_number;n_)/draw($number;n_)/g" file.txt > tmp
# Or, concatenate strings with only variables inside double quotes
# This would restrict expansion to the relevant portion
# and prevent accidental expansion for !, backticks, etc.
$ sed 's/draw('"$prev_number"';n_)/draw('"$number"';n_)/g' file.txt > tmp
# A variable cannot contain arbitrary characters
# See link in the further reading section for details
$ a='foo
bar'
$ echo 'baz' | sed 's/baz/'"$a"'/g'
sed: -e expression #1, char 9: unterminated `s' command
Further Reading:
Difference between single and double quotes in Bash
Is it possible to escape regex metacharacters reliably with sed
Using different delimiters for sed substitute command
Unless you need it in a different file you can use the -i flag to change the file in place
Variables within single quotes are not expanded, but within double quotes they are. Use double quotes in this case.
sed "s/draw($prev_number;n_)/draw($number;n_)/g" file.txt > tmp
You could also make it work with eval, but don’t do that!!
This may help:
sed "s/draw($prev_number;n_)/draw($number;n_)/g"
You can use variables like below. Like here, I wanted to replace hostname i.e., a system variable in the file. I am looking for string look.me and replacing that whole line with look.me=<system_name>
sed -i "s/.*look.me.*/look.me=`hostname`/"
You can also store your system value in another variable and can use that variable for substitution.
host_var=`hostname`
sed -i "s/.*look.me.*/look.me=$host_var/"
Input file:
look.me=demonic
Output of file (assuming my system name is prod-cfm-frontend-1-usa-central-1):
look.me=prod-cfm-frontend-1-usa-central-1
I needed to input github tags from my release within github actions. So that on release it will automatically package up and push code to artifactory.
Here is how I did it. :)
- name: Invoke build
run: |
# Gets the Tag number from the release
TAGNUMBER=$(echo $GITHUB_REF | cut -d / -f 3)
# Setups a string to be used by sed
FINDANDREPLACE='s/${GITHUBACTIONSTAG}/'$(echo $TAGNUMBER)/
# Updates the setup.cfg file within version number
sed -i $FINDANDREPLACE setup.cfg
# Installs prerequisites and pushes
pip install -r requirements-dev.txt
invoke build
Retrospectively I wish I did this in python with tests. However it was fun todo some bash.
Another variant, using printf:
SED_EXPR="$(printf -- 's/draw(%s;n_)/draw(%s;n_)/g' $prev_number $number)"
sed "${SED_EXPR}" file.txt
or in one line:
sed "$(printf -- 's/draw(%s;n_)/draw(%s;n_)/g' $prev_number $number)" file.txt
Using printf to build the replacement expression should be safe against all kinds of weird things, which is why I like this variant.

CSV first letter of line moved to end of field

I have a CSV file that I would like to move the first letter to the end of the first string and insert an underscore in front of the last two characters. I can't find anything on how to move a letter over with sed. Here is my example CSV:
name,number,number1,status,mode
B9AT0582B41,430,30,0,Loop
B8AU0302D11,448,0,0,Loop
B8AU0302D21,448,0,0,Loop
B8AU0302D31,448,0,0,Loop
B8AU0302D41,448,0,0,Loop
For example, the B9AT0582B41, I want it to be 9AT0582B_41B.
It needs to do this for each line and not change the state of the other CSV values.
I am open to forms other than sed.
In awk:
$ awk -F, -v OFS=, \
'NR > 1 { $1 = substr($1, 2, 8) "_" substr($1, 10) substr($1, 1, 1) } 1' infile
name,number,number1,status,mode
9AT0582B_41B,430,30,0,Loop
8AU0302D_11B,448,0,0,Loop
8AU0302D_21B,448,0,0,Loop
8AU0302D_31B,448,0,0,Loop
8AU0302D_41B,448,0,0,Loop
This sets input and output field separator to ,; then, for each line (except the first one) rearranges the first field (three calls to substr), then prints the line (the 1 at the end).
Or sed, a bit shorter:
sed -E '2,$s/^(.)([^,]*)([^,]{2})/\2_\3\1/' infile
This captures the first letter of each line (for lines 2 and up) in capture group 1, then everything up to two characters before the first comma in capture group 2 and the last two characters before the comma in capture group 3. The substitution then swaps and adds the underscore.
Here's my take on this.
$ sed -E 's/(.)(.{8})([^,]*)(.*)/\2_\3\1\4/' <<<"B9AT0582B41,430,30,0,Loop"
9AT0582B_41B,430,30,0,Loop
This uses an extended regular expression to make things easier to read. Sed's -E option causes the RE to be interpreted in extended notation. If your version of sed doesn't support this, check your man page to see if there's another option that does the same thing, or you can try to use BRE notation:
$ sed 's/\(.\)\(.\{8\}\)\([^,]*\)\(.*\)/\2_\3\1\4/' <<<"B9AT0582B41,430,30,0,Loop"
9AT0582B_41B,430,30,0,Loop

Find a string between 2 other strings in document

I have found a ton of solutions do do what I want with only one exception.
I need to search a .html document and pull a string.
The line containing the string will look like this (1 line, no newlines)
<script type="text/javascript">g_initHeader(0);LiveSearch.attach(ge('oh2345v5ks'));var _ = g_items;_[60]={icon:'INV_Chest_Leather_09',name_enus:'Layered Tunic'};_[6076]={icon:'INV_Pants_11',name_enus:'Tapered Pants'};_[3070]={icon:'INV_Misc_Cape_01',name_enus:'Ensign Cloak'};</script>
The text I need to get is
INV_CHEST_LEATHER_09
When I use awk, grep, and sed, I extract the data between icon:' and ',name_
The problem is, all three of these scripts scan the entire line and use the last occurring ',name_ thus I end up with
INV_Chest_Leather_09',name_enus:'Layered
Tunic'};_[6076]={icon:'INV_Pants_11',name_enus:'Tapered
Pants'};_[3070]={icon:'INV_Misc_Cape_01
Here's the last one I tried
grep -Po -m 1 "(?<=]={icon:').*(?=',name_)"
I've tried awk and sed too, and I don't really have a preference of which one to use.
So basically, I need to search the entire html file, find the first occurrence of icon:', extract the text right after it until the first occurrence after icon:' of ',name_.
With GNU awk for the 3rd arg to match():
$ awk 'match($0,/icon:\047([^\047]+)/,a){print a[1]}' file
INV_Chest_Leather_09
Simple perl approach:
perl -ne 'print "$1\n" if /\bicon:\047([^\047]+)/' file
The output:
INV_Chest_Leather_09
The .* in your regular expression is a greedy matcher, so the pattern will match till the end of the string and then backtrack to match the ,name_ portion. You could try replacing the .* with something like [^,]* (i.e. match anything except comma):
grep -Po -m 1 "(?<=]={icon:')[^,]*(?=',name_)"