Find and replace text from a large txt file on ubuntu

Find and replace text from a large txt file on ubuntu - json

I want to replace "}{" by "},{" to make a large txt file into valid json. Need help !!!

This can be achieved with sed:
sed -i 's/}{/},{/g' filename
sed is the command, -i implies that the changes have to made to a file, which name you're giving at the end (and you should change filename, of course).
The substitution part starts with the s, between the first //, you set what you want to replace, between the last //, you set what you want instead. The g at the end makes sure that this search/replace is not only executed once, but as long as sed finds matches.
If you have any newlines present after the }, you can simply remove them all, you'll still get a valid JSON afterwards:
cat filename | tr -d '\n' | sed 's/}{/},{/g' >newfilename
This would simply delete all newlines (\n) and pass it to the command. It will create a new file, though.

Related

How to delete all characters up to and including a specified character in a text file?

Quick sed question
In a text file how do I remove all characters up to and including the first '[' found in the entire file and nothing else?
I tried
sed "s/^[^\[]*\[//" example.json
but it's stripping out all text on every line.
Alternately,
I have a set of files that are sets of JSON documents. I am trying to import them into elasticsearch, but the first document in the file is an informational document with a non-standard layout that messes up the importing of the rest of the documents. I'm trying to get rid of the first document so the subsequent documents can load properly.
Here is the document:
https://earthquake.usgs.gov/fdsnws/event/1/query?format=geojson&starttime=2014-01-01&endtime=2014-01-02

For anything other than a simple s/old/new on individual strings just use awk. This will work using any awk in any shell on every UNIX box:
awk 'sub(/[^[]*\[/,""){f=1} f' file

I used two sed commands to achieve the task to "remove all characters up to and including the first '[' found"
sed -n '/\[/,$p' example.json | sed -r '1 s/[^\[]*\[(.*)/\1/'

How to reformat file with sed/vim?

I have a .csv file that looks like this.
atomnum,atominfo,metric
238,A-30-CYS-SG,53.7723
889,A-115-CYS-SG,46.2914
724,A-94-CYS-SG,44.6405
48,A-6-CYS-SG,37.2108
630,A-80-CYS-SG,29.574
513,A-64-CYS-SG,23.1925
981,A-127-CYS-SG,19.8903
325,A-41-GLN-OE1,17.6205
601,A-76-CYS-SG,17.5079
I want to change it like this:
atomnum,atominfo,metric
238,C30-SG,53.7723
889,C115-SG,46.2914
724,C94-SG,44.6405
48,C6-SG,37.2108
630,C80-SG,29.574
513,C64-SG,23.1925
981,C127-SG,19.8903
325,Q41-OE1,17.6205
601,C76-SG,17.5079
The part between the commas is an atom identifier: where A-30-CYS-SG is the gamma sulfur of the residue 30, which is a cysteine, in chain A. Residues can be represented with three letters or just one (Table here https://www.iupac.org/publications/pac-2007/1972/pdf/3104x0639.pdf). Basically, I just want to a) change the three letter code to the one letter code, b) remove the chain id (A in this case) and c) put the residue number next to the one letter code.
I've tried matching the patterns between the commas within vim. Something like s%:\(-\d\+\-\)\(\u\+\):\2\1:g gives me c) i.e. (ACYS-30--SG). I do not know how to do a) with vim. I know how to do it with sed and an input file with all the substitute commands in it. But then maybe is better to do all the work with sed... I am asking if is it possible to do a) on vim?
Thanks

This might work for you (GNU sed):
sed -r '1b;s/$/\n:ALAA:ARGR:ASNN:ASPD:CYSC:GLUE:GLNQ:GLYG:HISH:ILEI:LEUL:LYSK:METM:PHEF:PROP:SERS:THRT:TRPW:TYRY:VALV/;s/,A-([0-9]+)-(...)(.*)\n.*:\2(.).*/,\4\1\3/' file
Append a lookup table to each line and use pattern matching to substitute a 3 letter code (and integer value) for a 1 letter code. The lookup key is a colon, followed by the 3 letter key, followed by the 1 letter code.

Using sed, paste, cut, & bash, given input atoms.csv:
paste -d, <(cut -d, -f1 atoms.csv) \
<(cut -d, -f2 atoms.csv | sed 's/.-//
s/\(.*\)-\([A-Z]\{3\}\)-/\2\1-/
s/^ALA/A/
s/^ARG/R/
s/^ASN/N/
s/^ASP/D/
s/^CYS/C/
s/^GLU/E/
s/^GLN/Q/
s/^GLY/G/
s/^HIS/H/
s/^ILE/I/
s/^LEU/L/
s/^LYS/K/
s/^MET/M/
s/^PHE/F/
s/^PRO/P/
s/^SER/S/
s/^THR/T/
s/^TRP/W/
s/^TYR/Y/
s/^VAL/V/') \
<(cut -d, -f3 atoms.csv)
Output:
atomnum,atominfo,metric
238,C30-SG,53.7723
889,C115-SG,46.2914
724,C94-SG,44.6405
48,C6-SG,37.2108
630,C80-SG,29.574
513,C64-SG,23.1925
981,C127-SG,19.8903
325,Q41-OE1,17.6205
601,C76-SG,17.5079

If you know how to do it in sed why not leverage that knowledge and simply call out from Vim?
:%!sed -e '<your sed script>'
Once you done that and it works you can pop it in a Vim function.
functioni Transform()
your sed command
endfunction
and then just use
:call Transform()
which you can map to a key.
Simples!

Find and replace text in JSON with sed [duplicate]

I am trying to change the values in a text file using sed in a Bash script with the line,
sed 's/draw($prev_number;n_)/draw($number;n_)/g' file.txt > tmp
This will be in a for loop. Why is it not working?

Variables inside ' don't get substituted in Bash. To get string substitution (or interpolation, if you're familiar with Perl) you would need to change it to use double quotes " instead of the single quotes:
# Enclose the entire expression in double quotes
$ sed "s/draw($prev_number;n_)/draw($number;n_)/g" file.txt > tmp
# Or, concatenate strings with only variables inside double quotes
# This would restrict expansion to the relevant portion
# and prevent accidental expansion for !, backticks, etc.
$ sed 's/draw('"$prev_number"';n_)/draw('"$number"';n_)/g' file.txt > tmp
# A variable cannot contain arbitrary characters
# See link in the further reading section for details
$ a='foo
bar'
$ echo 'baz' | sed 's/baz/'"$a"'/g'
sed: -e expression #1, char 9: unterminated `s' command
Further Reading:
Difference between single and double quotes in Bash
Is it possible to escape regex metacharacters reliably with sed
Using different delimiters for sed substitute command
Unless you need it in a different file you can use the -i flag to change the file in place

Variables within single quotes are not expanded, but within double quotes they are. Use double quotes in this case.
sed "s/draw($prev_number;n_)/draw($number;n_)/g" file.txt > tmp
You could also make it work with eval, but don’t do that!!

This may help:
sed "s/draw($prev_number;n_)/draw($number;n_)/g"

You can use variables like below. Like here, I wanted to replace hostname i.e., a system variable in the file. I am looking for string look.me and replacing that whole line with look.me=<system_name>
sed -i "s/.*look.me.*/look.me=`hostname`/"
You can also store your system value in another variable and can use that variable for substitution.
host_var=`hostname`
sed -i "s/.*look.me.*/look.me=$host_var/"
Input file:
look.me=demonic
Output of file (assuming my system name is prod-cfm-frontend-1-usa-central-1):
look.me=prod-cfm-frontend-1-usa-central-1

I needed to input github tags from my release within github actions. So that on release it will automatically package up and push code to artifactory.
Here is how I did it. :)
- name: Invoke build
run: |
# Gets the Tag number from the release
TAGNUMBER=$(echo $GITHUB_REF | cut -d / -f 3)
# Setups a string to be used by sed
FINDANDREPLACE='s/${GITHUBACTIONSTAG}/'$(echo $TAGNUMBER)/
# Updates the setup.cfg file within version number
sed -i $FINDANDREPLACE setup.cfg
# Installs prerequisites and pushes
pip install -r requirements-dev.txt
invoke build
Retrospectively I wish I did this in python with tests. However it was fun todo some bash.

Another variant, using printf:
SED_EXPR="$(printf -- 's/draw(%s;n_)/draw(%s;n_)/g' $prev_number $number)"
sed "${SED_EXPR}" file.txt
or in one line:
sed "$(printf -- 's/draw(%s;n_)/draw(%s;n_)/g' $prev_number $number)" file.txt
Using printf to build the replacement expression should be safe against all kinds of weird things, which is why I like this variant.

How to remove first character from first line using sed

I have three .csv files that are output from saving a query in MS SQLServer. I need to load these files into an Informix database, which requires that tacking on of a trailing delimiter. That's easy to do using sed
s/$/,/g
However, each of these files also contains (as displayed by vim, but not ed or sed) an at the first character position of the first line.
I need to get rid of this character. It deletes as one character using vim's x command. How can I describe this character using sed so I can delete it without removing the line.
I've tried 1s/^.//g, but that is not working.

Try this instead:
sed -e '1s/^.//' input_file > output_file
Or if you'd like to edit the files in-place:
sed -ie '1s/^.//' input_file
(Edited) Apparently s/^.// doesn't quote do it, updated.

Remove the first character on the first line inplace:
sed -i '1s/^.//' file

try:
sed -i '1s/^.\(.*\)/\1/' file
this should remove the first character from the first line. (try it without the -i argument first to make sure)
edit: i originally posted the following, which would delete the first character from every line. upon re-reading the question i realized that isn't quite what was wanted.
sed -i 's/^.\(.*\)/\1/' file

Match any character (including newlines) in sed

I have a sed command that I want to run on a huge, terrible, ugly HTML file that was created from a Microsoft Word document. All it should do is remove any instance of the string
style='text-align:center; color:blue;
exampleStyle:exampleValue'
The sed command that I am trying to modify is
sed "s/ style='[^']*'//" fileA > fileB
It works great, except that whenever there is a new line inside of the matching text, it doesn't match. Is there a modifier for sed, or something I can do to force matching of any character, including newlines?
I understand that regexps are terrible at XML and HTML, blah blah blah, but in this case, the string patterns are well-formed in that the style attributes always start with a single quote and end with a single quote. So if I could just solve the newline problem, I could cut down the size of the HTML by over 50% with just that one command.
In the end, it turned out that Sinan Ünür's perl script worked best. It was almost instantaneous, and it reduced the file size from 2.3 MB to 850k. Good ol' Perl...

sed goes over the input file line by line which means, as I understand, what you want is not possible in sed.
You could use the following Perl script (untested), though:
#!/usr/bin/perl
use strict;
use warnings;
{
local $/; # slurp mode
my $html = <>;
$html =~ s/ style='[^']*'//g;
print $html;
}
__END__
A one liner would be:
$ perl -e 'local $/; $_ = <>; s/ style=\047[^\047]*\047//g; print' fileA > fileB

Sed reads the input line by line, so it is not simple to do processing over one line... but it is not impossible either, you need to make use of sed branching. The following will work, I have commented it to explain what is going on (not the most readable syntax!):
sed "# if the line matches 'style='', then branch to label,
# otherwise process next line
/style='/b style
b
# the line contains 'style', try to do a replace
: style
s/ style='[^']*'//
# if the replace worked, then process next line
t
# otherwise append the next line to the pattern space and try again.
N
b style
" fileA > fileB

You could remove all CR/LF using tr, run sed, and then import into an editor that auto-formats.

Another way is like:
$ cat toreplace.txt
I want to make \
this into one line
I also want to \
merge this line
$ sed -e 'N;N;s/\\\n//g;P;D;' toreplace.txt
Output:
I want to make this into one line
I also want to merge this line
The N loads another line, P prints the pattern space up to the first newline, and D deletes the pattern space up to the first newline.

You can try this:
awk '/style/&&/exampleValue/{
gsub(/style.*exampleValue\047/,"")
}
/style/&&!/exampleValue/{
gsub(/style.* /,"")
f=1
}
f &&/exampleValue/{
gsub(/.*exampleValue\047 /,"")
f=0
}
1
' file
Output:
# more file
this is a line
style='text-align:center; color:blue; exampleStyle:exampleValue'
this is a line
blah
blah
style='text-align:center; color:blue;
exampleStyle:exampleValue' blah blah....
# ./test.sh
this is a line
this is a line
blah
blah
blah blah....

Remove XML elements across several lines
My use case was pretty much the same, but I needed to match opening and closing tags from XML elements and remove them completely --including whatever was inside.
<xmlTag whatever="parameter that holds in the tag header">
<whatever_is_inside/>
<InWhicheverFormat>
<AcrossSeveralLines/>
</InWhicheverFormat>
</xmlTag>
Still, sed works on one single line. What we do here is tricking it to append subsequent lines to the current one so we can edit all lines we like, then rewrite the output (\n is a legal char you can output with sed to divide lines again).
Inspired by the answer from #beano, and another answer in Unix stackExchange, I built up my working sed "program":
sed -s --in-place=.back -e '/\(^[ ]*\)<xmlTag/{ # whenever you encounter the xmlTag
$! { # do
:begin # label to return to
N; # append next line
s/\(^[ ]*\)<\(xmlTag\)[^·]\+<\/\2>//; # Attempt substitution (elimination) of pattern
t end # if substitution succeeds, jump to :end
b begin # unconditional jump to :begin to append yet another line
:end # label to mark the end
}
}' myxmlfile.xml
Some explanations:
I match <xmlTag without closing the > because my XML element contains parameters.
What precedes <xmlTag is a very helpful piece of RegExp to match any existing indentation: \(^[ ]*\) so you can later output it with just \1 (even if it was not needed this time).
The addition of ; in several places is so that sed will understand that the command (N, s or whichever) ends there and following character(s) are another command.
most of my trouble was trying to find a RegExp that would match "anything in between". I finally settled by anything but · (i.e. [^·]\+), counting on not having that char in any of the data files. I needed to scape + because is special for GNU sed.
my original files remain as .back, just in case something goes wrong --tests still do fail after modification-- and are flagged easily by version control for removal in bulk.
I use this kind of sed-automation to evolve .XML files that we use with serialized data to run our unit and Integration tests. Whenever our classes change (loose or gain fields), the data have to be updated. I do that with a single ´find´ that executes a sed-automation in the files that contain the modified class. We hold hundreds of xml data files.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Find and replace text from a large txt file on ubuntu - json

I want to replace "}{" by "},{" to make a large txt file into valid json. Need help !!!

Related

How to delete all characters up to and including a specified character in a text file?

How to reformat file with sed/vim?

Find and replace text in JSON with sed [duplicate]

How to remove first character from first line using sed

Match any character (including newlines) in sed

Categories

Resources