sed replace tab ahead of specific character with line break - tabs

I know how to do this in a text editor but it shuts my computer down because I have a large file. I have tab-separated data like this all on a single line:
XP_23947974 XM_23947974 HG12390 product=blahblah NP_23947975 XM_23947975 HG12391 product=blahblah2
And I want to insert a line break at every either XP or NP. So, as it is tab-separated, in the text editor I was just going to do
Find:(\D)P_
Replace:\n\1P_
Giving
XP_23947974 XM_23947974 HG12390 product=blahblah
NP_23947975 XM_23947975 HG12391 product=blahblah2
But I want to use sed (etc) to do that. help appreciated.

This should do the trick:
sed -e 's/\(XP\|NP\)/\n\1/g'
You can test this with:
echo 'XP_23947974 XM_23947974 HG12390 product=blahblah NP_23947975 XM_23947975 HG12391 product=blahblah2' | sed -e 's/\(XP\|NP\)/\n\1/g'

Related

Find and replace text from a large txt file on ubuntu

I want to replace "}{" by "},{" to make a large txt file into valid json. Need help !!!
This can be achieved with sed:
sed -i 's/}{/},{/g' filename
sed is the command, -i implies that the changes have to made to a file, which name you're giving at the end (and you should change filename, of course).
The substitution part starts with the s, between the first //, you set what you want to replace, between the last //, you set what you want instead. The g at the end makes sure that this search/replace is not only executed once, but as long as sed finds matches.
If you have any newlines present after the }, you can simply remove them all, you'll still get a valid JSON afterwards:
cat filename | tr -d '\n' | sed 's/}{/},{/g' >newfilename
This would simply delete all newlines (\n) and pass it to the command. It will create a new file, though.

How to delete all characters up to and including a specified character in a text file?

Quick sed question
In a text file how do I remove all characters up to and including the first '[' found in the entire file and nothing else?
I tried
sed "s/^[^\[]*\[//" example.json
but it's stripping out all text on every line.
Alternately,
I have a set of files that are sets of JSON documents. I am trying to import them into elasticsearch, but the first document in the file is an informational document with a non-standard layout that messes up the importing of the rest of the documents. I'm trying to get rid of the first document so the subsequent documents can load properly.
Here is the document:
https://earthquake.usgs.gov/fdsnws/event/1/query?format=geojson&starttime=2014-01-01&endtime=2014-01-02
For anything other than a simple s/old/new on individual strings just use awk. This will work using any awk in any shell on every UNIX box:
awk 'sub(/[^[]*\[/,""){f=1} f' file
I used two sed commands to achieve the task to "remove all characters up to and including the first '[' found"
sed -n '/\[/,$p' example.json | sed -r '1 s/[^\[]*\[(.*)/\1/'

How to remove first character from first line using sed

I have three .csv files that are output from saving a query in MS SQLServer. I need to load these files into an Informix database, which requires that tacking on of a trailing delimiter. That's easy to do using sed
s/$/,/g
However, each of these files also contains (as displayed by vim, but not ed or sed) an at the first character position of the first line.
I need to get rid of this character. It deletes as one character using vim's x command. How can I describe this character using sed so I can delete it without removing the line.
I've tried 1s/^.//g, but that is not working.
Try this instead:
sed -e '1s/^.//' input_file > output_file
Or if you'd like to edit the files in-place:
sed -ie '1s/^.//' input_file
(Edited) Apparently s/^.// doesn't quote do it, updated.
Remove the first character on the first line inplace:
sed -i '1s/^.//' file
try:
sed -i '1s/^.\(.*\)/\1/' file
this should remove the first character from the first line. (try it without the -i argument first to make sure)
edit: i originally posted the following, which would delete the first character from every line. upon re-reading the question i realized that isn't quite what was wanted.
sed -i 's/^.\(.*\)/\1/' file

using sed in script to add html tag to text

I'm trying to use sed in a shell script to add html hyperlink tags to a url in a plain text file.
This is the content of my newtext.txt:
www.example.com
And here is the desired content of newtext.txt that I would like after running my script:
www.example.com
Here is the content of my current script, addhtml.sh:
#!/bin/bash
newtextv='cat newtext.txt'
sed -i.bak 's|\(www.*\)|\1|' newtext.txt
But unfortunately, after running the script, the content of newtext.txt becomes:
www.example.com
I believe my error somehow relates to how my variable is being quoted?
I eventually want this script to also be able to convert full urls (containing http:// )... I obviously need to improve my sed knowledge a good deal (it's taken me a few days to get this far), but I can't wrap my head around this one.
Thank you!
If you want to put the file's content into a variable:
newtextv=$(cat newtext.txt)
But really, you probably want something like this (but with a better regex, obviously):
sed 's|www\.[^ ]*|&|g' <newtext.txt >newtext.html
Sed replaces every & with the matched string.
Why mess around with a variable?
sed -i 's|\(www.*\)|\1|' newtext.txt
or
sed -i 's|www.*|&|' newtext.txt
If you happen to have the URL in a variable you can also do it without sed:
newtextv=www.example.com
echo "$newtextv"
returns
www.example.com
In Bash you can manipulate variables as a subset of variable substitution.
Here ${newtextv#www.} basically means take $newtextv and cut "www." from its beginning
Your trouble is two little syntax errors:
cat newtext.txt will never execute, you need to use backquotes ` or $()
using single quotes ' prevents variables from expanding. To allow variable expansion use double quotes "
here is what you want to do:
#!/bin/bash
newtextv=$(cat newtext.txt)
sed -i.bak "s|\(www.*\)|\1|" newtext.txt

Help with sed regex: extract text from specific tag

First time sed'er, so be gentle.
I have the following text file, 'test_file':
<Tag1>not </Tag1><Tag2>working</Tag2>
I want to extract the text in between <Tag2> using sed regex, there may be other occurrences of <Tag2> and I would like to extract those also.
So far I have this sed based regex:
cat test_file | grep -i "Tag2"| sed 's/<[^>]*[>]//g'
which gives the output:
not working
Anyone any idea how to get this working?
As another poster said, sed may not be the best tool for this job. You may want to use something built for XML parsing, or even a simple scripting language, such as perl.
The problem with your try, is that you aren't analyzing the string properly.
cat test_file is good - it prints out the contents of the file to stdout.
grep -i "Tag2" is ok - it prints out only lines with "Tag2" in them. This may not be exactly what you want. Bear in mind that it will print the whole line, not just the <Tag2> part, so you will still have to search out that part later.
sed 's/<[^>]*[>]//g' isn't what you want - it simply removes the tags, including <Tag1> and <Tag2>.
You can try something like:
cat tmp.tmp | grep -i tag2 | sed 's/.*<Tag2>\(.*\)<\/Tag2>.*/\1/'
This will produce
working
but it will only work for one tag pair.
For your nice, friendly example, you could use
sed -e 's/^.*<Tag2>//' -e 's!</Tag2>.*!!' test-file
but the XML out there is cruel and uncaring. You're asking for serious trouble using regular expressions to scrape XML.
you can use gawk, eg
$ cat file
<Tag1>not </Tag1><Tag2>working here</Tag2>
<Tag1>not </Tag1><Tag2>
working
</Tag2>
$ awk -vRS="</Tag2>" '/<Tag2>/{gsub(/.*<Tag2>/,"");print}' file
working here
working
awk -F"Tag2" '{print $2}' test_1 | sed 's/[^a-zA-Z]//g'