Finding and replacing last segment of ip addresses with sed - json

I have bunch of json files that contains web events. Each event contains lots of stuff and I'm trying to do ip address anonymization (replacing last segment of ip addresses with 0) with sed.
In short:
How to find substrings like "ip":"34.542.3.34" from json files and transform them to "ip":"34.542.3.0"?
Attempts:
Resetting starting point with \K
sed -e 's/"ip":"[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.\K[0-9]{1,3}/0/g' file.json
This would work but unfortunately sed doesn't seem support reseting starting point.
Negative lookbehind
sed -e 's/(?<="ip":"[0-9]{3}\.[0-9]{3}\.[0-9]{3}\.)([0-9]{3})/0/g' file.json
This would also work but negative lookbehind doesn't seem to support non-fixed-width assortions. So [0-9]{1,3} is not supported and hence this won't work.
Matching groups
Third idea was to use matching groups and do something like this
sed -e 's/("ip":"[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|([0-9]{1,3})/\1\20/g' file.json
But couldn't figure out how this would actually work with sed.
Writing regex for every possible length option separately
This would probably work but it would make regex too long and hardly readable. I would like to find more convenient and clean solution.
Any suggestions?

With GNU sed:
sed -r 's/("ip":"[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)[0-9]{1,3}(")/\10\2/' file

When there's no -r-option:
sed -e 's/\("ip":"[[:digit:]]\{1,3\}\(\.[[:digit:]]\{1,3\}\)\{2\}\.\)\([[:digit:]]\{1,3\}\)"/\10"/g' tst.json

Related

Pipe SED output to mysql

I'm an trying to pipe output data directly from sed into mysql by doing something like:
sed "s/url\/uct\/video/url\/uct\/video/g" 2019-11-25T0330-main.sql.gz | mysql opencast
Is this possible? I keep running out of space and therefore piping it straight into mysql would be ideal (instead of running sed independently).
You need
gunzip main.sql.gz | sed 's#url/uct/video#url/uct/video#g' | mysql opencast
Note if you use a different char besides / for seds s/str/rep/ you can avoid having to escape / chars that are part of the sub/repl patterns. For some OS's (besided Linux) users, you may need to escape the first use of the alternate delimit character, like sed '\#sub#repl#
And if you are using the -i option for sed, it doesn't make sense in the context of reading input from a pipe ;-)
IHTH

Filtering out indented lines in JSON GitHub API with sed

I'm able get all the names filtered out using,
sed -n '/"name":/p' htop.json
but I want to filter out all the indented outputs. I'm looking for the repo titles from each GitHub. It's important I use something light like sed to make this small and portable.
Here is htop.json
https://pastebin.com/5xuH29yW
Well, just filter from the beginning of the line with spaces/indentation characters then:
sed -n '/^ "name":/p' htop.json
and we can also specify the number of spaces as a number:
sed -n '/^[ ]\{6\}"name":/p' htop.json
Let's get repo names!
sed -n '/^ "name":/{s/[[:space:]]*"name":[[:space:]]*"\(.*\)",$/\1/;p}' htop.json

How to add text to the end of various digits using sed command?

I am trying to replace page=#" in various html files with page=#/index.html". I have tried using the command:
sed -i -re 's|"(page=[0-9]+)"|"\1/index.html"|' *.html
along with numerous interpretations but have not been successful. The first part of the code sed -i -re 's|"(page=[0-9]+)"| seems to be working properly but I cannot seem to format the end to achieve my goal. Any suggestions to modify this command would be greatly appreciated!
If you're trying to replace page=#" where the actual strings look like page=99", then the first double quote in the RE isn't going to match anything correctly. It would only match if it looks like:
"page=99"
But I'm guessing this is at the end of a link in html so it probably does not have the initial double quote. This should work instead:
`sed -i -re 's|(page=[0-9]+)"|\1/index.html"|' *.html
Also to confirm, if you're on OS X, you can't use the GNU option -r or use -i without an argument, so it would look like this:
`sed -i '' -Ee 's|(page=[0-9]+)"|\1/index.html"|' *.html
-E means to use Extended Regular Expressions so you can write ( instead of \( for grouping. In GNU sed this is -r.
-i means to edit the files in-place, on GNU it can take no argument, but on other systems you need to pass the extension to make for a backup, or '' for no backup.

Replacing HTML tag content using sed

I'm trying to replace the content of some HTML tags in an HTML page using sed in a bash script. For some reason I'm not getting the proper result as it's not replacing anything. It has to be something very simple/stupid im overlooking, anyone care to help me out?
HTML to search/replace in:
Unlocked <span id="unlockedCount"></span>/<span id="totalCount"></span> achievements for <span id="totalPoints"></span> points.
sed command used:
cat index.html | sed -i -e "s/\<span id\=\"unlockedCount\"\>([0-9]\{0,\})\<\/span\>/${unlockedCount}/g" index.html
The point of this is to parse the HTML page and update the figures according to some external data. For a first run, the contents of the tags will be empty, after that they will be filled.
EDIT:
I ended up using a combination of the answers which resulted in the following code:
sed -i -e 's|<span id="unlockedCount">\([0-9]\{0,\}\)</span>|<span id="unlockedCount">'"${unlockedCount}"'</span>|g' index.html
Many thanks to #Sorpigal, #tripleee, #classic for the help!
Try this:
sed -i -e "s/\(<span id=\"unlockedCount\">\)\(<\/span>\)/\1${unlockedCount}\2/g" index.html
What you say you want to do is not what you're telling sed to do.
You want to insert a number into a tag or replace it if present. What you're trying to tell sed to do is to replace a span tag and its contents, if any or a number, with the value of in a shell variable.
You're also employing a lot of complex, annoying and erorr-prone escape sequences which are just not necessary.
Here's what you want:
sed -r -i -e 's|<span id="unlockedCount">([0-9]{0,})</span>|<span id="unlockedCount">'"${unlockedCount}"'</span>|g' index.html
Note the differences:
Added -r to turn on extended expressions without which your capture pattern would not work.
Used | instead of / as the delimiter for the substitution so that escaping / would not be necessary.
Single-quoted the sed expression so that escaping things inside it from the shell would not be necessary.
Included the matched span tag in the replacement section so that it would not get deleted.
In order to expand the unlockedCount variable, closed the single-quoted expression, then later re-opened it.
Omitted cat | which was useless here.
I also used double quotes around the shell variable expansion, because this is good practice but if it contains no spaces this is not really necessary.
It was not, strictly speaking, necessary for me to add -r. Plain old sed will work if you say \([0-9]\{0,\}\), but the idea here was to simplify.
sed -i -e 's%<span id="unlockedCount">([0-9]*)</span\>/'"${unlockedCount}/g" index.html
I removed the Useless Use of Cat, took out a bunch of unnecessary backslashes, added single quotes around the regex to protect it from shell expansion, and fixed the repetition operator. You might still need to backslash the grouping parentheses; my sed, at least, wants \(...\).
Note the use of single and double quotes next to each other. Single quotes protect against shell expansion, so you can't use them around "${unlockedCount}" where you do want the shell to interpolate the variable.

Help with sed regex: extract text from specific tag

First time sed'er, so be gentle.
I have the following text file, 'test_file':
<Tag1>not </Tag1><Tag2>working</Tag2>
I want to extract the text in between <Tag2> using sed regex, there may be other occurrences of <Tag2> and I would like to extract those also.
So far I have this sed based regex:
cat test_file | grep -i "Tag2"| sed 's/<[^>]*[>]//g'
which gives the output:
not working
Anyone any idea how to get this working?
As another poster said, sed may not be the best tool for this job. You may want to use something built for XML parsing, or even a simple scripting language, such as perl.
The problem with your try, is that you aren't analyzing the string properly.
cat test_file is good - it prints out the contents of the file to stdout.
grep -i "Tag2" is ok - it prints out only lines with "Tag2" in them. This may not be exactly what you want. Bear in mind that it will print the whole line, not just the <Tag2> part, so you will still have to search out that part later.
sed 's/<[^>]*[>]//g' isn't what you want - it simply removes the tags, including <Tag1> and <Tag2>.
You can try something like:
cat tmp.tmp | grep -i tag2 | sed 's/.*<Tag2>\(.*\)<\/Tag2>.*/\1/'
This will produce
working
but it will only work for one tag pair.
For your nice, friendly example, you could use
sed -e 's/^.*<Tag2>//' -e 's!</Tag2>.*!!' test-file
but the XML out there is cruel and uncaring. You're asking for serious trouble using regular expressions to scrape XML.
you can use gawk, eg
$ cat file
<Tag1>not </Tag1><Tag2>working here</Tag2>
<Tag1>not </Tag1><Tag2>
working
</Tag2>
$ awk -vRS="</Tag2>" '/<Tag2>/{gsub(/.*<Tag2>/,"");print}' file
working here
working
awk -F"Tag2" '{print $2}' test_1 | sed 's/[^a-zA-Z]//g'