Replacing a string in HTML file in Python

Replacing a string in HTML file in Python - html

I'm trying to replace a string stored in a list with an HTML tag in a file by doing:
links=[http://hexagon-dashboard-gbc-01/vboard/latest?regs=3281546<!--V68NUR-->]
str1="""%s<!--V68NUR-->"""%(vboard['V68N']['perf.tl'],vboard['V68N']['perf.tl'])
with open(html_file,'r+') as f:
content = f.read()
f.seek(0)
f.truncate()
f.write(content.replace(links[0],str1))
But I get the following error:
TypeError: replace() argument 1 must be str, not Tag.
What am I missing? Please help me with the modification I have to do.

Updated:
From what you posted, I suppose you are treating a html file as plain text and going to perform string replacement.
The replace() function only works when both of its arguments are strings.
The reason you got an error is that links[0] is not a string but a tag.
If you manage to get links like this (note the single quotes)
links=['http://hexagon-dashboard-gbc-01/vboard/latest?regs=3281546<!--V68NUR-->']
then
content.replace(links[0],str1)
would not produce any errors.
To edit html files, you can also use HTML Parser instead.

Related

Getting my re.findall to accept urls with a # symbol

Right now I have the line of code in python:
urls = re.findall("(?:(?:https?|ftp):\/\/)?[\w/\-?=%.]+\.[\w/\-?=%.]+",str(field))
This searches if a keyword is in a url, however it doesn't parse urls which include a # correctly. An example link I am trying parse is
https://partalert.net/product.html?v=51421546#asin=B08KH7RL89&price=&smid=A3P5ROKL5A1OLE&tag=partalert-21&timestamp=00%3A17+UTC+%281.3.2021%29&title=Gigabyte+GeForce+RTX+3080+VISION+OC+10GB+Graphics+Card&tld=.co.uk
However the parsing excludes the hashtag and everything after it:
https://partalert.net/product.html?v=51421546

I managed to solve this, i needed to add a few symbols to the character classes, here is the working regex: "(?:(?:https?|ftp)://)?[\w/-?=%.#&+]+.[\w/-?=%.#&+]+"

How to insert multi line string into coffeescript/jquery value with Rails

I have some CoffeeScript in my Rails project, with which I'm trying to update a textarea. My CoffeeScript is:
$('#video_description').val(<%= #description %>")
The returned text is a string from the YT gem (a description from one of my YouTube videos). An example can be:
Testing that this works
Does this work?
When I load the page and inspect it with the Developer Tools in Chrome, the CoffeeScript looks like:
$('#video_description').val("Testing that this works
Does this work?")
The new line avoids the CoffeeScript from making sense to the interpreter, and it raises the following message:
Uncaught SyntaxError: Invalid or unexpected token
I've tried replacing the CoffeeScript with:
$('#video_description').val(<%= h #description %>")
This has no effect (other than to escape all the single quotes in the actual string). It has something to do with the fact the string is dropped into the quotes as is.
I need to have the text string to have \n instead of the carriage return.
Any help to solve this would be great.

Use the escape_javascript() method (shortcut: j())
$('#video_description').val("<%= j(#description) %>");

Issues while trying to form regex of Json string

I am facing issues trying to form regex of the following string.
[{"Column1":"Value1"},{"Column2":"Value2"},{"Column3":"Value3"},{"Column1":"Value4"},{"Column2":"Value5"},{"Column3":"Value6"},{"Column1":"Value7"},{"Column2":"Value8"},{"Column3":"Value9"}]
and I want output as
[{"Column1":"Value1","Column2":"Value2","Column3":"Value3"},{"Column1":"Value4","Column2":"Value5","Column3":"Value6"},{"Column1":"Value7","Column2":"Value8","Column3":"Value9"}]
I tried with (?!^)^(?=(?:[^"]*"[^"]*")*[^"]*$)|\}(?!$)(?=(?:[^"]*"[^"]*")*[^"]*$) but it is either removing all opening braces or all closing braces.
This output formation, is it at all possible from regex ? or are there any other options except split, remove, replace string?

You can use a regex to replace each group of three, and use capture groups so you can recompose the JSON with just the data you need.
var input = #"[{""Column1"":""Value1""},{""Column2"":""Value2""},{""Column3"":""Value3""},{""Column1"":""Value4""},{""Column2"":""Value5""},{""Column3"":""Value6""},{""Column1"":""Value7""},{""Column2"":""Value8""},{""Column3"":""Value9""}";
var re = new System.Text.RegularExpressions.Regex(#"(\{[^}]+)\},\{([^}]+)\},\{([^}]+)\}");
var output = re.Replace(input, "$1, $2, $3 }");
Console.WriteLine(output);
It is probably best to use Newtonsoft.JSON to parse and rebuild your JSON, otherwise if the format of the string isn't followed exactly as described then this solution will break. You can accommodate some variance by adding \s* before and after each brace so that whitespace is accounted for, but otherwise this is a brittle solution.

What is the proper method for reading and writing HTML/XML (byte string) with Python and lxml and etree?

EDIT: Now that the problem is solved, I realize that it had more to do with properly reading/writing byte-strings, rather than HTML. Hopefully, that will make it easier for someone else to find this answer.
I have an HTML file that's poorly formatted. I want to use a Python lib to just make it tidy.
It seems like it should be as simple as the following:
import sys
from lxml import etree, html
#read the unformatted HTML
with open('C:/Users/mhurley/Portable_Python/notebooks/View_Custom_Report.html', 'r', encoding='utf-8') as file:
#write the pretty XML to a file
file_text = ''.join(file.readlines())
#format the HTML
document_root = html.fromstring(file_text)
document = etree.tostring(document_root, pretty_print=True)
#write the nice, pretty, formatted HTML
with open('C:/Users/mhurley/Portable_Python/notebooks/Pretty.html', 'w') as file:
#write the pretty XML to a file
file.write(document)
But this chunk of code complains that file_lines is not a string or bytes-like object. Okay, it makes sense that the function can't take a list, I suppose.
But then, it's 'bytes' not a string. No problem, str(document)
But then I get HTML that's full of '\n' that are not newlines... they're a slash followed by an en. And there are no actual carriage returns in the result, it's just one long line.
I've tried a number of other weird things like specifying the encoding, trying to decode, etc. None of which produce the desired result.
What's the right way to read and write this kind of (is non-ASCII the right term?) text?

You are missing that you get bytes from tostring method from etree and need to take that into account when writing (a bytestring) to a file. Use the b switch in the open function like this and forget about the str() conversion:
with open('Pretty.html', 'wb') as file:
#write the pretty XML to a file
file.write(document)
Addendum
Even though this answer solves the immediate problem at hand and teaches about bytestrings, the solution by Padraic Cunningham is the cleaner and faster way to write lxml etrees to a file.

This can be done all using lxml in a couple of lines of code without ever needing to use open, the .write method is exactly for what you are trying to do:
# parse using file name which is the also the recommended way.
tree = html.parse("C:/Users/mhurley/Portable_Python/notebooks/View_Custom_Report.html")
# call write on the tree
tree.write("C:/Users/mhurley/Portable_Python/notebooks/Pretty.html", pretty_print=True, encoding="utf=8")
Also file_text = ''.join(file.readlines()) is exactly the same as file_text = file.read()

How to escape quotes that passed from a JSON file to Jade template?

I have some variables stored in a JSON file which will be injected into my generated HTML later. Those variables would be put in places like:
var str = '#{content.str}';
While in the JSON file the content.str might contain ' in it, and it would cause the JavaScript error after rendering the HTML file.
What should I do to prevent this happening?
Thanks,

It's simple -
"I've done it".replace("'", "\\'")
//output "I\'ve done it"
Using RegEx - replace all
"I've done it haven't you".replace(/'/g, "\\'")
//output "I\'ve done it haven\'t you"

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Replacing a string in HTML file in Python - html

Related

Getting my re.findall to accept urls with a # symbol

How to insert multi line string into coffeescript/jquery value with Rails

Issues while trying to form regex of Json string

What is the proper method for reading and writing HTML/XML (byte string) with Python and lxml and etree?

How to escape quotes that passed from a JSON file to Jade template?

Categories

Resources