Partial unescape html string - html

I have webapp. Go html template will escape all unsafe tag. Is there way to selectively escape a string? For instance, Google.com will highlight your search query and wrap it with <em> tag while all other html & javascript gets escaped. I have this:
package main
import (
"fmt"
"html/template"
)
func main() {
s := "<i>This should be escaped</i><strong>This should be in bold</strong>."
h := template.HTMLEscaper(s)
fmt.Println(h)
}
Right now everything gets escaped. I am aware you can do a template function with template.HTML(mystring) but how do I do that for part of a string?
thx!
EDIT:
I would like the final string to be:
<i>This should be escaped</i><strong>This should be in bold</strong>.
To be clear, I have tried creating a template function to highlight matches...the html I add gets escaped in the actual template ;(

Related

Replace characters in go serialization by using custom MarshalJSON method

As far as I saw, I just did a customized MarshalJSON method in order to replace these characters:\u003c and \u003e: https://go.dev/play/p/xJ-qcMN9QXl
In the example above, i marshaled the similar struct by sending to marshal from an aux struct that contains the same fields and last step is to replace the fields that I actually need and the return.
As you can see in the print placed before returning from MarshalJSON method, the special characters were replaced, but after calling the json.Marshal func, the special characters remains the same.
Something I'm missing here but cannot figure it out. Appreciate your help.
Thankies :)
In the Marshal documentation of the json package https://pkg.go.dev/encoding/json#Marshal you will find the following paragraph:
String values encode as JSON strings coerced to valid UTF-8, replacing invalid bytes with the Unicode replacement rune. So that the JSON will be safe to embed inside HTML tags, the string is encoded using HTMLEscape, which replaces "<", ">", "&", U+2028, and U+2029 are escaped to "\u003c","\u003e", "\u0026", "\u2028", and "\u2029". This replacement can be disabled when using an Encoder, by calling SetEscapeHTML(false).
So try it using a Encoder, example:
package main
import (
"bytes"
"encoding/json"
"fmt"
)
type Foo struct {
Name string
Surname string
Likes map[string]interface{}
Hates map[string]interface{}
newGuy bool //rpcclonable
}
func main() {
foo := &Foo{
Name: "George",
Surname: "Denkin",
Likes: map[string]interface{}{
"Sports": "volleyball",
"Message": "<Geroge> play volleyball <usually>",
},
}
buf := &bytes.Buffer{} // or &strings.Builder{} as from the example of #mkopriva
enc := json.NewEncoder(buf)
enc.SetEscapeHTML(false)
err := enc.Encode(foo)
if err != nil {
return
}
fmt.Println(buf.String())
}

Golang struct unmarshal xss

I have a struct which has XSS injected in it. In order to remove it, I json.Marshal it, then run json.HTMLEscape. Then I json.Unmarshal it into a new struct.
The problem is the new struct has XSS injected still.
I simply can't figure how to remove the XSS from the struct. I can write a function to do it on the field but considering there is json.HTMLEscape and we can Unmarshal it back it should work fine, but its not.
type Person struct {
Name string `json:"name"`
}
func main() {
var p, p2 Person
// p.Name has XSS
p.Name = "<script>alert(1)</script>"
var tBytes bytes.Buffer
// I marshal it so I can use json.HTMLEscape
marshalledJson, _ := json.Marshal(p)
json.HTMLEscape(&tBytes, marshalledJson)
// here I insert it into a new struct, sadly the p2 struct has the XSS still
err := json.Unmarshal(tBytes.Bytes(), &p2)
if err != nil {
fmt.Printf(err.Error())
}
fmt.Print(p2)
}
expected outcome is p2.Name to be sanitized like <script>alert(1)</script>
First, json.HTMLEscape doesn't do what you want:
HTMLEscape appends to dst the JSON-encoded src with <, >, &, U+2028 and U+2029 characters inside string literals changed to \u003c, \u003e, \u0026, \u2028, \u2029 so that the JSON will be safe to embed inside HTML <script> tags.
but what you want is:
p2.Name to be sanitized like <script>alert(1)</script>
which you can get by calling html.EscapeString, but not by any of the json encoder routines.1
Second, if you inspect the result of json.Marshal you'll see that it has already replaced < with \u003c and so forth—it's already done the json.HTMLEscape, so json.HTMLEscape does not have any characters to replace! See https://play.golang.org/p/Zergs3bwElY for an example.
As Ahmed Hashem noted, if you really want to do this sort of thing, you can use reflection to find string fields (as in Implement XSS protection in Golang)—but in general it's probably wiser to do this at the point of input. Note that the answer there does not recurse into inner objects that might contain strings.
1JSON is not HTML, nor XML, etc. Keep them separate in your head, and in your code.
See also https://medium.com/#oazzat19/what-is-the-difference-between-html-vs-xml-vs-json-254864972bbb, a short summary of how we got here that as far as I can tell has no errors, which is pretty good for a random web article. :-) When using JSON, we get very simple typed data: objects, strings, numbers, lists/arrays, boolean, and null; see https://www.w3schools.com/js/js_json_syntax.asp, https://www.w3schools.com/js/js_json_objects.asp, and https://cswr.github.io/JsonSchema/spec/basic_types/ for instance.

How do I output, in C++, all links from a saved .html file that are in the <a href> tags?

Currently writing a program that, given a URL, will save a copy of the page's HTML in a .txt file, and then attempt to parse that .txt files for hyperlinks that are in the tags. Example:
Visit example.com!
Right now, everything except the parser works. I output the contents of the html file to a .txt. I then convert it into a string, and then attempt to parse that string using regex, and store all the hyperlinks in a vector. I think print out the contents of that vector. The code for the parsing section of my code is as follows:
vector<string> extract_hyperlinks(string html_file_name )
{
static const regex hl_regex( "<a href=\"(.*?)\">", regex_constants::icase ) ;
const string text = file_to_string(html_file_name) ;
sregex_token_iterator begin( text.begin(), text.end(), hl_regex, 1 );
sregex_token_iterator end ;
return vector<string>( begin, end ) ;
}
The parser is not putting anything into the vector, even though the string is populated with the .txt file converted into a string, which clearly contains values such as Visit example.com!.
What am I doing wrong and how can I fix it?
Try this.
vector<string> extract_hyperlinks(string html_file_name )
{
static const regex hl_regex( "<a href=\"(.*?)\">", regex_constants::icase );
const string text = file_to_string(html_file_name) ;
std::vector<std::string> ret_vec;
std::copy( std::sregex_token_iterator(text.begin(), text.end(), hl_regex, 1),
std::sregex_token_iterator(),
std::back_inserter(ret_vec));
return ret_vec;
}
In this case, Regular expressions are either so simple that they are inadequate or so complex that they are incomprehensible
As per the advice of Martin York, you really need a HTML parsing library.
I would advise using goolge's gumbo-parser at https://github.com/google/gumbo-parser. It is a well-tested pure C99 library and has some C++ example files. The find_links.cc example file does what I think you want.

Load json file bundled within the executable [duplicate]

I'm working on a small web application in Go that's meant to be used as a tool on a developer's machine to help debug their applications/web services. The interface to the program is a web page that includes not only the HTML but some JavaScript (for functionality), images, and CSS (for styling). I'm planning on open-sourcing this application, so users should be able to run a Makefile, and all the resources will go where they need to go. However, I'd also like to be able to simply distribute an executable with as few files/dependencies as possible. Is there a good way to bundle the HTML/CSS/JS with the executable, so users only have to download and worry about one file?
Right now, in my app, serving a static file looks a little like this:
// called via http.ListenAndServe
func switchboard(w http.ResponseWriter, r *http.Request) {
// snipped dynamic routing...
// look for static resource
uri := r.URL.RequestURI()
if fp, err := os.Open("static" + uri); err == nil {
defer fp.Close()
staticHandler(w, r, fp)
return
}
// snipped blackhole route
}
So it's pretty simple: if the requested file exists in my static directory, invoke the handler, which simply opens the file and tries to set a good Content-Type before serving. My thought was that there's no reason this needs to be based on the real filesystem: if there were compiled resources, I could simply index them by request URI and serve them as such.
Let me know if there's not a good way to do this or I'm barking up the wrong tree by trying to do this. I just figured the end-user would appreciate as few files as possible to manage.
If there are more appropriate tags than go, please feel free to add them or let me know.
Starting with Go 1.16 the go tool has support for embedding static files directly in the executable binary.
You have to import the embed package, and use the //go:embed directive to mark what files you want to embed and into which variable you want to store them.
3 ways to embed a hello.txt file into the executable:
import "embed"
//go:embed hello.txt
var s string
print(s)
//go:embed hello.txt
var b []byte
print(string(b))
//go:embed hello.txt
var f embed.FS
data, _ := f.ReadFile("hello.txt")
print(string(data))
Using the embed.FS type for the variable you can even include multiple files into a variable that will provide a simple file-system interface:
// content holds our static web server content.
//go:embed image/* template/*
//go:embed html/index.html
var content embed.FS
The net/http has support to serve files from a value of embed.FS using http.FS() like this:
http.Handle("/static/", http.StripPrefix("/static/", http.FileServer(http.FS(content))))
The template packages can also parse templates using text/template.ParseFS(), html/template.ParseFS() functions and text/template.Template.ParseFS(), html/template.Template.ParseFS() methods:
template.ParseFS(content, "*.tmpl")
The following of the answer lists your old options (prior to Go 1.16).
Embedding Text Files
If we're talking about text files, they can easily be embedded in the source code itself. Just use the back quotes to declare the string literal like this:
const html = `
<html>
<body>Example embedded HTML content.</body>
</html>
`
// Sending it:
w.Write([]byte(html)) // w is an io.Writer
Optimization tip:
Since most of the times you will only need to write the resource to an io.Writer, you can also store the result of a []byte conversion:
var html = []byte(`
<html><body>Example...</body></html>
`)
// Sending it:
w.Write(html) // w is an io.Writer
Only thing you have to be careful about is that raw string literals cannot contain the back quote character (`). Raw string literals cannot contain sequences (unlike the interpreted string literals), so if the text you want to embed does contain back quotes, you have to break the raw string literal and concatenate back quotes as interpreted string literals, like in this example:
var html = `<p>This is a back quote followed by a dot: ` + "`" + `.</p>`
Performance is not affected, as these concatenations will be executed by the compiler.
Embedding Binary Files
Storing as a byte slice
For binary files (e.g. images) most compact (regarding the resulting native binary) and most efficient would be to have the content of the file as a []byte in your source code. This can be generated by 3rd party toos/libraries like go-bindata.
If you don't want to use a 3rd party library for this, here's a simple code snippet that reads a binary file, and outputs Go source code that declares a variable of type []byte that will be initialized with the exact content of the file:
imgdata, err := ioutil.ReadFile("someimage.png")
if err != nil {
panic(err)
}
fmt.Print("var imgdata = []byte{")
for i, v := range imgdata {
if i > 0 {
fmt.Print(", ")
}
fmt.Print(v)
}
fmt.Println("}")
Example output if the file would contain bytes from 0 to 16 (try it on the Go Playground):
var imgdata = []byte{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15}
Storing as base64 string
If the file is not "too large" (most images/icons qualify), there are other viable options too. You can convert the content of the file to a Base64 string and store that in your source code. On application startup (func init()) or when needed, you can decode it to the original []byte content. Go has nice support for Base64 encoding in the encoding/base64 package.
Converting a (binary) file to base64 string is as simple as:
data, err := ioutil.ReadFile("someimage.png")
if err != nil {
panic(err)
}
fmt.Println(base64.StdEncoding.EncodeToString(data))
Store the result base64 string in your source code, e.g. as a const.
Decoding it is just one function call:
const imgBase64 = "<insert base64 string here>"
data, err := base64.StdEncoding.DecodeString(imgBase64) // data is of type []byte
Storing as quoted string
More efficient than storing as base64, but may be longer in source code is storing the quoted string literal of the binary data. We can obtain the quoted form of any string using the strconv.Quote() function:
data, err := ioutil.ReadFile("someimage.png")
if err != nil {
panic(err)
}
fmt.Println(strconv.Quote(string(data))
For binary data containing values from 0 up to 64 this is how the output would look like (try it on the Go Playground):
"\x00\x01\x02\x03\x04\x05\x06\a\b\t\n\v\f\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f !\"#$%&'()*+,-./0123456789:;<=>?"
(Note that strconv.Quote() appends and prepends a quotation mark to it.)
You can directly use this quoted string in your source code, for example:
const imgdata = "\x00\x01\x02\x03\x04\x05\x06\a\b\t\n\v\f\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f !\"#$%&'()*+,-./0123456789:;<=>?"
It is ready to use, no need to decode it; the unquoting is done by the Go compiler, at compile time.
You may also store it as a byte slice should you need it like that:
var imgdata = []byte("\x00\x01\x02\x03\x04\x05\x06\a\b\t\n\v\f\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f !\"#$%&'()*+,-./0123456789:;<=>?")
The go-bindata package looks like it might be what you're interested in.
https://github.com/go-bindata/go-bindata
It will allow you to convert any static file into a function call that can be embedded in your code and will return a byte slice of the file content when called.
Bundle React application
For example, you have a build output from react like the following:
build/favicon.ico
build/index.html
build/asset-manifest.json
build/static/css/**
build/static/js/**
build/manifest.json
When you use go:embed like this, it will serve the contents as http://localhost:port/build/index.html which is not what we want (unexpected /build).
//go:embed build/*
var static embed.FS
// ...
http.Handle("/", http.FileServer(http.FS(static)))
In fact, we will need to take one more step to make it works as expected by using fs.Sub:
package main
import (
"embed"
"io/fs"
"log"
"net/http"
)
//go:embed build/*
var static embed.FS
func main() {
contentStatic, _ := fs.Sub(static, "build")
http.Handle("/", http.FileServer(http.FS(contentStatic)))
log.Fatal(http.ListenAndServe("localhost:8080", nil))
}
Now, http://localhost:8080 should serve your web application as expected.
Credit to Amit Mittal.
Note: go:embed requires go 1.16 or higher.
also there is some exotic way - I use maven plugin to build GoLang projects and it allows to use JCP preprocessor to embed binary blocks and text files into sources. In the case code just look like line below (and some example can be found here)
var imageArray = []uint8{/*$binfile("./image.png","uint8[]")$*/}
As a popular alternative to go-bindata mentioned in another answer, mjibson/esc also embeds arbitrary files, but handles directory trees particularly conveniently.

In Go templates, I can get Parse to work but cannot get ParseFiles to work in like manner. Why?

I have the following code:
t, err := template.New("template").Funcs(funcMap).Parse("Howdy {{ myfunc . }}")
In this form everything works fine. But if I do exactly the same thing with ParseFiles, placing the text above in template.html it's a no go:
t, err := template.New("template").Funcs(funcMap).ParseFiles("template.html")
I was able to get ParseFiles to work in the following form, but cannot get Funcs to take effect:
t, err := template.ParseFiles("template.html")
t.Funcs(funcMap)
Of course, this last form is a direct call to a function instead of a call to the receiver method, so not the same thing.
Any one have any ideas what's going on here? Difficult to find a lot of detail on templates out in the either.
Did some digging and found this comment for template.ParseFiles in the source:
First template becomes return value if not already defined, and we use that one for subsequent New calls to associate all the templates together. Also, if this file has the same name as t, this file becomes the contents of t, so t, err := New(name).Funcs(xxx).ParseFiles(name) works. Otherwise we create a new template associated with t.
So the format should be as follows, given my example above:
t, err := template.New("template.html").Funcs(funcMap).ParseFiles("path/template.html")
.New("template.html") creates an empty template with the given name, .Funcs(funcMap) associates any custom functions we want to apply to our templates, and then .ParseFiles("path/template.html") parses one or more templates with an awareness of those functions and associates the contents with a template of that name.
Note that the base name of the first file MUST be the same as the name used in New. Any content parsed will be associated with either an empty preexisting template having the same base name of the first file in the series or with a new template having that base name.
So in my example above, one empty template named "template" was created and had a function map associated with it. Then a new template named "template.html" was created. THESE ARE NOT THE SAME! And since, ParseFiles was called last, t ends up being the "template.html" template, without any functions attached.
What about the last example? Why didn't this work? template.ParseFiles calls the receiver method Parse, which in turn applies any previously registered functions:
trees, err := parse.Parse(t.name, text, t.leftDelim, t.rightDelim, t.parseFuncs, builtins)
This means that custom functions have to be registered prior to parsing. Adding the functions after parsing the templates doesn't have any affect, leading to nil pointer errors at runtime when attempting to call a custom function.
So I think that covers my original question. The design here seems a little clunky. Doesn't make sense that I can chain on ParseFiles to one template and end up returning a different template from the one I am chaining from if they don't happen to be named the same. That is counterintuitive and probably ought to be addressed in future releases to avoid confusion.
ParseFiles should have some names of templates which is basename of filename. But you call template.New, it's create new one of template named error. So you should select one of templates.
foo.go
package main
import (
"text/template"
"log"
"os"
"strings"
)
func main() {
tmpl, err := template.New("error").Funcs(template.FuncMap{
"trim": strings.TrimSpace,
}).ParseFiles("foo.tmpl")
if err != nil {
log.Fatal(err)
}
tmpl = tmpl.Lookup("foo.tmpl")
err = tmpl.Execute(os.Stdout, " string contains spaces both ")
if err != nil {
log.Fatal(err)
}
}
foo.tmpl
{{. | trim}}
try this:
var templates = template.Must(template.New("").Funcs(fmap).ParseFiles("1.tmpl, "2.tmpl"))