How can I remove or escape new line-carriage returns within an XML string in XSL?

How can I remove or escape new line-carriage returns within an XML string in XSL? - html

I've got an ASP multiline textbox that saves user defined text to a database. When I retrieve the data, I serialize it into xml and then run it through an XSL transform to output my HTML.
Within my transform, I am passing the textbox defined data into a javascript function via an onclick event of an element.
The problem I'm running into...when a user enters a carriage return into the textbox and saves it to the database, a javascript error is generated on page load.
I'm using .NET's XslCompiledTransform to do the transform. There is a property on XmlDocument called PreserveWhiteSpace, default is false, that can be set to strip out white space in the XML. This solves the problem of not allowing a user to enter breaking text, however, the client wants to preserve the formatting of the text that they enter if at all possible.
From what I know, .NET XslCompiledTransform transforms carriage returns-new line into
. I believe these are the characters that are breaking the javascript.
My first thought was to strip out the carriage returns within the xsl prior to passing the string into the javascript function, but I've not been able to figure out what characters to "search" the string for.
I guess another question is what characters get stored in SQL for carriage returns from within an ASP.NET textbox control?
Looking directly at the data in the database, SQL seems to display the character as "non-displayable" characters or (2 empty boxes).
Has anyone had any experience with this type of thing?

I was able to do this in the code behind to get my desired results:
using (StringWriter sWriter = new StringWriter())
{
xTrans.Transform(xDoc, xslArgs, sWriter);
return sWriter.ToString().Replace("
", "\\r\\n");
}

One other thing that I've stumbled across...
Initially, I wanted to find a solution to this problem that did not require a "compiled" code change, ie. a way to do this within xsl aka "a short term fix".
I tried this first and was not successful...
<xsl:variable name="comment" select="normalize-space(.\Comment)" />
This essentially did nothing and I was still receiving the javascript error.
Eventually, I tried this...
<div onclick="Show('{normalize-space($comment)}'"></div>
The second actually worked in stripping out the white space, thus, the javascript error was avoided. This wasn't a complete solution for my requirements because it only solved the issue of the javascript error, however, it would effectively prevent the user from "breaking" the page.
For that reason, it could suffice as a short term solution.

Related

ReactJS - How to render carriage returns correctly when returned in Ajax call

In ReactJS, how is it possible to render carriage returns that may be submitted by the user in a textarea control. The content containing the carriage returns is retrieved by an Ajax call which calls an API that needs to convert the \r\n characters to <br> or something else. And then, I have a div element in which the content should be rendered. I tried the following Ajax responses:
{
"Comment" : "Some stuff followed by line breaks<br/><br/><br/><br/>And more stuff.",
}
and
{
"Comment" : "Some stuff followed by line breaks\n\n\nAnd more stuff.",
}
But instead of rendering the carriage returns in the browser, it renders the br tags as plain text in the first case and \n character as space in the second case.
What's the recommended approach here? I'm guessing I should steer clear of the scary dangerouslySetInnerHTML property? For example the following would actually work but there must a safer way of handling carriage returns:
<div className="comment-text" dangerouslySetInnerHTML={{__html: comment.Comment}}></div>

dangerouslySetInnerHTML is what you want. The name is meant to be scary, because using it presents a risk for XSS attacks, but essentially it's just a reminder that you need to sanitize user inputs (which you should do anyway!)
To see an XSS attack in action while using dangerouslySetInnerHTML, try having a user save a comment whose text is:
Just an innocent comment.... <script>alert("XSS!!!")</script>
You might be surprised to see that this comment will actually create the alert popup. An even more malicious user might insert JS to download a virus when anyone views their comment. We obviously can't allow that.
But protecting against XSS is pretty simple. Sanitization needs to be done server side, but there are plenty of packages available that do this exact task for any conceivable serverside setup.
Here's an example of a good package for Rails, for example: https://github.com/rgrove/sanitize
Just be sure whichever sanitizer you pick uses a "whitelist" sanitization method, not a "blacklist" one.

If you're using DOM, ensure you're using innerHTML to add text. However, in react world, more favourable is to use https://www.npmjs.com/package/html-to-react
Also, browser only understands HTML and won't interpret \n as line break. You should replace that with <br/> before rendering.

Keep linebreaks when getting text from <textarea>

I'm building a site with Visual Web Developer with C# and HTML.
I have a page where users can write feedback about my site in a textarea tag and then submit (in the textarea they can do a line-break everywhere).
The problem is that when I get back the text they wrote it appears without the linebreaks, for example:
if the user wrote:
"Hello, my name is
Omer N."
When I get it back it will look like this: "Hello, my name is Omer N.".
How can I fix this problem?

Depends on how you are storing the values. Remember that HTML and general input from fields following the whitespace rule, it will truncate/condense white space into a single entity.
So "Wide String" = "Wide String" and:
"Multi-line
string
here" will be truncated to "Multi-line string here" as you have experienced.
This is the default behavior.
So to keep your line breaks, spacing, etc.. you need to escape it or a process of encoding and decoding, before storing it.
It is explained here:
Many newcomers to web development cannot get their head around why the
carriage returns they made in their data on input from a textarea, or from a
text file, Excel spreadsheet etc. do not appear when the web page renders.
The solution is fairly obvious once the newcomer realizes that a web
page is only the browser's interpretation of html markup, and that a
new line in html is represented by the tag. So what is needed
is a way to swap carriage returns or line feeds with the tag.
Well, a way to Replace() them, actually.
<%# Eval("MyMultiLineValue").ToString().Replace(<linebreak>,"<br />") %>
The string.Replace() method allows this, but we also need to identify
what we want to replace with the html tag. How is a new line
represented in C# or VB.Net?
In C#, it's "\r\n", while in VB.Net, it's vbcrlf. However, there is
also a language independent option that does just the same thing:
Environment.NewLine.
<%# Eval("MyMultiLineValue").ToString().Replace(Environment.NewLine,"<br />") %>
Hope this helps! :)

How to detect HTML in clipboard data using Qt

I have a rich text editor I'm working on where I need to parse and clean data from the clipboard when appropriate. Whenever the text being pasted contains HTML, I will clean it up and update the text field with the correct html.
However, when there is no html in the clipboard, there is no need for me to run the html cleaning tool.
My first thought was to use Regex and check for any html tag in there, but I'm not sure this is the best solution for this problem as it can cause more headaches in the long run with false positives, etc.
My question is, how can I detect some HTML in the clipboard?
Is there a an elegant way to solve this problem without having to resort to Regex?

may be one of these functions:
bool QDomDocument::setContent(...)
This function reads the XML document from the string text, returning true if the content was successfully parsed; otherwise returns false. Since text is already a Unicode string, no encoding detection is done
Addition for a clipboard's mixed data:
// get a html data from a junk
QString htmlText = cliboardString.section("</html>", -2, 0,QString::SectionIncludeTrailingSep)
.section("<html", 1,-1,String::SectionIncludeLeadingSep);
// check for a validness, correctness etc.
if( !htmlText.isEmpty() ) {
QDomDocument::setContent(htmlText,...
}

Creating custom SSRS handler for field with HTML

I have an SSRS 2008 report with a field that contains and is configured to render as HTML. Some of the text in this field may contain IMG tags, and the IMG tag is not among the tags SSRS natively supports within its HTML rendering extension.
I am trying to find a way to write a custom handler to hook into the processing of this field that will let me look at the raw HTML before the SSRS handler processes it, in the hopes of grabbing IMG tags, extracting the SRC URL and getting the raw bytes of an image to insert on the fly in a way SSRS will accept, yet retaining the HTML SSRS will render.
From what I've read and seen so far, if a field is marked to render as HTML, the SSRS processor grabs it and parses it entirely before any handler could modify it, meaning the IMG tag is (would be) discarded before I could do anything with it (or even know it was present). The only option I see is to turn off the HTML rendering entirely, thus losing the benefit of the tags SSRS can recognize.
EDIT: Per Jamie's response below, I'm beginning to think the "2nd half" of this issue may prove harder than I realized: Is it even possible to programmatically add an Image to an SSRS Report at runtime (obviously through code/custom assembly)? That is, I'd like to write some code that might look something like this (pseudocode)
'Conceptual Pseudocode I'd like to be able to write
'for dynamic addition of Image element in SSRS report
'Is this even possible?? Is there a documented Report
'object model??
Public Function AddImage(imageBytes() as Byte) as Image
Dim newImage as New Image()
newImage.SetBytes(imageBytes)
Report.Add(newImage)
return newImage
End Function
I'm hoping I'm just overlooking something simple that prevents me from grabbing the raw, unprocessed HTML, and someone else might be able to point me in the right direction on how to grab it.

EDIT: I have created and implemented this solution within the SSRS development environment and it works. WOOHOO :) It did require some hoop-jumping with creating a Single-Threaded Apartment thread to host the WebBrowser control, and to create a message pump, but it does work! **
As I was literally typing up the message to a co-worker that this issue was a non-starter, I did have a bit of an inspiration on a way to solve this problem. I know this post hasn't generated a great deal of response, but just in case someone else finds themselves in a similar problem, I'm going to share what I've implemented in a "petri dish" scenario that, provided I get all the code permission issues resolved, should allow me a decent solution to this problem.
With SSRS inability to handle an IMG tag insurmountable, I actually thought of an idea that took the HTML rendering away from SSRS entirely. To do this, I created custom code that hands off the HTML rendering to a WebBrowser control, then copies the rendered result as an image. It does the following:
Instantiates a WebBrowser control of a given width and height.
Sets the DocumentText property of that control to the HTML from TinyMCE
Waits for the DocumentText to completely render.
Creates a bitmap equal to the size of the control.
Uses the undocumented and presumably unsupported DrawToBitmap method of the WebBrowser to draw the rendered HTML to a bitmap.
Copies the Bitmap to an Image
Saves the Image as a .png file
Returns the path to the .png as the result of the function.
In SSRS, I plan to replace the erstwhile HTML text field with an external Image control that will then call the above method and render the image file. I may alter that to simply draw the image to the SSRS Image control directly, but that's a final detail I'll resolve later. I think this basic design is going to work. Its a little kludgey, but I think it will work.
I have some permissions issues to work out with the code that SSRS will allow me to call at runtime, but I'm confident I'll get those sorted out (even if I end up moving the code to a separate assembly). Once this is tested and working, I plan to mark this as the answer.
Thanks to those who offered suggestions.

I've done something similar with success: We had an HTML "Comment" field that was collected on a web form. For a particular report we wanted to truncate this field to the first 1000 characters or so, but preserve valid HTML.
So I created a C# .dll & class with a public function:
public static string TruncateHtml(string html, int characters)
{
...
}
(I used the HtmlAgilityPack for most of the HTML parsing, and to create and close off my new HTML string, while I kept track of the content length.)
Then I could call that code with the fully qualified path to the function in an SSRS expression:
=ReportHtmlHandler.HtmlTruncate.TruncateHtml(Fields!Comment.Value, 1000)
I could have added a calculated field to my dataset with this, but I was only using this value for one field, so I kept it at the field expression level.
All of this code gets called well before the HTML is processed or rendered by SSRS. I'm sure that any original IMG tag will be in the string.
This approach might work for you, possibly create a ExtractImg function which could be set as the source of an img on the report. I think some of the tricky bits for your requirement will be to handle multiple images as well as embedding the extracted img. But you might be able to do this simply with a external reference to an image. I haven't done much with external images in SSRS.
An MSDN blog entry on calling a custom dll from SSRS: http://support.microsoft.com/kb/920769

Perl AJAX stripping html characters out of string?

I have a Perl program that is reading html tags from a text file. (im pretty sure this is working because when i run the perl program on the command line it prints out the HTML like it should be.)
I then pass that "html" to the web page as the return to an ajax request. I then use innerHTML to stick that string into a div.
Heres the problem:
all the text information is getting to where it needs to be. but the "<" ">" and "/" are getting stripped.
any one know the answer to this?

The question is a bit unclear to me without some code and data examples, but if it is what it vaguely sounds like, you may need to HTML-encode your text (e.g. using HTML::Entities).
I'm kind of surprized that's an issue with inserting into innerHTML, but without specific example, that's the first thing which comes to mind

There could be a mod on the server that is removing special characters. Are you running Apache? (I doubt this is what's happening).
If something is being stripped on the client-side, it is most likely in the response handler portion of the AJAX call. Show your code where you stick the string in the div.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008