Safe way to save html to database - html

I have a textarea in my page that is a HTML input field. The intention is to allow the user to register a confirmation HTML that will be shown in their users' browser after a certain action is taken place. You can imagine it as the confirmation of paypal after you pay something and it redirects you to a website that says "Thanks for your purchase". This is already implemented alright, but now I'm thinking about the user's security(XSS/SQL Injection).
What I want to know is how to filter out certain html tags such as <script> <embed> <object> safely inside my controller post action, so if I detect that there is a malicious html inside the HTML, I'll stop execution before saving. Right now I am doing like this:
[CustomHandleError]
[HttpPost]
[ValidateAntiForgeryToken]
[AccessDeniedAuthorize(Roles = "Admin,CreateMerchant")]
public ActionResult Create(MerchantDTO merchantModel)
{
if (ModelState.IsValid)
{
if (!IsSafeConfirmationHtml(merchantModel.ConfirmationHtml))
{
ModelState.AddModelError("ConfirmationHtml", "Unallowed HTML tags inputted");
return View("Create", merchantModel);
}
.
.
.
}
}
and my IsSafeConfirmationHTML is defined as
private bool IsSafeConfirmationHtml(string html)
{
if (html.ToLower().Contains("<script") || html.ToLower().Contains("<embed") || html.ToLower().Contains("<object"))
{
return false;
}
return true;
}
Is there a smarter, cleaner way to do this? I mean, I don't want to get false positives blocking the words "object", "script", etc, but I also don't want to be fooled by encodings that translate "<" to "%3C" or such...
Ontopic: does spacing inside tags works? Example: < script > alert("1"); < / script >?

So one thing you could do to defeat the encoding attack would be to run UrlDecode and HtmlDecode (html decode is probably superfluous, but it depends on what you do with the script) on it.
Another thing to speed up your checking would be to turn to a precompiled regex.
private static Regex disallowedHtml = new Regex(#"script|embed|object",
RegexOptions.IgnoreCase);
private bool IsSafeConfirmationHtml(string html)
{
Match match = disallowedHtml.Match(html);
return !match.success;
}
The static Regex instance cuts out most of the overhead of regex's for every run but the first one, making the regex match much faster than running 3 separate contains. You could make the regex complex enough to search for opening angle brackets, html entities and url encoded chars, match any whitespace between those chars and the actual tag name etc. etc. The Microsoft regex info has gotten quite good over the years.
I still wouldn't say this makes you 100% safe from a user (uploader? customer? the right word depends on what your business model is) running an XSS or injection attack against visitors to your site. They could point to an image or a css file that returns as mime-type x-application, or some such. And HTML is changing pretty rapidly these days. The best way to guarantee against that is to have a human involved in an approval process as well, but humans make mistakes and computers can be fooled, and there's no law that says those two events can't happen at the same time. But you are right to put some safeguards in place.

Related

SQL Query - extracting data between html tags in a text column into new table

A web form was developed and instead of saving individual fields the whole form was saved in a text column as html. I need to extract data between various html tags so want to create a query that writes each set of tags to a table for me to then use - if this is possible please can someone advise how this can be achieved.
Thank you.
Ok, i've noticed you weren't happy with my last answer, however i am still certain you need server side code to handle SQL queries. Basically without having any sort of server side code you wanna do something like "here's a TV i bought, and here's the DVD release of the "John Wick" movie, i wanna watch it on this TV. Without a DVD player you can't really do it tho', that is the role of the PHP or ASP.NET in this case. Since I am not familiar with PHP, I am only able to show a solution in ASP.NET C# which i put together some time ago.
Here's how I solved this in a site i've built some time ago. It is not the cleanest, but it most certainly worked.
In ASP.NET you have the page file, which is similar to a HTML or an XML file, using a lot of pointy brakcets. Create one like this:
page file, body:
<asp:TextBox ID="HiddenTextBox" style="display: none;" runat="server"
onclick="OnStuff" OnTextChanged="TheUserChangedMe"
AutoPostBack="true"></asp:TextBox>
scroll up a bit, and in the section add some javascript, where you can handle on text change instantly. So as soon as something happens, like the user clicks on an image, or... well does anything you want him to do (and respond to that) you need the ID of that element he clicked on.
page file, head:
<script type="text/javascript">
function MarkItems_onclick() {
var Sender = window.event.srcElement;
document.getElementById('<%= HiddenTextBox.ClientID%>').value = Sender.id;
__doPostBack('<%= HiddenTextBox.ClientID%>', 'TextChanged');
}
</script>
page's .cs file, the C# code behind
//
//add these on top:
//
using System.Configuration;
using System.Data.SqlClient;
using System.Data;
//
// later somewhere write this:
//
protected void TheUserChangeMe(object sender, EventArgs e)
{
SqlConnection conn1 = new SqlConnection(ConfigurationManager.ConnectionStrings["UserRegConnectionString"].ConnectionString);
//
Note: the connection string has to be set up earlier. make sure you make one, visual studio will let you do that in no time. Do not forget, that this connection string defines where your DB is located, and grants the required information for the site how to even reach it at the first place.
//
// somewhere you need to read out what you have in your HiddenTextbox:
//
String stringToProcess = HiddenTextBox.Text;
process your stuff, here i assume you cut it up accordingly and you will have an insertQ variable with a proper syntax. in order to add these values it should look something like this:
String insertQ = "insert into OrdersTable(OrderId, Type, Quantity) " +
"values (#OrderId, #StuffType, #StuffQuantity)";
How to access the database in asp.net C#:
conn1.Open();
SqlCommand insertComm = new SqlCommand(insertQ, conn1);
insertComm.Parameters.AddWithValue("#OrderId", nonStringVariable.ToString());
insertComm.Parameters.AddWithValue("#StuffType", aStringVariable);
insertComm.Parameters.AddWithValue("#StuffQuantity", "some random text");
insertComm.ExecuteNonQuery();
conn1.Close();
that's pretty much it. Your SQL database will have 3 fields filled up every time this function runs. It's a bit messy, but for my site it was crucial to handle any onclick event, and with this you can flush out 23 checkboxes, 10 pictures and whatnot as page elements, yet you'll know what happened every time the user clicked something.
i'm not a professional either, however i think you're gonna need something on the server side to process this query, like asp.net or php. Basically the server side code would have no problem generating your page's content according to what comes back from the DB.

Meta tag not functioning on one page but works on all the others?

This is the meta tag:
<meta http-equiv="X-UA-Compatible" content="IE=edge" />
It works on all pages, except 1.
The pages are being rendered with coldfusion includes <cfinclude> and is setup in a directory structure similar to MVC. The page that the tag is broken on has it's own controller (but only shows one page), but that controller is identical to the other's. I'm not sure what could be causing this.
The pages are also setup in modular design, where each function of the page is imported in chunks. Each page has it's own unique modules, so the problem is there in one of those chunks......... I just don't know what that problem could possibly be. I've been combing over the modules for the past few days and just can't find it.
What could possibly be causing this meta tag to not work? There is too much code paste here, so I'm hoping for an answer that can lead me in a direction to look for the solution. I don't believe it has anything to do with Coldfusion.
First, as drezabek stated, inspect the resulting HTML. IE is especially picky about that specific meta header. From my experience the IE=EDGE header must be the very first header on the entire page and must be the first item at the top of your <head> Check your HTML, is that the case?
Second, Coldfusion is notoriously troublesome about extra whitespace. If there is whitespace above your doctype, or possibly the header in question it could cause it to malfunction. When in doubt always use output='false' on all of your functions, even your cfscript functions. In addition, even some native CF methods add whitespace. For example serialize an ORM object, SerializeJSON(EntityLoad('blah')), and you'll see it throws in some whitespace. Joy.
The answer must lie within the HTML. View the source for a page that works, and the one that doesn't, and try looking for a difference that could cause the problem. If you can't find it, maybe post the source here?
The HTML really is the key. After you have that, finding the function that is causing the bad HTML should be easy.
It seems that I was wrong. The issue does in fact stem from Coldfusion.
Since I did not create the page that was having the issue, I overlooked an important part. The person who created it was using the cfform input tags with verification. The caused Coldfusion to insert this into the rendered page at the end of the header:
<script type="text/javascript">
<!--
_CF_checkeditUserInfo = function(_CF_this)
{
//reset on submit
_CF_error_exists = false;
_CF_error_messages = new Array();
_CF_error_fields = new Object();
_CF_FirstErrorField = null;
//display error messages and return success
if( _CF_error_exists )
{
if( _CF_error_messages.length > 0 )
{
// show alert() message
_CF_onErrorAlert(_CF_error_messages);
// set focus to first form error, if the field supports js focus().
if( _CF_this[_CF_FirstErrorField].type == "text" )
{ _CF_this[_CF_FirstErrorField].focus(); }
}
return false;
}else {
return true;
}
}
//-->
</script>
Now I'm not entirely sure what this is doing to cause the meta tag to break, but when I remove the module that causes this script to be generated, the issue is fixed. When that is removed text compare shows that the rendered headers in both pages have 0 differences.
I completely fixed my specific issue by setting the WebConfig in IIS itself.

Generating page information depending on variable selected

I am building a site with a select tag that has all 50 states. I want to be able to have a page generated for each state selected rather than having to write 50 separate pages. Any ideas on how I would accomplish this? Thank you.
You can either load all data on page load, or via ajax, and only show the data for the current state,
or if you have a server side database/data source to pull the data from you could also just have the page deliver data for a specific state, defined by a query/GET variable, the URL would look something like:
http://mysite.com/myPage.aspx?state=ca
or
http://mysite.com/myPage.php?state=az
When the page is requested you can then have your server page populate the correct data for the current state, which would then be sent over to the client.
Generally speaking I would lean toward the server side solution, especially if your visitors will likely only visit a handful of states then there's no reason to load ALL states data. On the flip side if the data for each state is very minimal it might not make much different either way.
EDIT
I'm not aware of any specific tutorials on the web, but since this encompasses putting a few things together I'll tell you the topics that might be useful in accomplishing this.
Depending on your level of knowledge of php, or c#, or other chosen language for your server side part, research the following topics:
Read a Get variable
Switch case statements
Add variable value into parts of html, or concatenation of html strings
Optionally various database topics, (if you don't want to hard code
data into your code)
For example, your server side php page could look something like this (untested code):
<?php
switch($_GET["state"]) {
case "ca":
$pageTitle = "California";
$pageContent = "stuff about CA";
break;
case "az":
$pageTitle = "Arizona";
$pageContent = "stuff about a really hot state";
break;
...
}
....
echo '<h1>' . $pageTitle . '</h1>' . '<p>' . $pageContent . '</p>';
?>
Along with that you'd want to handle when the visitor selects a different state, and upon selection of a state load the correct page. With jQuery you could do something with the change() method.
in your html body declare you drop down list (select input) with id="stateSelectDropDown", then in your javascript (untested code):
<script type="text/javascript">
$(function() {
$("#stateSelectDropDown").change(funciton() {
window.location = "nameOfThisPage.php?state=" + $(this).val();
});
}
</script>
I apologize for any typos or bad syntax, I hope this points you in the right direction.

Model Validation without the ValidationSummary?

I have a LoginModel for my Login Action, but I'm wanting to use just HTML.
Example...
public class LoginModel
{
[Required]
public string Email { get;set; }
}
in my HTML, I have
<input type="text" value="" name="Email">
This is because I'm going to be storing my HTML in my database, problem I'm having is, how do I get model validation without using Html.ValidationSummary()?
I was hoping I could just do <div class="validation-summary-errors"></div>
As this is what is in the HTML, but does not work..
Ideas?
Regardless of where you store your HTML the validation is done on the client side. There are various posts on how to use the virtual path provider to store your views somewhere else (DB) and then validation should still work fine. I think I'm missing why it's not working for you though so I have to imagine you aren't using the path provider to find your views.
Edit
Seems you want to inject messages into a Div. This wont happen automaticaly unless you work some magic in the path provider. Use your own helper method in the view to avoid hacks or just use what's provided by default. If you really want to do it render your view in your controlllet and search for your Div pattern to replace.
custom ValidationForMessage helper removing css element
Note Darin's method
var expression = ExpressionHelper.GetExpressionText(ex);
var modelName = htmlHelper.ViewContext.ViewData.TemplateInfo.GetFullHtmlFieldName(expression);
var modelState = htmlHelper.ViewData.ModelState[modelName];
without access to ViewContext in your controller you can only render your html for your View. However, somewhere in your view you need (as far as I can tell) a helper method to stick your error collection into ViewData.
Your Virtual Path Provider may have to inject this helper method into your view text so it is there for Razor to parse. Actually - duh. This may be much easier. Your provider may be able to just simply read your html from the database, find the div, and inject the #Html.ValidationSummary into that div. I believe this would work. Why not just put the validation summary in there though if its going to end up there in the end anyways (essentially)

Chrome Extension: DOM traversal

I want to write a Chrome extension that looks at the HTML of the page its on, and if it finds eg <div id="hello"> then it will output, as a HTML list in the popup, 'This page has a friendly div' and if it finds eg I am married to a banana then it will output 'This guy is weird.'
So in other words, searching for specific stuff in the DOM and outputting messages in the popup depending on what it finds.
I had a look at Google Chrome Extension - Accessing The DOM for accessing the dom but I'm afraid I don't really understand it. Then of course there will be traversing the dom and presumably using regex and then conditional statements.
Well that stackoverflow question asked how to let your extension talk to the DOM. There are numerous ways, one way is through chrome.tabs.executeScript, and another way is through Message Passing as I explained in that question.
Back to your question, you could use XPath to search within the DOM. It is pretty powerful. For example you said you want to search for <div id="hello">, you can do it like this:
var nodes = document.evaluate("//div[#id='hello']", document, null,
XPathResult.ANY_TYPE, null)
var resultNode = nodes.iterateNext()
if (resultNode) {
// Found the first node. Output its contents.
alert(resultNode.innerHTML);
}
Now for your second example, same thing ..
I am married to a banana
var nodes = document.evaluate("//a[#href='http://bananas.com']/text()[contains(.,'married')]",
document, null,
XPathResult.ANY_TYPE, null)
var resultNode = nodes.iterateNext()
if (resultNode) {
// Found the first node. Output its contents.
alert('This guy is weird');
}
Well you could use XPath which does work perfectly in Chrome, and you can make your query simple such as finding nodes that you want or even complex with detail. You can query any node, and then do post processing if you wish as well.
Hope that helped. Remember all this should be within a content script in the Chrome Extension. And if you want your extension to communicate to that, you can use Message Passing as I explained in the other post. So basically, within your popup.html, you send a request to the content script to find you text. Your content script will send back a response from its callback. To send the request, you should use chrome.tabs.sendRequest and within the content script.You listen for that request and handle it. As I explained in the other stackoverflow question.
Do NOT use regular expressions to parse HTML. The <center> cannot hold.
With that out of the way... although you can use XPath, I think querySelector is similar in power while being somewhat simpler as well.
You simply pass a CSS selector as a string, and it returns the elements that match the selector. Kinda like using jQuery without needing to load the jQuery library.
Here's how you would use it:
var query = document.querySelector("div#hello");
if (query) {
alert("This page has a friendly div");
}
var query = document.querySelectorAll("a[href='http://bananas.com']");
for (var i = 0; i < query.length; i += 1) {
if (query[i].textContent === "I am married to a banana") {
alert("This guy is weird.");
return;
}
}
document.querySelector finds only a single element, and returns null if that element is not found.
document.querySelectorAll returns a fake-array of elements, or an empty fake-array if none are found.
...however, it sounds like you're wanting to update the browser action popup when something is detected in a webpage, correct? If so, that is possible but immensely more difficult.
Mohamed Mansour's post will get you to the point where you can communicate between content scripts and the background page/popup, but there are other bits that need to be done as well.
Unless the problem is more complex than I think, why not just use jQuery or other convenient js api for this? This is what they were made for - to traverse the dom easily. You can inject jquery and your script that will be using it into required pages in manifest:
"content_scripts": [ {
"js": [ "jquery.js", "script.js" ],
"matches": [ "http://*/*", "https://*/*" ]
}]