ASP.NET - Using HAP (HTML Agility Pack) to Add Class References During Render - html

Is it possible to use the HAP (HTML Agility Pack) to:
Grab a collection of nodes e.g. all <a> elements which are children of <li> elements
Iterate over the collection
Add CSS class references into the class attribute for each element e.g. class &= "foo"
Update the nodes in their original position within the HTML
For point 4, I need to know whether:
When I grab a collection of nodes, am I working with copies?
If so, can I easily update the nodes in their original position within the HTML
Finally, would it be practical to do this when rendering a page in an ASP.NET website, considering:
I will need to modify the class references for no more than 100
elements
I am not working with large HTML documents
I plan to select my nodes starting at a div e.g. div[2] where body
contains 4 divs
I realise that this may seem like a bunch of separate questions but really it is just a breakdown of the following two questions:
Can I easily modify the HTML output of an ASP.NET page e.g. to insert class references?
Would it be practical to do this on 50 - 100 elements WRT speed e.g. no more than 2 seconds cost?
Many thanks.

Check out my CsQuery project: https://github.com/jamietre/csquery or on nuget as "CsQuery".
This is a C# (.NET 4) port of jQuery. Selectors are orders of magnitude faster than HTML Agility Pack; in fact, my initial purpose in writing it was to do exactly what you want to do: manipulate HTML in real time. As it happens, from a CMS with html generated by ckeditor.
To intercept HTML in webforms with CsQuery you do this in the page codebehind:
using CsQuery;
using CsQuery.Web;
protected override void Render(HtmlTextWriter writer)
{
// the CsQueryHttpContext object is part of the CsQuery library, it's a helper
// than abstracts the process of intercepting base.Render() for you.
CsQueryHttpContext csqContext =
WebForms.CreateFromRender(Page, base.Render, writer);
// CQ object is like a jQuery object. The "Dom" property of the context
// returned above represents the output of this page.
CQ doc = csqContext.Dom;
doc["li > a"].AddClass("foo");
// write it
csqContext.Render();
}
There is basic documentation on GitHub, but apart from getting HTML in and out, it works pretty much like jQuery. The WebForms object above is just to help you handle interacting with the HtmlTextWriter object and the Render method. The general-purpose usage is very simple:
var doc = CQ.Create(htmlString);
// or
var doc = CQ.CreateFromUrl(url);
.. do stuff with doc, a CQ object that acts like a jQuery object
string html = doc.Render();

Don't do that ! ASP.NET is not meant to be used that way, there is a better ways to do this task depending on how do you create that markup in witch you want change or add css classes. ASP.NET uses aspx templates, basically html markup and there you can intervene with code executing on server, here you can set css class statically or use server side scripts to set css class on markup with some code.
You can also create controls in code behind and set css to controls if anchor control have parent that is list item control (you will have to use server side controls).
To do it your way you will have to make Response Filter (example here) and after request is done do your parsing and write results and changes back to response stream. It's much easier using common ASP.NET techniques.

Related

JSoup Select Tag Recursive Search

I recently tried to work with JSoup to parse HTML documents, I went through the turorial on JSoup and found that the select-Method might be what I am looking for.
What I try to accomplish is to find all elements in a html document which possess a certain class. To test that, I tried this with the amazon web page (idea: find all deals with certain offers).
So I inspected the web page to see which classes and ids are being used and then I tried to integrate this into a small code snippet. In this example I found the follwing element:
<span id="dealTitle" class="a-size-base a-color-link dealTitleTwoLine restVisible singleCellTitle autoHeight">PROCAVE Matratzen-Brücke aus Schaumstoff 25 x 200 cm für ...</span>
This element is embedded in other elements and exists multiple times (for each deal of course). So here is my code to read the deal elements:
Document doc = Jsoup.connect("https://www.amazon.de/gp/angebote/ref=gbph_ftr_s-8_cd61_page_1?gb_f_LD=dealStates:AVAILABLE%252CWAITLIST%252CWAITLISTFULL%252CUPCOMING,dealTypes:LIGHTNING_DEAL,page:1,sortOrder:BY_SCORE,dealsPerPage:8&pf_rd_p=425ddcb8-bed4-4e85-ac0f-c1a79d14cd61&pf_rd_s=slot-8&pf_rd_t=701&pf_rd_i=gb_main&pf_rd_m=A3JWKAKR8XB7XF&pf_rd_r=BTHRY008J9N3N5CCMNEN&gb_f_second=dealStates:AVAILABLE%252CWAITLIST%252CWAITLISTFULL,dealTypes:COUPON_DEAL,page:8,sortOrder:BY_SCORE,dealsPerPage:8").timeout(0).get();
Elements deals = doc.select("span.a-size-base.a-color-link.dealTitleTwoLine.restVisible.singleCellTitle.autoHeight");
for (Element deal : deals) {
if (deal.text().contains("ItemMatch")) {
System.out.println("Found deal: " + deal.text());
}
}
Unfortunately I can't get the element I am looking for. deals has always the size of 0. I tried to modify my select with only part of the classes, I added the id-attribute and so on. Nevertheless, I do not get the elements (in this case these are nested into some others). If I try an element which is above this element in the DOM hierarchy (e.g. the div with class "a-section a-spacing-none slotContainer"), this is found.
Do I actually need to specify the whole DOM hierarchy (by using ">" in my select expressions? I expected to be able to define a selector and JSoup would travers and search the whole DOM-tree.
No, you do not have to specify the full DOM hierarchy. Your test should work, if the elements are really part of the DOM. I suspect that they might not be part of DOM as it is loaded be JSoup. The reason might me, that the inner DOM nodes are filled by JavaScript through AJAX. JSoup does not run JavaScript, so dynamically loaded parts of the DOM are not accessible. To achieve what you want you can either look into the AJAX calls directly and analyze them, or you move on to another solution like selenium webdriver, which runs a real browser including a working JavaScript engine.

what is the downside of using inline server tags in my html?

let's say I want to display my "Person" class.
I can do it in two ways:
put a few asp:labels in my html, and fill them in my server code:
lblName.Text = person1.Name
lblAge.Text = person1.Age
use the asp:formView control, so my server code will look like this:
Dim myDataSource = New Object() {person1}
FormView1.DataSource = myDataSource
FormView1.DataBind()
and my html will look like this:
<asp:FormView ID="FormView1" runat="server">
<ItemTemplate>
Name:<%# Eval("Name")%>
<br />
Age:<%# Eval("Age")%>
</ItemTemplate>
</asp:FormView>
which way is better? what is the cost of using the server tags?
It is the same because the controls have to be runat="server" and it assign the values to the control from server side.
The main difference is that you assign the values in code_behind file or in markup file. Wherever you prefer. In both cases it will need to go server side to gather that values.
I don't think the cost of using server tags would be different to assign values in code file, because they are processed in page's render phase.
Extrated from msdn:
The default model for adding code to an ASP.NET Web page is to either
create a code-behind class file (a code-behind page) or to write the
page's code in a script block with the attribute runat="server" (a
single-file page). The code you write typically interacts with
controls on the page. For example, you can display information on the
page from code by setting the Text (or other) properties of controls.
Another possibility is to embed code directly into the page using an
embedded code block. Embedded Code Blocks An embedded code block is
server code that executes during the page's render phase. The code in
the block can execute programming statements and call functions in the
current page class.
Uses for Embedded Code Blocks Embedded code blocks are supported in
ASP.NET Web pages primarily to preserve backward compatibility with
older ASP technology. In general, using embedded code blocks for
complex programming logic is not a best practice, because when the
code is mixed on the page with markup, it can be difficult to debug
and maintain. In addition, because the code is executed only during
the page's render phase, you have substantially less flexibility than
with code-behind or script-block code in scoping your code to the
appropriate stage of page processing. Some uses for embedded code
blocks include: Setting the value of a control or markup element to a
value returned by a function, as illustrated in the preceding example.
Embedding a calculation directly into the markup or control property.

Passing Razor Markup to view and process it

I've searched the internet for days now with no luck finding this.
My model has a property which holds a chunk of html containing Razor markup.
exmaple:
public class ViewModel
{
public string Content = "<div>#Html.TextBox(\"UserName\")</div>";
}
In the view, I display that with
#Html.Raw(Server.HtmlDecode(Model.Content).toString())
I need to be able to convert the Razor markup into html, although because the Content is dropped in through the model, the view engine doesn't process it.
I have tried simply dropping in the Content, using just .Raw(Model.Content), .Encode(Model.Content), nothing works.
Any thoughts?
You could use the RazorEngine package which allows you to parse and execute Razor code. This being said I would not recommend you giving your users the power of editing directly Razor templates. You are opening a huge security hole in your website.
There are other templating engines such as DotLiquid for example which are better suited for scenarios where you don't trust user input.

Multiple views for 1 controller - Play Framework [duplicate]

If i want to have a common piece of UI across multiple pages, such as a menu, what is the recommended way to do this?
It would contain both template code and a back-end controller (similar to "snippets" in the LiftWeb framework).
I am aware that there is a menu module for Play, but I'm more interested in how this would be achieved in general.
There are two ways to include common view code into the Play Framework.
You can use the #{include} tag or the #{extends} tag.
The extends tag, as the name suggests, extends from a parent view. The extends tag is used by default in the skeleton code set up by Play when you create a new application. It extends the main.html. You add your code here.
The includes tag, allows you to inject a common piece of view code into your templates at a specified point. This works in much the same was a php include/require, or jsp includes work.
The problem will come when your template code also requires data or logic from the model (via the controller). If this is the case, then you will need to use the #Before or #With notation in your controller to ensure that the common piece of controller code is executed each time. You can add any data to the renderArgs list, so that it is available for use within the view.
A simple example of using renderArgs would be.
#Before
private static void commonData() {
// do your logic here
renderArgs.put("menu", menu);
renderArgs.put("selected", selectedMenuItem);
}
the values you have put into renderArgs (menu and selected in the example) will be available just in the same way as if you passed them into the render method.

What do people mean by "DOM Manipulation" and how would I do that?

I always hear people talk about DOM this, manipulate the DOM, change the DOM, traverse the DOM; but what exactly does this mean?
What is the DOM and why would I want to do something with it?
The DOM is basically an API you use to interface the document with, and is available in many languages as a library ( JS is one of those languages ). The browser converts all the HTML in your web page to a tree based on the nesting. Pop open Firebug and look at the HTML structure. That is the tree I'm talking about.
If you want to change any HTML you can interact with the DOM API in order to do so.
<html>
<head><script src="file.js"></script></head>
<body>blah</body>
</html>
In file.js I can reference the body using:
onload = function() {
document.getElementsByTagName('body')[0].style.display='none';
}
The getElementsByTagName is a method of the document object. I am manipulating the body element, which is a DOM element. If I wanted to traverse and find say, a span I can do this:
onload = function() {
var els = document.getElementsByTagName('*');
for ( var i = els.length; i--; ) {
if ( els[i].nodeType == 1 && els[i].nodeName.toLowerCase() == 'span' ) {
alert( els[i] )
}
}
}
I am traversing the nodeList given back by getElementsByTagName in the snippet above, and looking for a span based on the nodeName property.
It means working with the Document Object Model, which is an API to work with XML like documents.
From w3 on the DOM:
The Document Object Model is a platform- and language-neutral interface that will allow programs and scripts to dynamically access and update the content, structure and style of documents. The document can be further processed and the results of that processing can be incorporated back into the presented page. This is an overview of DOM-related materials here at W3C and around the web.
One of the functions mostly used in DOM work is:
getElementById
Manipulating/Changing the DOM means using this API to change the document (add elements, remove elements, move elements around etc...).
Traversing the DOM means navigating it - selecting specific elements, iterating over groups of elements etc...
In short:
When a web page is loaded, the browser creates a Document Object Model of the page, which is an object oriented representation of an HTML document, that acts as an interface between JavaScript and the document itself and allows the creation of dynamic web pages.
Source: w3schools - HTML DOM
D ocument
O bject
M odel
This is the DOM. Either an XML, or HTML, or similar document. All of those terms mean to parse the document and/or make changes to it (usually by using some available tools like JavaScript or C#).
The best example of a DOM when people use those terms is the HTML document in a browser. You might want to manipulate the DOM in this case to add something to the web page.