Trying to override Nokogiri's serializing behaviour - html

I'm using Nokogiri to alter an HTML tree and output the code. I need to alter the way a particular node outputs to html (details below), so I've subclassed Nokogiri::XML::Node.
How do I override that subclass' output behaviour?
Right now, if I override to_html(), then I get the display I want when calling to_html() for instances of Nokogiri::HTML::DocumentFragment, but when I call it on instances of Nokogiri::HTML::Document, the normal output behaviour takes over. That won't do because I actually need to make changes to the document head (which is excluded from DocumentFragment instances).
Why I need to alter the HTML output:
I need to be able to include an unpartnered </noscript> tag for the sake of using GWO with my code. However, I can't add an unpartnered end tag in an HTML tree.
With Nokogiri, I can't add it as text either because the < and > get escaped as html char codes.
I can't use Hpricot for this project because I'm running it over some bad code (written by others at work), and Hpricot won't preserve the errors in question (like putting a block element inside of an <a> element). (No, I'm not about to track down all the bad HTML and fix it.)
Specs: WinXP, Ruby 1.8.6, Nokogiri 1.4.4
Update:
For a reason I can't guess, when I create a constructor for my subclass, regardless of how many parameters I require for the subclass constructor, I get errors if I supply any number but two (the number of params required for the superclass).
class NoScript < Nokogiri::XML::Node
def initialize(doc)
super("string", doc)
end
end
I haven't had this problem with other classes. Am I missing something?

Most likely, your code is calling at some point write_to (to_html calls serialize, and serialize calls write_to). It then calls native_write_to on current node. Let's take a look at it.
static VALUE native_write_to(
VALUE self,
VALUE io,
VALUE encoding,
VALUE indent_string,
VALUE options
) {
xmlNodePtr node;
const char * before_indent;
xmlSaveCtxtPtr savectx;
Data_Get_Struct(self, xmlNode, node);
xmlIndentTreeOutput = 1;
before_indent = xmlTreeIndentString;
xmlTreeIndentString = StringValuePtr(indent_string);
savectx = xmlSaveToIO(
(xmlOutputWriteCallback)io_write_callback,
(xmlOutputCloseCallback)io_close_callback,
(void *)io,
RTEST(encoding) ? StringValuePtr(encoding) : NULL,
(int)NUM2INT(options)
);
xmlSaveTree(savectx, node);
xmlSaveClose(savectx);
xmlTreeIndentString = before_indent;
return io;
}
Code is in github. If you read it, you will see that it does not call your to_html anywhere, so your custom method is never run. OTOH, if you use a Nokogiri::HTML::DocumentFragment it is being called, because DocumentFragment#to_html relies on Nokogiri::XML::NodeSet#to_html and it is a plain map:
def to_html *args
if Nokogiri.jruby?
options = args.first.is_a?(Hash) ? args.shift : {}
if !options[:save_with]
options[:save_with] = Node::SaveOptions::NO_DECLARATION | Node::SaveOptions::NO_EMPTY_TAGS | Node::SaveOptions::AS_HTML
end
args.insert(0, options)
end
map { |x| x.to_html(*args) }.join
end

Related

Why does tWriteJSONField create an array for null values?

I have joined Countries to Locations in the HR Sample database in OracleXE.
I am then using a tMap to generate a nested JSON document.
it works, but for some reason null values in the Location are coming through as arrays in the final output in the console (have also tried MongoDB).
Because tWriteJSONField generates an xml, then converts it to JSON using json-lib. Your null value will be converted to an empty xml node <STATE_PROVINCE/>, and json-lib, having no context of this node, will assume it is a parent node with no children, instead of an empty text (null notion is already far at this point).
Here is what happens in short:
package test.json;
public class JSONTest {
public static void main(String[] args) {
net.sf.json.xml.XMLSerializer s = new net.sf.json.xml.XMLSerializer();
s.clearNamespaces();
s.setSkipNamespaces(true);
s.setForceTopLevelObject(true);
net.sf.json.JSON json = s.read("<?xml version=\"1.0\" encoding=\"ISO-8859-15\"?>" +
"<org>" +
"<STATE_PROVINCE/>" +
"</org>"
);
System.out.println(json.toString());
}
}
Result:
{"org":{"STATE_PROVINCE":[]}}
A dirty solution is to use attributes instead of nodes in your tWriteJSONField, but it will prefix your properties with #. So after this component you put a tReplace, search "\"#", replace with "\"", uncheck whole word, check global expression. Your final JSON will have no property if null.
Thanks to https://www.talendforge.org/forum/viewtopic.php?id=27791
Insert a tJavaRow with the following code right after your tWriteJsonField:
output_row.output = input_row.output.replaceAll(",?\"[a-zA-Z_0-9]*\":\\[\\]", "");
I believe the elegant (and short) solution would be the following.
Talend docs state that:
When configuring a JSON tree, the default type of an element is
string. If an element is not of type string, you need to add an
attribute for the element to set its type.
So, to the object receiving a null value, you should add an attribute named class and set its static value to object.
Pic: JSON Tree Configuration
And voilà!
PIC: "Complemento":null

Exclude properties from rendering for all Grails domain classes

The Grails 2.5.4 docs say that it's possible to exclude properties from rendering for an entire group of domain classes.
There are some default configured renderers and the ability to register or override renderers for a given domain class or even for a collection of domain classes.
However there's no example given in the docs for how to do this. Does anyone know how to exclude properties for all of my domain classes? Specifically I'm trying to get rid of the class and enumType fields that Grails automatically adds to the response body.
There doesn't seem to be any good way to do this. What I discovered is that if you register an exclusion for a super class, all subclasses also "inherit" that exclusion. So to get rid of four properties for all Groovy objects (which cover all domain classes), I added the following bean to resources.groovy.
groovyObjectJsonRenderer(JsonRenderer, GroovyObject) {
excludes = ['class', 'declaringClass', 'errors', 'version']
}
I don't know if you are talking about this, but you can ignore some properties when you render as JSON, overriding the Marshaller here is the code:
static {
grails.converters.JSON.registerObjectMarshaller(NAMEOFYOURCLASS) {
return it.properties.findAll {k,v -> k != 'class' && k!='declaringClass'}
}
}
or if you want to create your custom render you can do something like this
static {
grails.converters.JSON.registerObjectMarshaller(NAMEOFYOURCLASS) {
def lista = [:]
lista['id'] = it.id
lista['name'] = it.name
lista['dateCreated'] = it.date?.format("dd/MM/yyyy HH:mm")
return lista
}
}
You can put it where you think its better i actually prefer to put it in the class i'm overriding, because letter i can find it or if some one else it's looking the code, he/she can find it easy.

How to build html using HTML helpers in MVC3

I've a helper like this, I created this using raw HTML inside as follows:
private static readonly Core Db = new Core();
// Main menu
public static MvcHtmlString MainMenu()
{
IQueryable<Page> primaryPages = Db.Pages.Where(p => p.IsItShowInMenu);
var sb = new StringBuilder();
sb.Clear();
string pagecode = Convert.ToString(HttpContext.Current.Request.RequestContext.RouteData.Values["url"]);
sb.Append("<div id=\"Logo\">");
sb.Append("<span id=\"Logo_Text\">Dr. Shreekumar</span> <span id=\"Logo_Sub_Text\">Obstetrician & Gynecologist</span>");
sb.Append("</div>");
sb.Append("<div id=\"Primary_Menu\">");
sb.Append("<ul>");
foreach (Page page in primaryPages)
{
if (page.PageCode != "Home")
{
Page currentPage = Db.Pages.SingleOrDefault(p => p.PageCode == pagecode);
if (currentPage != null)
{
Page parentPage = Db.Pages.Find(currentPage.ParentId);
if (parentPage != null)
{
sb.AppendFormat((page.PageCode == parentPage.PageCode ||
page.PageCode == currentPage.PageCode)
? "<li class=\"active\">{1}</li>"
: "<li>{1}</li>", page.PageCode,
page.Name.Trim());
}
else
{
sb.AppendFormat("<li>{1}</li>", page.PageCode,page.Name);
}
}
else
{
sb.AppendFormat("<li>{1}</li>", page.PageCode, page.Name);
}
}
}
sb.Append("</ul>");
sb.Append("</div>");
return new MvcHtmlString(sb.ToString());
}
Can anybody suggest me that how can I convert this using MVC HTML helpers (helpers for anchor, list (li), div etc)
It is an important part of your role as the architect of your application to define what will be generated by helpers and what not, as it depends on what is repeated where and how often in your code. I am not going to tell you what to build helpers for because that depends on the architecture of your whole application. To help you make the decision, however, consider the two general types of helpers you can build: global and local.
Global helpers are for chunks of code which are often repeated across your site, possibly with a few minor changes that can be handled by passing in different parameters. Local helpers do the same job, but are local to a given page. A page which has a repeating segment of code that isn't really found anywhere else should implement a local helper. Now then...
Global helpers: Create a new static class to contain your helpers. Then, create static methods inside the container class that look like this:
public static MvcHtmlString MyHelper(this HtmlHelper helper, (the rest of your arguments here))
{
// Create your HTML string.
return MvcHtmlString.Create(your string);
}
What this does is create an extension method on the Html helper class which will allow you to access your helpers with the standard Html. syntax. Note that you will have to include the namespace of this class in any files where you want to use your custom helpers.
Local helpers: The other way to do helpers works when you want them to be local to a single view. Perhaps you have a block of code in a view that is being repeated over and over again. You can use the following syntax;
#helper MyHelper()
{
// Create a string
#MvcHtmlString.Create(your string here);
}
You can then output this onto your page using:
#MyHelper()
The reason why we are always creating MvcHtmlString objects is because as a security feature built into MVC, outputted strings are encoded to appear as they look in text on the page. That means that a < will be encoded so that you actually see a "<" on the page. It won't by default start an HTML tag.
To get around this, we use the MvcHtmlString class, which bypasses this security feature and allows us to output HTML directly to the page.
I suggest you move all this logic into a separate Section as it is a Menu that is being rendered.
Instead of building the HTML from the code, it is cleaner and a lot more convenient to build it using Razor's helpers. Refer to this as well as this article from Scott Gu on how to render sections to get a quick starting guide.
Consider using Helper methods such as
#Html.DropDownListFor() or
#Html.DropDownList()

how to extract from dispatch.json.JsObject

What do i need to do to extract the value for friends_count. i noticed that screen_name are already define in the Status object and case class. Do still require to extends Js or JsObject different
object TweetDetails extends Js { val friends_count = 'friends_count ? num }
and then pattern match it against each json object in the list of JsObjects as represented below. The symbols are confusing:
scala> val friends_count = 'friends_count ! num // I wish SO understood Scala's symbols
val twtJsonList = http(Status("username").timeline)
twtJsonList foreach {
js =>
val Status.user.screen_name(screen_name) = js
val Status.text(text) = js
val friends_counts(friends_count) = js //i cannot figure out how to extract this
println(friends_count)
println(screen_name)
println(text)
}
Normally, Scala symbols can be thought of as a unique identifier which will always be the same. Every symbol that is lexi-graphically identical refers to the exact same memory space. There's nothing else that's special about them from Scala's point of view.
However, Dispatch-Json pimps out symbols making them JSON property extractors. To see the code which is responsible for the pimping, check out the SymOp class and the rest of the JsonExtractor.scala code.
Let's write some code which solves the problem you are looking at and then analyze what's going on:
trait ExtUserProps extends UserProps with Js {
val friends_count = 'friends_count ! num
}
object ExtUser extends ExtUserProps with Js
val good_stuff = for {
item <- http(Status("username").timeline)
msg = Status.text(item)
user = Status.user(item)
screen_name = ExtUser.screen_name(user)
friend_count = ExtUser.friends_count(user)
} yield (screen_name, msg, friend_count)
The first thing that we're doing is extending the UserProps trait in the Dispatch-Twitter module to give it a friends_count extractor and then defining a ExtUser object which we can use to get access to that extractor. Because the ExtUserProps extends UserProps, which also extends Js, we get the method sym_add_operators in scope which turns our symbol 'friends_count into a SymOp case class. We then call the ! method on that SymOp which we then pass the Extractor num to, which creates an Extractor that looks for a property "friends_count" on a JSON object and then parses it as a number before returning. Quite a bit going on there for such a small bit of code.
The next part of the program is just a for-comprehension that calls out to the Twitter timeline for a user and parses it into JsObjects which represent each status item, them we apply the Status.text extractor to pull out the status message. Then we do the same to pull out the user. We then pull the screen_name and friend_count out of the user JsObject and finally we yield a Tuple3 back with all of the properties we were looking for. We're then left with a List[Tuple3[String,String,BigDecimal]] which you could then iterate on to print out or do whatever with.
I hope that clears some things up. The Dispatch library is very expressive but can be a little tough to wrap your head around as it uses a lot of Scala tricks which someone just learning Scala won't get right away. But keep plugging around and playing with, as well as looking at the tests and source code, and you'll see how to create powerful DSL's using Scala.

How do you return non-copyable types?

I am trying to understand how you return non-primitives (i.e. types that do not implement Copy). If you return something like a i32, then the function creates a new value in memory with a copy of the return value, so it can be used outside the scope of the function. But if you return a type that doesn't implement Copy, it does not do this, and you get ownership errors.
I have tried using Box to create values on the heap so that the caller can take ownership of the return value, but this doesn't seem to work either.
Perhaps I am approaching this in the wrong manner by using the same coding style that I use in C# or other languages, where functions return values, rather than passing in an object reference as a parameter and mutating it, so that you can easily indicate ownership in Rust.
The following code examples fails compilation. I believe the issue is only within the iterator closure, but I have included the entire function just in case I am not seeing something.
pub fn get_files(path: &Path) -> Vec<&Path> {
let contents = fs::walk_dir(path);
match contents {
Ok(c) => c.filter_map(|i| { match i {
Ok(d) => {
let val = d.path();
let p = val.as_path();
Some(p)
},
Err(_) => None } })
.collect(),
Err(e) => panic!("An error occurred getting files from {:?}: {}", pa
th, e)
}
}
The compiler gives the following error (I have removed all the line numbers and extraneous text):
error: `val` does not live long enough
let p = val.as_path();
^~~
in expansion of closure expansion
expansion site
reference must be valid for the anonymous lifetime #1 defined on the block...
...but borrowed value is only valid for the block suffix following statement
let val = d.path();
let p = val.as_path();
Some(p)
},
You return a value by... well returning it. However, your signature shows that you are trying to return a reference to a value. You can't do that when the object will be dropped at the end of the block because the reference would become invalid.
In your case, I'd probably write something like
#![feature(fs_walk)]
use std::fs;
use std::path::{Path, PathBuf};
fn get_files(path: &Path) -> Vec<PathBuf> {
let contents = fs::walk_dir(path).unwrap();
contents.filter_map(|i| {
i.ok().map(|p| p.path())
}).collect()
}
fn main() {
for f in get_files(Path::new("/etc")) {
println!("{:?}", f);
}
}
The main thing is that the function returns a Vec<PathBuf> — a collection of a type that owns the path, and are more than just references into someone else's memory.
In your code, you do let p = val.as_path(). Here, val is a PathBuf. Then you call as_path, which is defined as: fn as_path(&self) -> &Path. This means that given a reference to a PathBuf, you can get a reference to a Path that will live as long as the PathBuf will. However, you are trying to keep that reference around longer than vec will exist, as it will be dropped at the end of the iteration.
How do you return non-copyable types?
By value.
fn make() -> String { "Hello, World!".into() }
There is a disconnect between:
the language semantics
the implementation details
Semantically, returning by value is moving the object, not copying it. In Rust, any object is movable and, optionally, may also be Clonable (implement Clone) and Copyable (implement Clone and Copy).
That the implementation of copying or moving uses a memcpy under the hood is a detail that does not affect the semantics, only performance. Furthermore, this being an implementation detail means that it can be optimized away without affecting the semantics, which the optimizer will try very hard to do.
As for your particular code, you have a lifetime issue. You cannot return a reference to a value if said reference may outlive the value (for then, what would it reference?).
The simple fix is to return the value itself: Vec<PathBuf>. As mentioned, it will move the paths, not copy them.