I am making a new version of an "old" software based on the Zend_Frameword 1.x,
this project use zend_form to create versatiles forms I like this aspect of the code, and i'd like to keep it.
But I need to be cross domain and to render the html with javascript, so I'd like Zend_Form to render in JSONP.
on the past I already experience the hard to understand decorators of zend_form and i don't think that I can use them to render in JSON, so, I think that the only way to go is to extend Zend_Form, to implement my own render method, but event so, I have the feeling that it won't be easy...
I'm this afraid of Zend_Form that I am thinking that maybe rendreing the form in html and parsing it on the javascript side to create a structure sufficient to then use it with handlebars (the javascript template engine) would be easier
Parsing html seems suboptimal to me.
Your initial idea - creating your own form class extending Zend_Form and adding a renderJsonp() method - seems a pretty reasonable approach. In the simplest case, your rendering could extract the form attributes and then iterate over the elements, extracting their attributes.
However, remember that a Zend_Form instance is more than just a list of form attributes and elements with their attributes. There are potentially display groups (typically representing html fieldsets) and validators attached to form elements. Also, even if you do not change the decorators used to render the form, they are part of the rendering. A json representation of the form that does not reflect those decorators would probably render into html differently than you experience when Zend_View is responsible for the rendering in a view-script.
While it may be possible to have that displaygroup and validator information reflected in the structure of your json output, it seems unlikely that you would be able to easily automate creation of client-side validation that perfectly parallels that on the server-side side.
One approach I have seen is to have your client-side front-end make an AJAX request to a specially defined route that renders the full html of the form. While I am not a fan of this approach, it does have the advantage that the form remains essentially DRY, all the information required for rendering the form resides back on the server side but is still available to the client. The only thing missing is extracting the server-side validation for use on the client-side.
Just thinking out loud, so not a great answer, but it seemed too long for a comment.
Related
The new HTML5 input types and attributes, ie tel, email, pattern, etc are useful for indicating to an unobtrusive javascript validator what rules should be applied to a given field.
In some cases though, a specific field type which isn't in the spec needs a special algorithm. For example, input type="credit-card" would cause the validator to use a modulo10 algorithm, which pattern matching can't validate.
The new input types work because old browsers fall-back to type="text" when it doesn't understand the type specified. From what I can tell, new browsers seem to follow this convention as well.
Since there is already a type attribute which is semantically correct, it seems messy and inconsistent to use a separate data-type="" attribute some of the time and the official type attribute when the needed type already exists.
I understand that some day there could be a type with the same name, but if the W3C adds an input type="credit-card" at some point in the future, what are the odds it would do something other than validate a credit card? (Or in other words, so long as the types are named well, the only conflict would be what was intended anyway).
Are there any pitfalls to simply making up new input types as I need them?
I'd recommend using a javascript framework, especially if you're starting a new project. In these frameworks you can create reusable components. For example, you can define a credit card component, implement your specific validation and styles and logic, and then reuse your new component throughout your application.
In angular.js, the most popular javascript framework which is backed by Google, it's called a directive:
https://docs.angularjs.org/guide/directive
In ember.js, it's called a component.
Pitfalls for the credit card example would be trying to invent your own regular expressions for validation or finding a bad regex online that doesn't actually work. But use of custom web components in javascript frameworks is pretty standard. You should also make sure that whatever fancy javascript libraries you use are compatible with whatever browsers and browser configurations you want to support.
I have a situation where the underlying application provides a UI layer and this in turn has to be rendered as a portlet. However, I do not want all parts of the UI originally presented to be rendered in Portlet.
Proposed solution: Using Datapower for parsing an XML being a norm, I am wondering if it is possible to parse a HTML. I understand HTML may not be always well formed. But if there are very few HTML pages in underlying application, then a contract can be enforced..
Also, if one manages to parse and extract data out of HTML using DP, then the resultant (perhaps and XML) can be used to produce HTML5 with all its goodies.
So question: Is it advisable to use Datapower to parse an HTML page to extract an XML out of it? Prerequisite: number of HTML pages per application could vary in data but not with many pages.
I suspect you will be unable to parse HTML using DataPower. DataPower can parse well-formed XML, but HTML - unless it is explicitly designed as xHTML - is likely to be full of tags that break well-formedness.
Many web pages are full of tags like <br> or <ul><li>Item1<li>Item2<li>Item3</ul>, all of which will cause the parsing to fail.
If you really want to follow your suggested approach, you'll probably need to do something on a more flexible platform such as WAS where you can build (or reuse) a parser that takes care of all of that for you.
If you think about it, this is what your web browser does - it has all the complex rules that turn badly-formed XML tags (i.e. HTML) into a valid DOM structure. It sounds like you may be better off doing manipulation at the level of the DOM rather than the HTML, as that way you can leverage existing, well-tested parsing solutions and focus on the structure of the data. You could do this client-side using JavaScript or you could look at a server-side JavaScript option such as Rhino or PhantomJS.
All of this might be doing things the hard way, though. Have you confirmed whether or not the underlying application has any APIs or web services that IT uses to render the pages, allowing you to get to the data without the existing presentation layer getting in the way?
Cheers,
Chris
Question of parsing and HTML page originates when you want to do some processing over it. If this is the case you can face problems because datapower by default will not allow hyperlinks inside the well formed XML or HTML document [It is considered to be a security risk], however this can be overcome with appropriate settings in XML manager present.
As far as question of HTML page parsing is concerned, Datapower being and ESB layer is expected to provide message format translation and that it indeed does. So design wise it is a good place to do message format translation. Practically however you will face above mentioned problem when you try to parse HTML as XML document.
The parsing can produce any message format model you wish [theoretically] hence you can use the XSLT to achieve what you wish.
Ajitabh
In Seaside, in all those renderContentOn: methods, I can use the HTML canvas object to assemble my DOM tree.
I am writing a bunch of helpers for my components currently, because I'm using Twitter Bootstrap for the styling and don't want to write all that boilerplate code (<div>s en mas) all the time.
For the way this is setup, the easiest way for me is to simply (I want to avoid using with: aBlock in those helpers) write out the HTML for the wrapping DIVs like this:
html html: '<div class="control-group">'.
Is there any reason for me not to do this? Any downsides?
There are various advantages in using the HTML canvas:
The HTML canvas ensures valid tags, a valid tag structure, that all tags are properly closed (at compile time), and that contents is properly escaped.
The HTML canvas ensures valid attributes, that all attributes are properly closed, and that contents is properly escaped.
As a consequence of the above two the HTML canvas automatically avoids the possibility of cross-site scripting (XSS) vulnerabilities.
The HTML canvas enables better reusability by enabling composition of tags (simple function calls), presenters (renderOn: in Objects), and components (renderContentOn: of components).
The HTML canvas avoids generating unnecessary whitespaces.
The use of HTML canvas enables one to use the standard tools the Smalltalk IDE provides on HTML code: senders, implementors, refactoring engine (extract to method, extract to component, inline method, automatic rewrite, ...), etc.
I agree that in some rare cases it is not worth to use HTML canvas: For example, when large static junks coming from an external source need to be embedded into a page.
I don't think there is a real downside to render static html pieces like that.
However, you might want to check out the Seaside integration of Twitter bootstrap: http://twitterbootstrap.seasidehosting.st/
To rephrase one of Lukas' arguments: it basically is not DRY. If you're using it only once, there is no problem. If you have to use this multiple times, the canvas allows you to use all the clean reuse capabilities smalltalk offers you.
I have some html (in this case created via TinyMCE) that I would like to add to a page. However, for security reason, I don't want to just print everything the user has entered.
Does anyone know of a templatetag (a filter, preferably) that will allow only a safe subset of html to be rendered?
I realize that markdown and others do this. However, they also add additional markup syntax which could be confusing for my users, since they are using a rich text editor that doesn't know about markdown.
There's removetags, but it's a blacklisting approach which fails to remove tags when they don't look exactly like the well-formed tags Django expects, and of course since it doesn't attempt to remove attributes it is totally vulnerable to the 1,000 other ways of script-injection that don't involve the <script> tag. It's a trap, offering the illusion of safety whilst actually providing no real security at all.
HTML-sanitisation approaches based on regex hacking are almost inevitably a total fail. Using a real HTML parser to get an object model for the submitted content, then filtering and re-serialising in a known-good format, is generally the most reliable approach.
If your rich text editor outputs XHTML it's easy, just use minidom or etree to parse the document then walk over it removing all but known-good elements and attributes and finally convert back to safe XML. If, on the other hand, it spits out HTML, or allows the user to input raw HTML, you may need to use something like BeautifulSoup on it. See this question for some discussion.
Filtering HTML is a large and complicated topic, which is why many people prefer the text-with-restrictive-markup languages.
Use HTML Purifier, html5lib, or another library that is built to do HTML sanitization.
You can use removetags to specify list of tags to be remove:
{{ data|removetags:"script" }}
In a web application, is it acceptable to use HTML in your code (non-scripted languages, Java, .NET)?
There are two major sub questions:
Should you use code to print HTML, or otherwise directly create HTML that is displayed?
Should you mix code within your HTML pages?
Generally, it's better to keep presentation (HTML) separate from logic ("back-end" code). Your code is decoupled and easier to maintain this way.
As long as your HTML-writing code is separate from your application logic, and the HTML is guaranteed to be well-formed somehow, you should be okay.
The only code that should be mixed in markup-based pages (i.e, those that contain literal HTML) is the code used for formatting the HTML (e.g., a loop for writing out a list).
There are trade-offs whether you put the code in with the HTML or you use pure code to write the HTML out using quoted string literals.
No, if you want to build good and maintainable software, and to achieve loose coupling.
If I understand the question right, you're asking whether it's a good practice to mix markup with back-end code. No. While this is commonly done, it's still a bad idea.
You should read up on the MVC paradigm, as well as on existing questions on the matter, such as What is the best way to migrate an existing messy webapp to elegant MVC? and Best practices for refactoring classic ASP?
The point is to keep the display logic separate from the rest of the code. In any complex site you'll have code mixed in with your HTML, but the code should be for display purposes only. It shouldn't be doing any complex calculations.
For example, templates will contain loops and conditionals. Plus you'll probably have a library of HTML-specific routines, like printing out an <option> list based on a list object.
Imagine you were writing an application that has two output modes: HTML and something else. How would you write it, to avoid duplicating code? That will probably point you in the right direction.
The HTML that makes up the view has to get sent to the browser in some way. In .net, each server control emits its own HTML markup as part of the page lifecycle. So yes it is OK to use HTML in server side code.
Perhaps you should try following the ASP.net pattern. Create a bunch of controls that represent UI elements and make them responsible for emitting their own HTML based on their state.
Its fugly, and not type safe. But people do it without consequence. I'd prefer using a DOM or, at a minimum, classes designed to write HTML using type safe semantics. Also, its not all that good to mix UI with logic...
If I need methods that generate HTML I usually isolate them in an HtmlHelpers class. That way you keep some level of separation. The ASP.NET MVC Framework does this quite successfully.
If you mean printing out HTML in your code, then no. Unless you have a good reason not to, you should use templates
Even if you think you don't need this now, there's always a good chance you'll need it later. Maybe you want to output in a different format than HTML, or you want different presentation for the same data. You usually have the need for these things further down the road, so it's best to use one from the start.
I hate when developers print() a bunch of html. It's completely unnecessary and looks ugly in any text editor that shows print/echo strings in red.
I agree with everyone else that you should try as hard as you can to separate the HTML/XHTML markup from the application logic. However, sometimes you do need to generate HTML/XHTML in the application logic for various reasons.
In these cases what I have been trying to do is to ensure the bare minimum amount of presentation code is in mixed in with the application logic and try to migrate everything else over to the presentation code. It is worth nothing that is some cases you have situations where you could have everything moved over to the presentation layer, but it might be a bit easier to generate the markup as part of the application logic. In those cases, your best bet is likely to be to go the route that makes the most sense in terms of time.
I don't think there's any excuse for generating HTML inside your business logic. Don't even do it when it's just a "quick fix" or when you'll "go back and fix it later", because that never happens.
To reiterate my position from other questions, using some control logic (conditionals, loops) within HTML to construct it is OK. Do NOT do any data massaging or business logic in the HTML. You have to be disciplined, but it's worth it. Maintenance is much easier if your concerns (like logic and display) are separated.
Ideally you are aiming for a separation of concerns between your presentation (UI) code and your domain (business logic) code.
The reason why you should avoid coupling these two concerns (in either direction) is simple...
You will only have one reason to change a piece of code. whether this is from structural/styling changes in your html design, or from your business rules changing, you should only have to make the change in one place.
To a lesser extent, although many purists would disagree, by sprinkling HTML code through your domain code or vice versa you are creating noise for the next developer who comes along to read/maintain it.
I try to avoid using code to print HTML "directly". It is difficult to maintain, edit, add styles and etc. Some cases like generating an HTML email in the code, I create a text file or HTML file with markers like, [name], [verification code] and etc. I load this from the code and replace those markers. This way, you can edit the style of the email without re-compiling your code. Separating "presentation" and "logic" is a good practice in my opinion.
Mixing code within HTML is generally not a good practice in similar reasons as said in #1. However, I do use code in HTML for things like simple dynamic strings that are displayed multiple times on a page or pages. I think this is better than creating multiple server controls for same exact values to set. Since this is not code "logic" mixed in the HTML, I think this is ok.