Is there such thing as a JSP minifier? (or Open Source HTML minifier) - html

This would be an HTML minifier that skips everything between <% and %>.
Actually, an Open Source HTML minifier would be a good starting place, especially if it already had code to preserve the contents certain blocks like <textarea. It's code might be able to be made to preserve <%%> blocks also.
I am aware that HTML minifiers are less common because that changes more often than JS/CSS and is often dynamically generated, but if the JSP compiler could be made to minify before making its compiled cache copy, it would result in minified HTML.
Also, an ASP minifier would probably be very close to the same thing. And I don't care about custom tags that have meaning to the server. The only stuff that matters to the server (for my company) is in the <%%> blocks.

This question is a bit outdated but an answer with a resource still hasn't made it's way to the posting.
HtmlCompressor makes this very thing possible and quite simply.
You can use it via Java API:
String html = getHtml(); //your external method to get html from memory, file, url etc.
HtmlCompressor compressor = new HtmlCompressor();
String compressedHtml = compressor.compress(html);
Or you can use it via Taglib:
Download .jar file of the current release and put it into your lib/ directory
Add the following taglib directive to your JSP pages:
<%# taglib uri="http://htmlcompressor.googlecode.com/taglib/compressor" prefix="compress" %>
Please note that JSP 2.0 or above is required.
In JSP:
<compress:html removeIntertagSpaces="true">
<!DOCTYPE html>
...
</html>
</compress:html>
Cheers

JSP is transformed to Java code and subsequntly compiled to bytecode. Minifying JSP has no purpose then.
You can process output generated by JSP page by writing custom filter. I have written filter to trim empty lines and unnecessary whitespace from JSP output, unfortunately it's not public. But if you google around, I'm sure you can find servlet filters to remove unneeded stuff from generated HTML.

Have a look at the Trim Filter (http://www.servletsuite.com/servlets/trimflt.htm), which you can simply map in your web.xml.
It will help you to remove whitespace, and can also strip off comments.
From my experience, whitespace occurs a lot in JSPs if you use tags that themselves don't have any output, such a the JSTL C control tags (c:if, c:choose, ...), and then this comes in very handy.

As you are already aware that HTML minification is less common and it also results in errors sometime than getting any benefit out of it. HTML is also dynamically generated content.
On the other hand, there are many better ways to speed up the application front end.
Minimizing HTTP requests
Minifying JS, CSS contents
gzip/deflate contents
Leveraging browser cache
Server Side caching, until resource changes
And many other - http://developer.yahoo.com/performance/rules.html
WebUtilities is a small java library to help speed up J2EE webapp front-end. Below is the link.
http://code.google.com/p/webutilities/
With new 0.0.4 version it does many optimization and results in significant performance boost. Please have a look in case you find it useful.

Related

Move 2sxc <script> to external file when it has razor content

I'm trying to make my CSP without unsafe inline.
Since I have to manually check every file from every app, I may as well move the scripts to external files instead of creating a million word CSP entry in the web.config by adding hashes or nounces.
This seems easy enough for client side content, but many templates have razor code in then such as:
<script>
alert(#myVar);
</script>
How can I move this to external?
So in general if you JS needs some input parameters you must of course put them somewhere, and only the razor will know what they are.
The simplest way is still to just have the initial call use the variables - like in your example above. If you have security concerns, doing type-checking in razor should eliminate that for you.
For example, if you do #((int)thing.property) than it simply cannot inject any unexpected payload.
If for some reason you really, really don't want this you can use a attribute-json convention, like
<div class="myGallery" init='{"files": 17}'> gallery contents </div>
and pick it up from the js you created. but this is quite a bit of work, so I would recommend the simpler way.

Split html source into multiple files

Does HTML support splitting source over multiple files? I'm looking for some equivalent of C++'s #include; or maybe something like C#'s partial; an element that could take source path and inject the file contents at that place.
Apologies if this has been asked before. Google and SO searches didn't return much. I'm not a web guy, but the only solution I found was using an iframe, which many people do not like for various reasons.
It is just that my html source is becoming huge and I want to manage it by splitting into multiple files.
You can't, at least not in flat-HTML. What you can do is using Javascript to load and place the snippets. iframes are also non-ideal because contrary to what happens with directives like #include and partial, those snippets will never be compiled in one single page.
However, I think it's important here to understand how your pages will be served. Is this a static website? Because in this case I would write a simple script in your language of choice to compile the page. Let's say that you have a base like this:
<html>
<head>
<!-- ... -->
</head>
<body>
{{ parts/navigation.html }}
<!-- ... -->
</body>
</html>
You could write a script that runs through this file line by line and loads the content into a variable named, for example, compiled_html. When it finds {{ file }} it opens file, reads its content and append it to compiled_html. When it gets to the end, it writes the content of the variable into a HTML file. How you would implement it depends on the languages you know. I'm sure that it's pretty straightforward to do it in C#.
This way you'll be able to split the source of your HTML pages into multiple files (and reuse some parts if you need them), but you'll still end up with fully functional single files.
It is easily possible, if you are running PHP:
The PHP Language has the "include" command built in.
Therefore you can have your "index.php" (note you have to change the suffix, for the PHP parser to kick-in) and simply use following syntax.
<html>
<head>
[...] (header content you want to set or use)
</head>
<body>
<?php
include "relative/path/to/your/firstfile.html";
include "relative/path/to/your/secondfile.html";
include "relative/path/to/your/evenwithothersuffix/thirdfile.php";
include "relative/path/to/your/fourth/file/in/another/folder.html";
?>
[...] (other source code you whish to use in the HTML body part)
</body>
</html>
Basically making you main index.php file a container-file and the included html files the components, which you like to maintain seperately.
For further reading I recommend the PHP Manual and the W3Schools Include Page.
not possible with static html.
in general, this problem (lazy-fetching of content) is solved with a template processor.
two options:
template processor runs on the server side
any language
static website generators, server side rendering
template processor runs on the client side
javascript
web frameworks

How to store strings for html page in separate file?

First time making a webpage in html. I have an assignment to format a bunch of text using appropriate html tags. No problem. But I would like to clean up my code by storing the paragraphs in a separate file. I have been searching for hours and cannot find anything.
Bottom line what I want to do:
have a file: strings.{html/xml/php/js}
and access variables from that file in my page index.html doing something like this:
<p>$someVarName</p>
This seems like a bit of a strange 'optimization', one that is not usually made, at least as far as I understand the question.
What you can do is have a JavaScript file e.g. script.js, and reference it in your index.html file:
<script language="javascript" type="text/javascript" src="script.js"></script>
In script.js you can insert custom HTML as such:
document.getElementById('tag-id').innerHTML = '<p>some text</p>';
To reduce the page load time of a website in the browser usually one tries to deliver one HTML file per page and one compact CSS/JS/image/SVG file for the whole website. All files are usually aggregated server side from multiple resources as you like to do.
Here are some common ways to enrich HTML pages and their creation process:
Using an iframe you can let the browser import and display another page using a single HTML tag but this is not recommended because it complicates layouting and a content's URL is not visible to the user in the browser's address bar.
Using PHP you could have an index.php with the contents of your index.html plus some PHP snippets printing variables from an included variables.php. PHP requires server side execution which is typically implemented using Apache2 webserver. A PHP script, index.php, would be executed each request / each time a user accesses the page.
index.php
<html>
<?php require_once 'variables.php'; ?>
...
<?php print $property1; ?>
<?php print $property2; ?>
</html>
variables.php
<?php
$property1 = 'value 1';
$property2 = 'value 2';
?>
Using XSLT you can transform the HTML as XML. This requires the HTML formatted as well-formed XML. XSLT can be executed both client and server side. XSLT 1 is limited but supported by major modern browsers. XSLT 2 is not supported by most browsers but often executed on the server side or rather offline to generate aggregated static html pages from XML/HTML with e.g. Saxon CE. On the downside XSLT may be more difficult to start with than PHP.
Using JavaScript (JS) you can also let the browser load additional documents into a currently displayed document. This is also known as AJAX and can be done with e.g. jQuery or AngularJS. With JS you can create interactive web pages and most modern websites make use of it.
BUT: Loading contents with JS on the client side limits the ability of search engines to index your content (bots usually do not execute JS). You should only use this method if your contents should not be crawled by bots or if you provide an alternative.
Of course, there is also a plethora of other template/programming languages that offer server side solutions for your problem like Java, Python and Ruby and their specialized frameworks.
Additionally you should check out one of the many existing PHP CMS (server side HTML page generator with UI to edit content).

Something like Include for HTML in VS2010 at build time

Is there a way to split a single HTML page (purely static, HTML + JS) in VS2010 (I use VS2010 + ReSharper for my HTML /Js coding) into parts, but get / build a single page at build time.
There was such a feature with Dreamweaver (I have used this years back, think it called libraries). If I was using PHP I'd use something like Include at runtime.
My page contains several div sections serving as tabs, only one visible at time. I want to place the code between these tabs in a single file, to make it easier to maintain. But in the end I do need one single, static HTML file. Again, I want to do this a build time, not at server side.
<DIV>
many lines of HTML
</DIV>
should be replaced by something like
<DIV>
#include tab1.html
</DIV>
I could write a script building the static page and hook it into VS2010, but is there some extension or function already existing?
-- Follow ups on using T4 ---
VS2010 - Assign html code formatting to T4 (.tt) file
VS2010 - disable validation for particular html file (not all files)
I ran across this question while searching for something else. You've probably moved on, but what the heck, maybe the answer will help someone else:
You could get so-called T4 templates to do this:
http://msdn.microsoft.com/en-us/library/bb126445.aspx
Alternately, Microsoft has similar capabilities in Razor, and it's more specific to HTML.
Here is a comparison of the two:
http://blogs.msdn.com/b/garethj/archive/2011/03/11/t4-vs-razor-what-s-the-skinny.aspx

How to sanitize user generated html code in ruby on rails

I am storing user generated html code in the database, but some of the codes are broken (without end tags), so when this code will mess up the whole render of the page.
How could I prevent this sort of behaviour with ruby on rails.
Thanks
It's not too hard to do this with a proper HTML parser like Nokogiri which can perform clean-up as part of the processing method:
bad_html = '<div><p><strong>bad</p>'
puts Nokogiri.fragment(bad_html).to_s
# <div><p><strong>bad</strong></p></div>
Once parsed properly, you should have fully balanced tags.
My google-fu reveals surprisingly few hits, but here is the top one :)
Valid Well-formed HTML
Try using the h() escape function in your erb templates to sanitize. That should do the trick
Check out Loofah, an HTML sanitization library based on Nokogiri. This will also remove potentially unsafe HTML that could inject malicious script or embed objects on the page. You should also scrub out style blocks, which might mess up the markup on the page.