Differences in language parsing speed when indexing a forum (HTML code)? - html

I was wondering if some programming languages are faster then others if it comes to processing and parsing a HTML page.
My intention is to scan over thousands of HTML forum pages and processing the code looking for specific <div> tags and content.
If there are no real differences what language would you recommend me for such a task?

Well, it depends.
You definitely should take a look at programming languages like node.js, python or php and figure out what works best for you.
I personally would recommend using something like node.js because it's a non-blocking programming language.

Related

Meaning of Program into Your Language and Program in Your Language

I've been reading Code Complete 2. As I am not native english speaker some statements take some time for me to understand. I would like you to describe the difference between these two statements the author made in his book:
You should program into Your Language (programming language).
You shouldn't program in Your Language.
Why in is bad and into is recommended?
As I understand it, it means to think outside of the bounds of your programming language.
So in means you are thinking in terms of the language, so your thinking is limited by the language itself, and the program you write may not be easily translated into some other language if needed.
But into means you think in algorithms, i.e. freely, then translate into your desired language. So you can easily code in any language you know the syntax of.
But as I have not read the book actually, this may be totally wrong per the context.
Program into your language means that you use the language to construct the "missing" pieces - leverage it to do more than it currently does. Things like creating missing data structure, algorithms and ways of accomplishing tasks that are not native to the language.
Program in your language means just that - not trying to leverage it.
I thought the examples given in the book were quite good.
The author provides an example of his own in that part of the book (which unfortunately I don't remember). You can try reading a bit further.
It means that even if the language doesn't support a particularly convenient feature, as you should always think of writing readable, easy to maintain, modular code, you should try to find a way to emulate that feature even if its not enforced by the language, then you would document that, so that other developers who may modify the code stick to the same rule. I can't provide an example right now, but I think is easy to see the rationale.

What are the pros and cons of using a template engine like Jade?

I'm looking into developing a web app with Node.js. I'm coming from a PHP background where I didn't use a template engine (besides PHP itself) and I have always just written straight HTML. So, why should I or should I not use Jade or some other template engine?
Pros:
Encourages good code organization (data generation is separate from presentation code)
Output generation is more expressive (template syntax doesn't require a sea of string concatenation)
Better productivity (common problems such as output encoding, iterating, conditionals, etc. have been handled)
Generally requires less code overall (jade in particular has a very terse syntax)
Cons:
Some performance overhead
Yet another thing to learn
About JADE or any other template language that differ a lot from HTML:
First of all it is more time consuming to debug the produced HTML. You see HTML in the browser and you need to parse it back to JADE (in your brain) to compare with your editor content. This is very inconvenient and makes debugging harder then it should be.
Of course it may not be a problem if you are the only programmer who works on the code. It may seem so easy to match the html lines with JADE lines if you are the one who wrote them.
It is a problem when working in teams.

Why use HTML markup in languages like ruby, php, asp.net mvc instead of XLST to convert XML to HTML?

I just learned about XLST on stackoverflow today (I love how in computers you can program for years and constantly have 'darn, how did I not know about that technology' moments). I'm wondering how popular XLST it is for web development? I've worked on a few websites (using php, ruby, and asp.net mvc) but I'm not a web developer by any means.
Is the reason each web language I listed above has it's own way of marking up html (and thus taking advantage of 'templates') just to make it simpler (simpler as in more to the point and not and more geared to one specific purpose) in that you don't have to first convert what you want to display to xml and then to html? Or are there other reasons why XLST doesn't seem too popular for web development? Or am I just crazy (again most of my work is with Desktop apps) and actually it is widely used in webpages? If not in development, what do you mainly use it for?
It seems that being able to easily serialize objects in xml with C# would make XLST a very popular way of displaying object in HTML on websites?
Thanks for feeding my curiosity!!
IMHO there are two main reasons why XSLT is not very popular:
it's generally hard.
you can just skip it and directly write HTML, and HTML is not hard and has first-class support from all web frameworks.
In summary, there is usually not enough reason to introduce yet-another-abstraction. Abstractions are not free, they solve some problems but introduce others (i.e. the "solve it by adding another layer of indirection" adagio), so the benefits must clearly outweight the costs.
That said, there are XSLT-based solutions for many web frameworks, e.g.:
ASP.NET MVC XSLT view engine
libxslt in RoR
Here's an excellent article that discusses XSLT for view engines.
As I've started so many answers on Stackoverflow, it depends :)
Doing what you're describing is adding another layer of abstraction between application logic and the display output; and introducing another language. There can be very compelling reasons to do this, but the important part to keep in mind is that you need to recognize and quantify the need to be able to understand whether it's worth it.
It seems that being able to easily...
As with most things in software development, something that seems easy after a few hours of pondering turns out to be quite complex and involved when you actually try to do it. This is especially true here, because I have built exactly what you're describing in ASP.NET. It provides a very interesting mechanism for skinning sites, as you simply have to define your model XML schemas and anyone can write an XSLT to transform it. But XSLT is like a one-way tunnel. It can't (easily) reach back out or to the sides to pull in extra info that wasn't included in the original model - "peripheral data", so to speak. In fact, it has a hard time really being aware of what's going on in the application at all.
Also, XSLT is very verbose, and (in many ways) a crude language. This makes it... unpleasant to do things like loops, and rather time consuming to even do something like an if-else statement*. An XSLT that generated something like, say, the page you're looking out right now would probably be several thousand lines long - which you're adding on top of the application code you have to write either way.
It is simply an additional cost which may or may not be worth it, depending on what you're trying to accomplish.
*For example, I once saw a developer try to write a pager control (e.g. "first | prev | 1 | 2 | 3 | next | last") in XSLT. We still visit her in the sanitarium from time to time.
XSLT is not popular in web frameworks because XML is not popular in web frameworks. But, if you have XML data, or you are willing to convert your objects to XML then XSLT is the best tool for transforming that XML into HTML or XHTML.
If you are using ASP.NET checkout myxsl
When I first discovered XSLT I was excited because I could write semantically correct presentation markup, and I would no longer have to write so many hacks and use so many nested layers of <divs> just to get the effect I wanted -- XSLT could do all that for me!
Then I realized XSLT was a weird, backwards, and painful language to write in. I believe other web developers have discovered the same thing and steered away from it.
What's the point when you can just write straight HTML and avoid an extra layer of abstraction? And if your application is complex enough to warrant that, then there are so many better, simpler, more powerful alternatives (other templating languages).
The design of a web page is often handed over to designers.
One thing about web designers is they know HTML (if you're lucky) but they're not going to know XSLT.
It's unpleasant to do loops and if-else-statements in XSLT like it's unpleasant to hammer with a screwdriver. Don't do that. The behaviour of an XSLT script is driven by the data, you only need to find the right matches for your templates. XSLT-templates are not just a piece of code. The actual data from the XML-file fitting in the "match"-attribute of the "template"-elements decide if and when to execute a template.
Once you discover that XSLT works different to other languages you see the possibilities this gives to you. It's easy to do things that are hard to do in usual languages. Just use it when appropriate. If your data comes as XML and you know XSLT and XPath, this is the right tool to build web pages.
Hints are available at e.g. jenitennison.com/xslt.

worth to learn groovy? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
the question im asking, is it worth to learn a new language like groovy? cause if i learn groovy, it feels like i code in groovy and not java. and how smart is that when i have to be good in java to code desktop applications too in the future. so if i use groovy a lot for web applications, i will just be worse and have to start over to be good in java when i code desktop applications right?
so why don´t I just stick with java and be good at ONE language instead of having to switch between 2 languages and their syntax. Cause it would be so confusing...
Groovy is a nice, scriptable and easier-to-use Java "knockoff" – and I don't mean that derogatively. while Java is a language to be compiled, deployed and (often) run on Enterprise servers where performance matters, Groovy is a language where you can quickly create a program to get something done. Often that something is fairly simple, so it's an hour's or a day's coding effort. Often the code is only run once and then thrown away. Because Java has more boilerplate and formalism in it, you can do this kind of program more quickly and hence more efficiently in Groovy.
However, just to give you some perspective, Groovy is a relative newcomer stomping on the turf of various other, better established scripting languages:
Perl is one of the grandfathers of scripting languages; rarely does a Unix server get installed without Perl on it, and Perl scripts are the lifeblood of many servers. However, Perl is a write-only language that looks like line noise to the uninitiated. There's more than one way to do everything, so styles diverge drastically. Perl coding tends to be a bit messy.
Python is a fresher, cleaner script language than Perl, and is these days preferred by many as a scripting language. It's fun to program in, it gets things done and because it's been around for a few years, lots of people know it. Python is found behind/inside a number of Linux system utilities.
Groovy leaves Perl and Python in the dust when (a) the environment already makes use of a JVM and/or there's a requirement to use existing Java code, including libraries. So far so good. Groovy is not blazingly fast, but faster than Python. Being dynamically typed, it's "fun" and "easy" to program in a way that Java's not.
But then came Scala. Scala is like Java on steroids. It is statically typed so it's not quite as "fun" to program as Groovy, but it has type inference so often you can leave off the types and the compiler can figure them out. Scala works really hard to make the most of types; it does generic types a lot more seamlessly than Java. It dispenses with a lot of Java's boilerplate, so Scala programs are typically about 30% shorter than similar Java programs. Scala runs on the JVM and interfaces pretty well with Java code. It also runs about as quickly as Java, which most of the other languages don't.
Finally in historic order, there's Clojure. Clojure is a Lisp derivative, so it has a programming style very different from languages you'd otherwise know, and it burns through a lot of parentheses! But Clojure runs on the JVM, is very compatible with all the rest of Java, and it's dynamically typed. You can use it as a scripting language or treat it like a compiled language... it's up to you. I find it fun to program in, and the fact that it's an almost pure functional language forces you to think in new ways about programming. It hurts your head at the beginning, but if you survive it's a very worthwhile exercise because you learn some techniques that will become more relevant (I think) in future programming.
In summary, it would probably do you good (put hair on your chest, if I may be so sexist) to learn one or more of these "alternative" / "scripting" languages. You may find them useful. Usually when there's something to be hacked up quickly in my project, I get the job because all my colleagues only know Java, and by the time they finish setting up their class framework I'm already done.
Quote:
so why don´t I just stick with java
and be good at ONE language instead of
having to switch between 2 languages
and their syntax.
This seems like a more general question about learning programming languages than learning a new language (Groovy) which runs on top of the Java Virual Machine.
Here's a question:
Suppose you are learning a foreign language because you want to be fluent in multiple languages so you can converse with many people. You're learning German right now, but you're getting good at it, but you also want to learn Spanish. Would you just suddenly forget German if you start to learn Spanish? If you are indeed worried that you will, what would you do?
If you were going to learn Groovy, but don't want to forget how to write Java, then why not continue to use both languages at the same time?
One of the things about being a programmer is going to be learning to adapt to new technologies as they come along. It's a good thing to be able to learn new languages, as it's going to be a skill that's going to be very useful in a field which is constantly changing.
Why don't you code your desktop apps in groovy too? Just because groovy is the choice of a web framework (grails) doesn't mean that you can't use it for desktop apps.
Indeed, it is great for desktop apps too. It's more a matter of dynamic or static languages...
In my opinion, it is quite good to have for each task the right language at hand. So go ahead and learn groovy - the result will be that you'll miss groovy features when you try to use java again ;-)
I would say in general in this field it's always good to be learning. I try constantly to learn new concepts to add to my toolbox, while getting better at the core things I'm interested in like Java. I recently purchased a book on learning Clojure - another functional language for the JVM.
The downside to learning something without using it every day is that some details don't stick in your head. That said, I'm glad I spent some time with Clojure; the important stuff stuck and I know I can quickly look up the details if and when I need to. You may want to take a similar approach to Groovy.
The Java platform is slowly starting to change direction to one where the JVM is targeted by multiple source languages (a trick .net has been showing off since day 1, but it's taking Java a while to catch up there). The Java7 classfile format is even adding a new instruction to make these dynamic languages work faster.
If you want to keep yourself current, then learning Groovy is a good way to do it, without abandoning all your investment in the Java platform.
Furthermore, Groovy (and Grails) is now maintained by SpringSource, so its popularity is only going to increase.
Going from java to groovy isn't a lot of work. No where near what would be needed to move to a less Javaish language like clojure.
I really like groovy for one-off apps and for scripting existing java code. I've used it to parse data from REST calls and feed the data to a JMS queue. I've used it to create scrambled test data for a partner from our production data. For stuff like that it is amazing.
If the goal is to learn a dynamic language to add to the toolbox, Python and Ruby are both good choices. They run on the JVM and have native versions. Both are well supported on a large number of platforms.
If the goal is to learn an alternative JVM language, groovy is an excellent choice. Both Scala and Clojure would also be good choices.
I used to stick to the "learn a new language every year rule" from The Pragmatic Programmer, but that was before I had kids. Now I learn a new building toy every six months.
First of all I'm this is a highly subjective question.
In my humble opinion it is worth learning a new language especially if it varies in paradigms (as is the case with groovy). I'm fairly young myself so for me learning a new language is not a much of hassle but the way I see it if you like the language, you estimate that coding in language X will be profitable you should learn it.
It won't hurt your resume.
It won't make your head hurt (much).
The only problem is, will you use it. You need to use a language to become good at it. If you are going to learn it now and never use it tomorrow it probably ain't worth learning it.
Learning something new does not take away something you already know. You may be a bit rusty when you get back into Java, but it'll come back real quick.
Also--
I'm not a Java guy, but I believe Groovy targets the JVM. If this is the case, then programming in Groovy will make you a better Java programmer, because you'll still be targeting the same framework as Java (the language) so you'll still continue to gain experience with the Java libraries. Knowing the available libraries is what really matters, not how well you know every minute detail of a particular language.
I find that by learning new languages, I always end up learning new ways to think about problems. Each language guides you into solving problems in the way most easily expressed by the language. Learning new languages only makes you stronger all around because you learn new ways to solve problems.
You might have to re-orient yourself with the libraries after a long time away from a language, but even then it's not a huge ordeal - just more frequent google searches, etc.
The benefits, however, are worth it. I recently did some functional programming for the first time and it really taught me a lot of different ways to think about certain situations. I find myself now using some of C#'s functional aspects and it makes my code a lot cleaner in some cases. The bottom line is; if your going to do this for a living you are going to want to learn more than one language, have you ever met a mechanic that only knew one make and model of car?
It's always good to learn a new language to be a better programmer. Groovy is a natural choice for java programmer - easy to learn and you can still use your all java knowlege.
Groovy is a dynamic language, after try to learn any functional language (like Scala). With this experience you will see java from different perspecitve. Some task that was painful in Java will be trivial in Groovy/Scala.
you can program desktop aplication with Griffon whose language of choice is Groovy, give it a try
If you are looking for online help, check this websites:
for Groovy
for Grails

Is XSLT worth it? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
A while ago, I started on a project where I designed a html-esque XML schema so that authors could write their content (educational course material) in a simplified format which would then be transformed into HTML via XSLT. I played around (struggled) with it for a while and got it to a very basic level but then was too annoyed by the limitations I was encountering (which may well have been limitations of my knowledge) and when I read a blog suggesting to ditch XSLT and just write your own XML-to-whatever parser in your language of choice, I eagerly jumped onto that and it's worked out brilliantly.
I'm still working on it to this day (I'm actually supposed to be working on it right now, instead of playing on SO), and I am seeing more and more things which make me think that the decision to ditch XSLT was a good one.
I know that XSLT has its place, in that it is an accepted standard, and that if everyone is writing their own interpreters, 90% of them will end up on TheDailyWTF. But given that it is a functional style language instead of the procedural style which most programmers are familiar with, for someone embarking on a project such as my own, would you recommend they go down the path that I did, or stick it out with XSLT?
So much negativity!
I've been using XSLT for a good few years now, and genuinely love it. The key thing you have to realise is that it's not a programming language it's a templating language (and in this respect I find it indescribably superior to asp.net /spit).
XML is the de facto data format of web development today, be it config files, raw data or in memory reprsentation. XSLT and XPath give you an enormously powerful and very efficient way to transform that data into any output format you might like, instantly giving you that MVC aspect of separating the presentation from the data.
Then there's the utility abilities: washing out namespaces, recognising disparate schema definitions, merging documents.
It must be better to deal with XSLT than developing your own in-house methods. At least XSLT is a standard and something you could hire for, and if it's ever really a problem for your team it's very nature would let you keep most of your team working with just XML.
A real world use case: I just wrote an app which handles in-memory XML docs throughout the system, and transforms to JSON, HTML, or XML as requested by the end user. I had a fairly random request to provide as Excel data. A former colleague had done something similar programatically but it required a module of a few class files and that the server had MS Office installed! Turns out Excel has an XSD: new functionality with minimum basecode impact in 3 hours.
Personally I think it's one of the cleanest things I've encountered in my career, and I believe all of it's apparent issues (debugging, string manipulation, programming structures) are down to a flawed understanding of the tool.
Obviously, I strongly believe it is "worth it".
Advantages of XSLT:
Domain-specific to XML, so for example no need to quote literal XML in the output.
Supports XPath/XQuery, which can be a nice way to query DOMs, in the same way that regular expressions can be a nice way to query strings.
Functional language.
Disadvantages of XSLT:
Can be obscenely verbose - you don't have to quote literal XML, which effectively means you do have to quote code. And not in a pretty way. But then again, it's not much worse than your typical SSI.
Doesn't do certain things which most programmers take for granted. For instance string manipulation can be a chore. This can lead to "unfortunate moments" when novices design code, then frantically search the web for hints how to implement functions they assumed would just be there and didn't give themselves time to write.
Functional language.
One way to get procedural behaviour, by the way, is to chain multiple transforms together. After each step you have a brand new DOM to work on which reflects the changes in that step. Some XSL processors have extensions to effectively do this in one transform, but I forget the details.
So, if your code is mostly output and not much logic, XSLT can be a very neat way to express it. If there is a lot of logic, but mostly of forms which are built in to XSLT (select all elements which look like blah, and for each one output blah), it's likely to be quite a friendly environment. If you fancy thinking XML-ishly at all times, then give XSLT 2 a go.
Otherwise, I'd say that if your favourite programming language has a good DOM implementation supporting XPath and allowing you to build documents in a useful way, then there are few benefits to using XSLT. Bindings to libxml2 and gdome2 should do nicely, and there's no shame in sticking to general-purpose languages you know well.
Home-grown XML parsers are usually either incomplete (in which case you'll come unstuck some day) or else not much smaller than something you could have got off the shelf (in which case you're probably wasting your time), and give you any number of opportunities to introduce severe security issues around malicious input. Don't write one unless you know exactly what you gain by doing it. Which is not to say you can't write a parser for something simpler than XML as your input format, if you don't need everything that XML offers.
I have to admit a bias here because I teach XSLT for a living. But, it might be worth covering off the areas that I see my students working in. They split into three groups generally: publishing, banking and web.
Many of the answers so far could be summarised as "it's no good for creating websites" or "it's nothing like language X". Many tech folks go through their careers with no exposure to functional/declarative languages. When I'm teaching, the experienced Java/VB/C/etc folk are the ones who have issues with the language (variables are variables in the sense of algebra not procedural programming for example). That's many of the people answering here - I've never gotten on with Java but I'm not going to bother to critique the language because of that.
In many circumstances it is an inappropriate tool for creating websites - a general purpose programming language may be better. I often need to take very large XML documents and present them on the web; XSLT makes that trivial. The students I see in this space tend to be processing data sets and presenting them on the web. XSLT is certainly not the only applicable tool in this space. However, many of them are using the DOM to do this and XSLT is certainly less painful.
The banking students I see use a DataPower box in general. This is an XML appliance and it's used to sit between services 'speaking' different XML dialects. Transformation from one XML language to another is almost trivial in XSLT and the number of students attending my courses on this are increasing.
The final set of students I see come from a publishing background (like me). These people tend to have immense documents in XML (believe me, publishing as an industry is getting very into XML - technical publishing has been there for years and trade publishing is getting there now). These documents need to be processing (DocBook to ePub comes to mind here).
Someone above commented that scripts tend to be below 60 lines or they become unwieldy. If it does become unwieldy, the odds are the coder hasn't really got the idea - XSLT is a very different mindset from many other languages. If you don't get the mindset it won't work.
It's certainly not a dying language (the amount of work I get tells me that). Right now, it's a bit 'stuck' until Microsoft finish their (very late) implementation of XSLT 2. But it's still there and seems to be going strong from my viewpoint.
We use XSLT extensively for things like documentation, and making some complex configuration settings user-serviceable.
For documentation, we use a lot of DocBook, which is an XML-based format. This lets us store and manage our documentation with all of our source code, since the files are plain text. With XSLT, we can easily build our own documentation formats, allowing us to both autogenerate the content in a generic way, and make the content more readable. For example, when we publish release notes, we can create XML that looks something like:
<ReleaseNotes>
<FixedBugs>
<Bug id="123" component="Admin">Error when clicking the Foo button</Bug>
<Bug id="125" component="Core">Crash at startup when configuration is missing</Bug>
<Bug id="127" component="Admin">Error when clicking the Bar button</Bug>
</FixedBugs>
</ReleaseNotes>
And then using XSLT (which transforms the above to DocBook) we end up with nice release notes (PDF or HTML usually) where bug IDs are automatically linked to our bug tracker, bugs are grouped by component, and the format of everything is perfectly consistent. And the above XML can be generated automatically by querying our bug tracker for what has changed between versions.
The other place where we have found XSLT to be useful is actually in our core product. Sometimes when interfacing with third-party systems we need to somehow process data in a complex HTML page. Parsing HTML is ugly, so we feed the data through something like TagSoup (which generates proper SAX XML events, essentially letting us deal with the HTML as if it were properly written XML) and then we can run some XSLT against it, to turn the data into a "known stable" format that we can actually work with. By separating out that transformation into an XSLT file, that means that if and when the HTML format changes, the application itself does not need to be upgraded, instead the end-user can just edit the XSLT file themselves, or we can e-mail them an updated XSLT file without the entire system needing to be upgraded.
I would say that for web projects, there are better ways to handle the view side than XSLT today, but as a technology there are definitely uses for XSLT. It's not the easiest language in the world to use, but it is definitely not dead, and from my perspective still has lots of good uses.
XSLT is an example of a declarative programming language.
Other examples of declarative programming languages include regular expressions, Prolog, and SQL. All of these are highly expressive and compact, and usually very well designed and powerful for the task for which they are designed.
However, software developers generally hate such languages, because they are so different from more mainstream OO or procedural languages that they're hard to learn and debug. Their compact nature generally makes it very easy to do a lot of damage inadvertently.
So while XSLT is an efficient mechanism to merge data into presentation, it fails in the ease-of-use department. I believe that's why it hasn't really caught on.
I remember all the hype around XSLT when the standard was newly released. All the excitement around being able built an entire HTML UI with a 'simple' transform.
Let’s face it, it is hard to use, near impossible to debug, often unbearably slow. The end result is nearly always quirky and less than ideal.
I will sooner gnaw off my own leg than use an XSLT while there are better ways to do things. Still it has its places, its good for simple transform tasks.
I've used XSLT (and also XQuery) extensively for various things - to generate C++ code as part of build process, to produce documentation from doc comments, and within an application that had to work with XML in general and XHTML in particular a lot. The code generator in particular was in excess of 10,000 lines of XSLT 2.0 code spread around about a dozen separate files (it did a lot of things - headers for clients, remoting proxies/stubs, COM wrappers, .NET wrappers, ORM - to name a few). I inherited it over another guy who didn't really understand the language well, and the older bits were consequently quite a mess. Newer stuff that we wrote was mostly kept sane and readable, however, and I do not recall any particular problems with achieving that. It was certainly not any harder than doing it for C++.
Speaking of versions, dealing with XSLT 2.0 definitely helps keep you sane, but 1.0 is still alright for simpler transforms. In its niche, it is an extremely handy tool, and the productivity you get from certain domain-specific features (most importantly, dynamic dispatch via template matching) is hard to match. Despite the perceived wordiness of XSLT's XML-based syntax, the same thing in LINQ to XML (even in VB with XML literals) was usually several times longer. Quite often, however, it gets undeserved flack because of unnecessary use of XML in some case in the first place.
To sum it up: it is an incredibly useful tool to have in one's toolbox, but it is a very specialized one, so it is good so long as you use it properly and for its intended purpose. I really wish there was a proper, native .NET implementation of XSLT 2.0.
I use XSLT (for lack of better alternative), but not for presentation, just for transformation:
I write short XSLT transformations to do mass edits on our maven pom.xml files.
I've written a pipeline of transformations to generate XML Schemas from XMI (UML Diagram). It worked for a while, but it finally got too complex and we had to take it out behind the barn.
I've used transformations to refactor XML Schemas.
I've worked around some limitations in XSLT by using it to generate an XSLT to do the real work. (Ever tried to write an XSLT that produces an output using namespaces that aren't known until runtime?)
I keep coming back to it because it does a better job round-tripping the XML it's processing than other approaches I've tried, which have seemed needlessly lossy or simply misunderstand XML. XSLT is unpleasant, but I find using Oxygen makes it bearable.
That said, I'm investigating using Clojure (a lisp) to perform transformations of XML, but I haven't gotten far enough yet to know if that approach will bring me benefits.
Personally I used XSLT in a totally different context. The computer game that I was working on at the time used tons of UI pages defined using XML. During a major refactor shortly after a release we wanted to change the structure of these XML documents. We made the game's input format follow a much better and schema aware structure.
XSLT seemed the perfect choice for this translation from old format -> New format. Within two weeks I had a working conversion from old to new for our hundreds of pages. I was also able to use it to extract lots of information on the layout of our UI pages. I created lists of which components were imbedded in which relatively easily which I then used XSLT to write into our schema definitions.
Also, coming from a C++ background, it was a very fun and interesting language to master.
I think that as a tool to translate XML from one format to another it is fantastic. However, it is not the only way to define an algorithm that takes XML as an input and outputs Something. If your algorithm is sufficiently complex, the fact that the input is XML becomes irrelevant to your choice of tool - i.e roll your own in C++ / Python / whatever.
Specific to your example, I would imagine the best idea would be to create your own XML->XML convert that follows your business logic. Next, write a XSLT translator that just knows about formatting and does nothing clever. That might be a nice middle ground but it totally depends what you are doing. Having a XSLT translator on the output makes it easier to create alternative output formats - printable, for mobiles, etc.
Yes, I use it a lot. By using different xslt files, I can use the same XML source to create multiple polyglot (X)HTML files (presenting the same data in different ways), a RSS feed, an Atom feed, a RDF descriptor file and fragment of a site map.
It's not a panacea. There are things it does well, and things it doesn't do well, and like all other aspects of programming, it's all about using the right tool for the right job. It's a tool that's well worth having in your toolbox but it should used only when it's appropriate to do so.
I would definitely reccomend to stick it out. Particularly if you are using visual studio which has built in editing, viewing and debugging tools for XSLT.
Yes, it is a pain while you are learning, but most of the pain is to do with familiarity. The pain does diminish as you learn the language.
W3schools has two articles that are of particular worth:
http://www.w3schools.com/xpath/xpath_functions.asp
http://www.w3schools.com/xsl/xsl_functions.asp
I have found XSLT to be quite difficult to work with.
I have had experience working on a system somewhat similar to the one you describe. My company noted that the data we were returning from "the middle tier" was in XML, and that the pages were to be rendered in HTML which might as well be XHTML, plus they'd heard that XSL was a standard for transforming between XML formats. So the "architects" (by which I mean people who think deep design thoughts but apparently never code) decided that our front tier would be implemented by writing XSLT scripts that transformed the data into the XHTML for display.
The choice turned out to be disastrous. XSLT, it turns out, is a pain to write. And so all of our pages were difficult to write and to maintain. We would have done much better to have used JSP (this was in Java) or some similar approach that used one kind of markup (angle brackets) for the output format (the HTML) and another kind of markup (like <%...%>) for the meta-data. The most confusing thing about XSLT is that it is written in XML, and it translates from XML to XML... it is quite difficult to keep all 3 different XML documents straight in one's mind.
Your situation is slightly different: instead of authoring each page in XSLT as I did, you only need to write ONE bit of code in XSLT (the code to convert from templates to display). But it sounds like you may have run into the same kind of difficulty that I did. I would say that trying to interpret a simple XML-based DSL (domain specific language) like you are doing is NOT one of the strong points of XSLT. (Although it CAN do the job... after all, it IS Turing complete!)
However, if what you had was simpler: you have data in one XML format and wanted to make simple alterations to it -- not a full page-description DSL, but some simple straightforward modifications, then XSLT is an excellent tool for that purpose. It's declarative (not procedural) nature is actually an advantage for that purpose.
-- Michael Chermside
XSLT is difficult to work with, but once you conquer it you will have a very thorough understanding of the DOM and schema. If you also XPath, then you on your way to learning functional programming and this will expose to new techniques and ways about solving problems. In some cases, successive transformation is more powerful than procedural solutions.
I use XSLT extensively, for a custom MVC style front-end. The model is "serialized" to xml (not via xml serializaiton), and then converted to html via xslt. The advantage over ASP.NET lie in the natural integration with XPath, and the more rigorous well-formedness requirements (it's much easier to reason about document structure in xslt than in most other languages).
Unfortunately, the language contains several limitations (for example, the ability to transform the output of another transform) which mean that it's occasionally frustrating to work with.
Nevertheless, the easily achievable, strongly enforced separation of concerns which it grants aren't something I see another technology providing right now - so for document transforms it's still something I'd recommend.
I used XML, XSD and XSLT on an integration project between very dis-similar DB systems sometime in 2004. I had to learn XSD and XSLT from scratch but it wasn't hard. The great thing about these tools was that it enabled me to write data independent C++ code, relying on XSD and XSLT to validate/verify and then transform the XML documents. Change the data format, change the XSD and XSLT documents not the C++ code which employed the Xerces libraries.
For interest: the main XSD was 150KB and the average size of the XSLT was < 5KB IIRC.
The other great benefit is that the XSD is a specification document that the XSLT is based on. The two work in harmony. And specs are rare in software development these days.
Although I did not have too much trouble learning the declarative nature XSD and XSLT I did find that other C/C++ programmers had great trouble in adjusting to the declarative way. When they saw that was it, ah procedural they muttered, now that I understand! And they proceeded (pun?) to write procedural XSLT! The thing is you have to learn XPath and understand the axes of XML. Reminds me of old-time C programmers adjusting to employing OO when writing C++.
I used these tools as they enabled me to write a small C++ code base that was isolated from all but the most fundamental of data structure modifications and these latter were DB structure changes. Even though I prefer C++ to any other language I'll use what I consider to be useful to benefit the long term viability of a software project.
I used to think XSLT was a great idea. I mean it is a great idea.
Where it fails is the execution.
The problem I discovered over time was that programming languages in XML are just a bad idea. It makes the whole thing impenetrable. Specifically I think XSLT is very hard learn, code and understand. The XML on top of the functional aspects just makes the whole thing too confusing. I have tried to learn it about 5 times in my career, and it just doesn't stick.
OK, you could 'tool' it -- I think that was partly the point of it's design -- but that's the second failing: all the XSLT tools on the market are, quite simply ... crap!
The XSLT specification defines XSLT as "a language for transforming XML documents into other XML documents". If you are trying to do any thing but the most basic data processing within XSLT there are probably better solutions.
Also worth noting that the data processing capabilities of XSLT can be extended in .NET using custom extension functions:
MSDN Documentation
CSharpFriends: Tutorial
I maintain an online documentation system for my company. The writers create the documentation in SGML ( an xml like language ). The SGML is then combined with XSLT and transformed into HTML.
This allows us to easily make changes to the documentation layout without doing any coding. Its just a matter of changing the XSLT.
This works well for us. In our case, its a read only document. The user isn't interacting with the documentation.
Also, by using XSLT, you are working closer to your problem domain (HTML). I always consider that to be good idea.
Lastly, if your current system WORKS, leave it alone. I would never suggest trashing your existing code. If I was starting from scratch, I would use XSLT, but in your case, I would use what you have.
It comes down to what you need it for. Its main strength is the easy maintainability of the transform, and writing your own parser generally obliterates that. With that said, sometimes a system is small and simple and really doesn't need a "fancy" solution. As long as your code-based builder is replaceable without having to change other code, no big deal.
As for the ugliness of XSL, yes it's ugly. Yes, it takes some getting used to. But once you get the hang of it (shouldn't take long IMO), it's actually smooth sailing. Compiled transforms run quite quickly in my experience, and you can certainly debug into them.
I still believe that XSLT can be useful but it is an ugly language and can lead to an awful unreadable, unmaintainable mess. Partly because XML is not human readable enough to make up a "language" and partly because XSLT is stuck somewhere between being declarative and procedural. Having said that, and I think a comparison can be drawn with regular expressions, it has it's uses when it comes to simple well defined problems.
Using the alternative approach and parsing XML in code can be equally nasty and you really want to employ some kind of XML marshalling/binding technology (such as JiBX in Java) that will convert your XML straight to an object.
If you can use XSLT in a declarative style (although I don't entirely agree that it is declarative language) then I think it is useful and expressive.
I've written web apps that use an OO language (C# in my case) to handle the data/ processing layer, but output XML rather than HTML. This can then be consumed directly by clients as a data API, or rendered as HTML by XSLTs. Because the C# was outputting XML that was structurally compatible with this use it was all very smooth, and the presentation logic was kept declarative. It was easier to follow and change than sending the tags from C#.
However, as you require more processing logic at the XSLT level it gets convoluted and verbose - even if you "get" the functional style.
Of course, these days I'd probably have written those web apps using a RESTful interface - and I think data "languages" such as JSON are gaining traction in areas that XML has traditionally been transformed by XSLT. But for now XSLT is still an important, and useful, technology.
I have spent a lot of time in XSLT and found that while it is a useful tool in some situations, it is definitely not a fix all. It works very well for B2B purposes when it is used for data translation for machine-readable XML input/output. I don't think you are on the wrong track in your statement of its limitations. One of the things that frustrated me the most were the nuances in the implementations of XSLT.
Perhaps you should look at some of the other markup languages available. I believe Jeff did an article about this very topic concerning Stack Overflow.
Is HTML a Humane Markup Language?
I would take a look at what he wrote. You can probably find a software package that does what you want "out of the box", or at least very close instead of writing your own stuff from the ground up.
I'm currently tasked with scraping data from a public site (yeah, i know). Thankfully it conforms to xhtml so I'm able to use xslt to gather the data I need. The resulting solution is readable, clean and easy to change if need occurs. Perfect!
I've used XSLT before. The group of 6 .xslt files (refactored out of one large one) was about 2750 lines long before I rewrote it in C#. The C# code is currently 4000 lines containing lots of logic; I don't even want to think about what that would have taken to write in XSLT.
The point where I gave up is when I realized not having XPATH 2.0 was significantly hurting my progress.
To answer your three questions:
I've used XSLT once some years ago.
I do believe XSLT could be the right solution in certain circumstances. (Never say never)
I tend to agree with your assesment that it is mostly useful for 'simple' transformations. But I think as long as you understand XSLT well, there is a case to be made for using it for bigger tasks like publishing a website as XML transformed into HTML.
I believe the reason many developers dislike XSLT is because they do not understand the fundamentally different paradigm it is based on. But with the recent interest in functional programming we might see XSLT making a comeback...
One place where xslt really shines is in generating reports. I've found that a 2 step process, with the first step exporting the report data as an xml file, and the second step generating the visual report from the xml using xslt. This allows for nice visual reports while still keeping the raw data around as a validation mechanism if needs be.
At a previous company we did a lot with XML and XSLT. Both XML and XSLT big.
Yes there is a learning curve, but then you have a powerful tool to handle XML. And you can even use XSLT on XSLT (which can sometimes be useful).
Performance is also an issue (with very large XML) but you can tackle that by using smart XSLT and do some preprocessing with the (generated) XML.
Anybody with knowledge of XSLT can change the apearance of the finished product because it is not compiled.
I personally like XSLT, and you may want to give the simplified syntax a look (no explicit templates, just a regular old HTML file with a few XSLT tags to spit values into it), but it just isn't for everyone.
Maybe you just want to offer your authors a simple Wiki or Markdown interface. There are libraries for that, too, and if XSLT isn't working for you, maybe XML isn't working for them either.
XSLT is not the end-all be-all of xml transformation. However, it's very difficult to judge based on the information given if it would have been the best solution to your problem or if there are other more efficient and maintainable approaches. You say the authors could enter their content in a simplified format - what format? Text boxes? What kind of html were you converting it to?
To judge whether XSLT is the right tool for the job, it would help to know the features of this transformation in more detail.
I enjoy using XSLT only for changing the tree structure of XML documents. I find it cumbersome to do anything related to text processing and relegate that to a custom script that I may run before or after applying an XSLT to an XML document.
XSLT 2.0 included a lot more string functions, but I think it's not a good fit for the language, and there's not many implementations of XSLT 2.0.