Simple HTML interface to XSD? - html

I'm writing an app that, at its heart, uses a hierarchical tree of nodes
in XML, it looks like this:
<node>
<name>Node1</name>
<Attribute1>Something</Attribute1>
<Attribute2>SomethingElse</Attribute2>
<child>Node2</child>
<child>Node4</child>
<child>Node7</child>
</node>
And so on (all child elements must refer to an existing node, though the node inquestion doesnt have to precede the first reference to it)
For a simple structure like this is there a simple tool to generate a html page that will allow a user to enter Nodes and dynamically update a server-side xml file?
Im basically writing a tool that will use such a file, but the people who's job it is to create the file arent especially techno-literate, so creating the XML by hand is a no-no.
I could hand-crank one fairly quickly, but if I can get a tool to do it, even better (especially as the format may change in future)....

Xopus is a browser based XML editor that you could use for this. It is designed for the non techno-literate people out there.
Disclaimer: I work at Xopus.

I am pretty sure there is nothing that will do that for you automagically and you'll need to write that bit yourself.
Your options are to create a web based interface to do it, using HTML POST and writing the output to a file or database (then reloading it on submission) or something more advance with Javascript (e.g. that could do it dynamically with AJAX).
You can't do it in HTML alone - either way you'd need something to handle outputting the existing data and accepting HTTP POST requests, but you don't mention what language or platform you are using to write this. Being clear on that will help people suggest appropriate solutions.
You might want to rethink the XML structure ... Elements called "attribute{anything}" are ill advised (as are elements named in the convention foo1, foo2, (etc)). The whole <child>Node2</child> thing doesn't seem like a good way to go either. I suggest posting an actual example of the XML in question.
From what you've said, it sounds like there is no specific need for it to be in XML at all. Not that XML is bad (it isn't) but if putting it in an SQL database is a valid option and you have one of those anyway (e.g. your using a LAMP stack) then that's something to consider.

Would an XML editor like http://www.oxygenxml.com/ suffice? I don't know of any html web ones unless you write one yourself and use AJAX to send the data. At least an XML editor can generate a form that you can use to create and edit XML documents. Microsoft do infopath as well - which is actually designed more for questionaires but might do what you need, if the non-tech people would prefer something more office like.

Related

Is there a way to define a variable in HTML?

I have some text that is pretty repetitive on a web page. It can change from time to time. If I write it in 10 different places then need to change it, I have to go through the html and change it in every single place.
As an example of what I'm looking for, instead of writing "Stack Overflow is the best website ever" over and over again, is there a way that I could write something like variable:stackoverflow="Stack Overflow is the best website ever" in one place then write "stackoverflow" in the 10 different places I need the text. That way if/when I need to change it I can just change it in the one place, not the 10 different places on the page.
No.
Typically this problem is solved by generating HTML using a programming language, often in combination with a template language.
This can be done either at:
build time, generating static files and uploading them to a server), e.g. using a tool such as assemble
server side (on demand when the page is requested), e.g. using FastCGI and the programming language of your choice
client side (using JavaScript running in the browser … although this isn't considered best practise).
You can't declare such variables in HTML. I guess, it's the best way, to use PHP. PHP is executed by the server and a HTML-Document will be send to the client. JavaScript is executed by the client and some users denie the JavaScript-execution.
In my opinion it's he best way is to use PHP.
For display texts in PHP, you should check out the website PHP.net

Rails, HTML to JSON?

Given a static HTML page, is there an automated way to generate json?
For a large website that contains a lot of static HTML I am wanting to generate json for RSS feeds and search functionality and am looking for a way to convert HTML to json.
I could obviously write json templates for every page and every language but that would be a unmaintainable. That would double an 800page website to 1600 pages and that is not an option.
One approach I thought of could be to write a bot that would loop through the routes to index the pages and save data to a database which would give me all the choices I could wish for, for searching such as solr, elastic search, thinking sphinx etc...
I could use capybarra to aid me in this by visiting each path and extracting text to save to a database in a rake task as a background job but not sure how that would work in a production environment and it seems that such a common requirement might have already been achieved but for the life of me I can't find one.
I would be far happier (I think) if I could find a way to convert HTML text content to JSON
Any ideas? Has this already been done? are there any gems that might help? or is there built in functionality that I have not thought of, maybe a way to get html into a hash that could then be converted into json? whatever the approach it needs to be automated. I'm just stuck for the best approach.
Basically html looks a lot like xml, but with strong tag meanings, so you could use xml to json conversion, if it all ends up getting tree of html tags embedded in each other.
And so your question becomes this question Except you might get problems with single tags, without closing one. So you might get all of these and put a closing bracket after each one before trying to get it as hash from xml. Oh, early answer. Btw in general for parsing text data you should look at regular expressions.
I chose to go with a nokogiri solution in the end and wrote a parser to meet my needs

What techniques are available for programatically transforming HTML/DOM in an iOS Application?

I'm processing a variety of RSS feeds, which contain summaries, as well as the target page URL content, and trying to use a uniform transformation method.
XSLT was the first thing that occurred to me to try, as it would accomplish what I want, in a standard way, without a lot of fuss aside from adding new XSLT stylesheets to accommodate uniquely formatted sites and feed content.
Problem: XSLT libraries are considered "private" in iOS, and even linking statically against your own copy will get you rejected by the Apple Store analysis tools.
I've looked into the possibility if injecting the stylesheet and data into a UIWebView that wasn't displayed, but this seems like a really roundabout and hackish way to get at the system's underlying XSLT processor in an "approved" fashion.
What alternative techniques/libraries exist which would let me do this in a standard fashion, ie: without rolling my own.
I'm not sure I fully understand your requirements, but one possbility would be to use libxml (which is allowed in iOS) to parse the XML and if necessary manipulate the DOM. If you really need to do XML transformations this is going to be more effort than XSLT, but if you just need to extract data from the XML, that can be done fairly easily with xpath queries.
That said, I have read several people claiming they got XSLT working on iOS and had their apps approved in the app store. In particular, I've seen this stackoverflow answer claimed as a working solution by multiple people. And if that fails, another answer suggested building the libxslt library yourself with renamed symbols to bypass the app store checks. I would only suggest that as a last resort though.
You'll probably want to look into Hpple for something powerful but light weight / native. See the tutorial on getting started here: http://www.raywenderlich.com/14172/how-to-parse-html-on-ios. Good luck!
I'm going to also recommend TFHpple but I'm also going to elaborate on the solution. I've explored an app that navigates a 3rd party (well, I'm the 3rd party, they're the source but that's semantics) website/data source but there are some pitfalls. The biggest pitfall is obvious: if the data source DOM changes you need to change your app and re-release. A creative way around this would be to publish/expose a global copy of the DOM on a public server that way the end user doesn't have to update their app any time the data source changes (as long as the change isn't radical).
For instance, if your expected DOM search in TFHpple is #"//figure[#class='figure']/a" and then a week from now your data source's resource you're looking for is altered to #"//figure1[#class='figure1']/a" you just opened yourself to an App Store release... UNLESS... you publish the expected DOM searches on a web server you control in a data dictionary that your app can consume and serve out to the various DOM search elements within your app. The only problem I foresee here is that if the data source adds or removes a data element you want to consume you either have to release a build or handle the removal ahead of time (respectively).
Lastly if the data source DOM isn't well formed or consistent you may be beating your head against a wall more times than not.

How can I create a well-formatted PDF?

I'm working on automating our company invoicing system. Currently all data is stored in our local MySQL database and someone manually updates an excel spreadsheet and then merges this data into a MS Word template. The goal is to automate this process so that the invoice can be generated from our intranet website as a PDF.
My original plan was to create a template in HTML/CSS and use wkhtmltopdf to generate the PDF but I ran into problems with getting a repeatable header and footer on each page. thead and tfoot aren't supported by Webkit and the fix suggested in this other question does not seem to work either.
So I then stumbled on using XML and XSL-FO, the latter I know nothing about. Is this the best path to take? Are there any libraries or utilities out there that will make converting my HTML+CSS into XML+XSL-FO easier? Are there any other alternatives I'm overlooking?
EDIT
Currently the server is CentOS Linux with a MySQL database. All other code is currently in PHP currently but that may change as the whole system is being revamped. Linux and MySQL will almost certainly remain, though.
For your requirement, XSL-FO might just do the trick. It is much cleaner to produce the pdf's directly from the data, then going the cumbersome html path, unless you need to display the html as well, then you might consider converting from html to pdf, but it will always be messy.
You can get xml results from mysql quite easily (mysql --xml) and then you write one (or several) xsl-fo stylesheet for the data. then, you cannot only produce pdfs, but also postscript files or rtf's with some processors.
XSL-FO has its limitations tho, but for your situation, it should suffice.
I admit, the learning curve can be steep, and maintaining xslt-stylesheets can get very tiring, but as you start knowing more about it, you end up writing less code.
another possibility is to do the whole thing in e.g. java or c# - send select statements and loop the results and iteratively build the pdf using a library like iText.
You could try JODReports or Docmosis as less-code intensive options. You supply Word or OpenOffice Writer documents to act as templates and use these engines to manipulate/populate the templates then spit out the documents in the format(s) you require. This may mean your existing Word-templates can be used directly which should save you some effort/time.
iText is another library that will let you build and pump out PDFs from code. It's pretty good.
If you cloud use ASP.NET for web you can use free ReportViewer library and designer for automated of publishing PDF-s.
Here is some references:
http://gotreportviewer.com
http://weblogs.asp.net/srkirkland/archive/2007/10/29/exporting-a-sql-server-reporting-services-2005-report-directly-to-pdf-or-excel.aspx
If you're OK using .NET and C#, you could use DotPdf from Atalasoft (obligatory disclaimer: I work for Atalasoft and wrote most of DotPdf). The Generating namespace is geared for exactly what you're trying to do: automate report generation. From the very basics, you could just create docs directly with the toolkit or you can create template documents that have unpopulated text fields that you can reload and fill later (see here and here for examples).

How can I extract HTML content efficiently with Perl?

I am writing a crawler in Perl, which has to extract contents of web pages that reside on the same server. I am currently using the HTML::Extract module to do the job, but I found the module a bit slow, so I looked into its source code and found out it does not use any connection cache for LWP::UserAgent.
My last resort is to grab HTML::Extract's source code and modify it to use a cache, but I really want to avoid that if I can. Does anyone know any other module that can perform the same job better? I basically just need to grab all the text in the <body> element with the HTML tags removed.
I use pQuery for my web scraping. But I've also heard good things about Web::Scraper.
Both of these along with other modules have appeared in answers on SO for similar questions to yours:
how can i screen scrape with perl
how can i extract xml of a website and save in a file using perls lwp
how do i extract an html title with perl
can you provide an example of parsing html with your favorite parser
how do I extract content from html file using perl
HTML::Extract's features look very basic and uninteresting. If the modules that draegfun mentioned don't interest you, you could do everything that HTML::Extract does using LWP::UserAgent and HTML::TreeBuilder yourself, without requiring very much code at all, and then you would be free to work in caching on your own terms.
I've been using Web::Scraper for my scraping needs. It's very nice indeed for extracting data, and because you can call ->scrape($html, $originating_uri) then it's very easy to cache the result you need as well.
Do you need to do this in real-time? How does the inefficiency affect you? Are you doing the task serially so that you have to extract one page before you move onto the next one? Why do you want to avoid a cache?
Can your crawler download the pages and pass them off to something else? Perhaps your crawler can even run in parallel, or in some distributed manner.