What techniques are available for programatically transforming HTML/DOM in an iOS Application? - html

I'm processing a variety of RSS feeds, which contain summaries, as well as the target page URL content, and trying to use a uniform transformation method.
XSLT was the first thing that occurred to me to try, as it would accomplish what I want, in a standard way, without a lot of fuss aside from adding new XSLT stylesheets to accommodate uniquely formatted sites and feed content.
Problem: XSLT libraries are considered "private" in iOS, and even linking statically against your own copy will get you rejected by the Apple Store analysis tools.
I've looked into the possibility if injecting the stylesheet and data into a UIWebView that wasn't displayed, but this seems like a really roundabout and hackish way to get at the system's underlying XSLT processor in an "approved" fashion.
What alternative techniques/libraries exist which would let me do this in a standard fashion, ie: without rolling my own.

I'm not sure I fully understand your requirements, but one possbility would be to use libxml (which is allowed in iOS) to parse the XML and if necessary manipulate the DOM. If you really need to do XML transformations this is going to be more effort than XSLT, but if you just need to extract data from the XML, that can be done fairly easily with xpath queries.
That said, I have read several people claiming they got XSLT working on iOS and had their apps approved in the app store. In particular, I've seen this stackoverflow answer claimed as a working solution by multiple people. And if that fails, another answer suggested building the libxslt library yourself with renamed symbols to bypass the app store checks. I would only suggest that as a last resort though.

You'll probably want to look into Hpple for something powerful but light weight / native. See the tutorial on getting started here: http://www.raywenderlich.com/14172/how-to-parse-html-on-ios. Good luck!

I'm going to also recommend TFHpple but I'm also going to elaborate on the solution. I've explored an app that navigates a 3rd party (well, I'm the 3rd party, they're the source but that's semantics) website/data source but there are some pitfalls. The biggest pitfall is obvious: if the data source DOM changes you need to change your app and re-release. A creative way around this would be to publish/expose a global copy of the DOM on a public server that way the end user doesn't have to update their app any time the data source changes (as long as the change isn't radical).
For instance, if your expected DOM search in TFHpple is #"//figure[#class='figure']/a" and then a week from now your data source's resource you're looking for is altered to #"//figure1[#class='figure1']/a" you just opened yourself to an App Store release... UNLESS... you publish the expected DOM searches on a web server you control in a data dictionary that your app can consume and serve out to the various DOM search elements within your app. The only problem I foresee here is that if the data source adds or removes a data element you want to consume you either have to release a build or handle the removal ahead of time (respectively).
Lastly if the data source DOM isn't well formed or consistent you may be beating your head against a wall more times than not.

Related

Azure ARM Template (JSON) Self-Reference

I'm creating some default "drag and drop" templates for our developers, and one section is the required tags. Most of the tags reference a variable: nice and easy. But one wants to reference the resource itself and I cannot figure out a way to it. Does anyone have any suggestions?
The tag itself is called "Context" and it's value should be the "type" of the resource it is in, e.g. "Microsoft.Web/serverfarms". This is desired to aid with billing. Obviously I could either create a different template per resource type (not ideal considering the number of different resources) or rely on the devs to update the field manually (not ideal either as relying on them to add the tags manually hasn't worked so far in a lot of cases), but I am trying to automate it.
Extrapolating from the [variables('< variablename >')] function I did try [resources('type')] but Azure complained that "resources is not a valid selection". I thought it might have complained that it couldn't tell which resource to look at, but it didn't get that far. Internet searches have not turned up anything useful so far.
I can't find a way to do this cleanly either (I hope someone corrects me though! This is a topic for us too). The reference and resourceId functions look promising, but both are unavailable inside of the resources block, would require some parsing, and also require the api version, which you probably also need to vary by resource and so you're just back to where you started. ARM won't even let you use a variable for the resource type property(probably a good thing), so that option is out too.
As such, you'll either have to live with your team having to replace that chunk of text manually or pursue some alternative.
The simplest thing that comes to mind would be to write a script in a language that understands JSON. That script reads the template, adds the tag to the resource, then saves the template again.
A similar approach would be to do it after the resources are deployed by writing a script that loops through all resources and making sure they have the tag. You can use automation to schedule this on a regular basis if you're concerned about it being missed. If you're deploying the templates using a script, you could add it in that script too.
There's some things you probably do with nested templates, but you probably wouldn't be making anyone's life easier or making the process more reliable.
This could be achievable potentially through some powershell specifically around Resource and Resource Group. Would need to run a Get-AzResource either at the subscription or potentially just the resource group level. Then pull the ResourceType field from the object return and use a Set-AzResource command passing in the ResourceID from above and the new tag mapped to the returnedResourceType field.

How to turn off phone carrier HTML optimizations?

I made an app which provides a schedule for the pupils at my school. It gets its data from the school's online schedule service. Due to the lack for a real API, I reverse-engineerd the website: Now, the app parses it with string operations basically.
And here's the problem: The string searches do not match on certain mobile carriers' networks because they're stripping away the spaces and other foo. Is there an universal way to turn that off?
No, this is up to the carrier and even if there was a way to disable it, it would be non-standard and not worth addressing.
Additionally, you should not use string operations but a real HTML parser, like JSoup is for Java (there is a .NET port too, NSoup). If you look at the examples, it is relatively easy to use and will protect your application from space normalizations and any other change in the markup irrelevant to your application.
For data stored in inline JavaScript, you could first extract the right node from the document and then use a regex to trim the relevant parts. Or you could also use a regex on the HTML document as a whole, but remember that you can't really parse HTML using regexes.
Adopting another strategy, request pages over HTTPs rather than HTTP (if the server supports TLS/SSL) so that they can't be manipulated by the carrier.

HTML5: accessing large structured local data

Summary:
Are there good HTML5/javascript options for selectively reading chunks of data (let's say to be eventually converted to JSON) from a large local file?
Problem I am trying to solve:
Some existing program locally and outputs a ton of data. I want to provide a browser-based interactive viewer that will allow folks to browse through these results. I have control over how the data is written out. I can write it all out in one big file, but since it's quite large, I can't just read the whole thing in memory. Hence, I am looking for some kind of indexed or db-like access to this from my webapp.
Thoughts on solutions:
1. Brute-force: HTML5 FileReader API has a nice slice() method for random access. So I could write out some kind of an index in the beginning of the file, use it to look up positions of other stored objects, and read them whenever they're needed. I figured I'd ask if there are already javascript libraries that do something like this (or better) before trying to implement this ugly thing.
2. HTML5 local database. Essentially, I am looking for an analog of HTML5 openDatabase() call that would open (a read-only) connection to a database based on a user-specified local file. From what I understand, there's no way to specify a file with a pre-loaded database. Furthermore, even if there was such a hack, it's not clear whether the local file format would be the same across browsers. I've seen the phonegap solution that populates the browser local database from SQL statements. I can do that too, but the data I am talking about is quite large (5-10GB): it will take a while to load, and such duplication seems rather pointless.
HTML5 does not sound like the appropriate answer for your needs. HTML5's focus is on the client side, and based on your description you're asking a lot out of the browsers, most likely more than they can handle.
I would instead recommend you look at a server-based solution to deliver the desired goal/results to the client view, something like Splunk would be a good product to consider.

Programmatically generate high quality PDFs

Note: I realize this question has already been asked (with a ruby slant) here: Creating on-demand, print-quality PDFs (preferably in Ruby if feasible). BUT there was no decent answer IMHO.
So as you may have guessed, I am looking to find the best approach to producing HIGH QUALITY, print ready PDF documents programmatically. Our requirements need us to be able to have design documents that define place holders for dynamic content like images and text i.e. some kind of template mechanism.
The suggestion has been to use Adobe's InDesign server, but this seems like an expensive solution not to mention a little overkill for our need.
Are there any alternative, cheaper and more fitting solutions out there? The language of the solution doesn't really matter, just as long as it can be executes on a Windows box.
My suggestion would be to look at XSL-FO or thereabouts...
You create an XML doc that describes what you want and there are various libraries and toolkits (I've used XEP from RenderX) that will convert said XML into PDF.
In real terms what we did was take a large lump of data in XML format, use XSLT - templates in effect - to convert the data to formating objects which XEP renders up into something (a 500 page hotel directory with auto-generated TOC and Index) that has been consumed quite happily by at least three different commercial printers. We did some other smaller documents too from time to time.
Downside with this is that its not even remotely a WYSIWYG solution - you're effectively compiling "source code" to get PDF out the back. Upside is that the base technologies are reasonably generic even if the specific toolkits may be a bit less so.
You can convert XML templates to PDFs with Prince.
Prince is a computer program that
converts XML and HTML into PDF
documents. Prince can read many XML
formats, including XHTML and SVG.
Prince formats documents according to
style sheets written in CSS.
I have and also know many people that have had much success with ReportLab an open source Python PDF library (http://www.reportlab.org/rl_toolkit.html).
Its extremely easy to use and very quick to get started. So worth trying out.
I don't know why no one has suggested using LaTeX for this. It's an extremely popular open format for document design and not hard to set up a template that you can fill in text or image content. While the reference implementation of LaTeX runs as a standalone program, if that sounds like too many moving parts for you there are wrapper libraries for Python and other languages you can call via an API.
Java language and JasperReports
Java: iText
C#: iTextSharp
depends on what you want to publish, but take a look at Pentaho reporting
http://reporting.pentaho.org/
rinohtype is an open-source document processor that is capable of producing high-quality print-ready PDF documents. You can use one of the built-in document templates (book, article) or define your own template. The look of document elements can be configured by means of CSS-like style sheets. The contents of your document can be parsed from reStructuredText or CommonMark files, or you can build the document tree programmatically.
Full disclosure: I am the author of rinohtype.

Essential Dojo

I'm starting to use Dojo; this is (essentially) my introduction to AJAX. We have a Java backend (torque / turbine / velocity) and are using the jabsorb JSON-RPC library to bridge Java and Javascript.
What do I need to know? What is the big picture of Dojo and JSON, and what are the nasty little details that will catch me up? What did you spend a couple of days tracking down, when you started with Dojo, that you now take for granted? Thanks for any and all tips.
The first thing to do is get familiar with the Dojo Object Model. JavaScript does not have a class system so the Dojo toolkit has created a sort of "by convention" object model that works rather well but is very different to how it works in Java for example.
The reason I suggest getting familiar with it is so you can dig into the code base whenever you start experiencing issues. The documentation available has improved significantly over the past year, but every now and then I find myself having to work out a bug in my code by learning exactly how the Dojo code involved works.
Another tip is to make use of the custom build feature which will significantly improve performance once your application is ready.
As a general tip on DHTML programming, use firebug (a plug-in for Firefox). It allows JavaScript debugging, DOM inspection, HTML editing in real-time and a whole lot more. I've become totally reliant on it now when I'm working in DHTML!
Good luck!
I too just dove head first into Dojo, they have a good API documentation at http://api.dojotoolkit.org/. Even Dojo Campus has some good examples of the plug ins.
If you ask me O'Reilly's Dojo: The Definitive Guide is the best Dojo book on the market.
I also would like any tips and pointers from the Dojo masters.
Cheers
Make sure documentation you read pertains to as recent a release as possible, since a lot has changed very quickly in the Dojo architecture.
Also a great way to see how some Dojo or Dijit widget is used is to look at the source code for the tests - for example, the DataGrid has poor documentation but the tests show a lot of use cases and configurations.
Sitepen is a good resource for Dojo articles.
Also, read up on Deferred (andDeferredList), as well as hitch() - two extremely flexible and powerful features of Dojo. SitePen has a great article on demystifying Deferreds.
Check out plugd, a collection of Dojo extensions that make some things more convenient or adds some clever functionalities to the language. It's made by one of the core Dojo authors so it's rather reliable. It even brings some jQuery niceties into the framework.
Some more things: look into data stores, they're very useful and a much cleaner way to handle Ajax. DojoX has a lot of nice ones too, just remember that DojoX ranges in how well documented or how experimental the components are. Learn the differences between dojo.byId and dijit.byId, as well as the HTML attributes id versus jsId (again, Sitepen has an article).
A couple of things that caught me when I started writing widgets where:
[Understand what dojoAttachPoint, dojoAttachEvent, containerNode and widgitsInTemplate do][1]
have a firm grasp of closures,
Get your head around deferreds
understand ItemFileReadStore, ItemFileWriteStore and stores in general
You can look at stores like a ResultSet (sort of) as well you can data bind them to widgets.
With these major concepts you can start to put together some compelling applications.
Generally what I do is I build a JavaScript facade around my service calls and then I will scrub the response into a store by attaching the first callback in the facade, that call back converts the results into a store and then returns it. This allows me to not hard bind my services to Dojo constructs (so I can support mobile, etc.) while also retuning the data from the facade in a format that data aware widgets expect.
As well if you are doing Java service development you my want to look into JAX-RS. I started out using JSON-RPC which became JABS-ORB but after working with JAX-RS I prefer it, as it integrates well with JPA-EJB and JAXB.
First read how to configure Dojo in your application. Try to understand basic structure of Dojo like if we are writing dijit.form.Button or dijit/form/Button it means Button.js resides in dijit/form folder. Try to understand require, define, declare modules of Dojo. This is enough to start Dojo Toolkit.
Very important fact, indulge with your own sample project using Dojo.