How to turn off phone carrier HTML optimizations? - html

I made an app which provides a schedule for the pupils at my school. It gets its data from the school's online schedule service. Due to the lack for a real API, I reverse-engineerd the website: Now, the app parses it with string operations basically.
And here's the problem: The string searches do not match on certain mobile carriers' networks because they're stripping away the spaces and other foo. Is there an universal way to turn that off?

No, this is up to the carrier and even if there was a way to disable it, it would be non-standard and not worth addressing.
Additionally, you should not use string operations but a real HTML parser, like JSoup is for Java (there is a .NET port too, NSoup). If you look at the examples, it is relatively easy to use and will protect your application from space normalizations and any other change in the markup irrelevant to your application.
For data stored in inline JavaScript, you could first extract the right node from the document and then use a regex to trim the relevant parts. Or you could also use a regex on the HTML document as a whole, but remember that you can't really parse HTML using regexes.
Adopting another strategy, request pages over HTTPs rather than HTTP (if the server supports TLS/SSL) so that they can't be manipulated by the carrier.

Related

Can/should I use YAML as payload in RESTful webservice?

As the header says.
In general I like YAML more than JSON these days. I implemented a RESTful WS PoC back in the day using JSON. I was wondering if I can instead use YAML or not.
E.g. are there enough tools/libraries/support for doing that? Or would I end up doing quite a bit of mundane/tedious coding which I would've avoided if I were using JSON instead?
Also as I understood from WWW: REST doesn't restrict one from using YAML as the payload, is that correct?
Thanks!
Yes, if it's a goal that the data be especially readable by humans. REST itself isn't focused on protocols/formats so much as patterns.
There's not a lot to gain here for webservices however, which typically represent app to app communication. Computers don't care, and JSON can be pretty-printed to improve legibility somewhat.
YAML is well supported by mainstream languages, though not always included in standard libraries as JSON typically is. So you'll probably be looking at an additional library dependency.
Also, if the client is a browser, parsing will be slower, as you'll have to use a non-native external lib such as described here using: JavaScript YAML Parser . Make sure it gets compressed in transit or the extra indentation spaces will expand the size of the data.
Also, YAML has a lot of esoteric and downright potentially dangerous features. Whenever I'm using it I use the "safe" parser, and deactivate many if not most of its features besides data structures.
I could imagine some utility as a debug parameter however, perhaps url.yaml or …?fmt=yaml to assist during development. But, otherwise not much gain for all the trouble.

What techniques are available for programatically transforming HTML/DOM in an iOS Application?

I'm processing a variety of RSS feeds, which contain summaries, as well as the target page URL content, and trying to use a uniform transformation method.
XSLT was the first thing that occurred to me to try, as it would accomplish what I want, in a standard way, without a lot of fuss aside from adding new XSLT stylesheets to accommodate uniquely formatted sites and feed content.
Problem: XSLT libraries are considered "private" in iOS, and even linking statically against your own copy will get you rejected by the Apple Store analysis tools.
I've looked into the possibility if injecting the stylesheet and data into a UIWebView that wasn't displayed, but this seems like a really roundabout and hackish way to get at the system's underlying XSLT processor in an "approved" fashion.
What alternative techniques/libraries exist which would let me do this in a standard fashion, ie: without rolling my own.
I'm not sure I fully understand your requirements, but one possbility would be to use libxml (which is allowed in iOS) to parse the XML and if necessary manipulate the DOM. If you really need to do XML transformations this is going to be more effort than XSLT, but if you just need to extract data from the XML, that can be done fairly easily with xpath queries.
That said, I have read several people claiming they got XSLT working on iOS and had their apps approved in the app store. In particular, I've seen this stackoverflow answer claimed as a working solution by multiple people. And if that fails, another answer suggested building the libxslt library yourself with renamed symbols to bypass the app store checks. I would only suggest that as a last resort though.
You'll probably want to look into Hpple for something powerful but light weight / native. See the tutorial on getting started here: http://www.raywenderlich.com/14172/how-to-parse-html-on-ios. Good luck!
I'm going to also recommend TFHpple but I'm also going to elaborate on the solution. I've explored an app that navigates a 3rd party (well, I'm the 3rd party, they're the source but that's semantics) website/data source but there are some pitfalls. The biggest pitfall is obvious: if the data source DOM changes you need to change your app and re-release. A creative way around this would be to publish/expose a global copy of the DOM on a public server that way the end user doesn't have to update their app any time the data source changes (as long as the change isn't radical).
For instance, if your expected DOM search in TFHpple is #"//figure[#class='figure']/a" and then a week from now your data source's resource you're looking for is altered to #"//figure1[#class='figure1']/a" you just opened yourself to an App Store release... UNLESS... you publish the expected DOM searches on a web server you control in a data dictionary that your app can consume and serve out to the various DOM search elements within your app. The only problem I foresee here is that if the data source adds or removes a data element you want to consume you either have to release a build or handle the removal ahead of time (respectively).
Lastly if the data source DOM isn't well formed or consistent you may be beating your head against a wall more times than not.

Why use XML(SOAP) when JSON so simple and easy to handle?

Receiving and sending data with JSON is done with simple HTTP requests. Whereas in SOAP, we need to take care of a lot of things. Parsing XML is also, sometimes, hard. Even Facebook uses JSON in Graph API. I still wonder why one should still use SOAP? Is there any reason or area where SOAP is still a better option? (Despite the data format)
Also, in simple client-server apps (like Mobile apps connected with a server), can SOAP give any advantage over JSON?
I will be very thankful if someone can enlist the major/prominent differences between JSON and SOAP considering the information I have provided(If there are any).
I found the following on advantages of SOAP:
There is one big reason everyone sticks with SOAP instead of using JSON. With every JSON setup, you're always coming up with your own data structure for each project. I don't mean how the data is encoded and passed, but how the data formatted format is defined, the data model.
SOAP has an industry-mature way of specifying that data will be in a certain format: e.g. "Cart is a collection of Products and each Product can have these attributes, etc." A well put together WSDL document really has this nailed. See W3C specification: Web Services Description Language
JSON has similar ways of specifying this data structure — a JavaScript class comes to mind as the most common way of doing this — but a JavaScript class isn't really a data structure used for this purpose in any kind of agnostic, well established, widely used way.
In short, SOAP has a way of specifying the data structure in a maturely formatted document (WSDL). JSON doesn't have a standard way of doing this.
If you are creating a client application and your server implementation is done with SOAP then you have to use SOAP in client side.
Also, see: Why use SOAP over JSON and custom data format in an “ENTERPRISE” application? [closed]
Nowadays SOAP is a complete overkill, IMHO. It was nice to use it, nice to learn it, and it is beautiful we can use JSON now.
The only difference between SOAP and REST services (no matter whether using JSON) is that SOAP WS always has it's own WSDL document that could be easily transformed into a self-descriptive documentation while within REST you have to write the documentation for yourself (at least to document the data structures). Here are my cons'&'pros for both:
REST
Pros
lightweight (in all means: no server- nor client-side extensions needed, no big chunks of XML are needed to be transfered here and there)
free choice of the data format - it's up on you to decide whether you can use plain TXT, JSON, XML, or even create you own format of data
most of the current data formats (and even if used XML) ensures that only the really required amount of data is transfered over HTTP while with SOAP for 5 bytes of data you need 1 kB of XML junk (exaggerated, ofc, but you got the point)
Cons
even there are tools that could generate the documentation from docblock comments there is need to write such comments in very descriptive way if one wants to achieve a good documentation as well
SOAP
Pros
has a WSDL that could be generated from even basic docblock comments (in many languages even without them) that works well as a documentation
even there are tools that could work with WSDL to give an enhanced try this request interface (while I do not know about any such tool for REST)
strict data structure
Cons
strict data structure
uses an XML (only!) for data transfers while each request contains a lot of junk and the response contains five times more junk of information
the need for external libraries (for client and/or server, though nowadays there are such libraries already a native part of many languages yet people always tend to use some third-party ones)
To conclude, I do not see a big reason to prefer SOAP over REST (and JSON). Both can do the same, there is a native support for JSON encoding and decoding in almost every popular web programming language and with JSON you have more freedom and the HTTP transfers are cleansed from lot of useless information junk. If I were to build any API now I would use REST with JSON.
I disagree a bit on the trend of JSON I see here. Although JSON is an order maginitude easier, I'd venture to say it's quite limited. For example, SOAP WS is not the last thing. Indeed, between soap client/server you now have enterprise services bus, authentification scheme based on crypto, user management, timestamping requests/replies, etc. For all of this, there're some huge software platforms that provide services around SOAP (well, "web services") and will inject stuff in your XML. So although JSON is probably enough for small projects and an order of magnitude easier there, I think it becomes quite limited if you have decoupled transmission control and content (ie. you develop the content stuff, the actual server, but all the transmission is managed by another team, the authentification by one more team, deployment by yet another team). I don't know if my experience at a big corp is relevant, but I'd say that JSON won't survive there. There are too many constraints on top of the basic need of data representation. So the problem is not JSON RPC itself, the problem is it misses the additional tools to manage the complexity that arises in complex applications (not to say that what you do is not complex, it's just that the software reflects the complexity of the company that produces it)
I think there is a lot of basic misinformation on this thread. SOAP, REST, XML, and JSON concepts seem to be mixed up in the responses.
Here is some clarification -
XML and JSON (an others) are encodings of information.
SOAP is a communications protocol
REST is an (Architecture) style
each is used for something different although you might use more than one of these things together.
Lets start with encoding data structures as XML vs JSON:
Everything JSON currently supports can be done in XML, but not the other way around. JSON will eventually adopt all the features that XML has, but its proponents haven't encountered all of the problems yet, once they get more experience things will be added on to close the gap. for example JSON didn't start out with Schemas and binary formats.
SOAP is a communication protocol for calling an operation. It runs on top of things like, HTTP, SMTP, etc. Aside from many other features, SOAP messages can span multiple "application" layer protocols. i.e. i can sent a SOAP message by HTTP to a service endpoint which then puts it on a message queue for another system. SOAP solves the problem of maintaining authentication, message authenticity, etc. as the requested moved between different parts of a distributed system.
JSON and other data formats canbe sent via SOAP. I work with some systems that sent binary fixed-width encoded objects via SOAP, its not a problem.
The analogy is that - if only the postman is allowed to send you a letter, then it is just HTTP, but if anyone can send you a letter, then you want SOAP. (i.e. message transport security vs message content security)
the 6 REST constraints are architectural style. Interestingly the first several years of REST the examples were in SOAP. (there is no such thing as REST or SOAP they are not opposites)
A "heavyweight bloated, etc.etc." SOA SOAP system might have monoliths with operations like GET, PUT, POST instances of a single entity. SOAP doesn't have those operations predefined, but that is typically how it is used.
Consider that if you built a "REST" service on HTTP alone with an SSL/TLS terminating proxy, then you may have violated the 4th constraint of REST.
So for your software development today, you wouldn't normally interact with any of these directly. Just as if you were written a graphics program you wouldn't directly work with HDMI vs. DisplayPort typically.
The question is do you understand architecturally what your system needs to do and configure it to use the mechanism that does that job. (for example, all the challenges of applying today's microservices to general systems are old problems previously solved by SOAP, CORBA and the old protocols)
I have spent several years writing SOAP web services (with JAX WS). They are not hard to write. And I love the idea of a single endpoint and single HTTP method (POST). For me, REST is too verbose.
But as a data container, JSON is simpler, smaller, more readable, more flexible, looks closer to programming languages.
So, I reinvented the wheel and created my own approach to writing backends for AJAX requests. In comparison:
REST:
get user: method GET https://example.com/users/{id}
update user: method POST https://example.com/users/ (JSON with User object in request body)
RPC:
get user: method GET https://example.com/getUser?id=1
update user: method POST https://example.com/updateUser (JSON with User object in the request body)
My way (the proposed name is JOH - JSON over HTTP):
get user: method POST https://example.com/ (JSON specifies both user ID and class/method responsible for handling request)
update user: method POST https://example.com/ (JSON specifies both user object and class/method responsible for handling request)

What is the exact use of JSON?

Hi i recently found that JSON is been used in many areas. In COMET techniques and as well as in Google instant. Wiki says that:
JSON (an acronym for JavaScript Object
Notation pronounced /ˈdʒeɪsən/) is a
lightweight text-based open standard
designed for human-readable data
interchange....
I was shocked after seeing the word's human-readable data interchange, and I was thinking: since the whole internet is using techniques to increase their security, then why such JSON techniques should be used to exchange data's, since any human eye can see and can read too?
Or else, JSON is very secure, then how?
And if my thought is incorrect then correct me.
If a binary format were used, it wouldn't provide any advantage in security (since it would still be machine-readable and open-spec'd - otherwise it wouldn't have any use for information exchange), and it would make debugging more complicated.
Security is not achieved by the obscurity of interchange formats, but with cryptography. Once you are on an SSL tunnel, you can send the data in whatever format you like most - JSON included - and it will be secure.
Notice that the same applies to any other communication on the web: even HTML is "almost" human readable, and still it's used even for very private communications (e.g. home banking, ...) by encrypting it while it's on the untrusted path with HTTPS.
JSON data is used to put information on a web page. It is human-readable because Web pages are meant to be readable by humans. Web pages are also, by their nature, not secure on the client side, so developers who need to hide certain information either process that on the server or use a secure session.

How can I extract addresses and phone number from HTML?

Is there a library that specializes in parsing such data?
You could use something like Google Maps. Geocode the address and, if successful, Google's API will return an XML representation of the address with all of the elements separated (and corrected or completed).
EDIT:
I'm being voted down and not sure why. Parsing addresses can be a little difficult. Here's an example of using Google to do this:
http://blog.nerdburn.com/entries/code/how-to-parse-google-maps-returned-address-data-a-simple-jquery-plugin
I'm not saying this is the only way or necessarily the best way. Just a way to parse addresses on a web site.
There are 2 parts to this: extract the complete address from the page, and parse that address into something you can use (store the various parts in a DB for example).
For the first part you will need a heuristic, most likely country-dependant: for US addresses [A-Z][A-Z],?\s*\d\d\d\d\d should give you the end of an address, provided the 2 letters turn out to be a state. Finding the beginning of the string is left as an exercise.
The second part can be done either through a call to Google maps, or as usual in Perl, using a CPAN module: Lingua::EN::AddressParse (test it on your data to see if it works well enough for you).
In any case this is a difficult task, and you will most likely never get it 100% right, so plan for manually checking the addresses before using them.
You don't need regular expressions (yet) or a general parser like pyparsing (at all). Look at something like Beautiful Soup, which will parse even bad HTML into something like a tree of tags. From there, you can look at the source of the page, and find out what tags to drill down through to get to the data. Then, from Beautiful Soup's tree, you can search for these nodes using XPath (in recent versions), and directly loop over the tags you're interested in, getting to the actual data easily. From there, you can parse the data using a quick regex or something. This will be more flexible and more future proof, and also possibly less head-exploding, than just trying to do it in pure regular expressions.