Bug hunt: CLDR 30 JSON data no longer has currencySpacing information

Bug hunt: CLDR 30 JSON data no longer has currencySpacing information - json

We've been using jquery/globalize in our web application with the CLDR 29 data in JSON format without any problems. Just recently, Unicode released CLDR 30 (and shortly after, version 30.0.1 with some fixes).
When we upgrade to the CLDR 30(.0.1) data, our client-side currency-formatting tests are failing because, for many cultures, the "currencySpacing" information in numbers.json is no longer there. For example, assuming the culture ar-AE, the Globalize library tries to load CLDR data at the path...
/main/ar-AE/numbers/currencyFormats-numberSystem-arab/currencySpacing/beforeCurrency
...which doesn't exist in the latest CLDR 30 numbers.json data for this (and many other) cultures.
We've been trying to traverse the stack to see what's causing this problem. We started with the DTD. The DTD for CLDR 30 (along with that for CLDR 29) includes the line...
<!ELEMENT currencyFormats ( alias | ( default*, currencySpacing*, currencyFormatLength*, unitPattern*, special* ) ) >
...which implies that the currencySpacing is an optional element. That said, we can't find anything in the CLDR 30 release notes or Delta that suggests that this information was changed for a large number of cultures.
In the XML data (the "ground truth") we see that the currencySpacing element is only used in main/root.xml in both CLDR 29 and CLDR 30, i.e. apparently no significant changes in this respect in the XML.
This makes us wonder if it's a problem in the tool that's used for generating the JSON data from the XML data. The tool is called ldml2json and is also used by the cldr-json project. To rule out a bug in the cldr-json project, we built the tool ourselves and generated the JSON data ourselves. This generated data was then also missing the "currencySpacing" information in the numbers.json files. So it doesn't seem to be an issue with the cldr-json project.
If we understand correctly, this means that the problem is either:
The ldml2json tool has a bug
jquery/globalize is incorrect to assume that this information always exists
If the latter is true, then I guess this should be raised as a jquery/globalize bug. Investigating the former would require us to debug from source presumably. Before we go investing time in either, we wanted to ask: Is anybody else seeing this problem, and is there any known solution? Our hope is that there's someone out there who's a bit more experienced in the CLDR+JSON+Globalize stack that can help us out!

This was caused by this change: http://unicode.org/cldr/trac/changeset/12636/trunk/common/main/root.xml
Before this change, the root locale's currencySpacing information for the arab number system was inherited by all the other locales. Now it's no longer there.
I'm not sure how the missing currencySpacing should be handled, but the java and C documentation both state that the data can be null.
Both appear to use a hard-coded default in that case: http://bugs.icu-project.org/trac/browser/icu4j/trunk/main/classes/core/src/com/ibm/icu/impl/CurrencyData.java#L86
So this is probably a bug in globalize.
Update: Bug report and pull request.

The recent CLDR 30.0.2 patch fixed this bug, so no updates needed for Globalize.
By the way, follow the UTS#35 specification for deeper investigation :)
http://www.unicode.org/reports/tr35/tr35-numbers.html#Currencies

Related

How to find pink highlighted json errors in sublime text

I'm working with a large json file.
This json has been parsed by myself using Python, and (as a result) there are some json validation errors at different points in the file. I want to identify these errors in order to improve my Python parser.
Sublime text (2) helpfully highlights in pink formatting errors in the json, however working my way through 70,000,000 lines of json to find these errors is somewhat challenging.
Is there any way to skip to pink highlighted errors in the json?
(Note: the json file is sufficiently large that trying to use an online validator for example is not possible)
Thanks!

This can be done in a fancy way using a plugin, but for your purposes probably the best way is to just enter a command into the console. Open your JSON file with errors in it, then open the console with Ctrl`. Paste in the following code and hit Enter:
view.show_at_center(view.find_by_selector("invalid.illegal")[0])
and the view will scroll to show the first error in the file. Fix that error, click back on the console entry line, hit the up arrow to bring back the command you just ran, and hit Enter again, and it should scroll to the next error, and so on. When there are no more errors, you'll get IndexError: list index out of range printed to the console, and the view won't scroll any more.
While this will work in both Sublime Text 2 and 3, I strongly urge you to upgrade to ST3 if at all possible. ST2 has been shelved and deprecated, and there will be no more bug fixes released. Development is now focused solely on ST3 (as well as being in the planning stages for ST4!). "I don't know of any good reason to not use Sublime Text 3" - Will Bond, ST core developer.
There are a ton of new features and bug fixes in the new version, even if you're just using the public beta. (BTW, don't let the word "beta" fool you - the program is rock solid, and has been for years.) If you want more cutting-edge features, and are a registered user (which you should be if you are using the program long-term or for commercial purposes), you can download the dev builds which are updated more frequently, but run the slight chance of having an undetected bug or two.
One of the major advantages of ST3 is that it now supports a new, YAML-based sublime-syntax highlighting engine, which allows for much greater flexibility than the old .tmLanguage highlighting files (which are still supported). Related to that, the syntax files have all been open-sourced and development is proceeding very rapidly on them, even though it's been a few months since the last build was released.
Probably the biggest reason to upgrade is the plugin community. The internal Python API has been updated to Python 3 (3.3.6, to be precise), which had the side effect of making many old plugins incompatible. Except in a few rare cases, most plugins now support ST3, and many are dropping ST2 support by the wayside as it becomes too difficult to maintain two codebases, as well as trying to develop with the much more limited API ST2 provides. So, unless you absolutely depend on an old ST2-only plugin that can't be ported, upgrading is definitely the best path to take.

What techniques are available for programatically transforming HTML/DOM in an iOS Application?

I'm processing a variety of RSS feeds, which contain summaries, as well as the target page URL content, and trying to use a uniform transformation method.
XSLT was the first thing that occurred to me to try, as it would accomplish what I want, in a standard way, without a lot of fuss aside from adding new XSLT stylesheets to accommodate uniquely formatted sites and feed content.
Problem: XSLT libraries are considered "private" in iOS, and even linking statically against your own copy will get you rejected by the Apple Store analysis tools.
I've looked into the possibility if injecting the stylesheet and data into a UIWebView that wasn't displayed, but this seems like a really roundabout and hackish way to get at the system's underlying XSLT processor in an "approved" fashion.
What alternative techniques/libraries exist which would let me do this in a standard fashion, ie: without rolling my own.

I'm not sure I fully understand your requirements, but one possbility would be to use libxml (which is allowed in iOS) to parse the XML and if necessary manipulate the DOM. If you really need to do XML transformations this is going to be more effort than XSLT, but if you just need to extract data from the XML, that can be done fairly easily with xpath queries.
That said, I have read several people claiming they got XSLT working on iOS and had their apps approved in the app store. In particular, I've seen this stackoverflow answer claimed as a working solution by multiple people. And if that fails, another answer suggested building the libxslt library yourself with renamed symbols to bypass the app store checks. I would only suggest that as a last resort though.

You'll probably want to look into Hpple for something powerful but light weight / native. See the tutorial on getting started here: http://www.raywenderlich.com/14172/how-to-parse-html-on-ios. Good luck!

I'm going to also recommend TFHpple but I'm also going to elaborate on the solution. I've explored an app that navigates a 3rd party (well, I'm the 3rd party, they're the source but that's semantics) website/data source but there are some pitfalls. The biggest pitfall is obvious: if the data source DOM changes you need to change your app and re-release. A creative way around this would be to publish/expose a global copy of the DOM on a public server that way the end user doesn't have to update their app any time the data source changes (as long as the change isn't radical).
For instance, if your expected DOM search in TFHpple is #"//figure[#class='figure']/a" and then a week from now your data source's resource you're looking for is altered to #"//figure1[#class='figure1']/a" you just opened yourself to an App Store release... UNLESS... you publish the expected DOM searches on a web server you control in a data dictionary that your app can consume and serve out to the various DOM search elements within your app. The only problem I foresee here is that if the data source adds or removes a data element you want to consume you either have to release a build or handle the removal ahead of time (respectively).
Lastly if the data source DOM isn't well formed or consistent you may be beating your head against a wall more times than not.

Can't get MVVMCross core project to reference System.Xml.Linq

I'm quite new to MVVMCross but I've been actively using it for two weeks, at work and in a school project, and I am really enjoying it! Unfortunately, I've been stuck on the school project for 2 days now : we're asked to do a mobile Jabber client. This is not a big deal since I started it using Matrix XMPP library, which does most of the job and is easy to use. I decided to restart my project using MVVMCross, in order to have cleaner separated code and add a Windows Phone project, but Matrix absolutely needs System.Xml.Linq, and I can't get the core PCL to compile :
The type 'System.Xml.Linq.XElement' is defined in an assembly that is not referenced.
You must add a reference to assembly 'System.Xml.Linq, Version=2.0.5.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35'
As shown in Stuart Lodge's tutorial videos, I'm using profile 104, the the faulting dll is really present in the folder, I can't add it manually to project's references since VS prevents me from doing it (gently explaining that it's automatically loaded since .Net portable subset is included in references), I've updated and repaired my VS install "just in case"... and have no more idea left.
So, here are the questions :
is it really possible to use System.Xml.Linq with MVVMCross? or did I miss the big title explaining that what I'm trying to do is stupid?
if yes (that'd be great!) did/does someone experience the same problem? Even more interesting : did someone find a solution?
Thanks in advance!
Additional info : Windows8(x64), VS2012 Ultimate, trial license (school project...) for Xamarin.Android
UPDATE : following Stuart's answer, I compiled and ran the BestSellers sample, which uses System.Xml.Linq... without any problem. As it comes with an explicit reference to System.Xml.Linq (see first link in answer), I tried :
to delete it (and a few others) : VS holds it's promises, and really includes needed references as long as .Net Portable Subset is referenced, so everything rolls smooth.
to manually add this reference via Notepad to my .csproj : it doesn't change anything.
One thing tickles me in Stuart's answer : "perhaps it is something to do with the way the matrix uses XML.linq". Since the Matrix type I'm trying to use is just a descendant of System.Xml.Linq.XElement, which is widely used in BookViewModel.cs from sample, what could possibly be wrong with that?
"Solution" : The problem seems to be due to Matrix requiring a special version of System.Xml.Linq, which is not the one included when profile 104 for building PCL. I used file linking method as a workaround to share the core, and that works, though this is less elegant, readable, and harder to maintain...

Yes it is possible to use at least some of System.Xml.Linq
For example, see the BestSellers sample
csproj file - https://github.com/slodge/MvvmCross-Tutorials/blob/master/Sample%20-%20BestSellers/BestSellers/BestSellers/BestSellers.csproj#L49
example XML linq use - https://github.com/slodge/MvvmCross-Tutorials/blob/master/Sample%20-%20BestSellers/BestSellers/BestSellers/ViewModels/BookViewModel.cs#L44
For the problem you are seeing, I'm really not sure what the error is - perhaps it is something to do with the way the matrix uses XML.linq? You might have more luck of you open up this question to other tags like portable-class-library, XML-linq and windows-phone.

Does a JSON-RPC server exist for answering general Prolog queries?

I saw this tutorial for writing a JSON-RPC server for SWI-Prolog. Unfortunately, all it does is add two numbers. I'm wondering if there exists a RPC server for SWI-Prolog that can define new rules and answer general Prolog queries, returning JSON lists, etc?

When you take a tour on SWI-Prolog website, proudly self-powered, you can see at work some of the features offered by http package.
It's a fairly large range of tools, and to grasp the basic of the system, the easiest way it's to follow the specific How to section, step by step. There is a small bug you should be aware in the LOD Crawler: add an option on line 42 of lod.pl:
...
; rdf_load(URI2, [format(xml)]),
....
or you will probably get
Internal server error
Domain error: content_type' expected, found text/xml;charset=UTF-8'
when running the sample.
An important feature of the IDE it's the ability to debug the HTTP requests.
When done with the HowTo, you can take a look to Cliopatria, dedicated to interfacing RDF to HTML. It come with a pirates demo, I must say I find it a bit too 'crude' for my taste, and I don't know about YUI, used in the award winning MultimediaN project. Then I've used Bootstrap to gain a modern look for the front end, with appreciable result (I'm sorry I can't - yet - publish it, need more time to engineering the system).
HTH

Parsing language for both binary and character files

The problem:
You have some data and your program needs specified input. For example strings which are numbers. You are searching for a way to transform the original data in a format you need.
And the problem is: The source can be anything. It can be XML, property lists, binary which
contains the needed data deeply embedded in binary junk. And your output format may vary
also: It can be number strings, float, doubles....
You don't want to program. You want routines which gives you commands capable to transform the data in a form you wish. Surely it contains regular expressions, but it is very good designed and it offers capabilities which are sometimes much more easier and more powerful.
ADDITION:
Many users have this problem and hope that their programs can convert, read and write data which is given by other sources. If it can't, they are doomed or use programs like business
intelligence. That is NOT the problem.
I am talking of a tool for a developer who knows what is he doing, but who is also dissatisfied to write every time routines in a regular language. A professional data manipulation tool, something like a hex editor, regex, vi, grep, parser melted together
accessible by routines or a REPL.
If you have the spec of the data format, you can access and transform the data at once. No need to debug or plan meticulously how to program the transformation. I am searching for a solution because I don't believe the problem is new.
It allows:
joining/grouping/merging of results
inserting/deleting/finding/replacing
write macros which allows to execute a command chain repeatedly
meta-grouping (lists->tables->n-dimensional tables)
Example (No, I am not looking for a solution to this, it is just an example):
You want to read xml strings embedded in a binary file with variable length records. Your
tool reads the record length and deletes the junk surrounding your text. Now it splits open
the xml and extracts the strings. Being Indian number glyphs and containing decimal commas instead of decimal points, your tool transforms it into ASCII and replaces commas with points. Now the results must be stored into matrices of variable length....etc. etc.
I am searching for a good language / language-design and if possible, an implementation.
Which design do you like or even, if it does not fulfill the conditions, wouldn't you want to miss ?
EDIT: The question is if a solution for the problem exists and if yes, which implementations are available. You DO NOT implement your own sorting algorithm if Quicksort, Mergesort and Heapsort is available. You DO NOT invent your own text parsing
method if you have regular expressions. You DO NOT invent your own 3D language for graphics if OpenGL/Direct3D is available. There are existing solutions or at least papers describing the problem and giving suggestions. And there are people who may have worked and experienced such problems and who can give ideas and suggestions. The idea that this problem is totally new and I should work out and implement it myself without background
knowledge seems for me, I must admit, totally off the mark.
UPDATE:
Unfortunately I had less time than anticipated to delve in the subject because our development team is currently in a hot phase. But I have contacted the author of TextTransformer and he kindly answered my questions.
I have investigated TextTransformer (http://www.texttransformer.de) in the meantime and so far I can see it offers a complete and efficient solution if you are going to parse character data.
For anyone who will give it a try to implement a good parsing language, the smallest set of operators to directly transform any input data to any output data if (!) they were powerful enough seems to be:
Insert/Remove: Self-explaining
Group/Ungroup: Split the input data into a set of tokens and organize them into groups
and supergroups (datastructures, lists, tables etc.)
Transform
Substituition: Change the content of the tokens (special operation: replace)
Transposition: Change the order of tokens (swap,merge etc.)

Have you investigated TextTransformer?
I have no experience with this, but it sounds pretty good and the author makes quite competent posts in the comp.compilers newsgroup.
You still have to some programming work.

For a programmer, I would suggest:
Perl against a SQL backend.
For a non-programmer, what it sounds like you're looking for is some sort of business intelligence suite.

This suggestion may broaden the scope of your search too much... but here it is:
You could either reuse, as-is, or otherwise get "inspiration" from the [open source] code of the SnapLogic framework.
Edit (answering the comment on SnapLogic documentation etc.)
I agree, the SnapLogic documentation leaves a bit to be desired, in particular for people in your situation, i.e. when just trying to quickly get an overview of what SnapLogic can do, and if it would generally meet their needs, without investing much time and learn the system in earnest.
Also, I realize that the scope and typical uses of of SnapLogic differ, somewhat, from the requirements expressed in the question, and I should have taken the time to better articulate the possible connection.
So here goes...
A salient and powerful feature of SnapLogic is its ability to [virtually] codelessly create "pipelines" i.e. processes made from pre-built components;
Components addressing the most common needs of Data Integration tasks at-large are supplied with the SnapLogic framework. For example, there are components to
read in and/or write to files in CSV or XML or fixed length format
connect to various SQL backends (for either input, output or both)
transform/format [readily parsed] data fields
sort records
join records for lookup and general "denormalized" record building (akin to SQL joins but applicable to any input [of reasonnable size])
merge sources
Filter records within a source (to select and, at a later step, work on say only records with attribute "State" equal to "NY")
see this list of available components for more details
A relatively weak area of functionality of SnapLogic (for the described purpose of the OP) is with regards to parsing. Standard components will only read generic file formats (XML, RSS, CSV, Fixed Len, DBMSes...) therefore structured (or semi-structured?) files such as the one described in the question, with mixed binary and text and such are unlikely to ever be a standard component.
You'd therefore need to write your own parsing logic, in Python or Java, respecting the SnapLogic API of course so the module can later "play nice" with the other ones.
BTW, the task of parsing the files described could be done in one of two ways, with a "monolithic" reader component (i.e. one which takes in the whole file and produces an array of readily parsed records), or with a multi-component approach, whereby an input component reads in and parse the file at "record" level (or line level or block level whatever this may be), and other standard or custom SnapLogic components are used to create a pipeline which effectively expresses the logic of parsing a record (or block or...) into its individual fields/attributes.
The second approach is of course more modular and may be applicable if the goal is to process many different files format, whereby each new format requires piecing together components with no or little coding. Whatever the approach used for the input / parsing of the file(s), the SnapLogic framework remains available to create pipelines to then process the parsed input in various fashion.
My understanding of the question therefore prompted me to suggest SnapLogic as a possible framework for the problem at hand, because I understood the gap in feature concerning the "codeless" parsing of odd-formatted files, but also saw some commonality of features with regards to creating various processing pipelines.
I also edged my suggestion, with an expression like "inspire onself from", because of the possible feature gap, but also because of the relative lack of maturity of the SnapLogic offering and its apparent commercial/open-source ambivalence.
(Note: this statement is neither a critique of the technical maturity/value of the framework per-se, nor a critique of business-oriented use of open-source, but rather a warning that business/commercial pressures may shape the offering in various direction)
To summarize:
Depending on the specific details of the vision expressed in the question, SnapLogic may be worthy of consideration, provided one understands that "some-assembly-required" will apply, in particular in the area of file parsing, and that the specific shape and nature of the product may evolve (but then again it is open source so one can freeze it or bend it as needed).
A more generic remark is that SnapLogic is based on Python which is a very swell language for coding various connectors, convertion logic etc.

In reply to Paul Nathan you mentioned writing throwaway code as something rather unpleasant. I don't see why it should be so. After all, all of our code will be thrown away and replaced eventually, no matter how perfect we wrote it. So my opinion is that writing throwaway code is pretty much ok, if you don't spend too much time writing it.
So, it seems that there are two approaches to solving your solution: either a) find some specific tool intended for the purpose (parse data, perform some basic operations on it and storing it in some specific structure) or b) use some general purpose language with lots of libraries and code it yourself.
I don't think that approach a) is viable because sooner or later you'll bump into an obstacle not covered by the tool and you'll spend your time and nerves hacking the tool, or mailing the authors and waiting for them to implement what you need. I might as well be wrong, so please if you find a perfect tool, drop here a link (I myself am doing lots of data processing in my day job and I can't swear that I couldn't do it more efficiently).
Approach b) may at first seem "unpleasant", but given a nice high-level expressive language with bunch of useful libraries (regexps, XML manipulation, creating parsers...) it shouldn't be too hard, and may be gradually turned into a DSL for the very purpose. Beside Perl which was already mentioned, Python and Ruby sound like good candidates for these languages (I bet some Lisp derivatives too, but I have no experience there).

You might find AntlrWorks useful if you go so far as defining formal grammars for what you're parsing.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008