How to Convert iXBRL to XBRL - xbrl

We have already been using Dragon View XBRL Parser to read out Tables, paragraphs and other content from XBRL documents. Now that more companies are switching over to file/report their Financial documents in iXBRL instead of XBRL, we have to write/have a new parser for iXBRL to read out its contents. Instead if we can have a mechanism to convert iXBRL documents to XBRL, we would still be able to use the existing parser with little changes to process iXBRL documents
In XBRL: instance document is separate and independent of rendering document
In iXBRL: instance document is integrated inline in rendering document
My Question is: Is there any known/easy way to convert an iXBRL document to XBRL.
Many know what an XBRL document is.
To know more details about iXBRL document read here: http://www.xbrl.org/Specification/inlineXBRL/CR-2009-11-16/inlineXBRL-background-CR-2009-11-16.html
Differences between XBRL and iXBRL: http://www.datatracks.co.uk/ixbrl-blog/what-is-ixbrl/

There is an open source xslt based converter: https://sourceforge.net/projects/inlinexbrl/
But it hasn't been maintained for a long time and would need some updates to support the latest ixbrl version. Still, its a starting point.

Related

Generating ixbrl file from json and xlsx

please my first question here,
I am working on a project on an accounting site to help generate an ixbrl file from the account details which are in json and xlsx format.
Please has anyone worked with something similar that can put me through on how to go about it.
Welcome #Abiola Aribisala.
An ixbrl file, also known as Inline XBRL or the XHTML syntax of XBRL, requires two things:
The "print friendly" part, in "raw" XHTML, that a human user can look at;
Extra tags within this XHTML (they are in a namespace specific to XBRL), which are the machine-readable part.
Thus, in order to produce Inline XBRL syntax, you first need to have a print friendly version in a format that can be converted to XHTML (like Word, etc), as this cannot be automated just reading from JSON. I imagine that if the Excel file is nicely formatted, it might be possible to convert it to some "raw" XHTML in some way, too.
Second, for the tags, you need a data source with all the contexts, characteristics, etc for each fact value. If your JSON data is in xBRL-JSON format, it should contain this information. Otherwise, it requires extra work.
Finally, a challenge is knowing to put which tag where in XHTML, i.e. "merging" the print version with the data. In a regular setup, this comes from a common source that both generated the print version and the machine-readable data. That way, this common source can directly generate the Inline XBRL file and it is best for quality and correctness.
If the binding between the print version and the data is not available, one could in theory put all the tags in an ix-hidden section in XHTML, however it defeats the purpose of tagging the data exactly where it is on the XHTML page, i.e., it makes it less interactive.

Can CouchDB natively convert plain text to json format?

I'm aware that there are python and powershell methods to convert plain text files, csv's etc.... into json format for upload into NoSQL DBs such as CouchDB.
But according to the CouchDB definitive guide, it makes it seems like there is a native built in way of doing this kind of conversion, without the need for a 3rd party tool.
This older thread appears to hint at this:
Filter and update functions in CouchDB?
This part in particular:
There are other design document functions that are being introduced at the >time of this writing, including _update and _filter that we aren’t covering in >depth here. Filter functions are covered in Chapter 20, Change Notifications. >Imagine a web service that POSTs an XML blob at a URL of your choosing when >particular events occur. PayPal’s instant payment notification is one of >these. With an _update handler, you can POST these directly in CouchDB and it >can parse the XML into a JSON document and save it. The same goes for CSV, >multi-part form, or any other format.
But when I dig deeper I don't find anything concrete.
The supporting wiki link is not clear to me (a beginner with json/NoSQL/curl stuff: http://wiki.apache.org/couchdb/Document_Update_Handlers
Hopefully this is a simple yes/no. And any links to help on this that is better than the above link also appreciated, thank you!
CouchDB supports transforming the internal documents/views into many other formats through the use of show and list functions. It's not a "native" transformation, as you define the transformation yourself, it's not magical.
That being said, there is not a similar mechanism for the reverse (ie: converting some arbitrary format into JSON documents) but you're much better off scripting that with a full-featured language/script and using the bulk docs API to do your imports in batch.

JSON and HTML trying to understand

According to a post on Stackflow.com called “what’s is JSOn and why would I use it? “web services used XML as their primary data format for transmitting back data, but since JSON appeared, it is preferred method.” Why do must web services use JSON over XML, is because it’s a better method for interchanging?
XML was designed primarily for document formats, e.g. papers in scientific journals. It contains many features that aren't needed for simple data interchange, and these features can get in the way when you are processing XML because they can't be easily represented in Javascript. So the code for processing the XML ends up a lot more complicated than it could be. By contrasts, JSON has an exact match to the data structures Javascript can handle natively. Of course, that problem could in principle be solved by using a language with better XML support than JavaScript - XSLT, for example - but unfortunately XSLT in the browser has never had the same level of investment put into it.
Additionally, for reasons I have never understood, the browser security folks decided that reading JSON from alien web sites (i.e. from a different domain from your HTML page) is safe, but reading XML from alien sites isn't. So if you switch from XML to JSON, you get rid of a lot of cross-site-scripting hassle.
JSON is less verbose and it is sufficient for simple data transmission, i.e. if you do not need any transformations (XSLT).

Parse HTML to XML

I am trying to figure out how to parse HTML to XML, but I cannot figure it out. I want to use the MSXML2.ServerXMLHTTP object (in an .asp file).
<%
url = "http://www.website.com/file.asp"
set xmlhttp = CreateObject("MSXML2.ServerXMLHTTP")
xmlhttp.open "POST", url, false
xmlhttp.send
Response.write xmlhttp.responseText
set xmlhttp = nothing
%>
This gives me the text, but I really don't know where to go from here.
I think problem is in HEAD of HTML file.
From MSDN: resonse should return XML ("text/xml"), but your http://www.website.com/file.asp returns HTML content, with ("text/html") mime type.
Native XML Extensions
I prefer using one of the native XML extensions since they come bundled with PHP, are usually faster than all the 3rd party libs and give me all the control I need over the markup.
DOM
The DOM extension allows you to operate on XML documents through the DOM API with PHP 5. It is an implementation of the W3C's Document Object Model Core Level 3, a platform- and language-neutral interface that allows programs and scripts to dynamically access and update the content, structure and style of documents.
DOM is capable of parsing and modifying real world (broken) HTML and it can do XPath queries. It is based on libxml.
It takes some time to get productive with DOM, but that time is well worth it IMO. Since DOM is a language-agnostic interface, you'll find implementations in many languages, so if you need to change your programming language, chances are you will already know how to use that language's DOM API then.
A basic usage example can be found in grabbing the href attribute of an A element and a general conceptual overview can be found at DOMDocument in PHP.
How to use the DOM extension has been covered extensively on StackOverflow, so if you choose to use it, you can be sure most of the issues you run into can be solved by searching/browsing StackOverflow.
XMLReader
The XMLReader extension is an XML pull parser. The reader acts as a cursor going forward on the document stream and stopping at each node on the way.
XMLReader, like DOM, is based on libxml. I am not aware of how to trigger the HTML Parser Module, so chances are using XMLReader for parsing broken HTML might be less robust than using DOM where you can explicitly tell it to use libxml's HTML parser module.
A basic usage example can be found at getting all values from h1 tags using PHP.
XML Parser
This extension lets you create XML parsers and then define handlers for different XML events. Each XML parser also has a few parameters you can adjust.
The XML Parser library is also based on libxml, and implements a SAX style XML push parser. It may be a better choice for memory management than DOM or SimpleXML, but will be more difficult to work with than the pull parser implemented by XMLReader.
SimpleXml
The SimpleXML extension provides a very simple and easily usable toolset to convert XML to an object that can be processed with normal property selectors and array iterators.
SimpleXML is an option when you know the HTML is valid XHTML. If you need to parse broken HTML, don't even consider SimpleXml because it will choke.
A basic usage example can be found at A simple program to CRUD node and node values of xml file and there is lots of additional examples in the PHP manual.
3rd Party Libraries (libxml based)
If you prefer to use a 3rd-party lib, I'd suggest using a lib that actually uses DOM/libxml underneath instead of string parsing.
FluentDom - Repo
FluentDOM provides a jQuery-like fluent XML interface for the DOMDocument in PHP. Selectors are written in XPath or CSS (using a CSS to XPath converter). Current versions extend the DOM implementing standard interfaces and add features from the DOM Living Standard. FluentDOM can load formats like JSON, CSV, JsonML, RabbitFish and others. Can be installed via Composer.
HtmlPageDom
Wa72\HtmlPageDom` is a PHP library for easy manipulation of HTML documents using It requires DomCrawler from Symfony2 components for traversing the DOM tree and extends it by adding methods for manipulating the DOM tree of HTML documents.
phpQuery (not updated for years)
phpQuery is a server-side, chainable, CSS3 selector driven Document Object Model (DOM) API based on jQuery JavaScript Library written in PHP5 and provides additional Command Line Interface (CLI).
Also see: https://github.com/electrolinux/phpquery
Zend_Dom
Zend_Dom provides tools for working with DOM documents and structures. Currently, we offer Zend_Dom_Query, which provides a unified interface for querying DOM documents utilizing both XPath and CSS selectors.
QueryPath
QueryPath is a PHP library for manipulating XML and HTML. It is designed to work not only with local files, but also with web services and database resources. It implements much of the jQuery interface (including CSS-style selectors), but it is heavily tuned for server-side use. Can be installed via Composer.
fDOMDocument
fDOMDocument extends the standard DOM to use exceptions at all occasions of errors instead of PHP warnings or notices. They also add various custom methods and shortcuts for convenience and to simplify the usage of DOM.
sabre/xml
sabre/xml is a library that wraps and extends the XMLReader and XMLWriter classes to create a simple "XML to object/array" mapping system and design pattern. Writing and reading XML is single-pass and can therefore be fast and require low memory on large XML files.
FluidXML
FluidXML is a PHP library for manipulating XML with a concise and fluent API. It leverages XPath and the fluent programming pattern to be fun and effective.
3rd-Party (not libxml-based)
The benefit of building upon DOM/libxml is that you get good performance out of the box because you are based on a native extension. However, not all 3rd-party libs go down this route. Some of them listed below.
PHP Simple HTML DOM Parser
An HTML DOM parser written in PHP5+ lets you manipulate HTML in a very easy way!
Require PHP 5+.
Supports invalid HTML.
Find tags on an HTML page with selectors just like jQuery.
Extract contents from HTML in a single line.
I generally do not recommend this parser. The codebase is horrible and the parser itself is rather slow and memory hungry. Not all jQuery Selectors (such as child selectors) are possible. Any of the libxml based libraries should outperform this easily.
PHP Html Parser
PHPHtmlParser is a simple, flexible, HTML parser which allows you to select tags using any CSS selector, like jQuery. The goal is to assist in the development of tools which require a quick, easy way to scrape HTML, whether it's valid or not! This project was original supported by sunra/php-simple-html-dom-parser but the support seems to have stopped so this project is my adaptation of his previous work.
Again, I would not recommend this parser. It is rather slow with high CPU usage. There is also no function to clear memory of created DOM objects. These problems scale particularly with nested loops. The documentation itself is inaccurate and misspelled, with no responses to fixes since 14 Apr 16.
Ganon
A universal tokenizer and HTML/XML/RSS DOM parser
Ability to manipulate elements and their attributes
Supports invalid HTML and UTF8
Can perform advanced CSS3-like queries on elements (like jQuery -- namespaces supported)
A HTML beautifier (like HTML Tidy)
Minify CSS and Javascript
Sort attributes, change character case, correct indentation, etc.
Extensible
Parsing documents using callbacks based on current character/token
Operations separated in smaller functions for easy overriding
Fast and easy
Never used it. Can't tell if it's any good.
HTML 5
You can use the above for parsing HTML5, but there can be quirks due to the markup HTML5 allows. So for HTML5 you want to consider using a dedicated parser, like:
html5lib
A Python and PHP implementations of a HTML parser based on the WHATWG HTML5 specification for maximum compatibility with major desktop web browsers.
We might see more dedicated parsers once HTML5 is finalized. There is also a blogpost by the W3's titled How-To for html 5 parsing that is worth checking out.
WebServices
If you don't feel like programming PHP, you can also use Web services. In general, I found very little utility for these, but that's just me and my use cases.
ScraperWiki
ScraperWiki's external interface allows you to extract data in the form you want for use on the web or in your own applications. You can also extract information about the state of any scraper.
Regular Expressions
Last and least recommended, you can extract data from HTML with regular expressions. In general using Regular Expressions on HTML is discouraged.
Most of the snippets you will find on the web to match markup are brittle. In most cases they are only working for a very particular piece of HTML. Tiny markup changes, like adding whitespace somewhere, or adding, or changing attributes in a tag, can make the RegEx fails when it's not properly written. You should know what you are doing before using RegEx on HTML.
HTML parsers already know the syntactical rules of HTML. Regular expressions have to be taught for each new RegEx you write. RegEx are fine in some cases, but it really depends on your use-case.
You can write more reliable parsers, but writing a complete and reliable custom parser with regular expressions is a waste of time when the aforementioned libraries already exist and do a much better job on this.
Also see Parsing Html The Cthulhu Way
Books
If you want to spend some money, have a look at
PHP Architect's Guide to Webscraping with PHP
I am not affiliated with PHP Architect or the authors.

Programmatically generate high quality PDFs

Note: I realize this question has already been asked (with a ruby slant) here: Creating on-demand, print-quality PDFs (preferably in Ruby if feasible). BUT there was no decent answer IMHO.
So as you may have guessed, I am looking to find the best approach to producing HIGH QUALITY, print ready PDF documents programmatically. Our requirements need us to be able to have design documents that define place holders for dynamic content like images and text i.e. some kind of template mechanism.
The suggestion has been to use Adobe's InDesign server, but this seems like an expensive solution not to mention a little overkill for our need.
Are there any alternative, cheaper and more fitting solutions out there? The language of the solution doesn't really matter, just as long as it can be executes on a Windows box.
My suggestion would be to look at XSL-FO or thereabouts...
You create an XML doc that describes what you want and there are various libraries and toolkits (I've used XEP from RenderX) that will convert said XML into PDF.
In real terms what we did was take a large lump of data in XML format, use XSLT - templates in effect - to convert the data to formating objects which XEP renders up into something (a 500 page hotel directory with auto-generated TOC and Index) that has been consumed quite happily by at least three different commercial printers. We did some other smaller documents too from time to time.
Downside with this is that its not even remotely a WYSIWYG solution - you're effectively compiling "source code" to get PDF out the back. Upside is that the base technologies are reasonably generic even if the specific toolkits may be a bit less so.
You can convert XML templates to PDFs with Prince.
Prince is a computer program that
converts XML and HTML into PDF
documents. Prince can read many XML
formats, including XHTML and SVG.
Prince formats documents according to
style sheets written in CSS.
I have and also know many people that have had much success with ReportLab an open source Python PDF library (http://www.reportlab.org/rl_toolkit.html).
Its extremely easy to use and very quick to get started. So worth trying out.
I don't know why no one has suggested using LaTeX for this. It's an extremely popular open format for document design and not hard to set up a template that you can fill in text or image content. While the reference implementation of LaTeX runs as a standalone program, if that sounds like too many moving parts for you there are wrapper libraries for Python and other languages you can call via an API.
Java language and JasperReports
Java: iText
C#: iTextSharp
depends on what you want to publish, but take a look at Pentaho reporting
http://reporting.pentaho.org/
rinohtype is an open-source document processor that is capable of producing high-quality print-ready PDF documents. You can use one of the built-in document templates (book, article) or define your own template. The look of document elements can be configured by means of CSS-like style sheets. The contents of your document can be parsed from reStructuredText or CommonMark files, or you can build the document tree programmatically.
Full disclosure: I am the author of rinohtype.