extraction of data from PDF converted XBRL files [closed] - xbrl

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I have some XBRL files converted into pdf. Now I want to develop a project that would automatically extract all the data from these files. The project would be developed in JAVA. I am unable to get any lead. Any suggestions regarding how to start the project would be very much appreciated as there is very limited information over the internet regarding this.

I would recommend trying to get the original XBRL (or iXBRL) files rather than use the generated PDFs.
XBRL was designed in the first place in order to be easily machine readable and in order to avoid having to reverse engineer printed documents or PDFs. Attempting to read PDFs means not leveraging the potential of XBRL and may lead to imprecisions and errors.
Then, if you can get these source files, I recommend using an XBRL processor that will take care of all the complexity for you. This will save a lot of time compared to use a raw XML processor. It is likely that there are XBRL libraries written for Java.
I am sorry not to be able to give you a better answer, but I hope this helps you get started.

Related

How can I use the website open library to store information into a database [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I am trying to make a web page that gets information about books using HTML and to place the information about books into a database to use it. Any idea of how to take the information from the website open library and store it into a database?
here is the link to the API if needed:
https://openlibrary.org/developers/api
thanks in advance.
If postgreSQL and python is a viable option, LibrariesHacked has a ready-made solution on GitHub for importing and searching Open Library data.
GitHub: LibrariesHacked / openlibrary-search
Using a postgreSQL database it should be possible to import the data directly into tables and then do complex searches with SQL.
Unfortunately the downloads provided are a bit messy. The open library file always errors as the number of columns provided seem to vary. Cleaning it up is difficult as just the text file for editions is 25GB.
That means another python script to clean up the data. The file openlibrary-data-process.py simply reads in the CSV (python is a little more forgiving about dodgy data) and writes it out again, but only if there are 5 columns.

building html resume... should I use JSON? PHP includes? sql? xml? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
I am building an html resume (with bootstrap tabs) but want to break the sections into separate parts for easy editing and repurposing.
For example, have the work experience data in one file, have education in another and link them to a tabbed html page, but also have the option to export to a docx or pdf. Have skills appear on the html version but not on the pdf export.
What would be the best architecture design to use? Would JSON be good or should I use PHP includes.
What about xml? or should I just maek it a mysql database and use PHP to pull that data (this seems like overkill for less than 1,000 words).
I would argue any of these will be an overkill for a small project. So I'd go placing it all in one html file.
If you want to automatically generate pdf or docs it's no more a html resume. So I won't answer generating part of the question.
As for html management you can use templating language, e.g. Nunjucks or Pug
It will allow you to include html files one in another; the downside is you'll have to setup a build tool like Gulp for this (which will require some basic Javascript knowledge and time).
Something which you need to consider is the format which you would be handing into potential employers.
If you are hoping to hand in a web page, you would probably want to "render" it and not hand in a piece of functioning code. The reason for this, is if the employer/recruiter is unable to open or correctly read the file, this will decrease your chances of getting the job dramatically. Not to mention many large companies use bots which read CV's for you, See this article which explains that matter all to you.
You would also want to consider what some companies/recruiters may think when they see CV.html in their email inbox. Some will think its a really smart and creative idea, others may think it is an incompatible file with their computer and may never open it. Leaving instructions on how to open the document may take time which the employer doesn't have.
I'm not saying its a ludicrous idea, I'm saying you need to properly plan it out. Personally, I would keep an online copy on my website, but I would also have an additional copy (Word document or PDF) which could be downloaded and accessed by those bots which I mentioned early.
In programming there are many ways to do the same thing, and it is entirely up to you and your abilities to find what is best.

whats the best way to store the post in blogs [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
I am trying to write a webapp, where one of the functionality is to exchange messages. I am trying to understand how to store these messages. I do not want to store it in DB. If i have to store in file, then how do i separate between messages.
Any links to some document would be greatly appreciated. I tried googling a lot but could not get hold of any reference
You should think about storing the messages in XML format, and use your webapp to load and parse those XML files into the message objects. Why do you not want to store the messages in the database? There are serious drawbacks to storing in the file system rather then the database (or even system memory).
A file system is a database, just not a relational database.
It's often faster than a relation database, but it has significantly less flexibility for indexing on multiple fields.
Parsing XML is gonna suck whether the XML comes from a database or a file.
Instead, you should do page caching to the file system of HTML, or HTML fragments.

What's the best way to turn a bunch of XML docs into a set of HTML help pages, run locally? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
My intent to is to create documentation for our software project that is checked into our SCM system along with the source code.
These files can be spread among various sub-projects but I want to bring the documentation together so that, e.g. a page can include documentation on more than one sub-project at once. The viewer should not see a 'page per sub-project' - rather they see the documentation for the project and not the boundaries between sub-projects.
This documentation needs to load direct from a user's local PC in their browser, so I can't composite or transform XML files into a single HTML on a server.
First thing I would suggest to transform XML into another form would be XSLT http://www.w3schools.com/xsl/.
XSL stands for EXtensible Stylesheet Language, and is a style sheet language for XML documents.
XSLT stands for XSL Transformations. In this tutorial you will learn how to use XSLT to transform XML documents into other formats, like XHTML.

Converting excel spreadsheets to HTML [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I have a customer who asked me to make a website.
Now I have the basic website running (on joomla) but now he wants his pricelist pages displayed on there (seems reasonable)
How can i import import Excel file into as an array and display on html page with tags
Grtz,
Thomas
Edit:
perhaps something from pdf to html since I can create pdf files from it...
Excel saves spreadsheets in XML format, so you can use XSLT to transform your customer's spreadsheet into HTML. The Excel XML format is somewhat obtuse, but if you only need to grab certain pieces of critical data, it's a reasonable solution. Here's some information about the Excel XML format, though Googling will probably reveal more:
http://msdn.microsoft.com/en-us/library/aa140066%28office.10%29.aspx
And here's the W3C standard for XSL 1.0 (I doubt you would need 2.0 features, which are more complex, for this job):
http://www.w3.org/TR/xslt
XSLT is a declarative XML transformation language, which you would have to learn the fundamentals of for this job, but it's a very useful tool if you deal with XML generally, and the additional virtue of this solution is that it is repeatable (when the customer's data changes).
EDIT: Here's an XSLT tutorial, which is obviously a more friendly introduction to the language than the W3C standard:
http://www.w3schools.com/xsl/
If the price list only gets updated every now and again, can you not simply save the spreadsheet file as an HTML page from within Excel? This will give you some pretty nasty HTML (thanks MS), but it's a good starting point.
(As JollyMorphic points out, you can also transform Excel's XML, but that's quite heavy duty for what you appear to need).