Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I am trying to convert a number of .json files to .csv's using Python 2.7
Is there any general way to convert a json file to a csv?
PS: I saw various similar solutions on stackoverflow.com but they were very specific
to json tree and doesn't work if the tree structure changes. I am new to this site and am sorry for my bad english and reposting ty
The basic thing to understand is that json and csv files are extremely different on a very fundamental level.
A csv file is just a series of value separated by commas, this is useful for defining data like those in relational databases where you have exactly the same fields repeated for a large number of objects.
A json file has structure to it, there is no straightforward way to represent any kind of tree structure in a csv. You can have various types of foreign key relationships, but when it comes right down to it, trees don't make any sense in a csv file.
My advice to you would be to reconsider using a csv or post your specific example because for the vast majority of cases, there is no sensible way to convert a json document into a csv.
Related
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I am trying to make a web page that gets information about books using HTML and to place the information about books into a database to use it. Any idea of how to take the information from the website open library and store it into a database?
here is the link to the API if needed:
https://openlibrary.org/developers/api
thanks in advance.
If postgreSQL and python is a viable option, LibrariesHacked has a ready-made solution on GitHub for importing and searching Open Library data.
GitHub: LibrariesHacked / openlibrary-search
Using a postgreSQL database it should be possible to import the data directly into tables and then do complex searches with SQL.
Unfortunately the downloads provided are a bit messy. The open library file always errors as the number of columns provided seem to vary. Cleaning it up is difficult as just the text file for editions is 25GB.
That means another python script to clean up the data. The file openlibrary-data-process.py simply reads in the CSV (python is a little more forgiving about dodgy data) and writes it out again, but only if there are 5 columns.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I have some XBRL files converted into pdf. Now I want to develop a project that would automatically extract all the data from these files. The project would be developed in JAVA. I am unable to get any lead. Any suggestions regarding how to start the project would be very much appreciated as there is very limited information over the internet regarding this.
I would recommend trying to get the original XBRL (or iXBRL) files rather than use the generated PDFs.
XBRL was designed in the first place in order to be easily machine readable and in order to avoid having to reverse engineer printed documents or PDFs. Attempting to read PDFs means not leveraging the potential of XBRL and may lead to imprecisions and errors.
Then, if you can get these source files, I recommend using an XBRL processor that will take care of all the complexity for you. This will save a lot of time compared to use a raw XML processor. It is likely that there are XBRL libraries written for Java.
I am sorry not to be able to give you a better answer, but I hope this helps you get started.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
Please help me decide between the following formats for storing articles on a server:
XML
JSON
YAML
CSV
There are too many and I don't have the knowledge to choose. I am looking for objective criteria, not subjective opinions.
The articles may contain a short title, some paragraphs and images.
XML vs JSON vs YAML vs CSV
Here are some considerations you might use to guide your decision:
Choose XML if
You have to represent mixed content (tags mixed within text). [This would appear to be a major concern in your case. You might even consider HTML for this reason.]
There's already an industry standard XSD to follow.
You need to transform the data to another XML/HTML format. (XSLT is great for transformations.)
Choose JSON if
You have to represent data records, and a closer fit to JavaScript is valuable to your team or your community.
Choose YAML if
You have to represent data records, and you value some additional features missing from JSON: comments, strings without quotes, order-preserving maps, and extensible data types.
Choose CSV if
You have to represent data records, and you value ease of import/export with databases and spreadsheets.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
Why is JSON-formatted data stored in MongoDB? Is that the only format supported in MongoDB? What are the advantages of using JSON for storing records in MongoDB? What is the benefit of using JSON in Mong DB over other formats?
Actually Mongo uses BSON, that could represent the same things that JSON, but with less space. "JSON" (that is like the representation for human beings) have some properties useful in a NoSQL database:
No need for a fixed schema. You could just add whatever you want and it will be correct JSON.
There are parsers available for almost any programming language out there.
The format is programmer friendly, not like some alternatives... I'm looking at you, XML ¬¬.
Mongo needs to understand the data, without forcing a "collection schema". You don't need information about the object to reason about it, if it uses JSON. For example, you could get the "title" or "age" for any JSON document, just find that field. With other formats (eg. protocol buffers) thats not possible. At least without a lot of code...
(Added) Because Mongo is a database they want to do queries fast. BSON/JSON is a format that can meet that requirement AND the others at the same time (easily implementable, allow reflectioning about data, parsing speed, no fixed schema, etc).
(Added) Mongo reuses a Javascript engine for their queries, so it have all the sense in the world to reuse JSON for object representation. BSON is a more compact representation for that format.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
I am trying to write a webapp, where one of the functionality is to exchange messages. I am trying to understand how to store these messages. I do not want to store it in DB. If i have to store in file, then how do i separate between messages.
Any links to some document would be greatly appreciated. I tried googling a lot but could not get hold of any reference
You should think about storing the messages in XML format, and use your webapp to load and parse those XML files into the message objects. Why do you not want to store the messages in the database? There are serious drawbacks to storing in the file system rather then the database (or even system memory).
A file system is a database, just not a relational database.
It's often faster than a relation database, but it has significantly less flexibility for indexing on multiple fields.
Parsing XML is gonna suck whether the XML comes from a database or a file.
Instead, you should do page caching to the file system of HTML, or HTML fragments.