Does anyone known of a simple utility for editing a simple BSON database/file?
Did You try this: http://docs.mongodb.org/manual/reference/bsondump/ ?
The installation package that includes mongodump ('mongo-tools' on Ubuntu) should also include bsondump, for which the manpage says:
bsondump - examine BSON files in a human-readable form
You can convert BSON to JSON with the following:
bsondump --pretty <your_file.bson
As a data interchange format, BSON may not be suitable for editing directly. For manipulating a BSON dataset, you could, of course, upload it to MonogDB and work with that. Or, you could open the bsondump decoded JSON in an editor. But the BSON Wikipedia article indicates that compatible libraries exist in several languages, which suggests that you should also be able to decode it programmatically to an internal map representation and edit that internal representation in code.
Related
I need some guidance on how to proceed with a problem.
Our integration team receives xml files which are converted to json and sent to pub/sub. We then ingest the json files (or are supposed to) into bigquery.
The problem is that the xml files do not include all possible objects or values all the time. So, I cant create a correct schema in bq to receive the json files. I got the xsd file with an extension file which gives me all possible objects but I don't know how to convert this to a correct bq schema.
Do you have any suggestions on how to create a bq schema from xsd files? I was thinking that if I create an xml file with dummy data (including all objects and more than one object when creating repeated objects) with help of the xsd maybe that xml file may be converted to json and then use the auto-schema detection of bq.
Any suggestions?
Thanks,
Cris
If you have the XSD schema files, you can convert these to a valid JSON schema. There are a few tools that can help you to accomplish this.
Keep in mind that the tools are for general purposes and not for the particular case of BigQuery, so you'll have to tune the result to get a valid JSON schema. For this check the components of a BigQuery schema, and for quick reference the sample provided in the documentation.
I went through previous posts on SO and some of the answers say that a JSON file is used to send data from server to client.
Well that seems to be okay but then we can create package.json, Apidoc.json, manifest.json which do not interact with the client and server
So can someone tell me what actually is a JSON file?
JSON stands for JavaScript Object Notation. It is used to describe a data structure in a simple format. It can be a plain text file, which may be used to pass data from the server to a client, but it could be equally used to hold and consume that data at the same layer e.g. you could have a configuration file at the client side which is read an interpreted by your application.
Note also that JSON does not need to be held in a file; you could create a string variable with JSON data in it and pass this from one method to another without ever storing it in a file.
The tag definition in Stack Overflow can be found here https://stackoverflow.com/tags/json/info and further information can be found here https://www.json.org/.
JSON is a file format, just like CSV. Just because CSV is used with Microsoft Excel, does not mean that is all it is used for (just like with JSON). Just because it is common to get info from a server in JSON format, does not mean that is all JSON is used for. Do some googling before asking a question like this on Stack Overflow.
Here is an intro to JSON. JSON Intro W3Schools
The avro format is used in hadoop as a header to describe the contents of the binary file that follows. My question is whether the json part of the avro file can be extended to include information that is not necessary for hadoop? The typical use case would be to attach meta-data like the originator of the file and a date to the file without it needing to be data and part of the file.
Yes. Avro files can be annotated with additional information in the json schema or with specific additional name:value pairs. Additionally, we have been able to read these avro files with Pentaho and Google Big Query. One caveat is that the schema and name:value pairs are discarded during the import process. So if you feel you will need them later, you should extract and store local copies of them.
I'm on an Ubuntu system, and I'm trying to write a testing framework that has to (among other things) compare the output of a mongodump command. This command generates a bunch of BSON files, which I can compare. However, for human readability, I'd like to convert these to nicely formatted JSON instead, which I can do using the provided bsondump command. The issue is that this appears to be a one-way conversion.
While I can work around this if I absolutely need to, it would be alot easier if there was a way to convert back from JSON to BSON on the command line. Does anyone know of a command line tool to do this? Google seems to have come up dry.
I haven't used them, but bsontools can convert from json, xml, or csv
As #WiredPrarie points out, the conversion from BSON to JSON is lossy, and it makes no sense to want to go back the other way. Workarounds include using mongoimport instead of mongorestore, or just using the original BSON. See the comments for more deails (adding this answer mainly so I can close the question)
You can try beesn, it converts data both ways. For your variant - JSON -> BSON - use the -x switch.
Example:
$ beesn -x -i test-data/01.json -o my.bson
Disclaimer: I am an author of this tool.
Is there any naming convention for a json schema file extension? XML has .xsd (XML Schema Definition), what should json schema files have, .jsd (JSON Schema Definition)?
From Gary Court:
I personally use .schema.json, but there is no official file
extension. The official mime type however is
"application/schema+json".
Update 2022Nov
application/schema+json and application/schema-instance+json will be published by an IETF RFC.
According to current proposal, both json and schema.json extensions are supported. I still find it quite inconvenient for processing based on conventions to have a dot within an extension.
Previous comment
According to the last draft (v4), there is not a new extension proposed for files storing json-schemas. .json extension is used profusely within that document. .json is also the preferred extension in validators (PHP, Ruby, Python).
So I think that .json should be the preferred option in absence of an official/standard new extension.
From https://json-schema.org/understanding-json-schema/basics.html#id3
Since JSON Schema is itself JSON, it’s not always easy to tell when
something is JSON Schema or just an arbitrary chunk of JSON. The
$schema keyword is used to declare that something is JSON Schema. It’s
generally good practice to include it, though it is not required.
So you can use .json as the file extension for JSON schema but maybe with a $schema keyword (although optional) for better distinction.
I've started using .jschema after I had a run-in with an extension-based JSON Schema parser that automatically added id's to external RAML examples which are also .json files.
They are a specific format, after all. HTML is XML, which is UML, and we use a different file extension for each of those.
My suggestion is .jsd or .jsonsd standing for Json Schema Document.
I followed the way XML Schemas are named XSD (Xml Schema Document)
A JSON Schema is a valid JSON file so the extension .json is OK.
Then, the first attribute of your file should be '$schema' to declare the version of the specification you are using. Eg.
{
"$schema": "https://json-schema.org/draft/2019-09/schema",