Why docker-compose cannot use tabs in JSON-compatible mode? - json

While using docker-compose, I rather using JSON instead of YAML, and according to the official documentation provided by Docker, it is possible to use it:
That said, when I try to run a simple compose-compatible JSON file, it fails with the following output:
ERROR: yaml.scanner.ScannerError: while scanning for the next token
found character '\t' that cannot start any token
in "./sample-file.json", line 2, column 1
But, if I replace the tabs with spaces, no matter how many (even without a single space), it starts working:
Starting sandbox_apache_1 ... done
Attaching to sandbox_apache_1
apache_1 | AH00558: httpd: Could not reliably...
In the picture it clearly says "so any JSON file", that seems to be untrue.
What is with this, then?

TL:DR: the docker-compose documentation is misleading in quoting a feature of YAML 1.2, when they use an YAML 1.1 based loader to load their .yml files.
That things work when you delete the TABs is because you essentially can have very compact JSON: {"a":[1,2,3]} without any spaces between nodes at all.
Yes YAML is a superset of JSON for all practical purposes, but there are a few things that you need to keep in mind.
First of all you should take documentation that doesn't correctly write the acronym (Yaml instead of YAML) and doesn't directly reference the spec, but references a non-authorative with a grain of salt. Additionally the documentation uses the extension .yml for the docker-compose.yml file, although the recommended file extension for YAML files, according to the FAQ on yaml.org, has been .yaml since Sep 2006.
The specification of YAML 1.2 states that it is intended as a superset of JSON, but docker-compose is using PyYAML to parse/load the YAML file and that only loads a subset of YAML 1.1. There were specific changes to YAML going from 1.1 to 1.2 to make YAML 1.2 more of, but not a 100%, superset of JSON.
TAB characters are allowed in YAML 1.2 for white-space, as long as this is not white-space that determines indentation. Since JSON is flow-style YAML, within which indentation should not significant, you can read that as there should be no TAB before the initial { or [.
In YAML 1.1 the restriction on using TAB is more severe:
An ignored space character outside scalar content. Such spaces are used for indentation and separation between tokens. To maintain portability, tab characters must not be used in these cases, since different systems treat tabs differently.
(i.e. you can have TAB characters in non-plain scalars in YAML 1.1).

Related

Trying to parse a JSON file but it seems the format is different or something is wrong with the JSON file

Hi I'm trying to parse any of the files from the link underneath. I've tried reaching out to the owner of the data dumps, but nothing works in trying to parse the files as proper JSON files. No program we use (Power BI, Jupyter, Excel) anything really, wants to recognise the files as JSON and we can't figure out why this might be. I was wondering if anyone could help figuring out what the issue is here as this dataset is very interesting to me and my co-students. I hope I'm using the word 'parsing' correctly.
The link to the data dumps is linked underneath:
https://files.pushshift.io/reddit/comments/
The file I downloaded (I just tried one at random) was handled just fine by jq, my preferred command-line tool for processing JSON files.
jq accepts an input consisting of a sequence of JSON objects, which is what I found when I decompressed the test file. This format is commonly known as JSON lines, and many tools can handle it. The Wikipedia article on JSON streaming contains more information and a (possibly outdated) list of tools.
If your tools aren't capable of handling more than one JSON object in an input, you could turn the files into something which you can handle by adding a comma to the end of every line except the last one (since each JSON object is a single line) and then surrounding the whole input inside a pair of brackets to turn the sequence into a JSON list. Since JSON does not actually care about newlines, it would be sufficient to add a line containing [ at the beginning and a line containing ] at the end. I don't know what command-line tools you have available and are comfortable with, but the task shouldn't be too difficult.

Make sure some fields are there in a configuration file (for the Atom IDE) via a Ansible playbook

I am trying to install and configure the Atom Editor IDE using Ansible.
I know how to retrieve and parse a JSON file with Ansible, but I don't see how to insert/update some fields of that JSON file when seen as a dictionary, also dealing with the fact that the file may not be there at the beginning of the Ansible workbook.
I know the settings are stored in ~/.atom/config.cson.
My initial configuration looked like this:
$ cat ~/.atom/config.cson
"*":
core:
telemetryConsent: "limited"
editor:
invisibles: {}
"exception-reporting":
userId: "<SOME_UUID>"
But then I wanted to make sure tabs were treated as 2 blank spaces, so I went on the Settings window, changed some parameters and then the configuration file looked like:
$ cat ~/.atom/config.cson
"*":
core:
telemetryConsent: "limited"
editor:
invisibles: {}
showInvisibles: true
softTabs: false
tabType: "soft"
"exception-reporting":
userId: "<SOME_UUID>"
In Ansible I know I can load a JSON object and parse it with:
- name: Configure Atom IDE
shell: cat /home/"{{ cli_input_username }}"/.atom/config.cson
register: result
become_user: "{{ cli_input_username }}"
- set_fact:
atom_config_dict: "{{ result.stdout | from_json }}"
And then inspect some fields of that "JSON dictionary" with "{{ jsonVar['atom_config_dict."*".editor'] }}". I think this is going to work, but it may be I need to use some special tricks because of that asterisk used as a key of the dictionary "*".
But then how do I UPSERT (INSERT/UPDATE JSON key/values) some fields and save to file the whole JSON dictionary (after the changes) at ~/.atom/config.cson?
Do I have to treat special JSON keys as "*" in a specific way? Or is it just a string treated as a key of the dictionary?
How do I make sure the Ansible playbook can handle the fact that the configuration JSON file may not be there at the beginning? (e.g. when I am installing the Atom Editor IDE for the first time, i.e. at the first execution of the Ansible playbook).
EDIT:
I just realised this configuration file may not be an entirely valid JSON. In fact that file extension is "cson" which I am not familiar with.
So probably those tricks regarding from_json won't work.
Is there a way to deal with this configuration file in a structured way in order to make it searchable and parse it and then inser/update some keys of that dictionary? Perhaps this could be treated as a YAML file using from_yaml?
Atom works perfectly with a JSON file that stores your configuration. Simply convert the existing config.cson to JSON, delete (or rename) the file and place the converted config.json in its place.
To convert the file, you could use js2coffee (requires little editing) or the atomizr package for Atom. With the latter installed, simply open your config.cson and run the Atomizr: Toggle Atom Format command. Note that with the default settings, this will not keep the original file.

How to specify / reference version of JSON Schema?

I'm trying to define a coding standard for a project and I want to specify use of JSON Schema version 4.
However, from the offical JSON Schema website, if you follow the links for the Specifications, takes you to the github page, then into the Version 4 Draft at the IETF. This document explicitly states that it is an Internet-Draft document and says:
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
Since there don't seem to be any versions of the JSON Schema that are anything other than Internet-Draft status, how should I reference this?
Edit: This is in a written project document, not within a JSON file itself. I currently have text like this:
Python Standards and Style
All Python Source Files MUST be written in
Python 3.x, specifically targeting Python 3.5.1 as the default Python
3 installation with Ubuntu 16.04 LTS.
All Python Source Files MUST
conform to the PEP 8 standard [footnote:
https://www.python.org/dev/peps/pep-0008/ ].
All Python Source Files
MUST pass a flake8 [footnote:
https://pypi.python.org/pypi/flake8/3.2.1 ] check before each delivery.
The checker MUST be set up to be ultra-pedantic and it MUST be
considered a unit test failure if the checker needs to change anything
on the checked in Source Files.
All Python Source Files SHOULD use
Docstrings conforming to the PEP 257 standard [footnote:
https://www.python.org/dev/peps/pep-0257/ ].
JSON Standards and Style
All JSON Source Files MUST be written in JSON Schema version 4
[footnote: https://datatracker.ietf.org/doc/html/draft-zyp-json-schema-04 ].
All
JSON Source Files MUST conform to the Google JSON Style Guide 0.9
[footnote: https://google.github.io/styleguide/jsoncstyleguide.xml ]
All JSON Source Files MUST pass a jsonschema [footnote:
https://python-jsonschema.readthedocs.io/en ] check before each
delivery. The checker MUST be set up to be ultra-pedantic and it MUST
be considered a unit test failure if the checker needs to change
anything on the checked in Source Files.
TOML Standards and Style
All TOML Source Files MUST adhere to v0.4.0 of the TOML standard [footnote:
https://github.com/toml-lang/toml ].
All TOML Source Files MUST be loadable
with the pytoml parser v0.1.11 [footnote:
https://github.com/bryant/pytoml ], without error.
All TOML Source Files SHOULD be
aligned at the left margin – i.e. do not indent sub-sections.
To me, the italicised footnote to the JSON Schema reference would count as citing an Internet-Draft document, which is explicitly stated as not appropriate in the excerpt I gave above.
Since there don't seem to be any versions of the JSON Schema that are
anything other than Internet-Draft status, how should I reference
this?
You do this:
{
"$schema":"http://json-schema.org/draft-04/schema#",
... // the rest of your schema
}
Just because a standard is in draft format doesn't make it any less a standard.
Now, you also have the option of authoring the schema without a $schema declaration and it will still be perfectly valid. If you do this and use the proper JSON schema draft v4 definition then this will be usable by all parsers supporting draft v4. However, the convention is to use the $schema declaration.
All JSON Source Files MUST be written in JSON Schema version 4
You don't want all JSON files to be schema-based - that's ludicrous. However, any schema files you do need you will have no choice from a documentation perspective other than to reference a version of the standard. And that version should be draft 4 even though it's a draft.
The alternative is to completely remove any reference to JSON Schema altogether, which is probably the route I would take.

Puppet - CSV file header

I'm, writing a Puppet (3.6.2) module that reads data fields from a CSV file via the extlookup function and I cannot figure out how to tell extlookup that the first line is the header field. Does extlookup support this? If not, can anyone recommend an external function I could import and use?
thanks,
PS - Yes I know about hiera, and having the data in YAML or JSON files but my requirement is CSV files only.
Brandon
The behavior of extlookup() is pretty well documented. It makes no special provision for column headers, which are by no means an inherent feature of CSV format. Indeed, if your header line is not readable as a data line, then your file is not CSV at all.
Supposing that your file is indeed valid CSV, the absolute simplest solution would be to ignore the issue. It presents a problem only if the first column heading duplicates an actual or potential data name. If it does not, then you will never look up or use the psuedo-value represented by the first row.
If your file in fact is not CSV on account of its first line, or if the first column name conflicts with a real data name, then it seems the next best alternative would be to just remove that line, or to avoid creating it in the first place. I don't see any reason why one of these should not be possible.
I know about heira, and having the data in YAML or JSON files but my requirement is CSV files only.
How sad. Do be aware that extlookup() has long been deprecated, and it was removed from Puppet 4.
I'm inclined to suggest you implement a translator from CSV to Hiera-friendly YAML, and use Hiera in your module. Alternatively, Hiera supports custom backends, and it's not too hard to write one. I am unaware of an existing CSV backend for Hiera, but you could write one. Ignoring a header line would then be under your control, and you would simultaneously achieve a measure of future-proofing.

What is '<?dctm xml_app="ignore"?> '?

I have encountered this tag in an xml file :
<?dctm xml_app="ignore"?>
What does this tag mean / signify ?
Googling this value did not return results as to what it means.
The <?...?> syntax is an XML processing instruction:
Processing instructions (PIs) allow documents to contain instructions for applications.
In this case it appears to be an instruction for Documentum, a content management system. When the document in question is stored in this application, it can specify some details about how it is to be handled by the application.
If you are not using the application in question you can probably just ignore this instruction, although obviously this will depend on exactly what you are doing with the document.
Here it is Leave my XML file alone in Documentum : To disable XML validation on import add the processing instruction <”dctm xml_app=”ignore”"> to the xml file. This will halt the parsing algorithm.
The value Ignore tells the system not to process this document as an XML
document. You must include this instruction; if you leave it out, the system
will use the Default XML Application to process the document.