Specify separator for JSON in Go

Specify separator for JSON in Go - json

I want custom separators for JSON in Go, equivalent to Python's:
json.dumps([1,2,3,{'4': 5, '6': 7}], separators=(',',':'))
How to specify separator for JSON in Go ?
Clarification:
json.dumps({"key1":"value1","key2":"value2"}, separators=('__','..'))
'{"key2".."value2"__"key1".."value1"}'
There are some valid use cases for this requirement:
http://mens.com?&cyclone_data={"VisitorId":"53905341bd05ae26a9000001","CampaignId":"538f278cbd05ae36c6000001","LandingPageId":"538eac3ebd05ae15fe000001","OfferId":"538f097ebd05ae2a4d000001","SourceId":"538f0e39bd05ae2b9c000002","NetworkId":""}
I need to set this same url in cookie:
tracking_data=http://mens.com?&cyclone_data={"VisitorId":"53905341bd05ae26a9000001","CampaignId":"538f278cbd05ae36c6000001","LandingPageId":"538eac3ebd05ae15fe000001","OfferId":"538f097ebd05ae2a4d000001","SourceId":"538f0e39bd05ae2b9c000002","NetworkId":""}
After setting cookie with http.SetCookie(), Chrome Web Developer tool shows:
Date:Thu, 05 Jun 2014 11:23:46 GMT
Location:http://mens.com?&cyclone_data={"VisitorId":"53905341bd05ae26a9000001","CampaignId":"538f278cbd05ae36c6000001","LandingPageId":"538eac3ebd05ae15fe000001","OfferId":"538f097ebd05ae2a4d000001","SourceId":"538f0e39bd05ae2b9c000002","NetworkId":""}
Set-Cookie:538f278cbd05ae36c6000001=538f278cbd05ae36c6000001; Path=/
Set-Cookie:cyclone-track-url=http://mens.com?&cyclone_data={VisitorId:53905341bd05ae26a9000001CampaignId:538f278cbd05ae36c6000001LandingPageId:538eac3ebd05ae15fe000001OfferId:538f097ebd05ae2a4d000001SourceId:538f0e39bd05ae2b9c000002NetworkId:} Content-Length:378
Content-Type:text/html; charset=utf-8
, and "" are missing.
I am aware of base64. I want users to be able to change VisitorId without any manual encode/decode routine.
Server side app reads and compares this url from Cookie and Browser Referer.
Found this issue:
https://code.google.com/p/go/issues/detail?id=7243

Well, you can't directly, but if what you want is as I'm guessing to remove extra unneeded spaces, you can use json.Compact:
http://golang.org/pkg/encoding/json/#Compact
It just takes your encoded json and removes unnecessary spaces.
You can also try playing with the json.MarshalIndent method, that lets you control indentation separators. But AFAIK, you can't specify non standard separators.

Related

CICS TS(DFHJS2LS): Chinese characters are getting corrupted when received into MAINFRAME from POSTMAN tool

We have developed a webservice having CICS as the HTTP SERVER (service provider). This Webservice takes the input JSON (which has both English and Chinese characters) from any client/POSTMAN tool and will be processed in Mainframe (CICS).
DFHJS2LS: JSON schema to high-level language conversion for request-response services
We are using this proc - DFHJS2LS to enable webservices in Mainframe. ThisI BM provided procedure does the conversion of JSON to MF copybook and vice-versa. Also it converts the UTF-8 code unit into UTF-16 when it reaches mainframe copybook.
Issue:
The issue what we face now is on the Chinese characters. The Chinese characters which we pass in JSON are not getting converted properly and they are getting corrupted when it is received inside mainframe. The conversion from UTF-8 to UTF-16 is not happening (this is my suspect).
市 - this is the chinese character passed in JSON (POSTMAN).
Expected value in Mainframe copybook is 5E02(UTF-16 - hex value)
but we got 00E5 00B8 0082(UTF-8 hex value)
we have tried all header values and still no luck.....
content type = application/json
charset=UTF-8 / UTF-16
Your inputs are much appreciated in addressing this DBCS/unicode/chinese character issue.

In the COBOL are you declaring the filed that will receive the Chinese characters as Pic G :
01 China-Test-Message.
03 Msg-using-pic-x Pic X(10).
03 Msg-using-pic-g Pic G(4) Usage Display-1.

Try "USAGE NATIONAL" which shoul dmap to UTF-16 which is probably the code page for the chinese character.
RTFM here:-
https://www.ibm.com/support/knowledgecenter/SS6SG3_6.3.0/pg/concepts/cpuni01.html

The chinese conversion is resolved once we changed our HTTP header to this -
Content-Type = application/json;charset=UTF-8
thanks everyone for the support.

How to generate a JSON log from nginx?

I'm trying to generate a JSON log from nginx.
I'm aware of solutions like this one but some of the fields I want to log include user generated input (like HTTP headers) which need to be escaped properly.
I'm aware of the nginx changelog entries from Oct 2011 and May 2008 that say:
*) Change: now the 0x7F-0x1F characters are escaped as \xXX in an
access_log.
*) Change: now the 0x00-0x1F, '"' and '\' characters are escaped as \xXX
in an access_log.
but this still doesn't help since \xXX is invalid in a JSON string.
I've also looked at the HttpSetMiscModule module which has a set_quote_json_str directive, but this just seems to add \x22 around the strings which doesn't help.
Any idea for other solutions to log in JSON format from nginx?

Finally it looks like we have good way to do this with vanilla nginx without any modules. Just define:
log_format json_combined escape=json
'{'
'"time_local":"$time_local",'
'"remote_addr":"$remote_addr",'
'"remote_user":"$remote_user",'
'"request":"$request",'
'"status": "$status",'
'"body_bytes_sent":"$body_bytes_sent",'
'"request_time":"$request_time",'
'"http_referrer":"$http_referer",'
'"http_user_agent":"$http_user_agent"'
'}';
Note that escape=json was added in nginx 1.11.8.
http://nginx.org/en/docs/http/ngx_http_log_module.html#log_format

You can try to use that one https://github.com/jiaz/nginx-http-json-log - addition module for Nginx.

You can try to use:
addition module for Nginx nginx-http-json-log
Use any language as done in nginx-json-logformat with example /etc/nginx/conf.d/json_log.conf
A version of the Nginx HTTP stub status module that outputs in JSON format
PS:
The if parameter (1.7.0) enables conditional logging. A request will not be logged if the condition evaluates to “0” or an empty string:
map $status $http_referer{
~\xXX 0;
default 1;
}
access_log /path/to/access.log combined if=$http_referer;
It’s a good idea to use a tool such as https://github.com/zaach/jsonlint to check your JSON data. You can test the output of your new logging format and make sure it’s real-and-proper JSON.

Malformed JSON string, encoding issue?

I am requesting a web service and getting a JSON data as shown below. However, I keep getting the following error:
malformed JSON string, neither array, object, number, string or atom, at character offset 0 (before "\x{feff}\x{feff}{"ur...") at /usr/share/perl5/JSON/Any.pm
http request:
Date: Tue, 16 Apr 2013 10:41:03 GMT
Server: nginx/0.7.67
Content-Type: application/json; charset=utf-8
Client-Date: Tue, 16 Apr 2013 10:41:03 GMT
Client-Peer: 127.0.1.1:80
Client-Response-Num: 1
Client-Transfer-Encoding: chunked
json data:
{"url":"http:\/\/example.com\/service\/rest.htm?req_data=<auth_req><request_token>20130416f186a9c0480e2501e73d19dbcd79d354<\/request_token> <\/auth_req>&user=208860&service=auth.execute&sid=0001&format=xml&v=2.0& sign=pn9xjQjzTgQuAMarLDtiZCMaGZm4bSo8aUTGtkSt1GrxPGtK29oIL1DgHveVMwf2n7rxLHzyWrNd%2BYU6%2BxZCzs56JkMtxQMPxEJ%2Bu9Eqk5SRL6EAjWMeKheix5frPyHi0hQ4nnbiVm%2Bx3bF0KFq3cORvVCeq8wBoZU1HQXD%2BuuY%3D"}
I suspect some kind of encoding issue because the JSON string validates fine in jslint JSON validator. But I don't know what else to look for. Please help, thanks.

\x{feff} is a BOM (Byte Order Mark). I am not sure whether it is allowed at the beginning of a JSON, but it definitely should not be repeated.

thanks to choroba for prompting me to look for the BOM. Greped 3rd party library files which generates the url and surely found the BOM in them.
grep -rl $'\xEF\xBB\xBF' . # Got BOM?
perl -pi -nle 's/^\xEF\xBB\xBF//' *.lib # remove them!
cheers.

What is the boundary in multipart/form-data?

I want to ask a question about the multipart/form-data. In the HTTP header, I find that the Content-Type: multipart/form-data; boundary=???.
Is the ??? free to be defined by the user? Or is it generated from the HTML? Is it possible for me to define the ??? = abcdefg?

Is the ??? free to be defined by the user?
Yes.
or is it supplied by the HTML?
No. HTML has nothing to do with that. Read below.
Is it possible for me to define the ??? as abcdefg?
Yes.
If you want to send the following data to the web server:
name = John
age = 12
using application/x-www-form-urlencoded would be like this:
name=John&age=12
As you can see, the server knows that parameters are separated by an ampersand &. If & is required for a parameter value then it must be encoded.
So how does the server know where a parameter value starts and ends when it receives an HTTP request using multipart/form-data?
Using the boundary, similar to &.
For example:
--XXX
Content-Disposition: form-data; name="name"
John
--XXX
Content-Disposition: form-data; name="age"
12
--XXX--
In that case, the boundary value is XXX. You specify it in the Content-Type header so that the server knows how to split the data it receives.
So you need to:
Use a value that won't appear in the HTTP data sent to the server.
Be consistent and use the same value everywhere in the request message.

The answer to substance of the question is yes. You can use an arbitrary value for the boundary parameter as long as it is less than 70 bytes long and only contains 7-bit US-ASCII (printable) characters.
If you use one of multipart/* content types, you are actually required to specify the boundary parameter in the Content-Type header. Otherwise, in the case of an HTTP request, the server will be unable to parse the payload.
Unless you are absolutely certain that only the US-ASCII character set will be used in its payload, you may want to add a Content-Type header to each part, with the charset parameter set to UTF-8.
A few relevant excerpts from the RFC2046:
4.1. Text Media Type
A "charset" parameter may be used to indicate the character set of the body text for "text" subtypes, notably including the subtype "text/plain", which is a generic subtype for plain text.
4.1.2. Charset Parameter
A critical parameter that may be specified in the Content-Type field
for "text/plain" data is the character set.
Unlike some other parameter values, the values of the charset parameter are NOT case sensitive. The default character set, which must be assumed in the absence of a charset parameter, is US-ASCII.
5.1. Multipart Media Type
As stated in the definition of the Content-Transfer-Encoding field [RFC 2045], no encoding other than "7bit", "8bit", or "binary" is permitted for entities of type "multipart". The "multipart" boundary delimiters and header fields are always represented as 7bit US-ASCII in any case (though the header fields may encode non-US-ASCII header text as per RFC 2047) and data within the body parts can be encoded on a part-by-part basis, with Content-Transfer-Encoding fields for each appropriate body part.
The Content-Type field for multipart entities requires one parameter, "boundary". The boundary delimiter line is then defined as a line consisting entirely of two hyphen characters ("-", decimal value 45) followed by the boundary parameter value from the Content-Type header field, optional linear whitespace, and a terminating CRLF.
Boundary delimiters must not appear within the encapsulated material, and must be no longer than 70 characters, not counting the two leading hyphens.
The boundary delimiter line following the last body part is a distinguished delimiter that indicates that no further body parts will follow. Such a delimiter line is identical to the previous delimiter lines, with the addition of two more hyphens after the boundary parameter value.
Here is an example using an arbitrary boundary:
Content-Type: multipart/form-data; boundary="yet another boundary"
--yet another boundary
Content-Disposition: form-data; name="foo"
bar
--yet another boundary
Content-Disposition: form-data; name="baz"
quux
--yet another boundary
Content-Disposition: form-data; name="feels"
Content-Type: text/plain; charset=utf-8
🤷
--yet another boundary--

multipart/form-data contains boundary to separate name/value pairs. The boundary acts like a marker of each chunk of name/value pairs passed when a form gets submitted. The boundary is automatically added to a content-type of a request header.
The form with enctype="multipart/form-data" attribute will have a request header Content-Type : multipart/form-data; boundary --- WebKit193844043-h (browser generated vaue).
The payload passed looks something like this:
Content-Type: multipart/form-data; boundary=---WebKitFormBoundary7MA4YWxkTrZu0gW
-----WebKitFormBoundary7MA4YWxkTrZu0gW
Content-Disposition: form-data; name=”file”; filename=”captcha”
Content-Type:
-----WebKitFormBoundary7MA4YWxkTrZu0gW
Content-Disposition: form-data; name=”action”
submit
-----WebKitFormBoundary7MA4YWxkTrZu0gW--
On the webservice side, it's consumed in #Consumes("multipart/form-data") form.
Beware, when testing your webservice using chrome postman, you need to check the form data option(radio button) and File menu from the dropdown box to send attachment. Explicit provision of content-type as multipart/form-data throws an error. Because boundary is missing as it overrides the curl request of post man to server with content-type by appending the boundary which works fine.
See RFC1341 sec7.2 The Multipart Content-Type

we have to split our data. So, the server understands what we send.
1 Example: We split data
$email = $_POST['email'];
$p_id = $_POST['pid'];
2.Example: if We send JSON data ( With ) content type Multipart/form-data, we get a warning related to boundary
$json = file_get_contents("php://input");

use this
headers: {
'content-type': 'application/x-www-form-urlencoded'
}
for boundary error

What are the Legal / Allowed characters for web server file names on?

What characters are allowed in filenames for HTML files on ALL servers (*nix, Windows, etc.) ?
I'm looking for the "lowest common denominator" that will work on all servers.
USE: I'm naming a file to be served up publicly (Mysite.com/My-Page.htm)
E.g., space? _ - , etc.
E.g., can I have File-Name.htm, File_Name.htm File Name.htm?
Obviously, this needs to work with all servers and browsers. (IIRC, the name is limited by the server not the browser, but I could be wrong).

What characters are allowed in filenames for HTML files on servers?
That totally depends on the server. HTTP itself allows any character at all, including control characters and non-ASCII characters, as long as they are suitably %-encoded when requested in a URL.
On a Unix server you cannot use ‘/’ or the zero byte. (If you could use them, they'd appear in the URL as ‘%2F’ and ‘%00’ respectively.) You also can't have the specific filenames ‘.’ or ‘..’, or the empty string.
On a Windows server you have all the limitations of a Unix server, plus you also can't use any of \/:*?"<>| or control characters 1-31 and you can't have leading or trailing dot or spaces, and you'll have difficulty using any of the legacy device filenames (CON, PRN, COM1 and many more).
This is nothing to do with HTTP; just how filenames work on Windows, which is complicated.
can I have File-Name.htm, File_Name.htm File Name.htm?
Certainly. But in the last case you should link to it by URL-encoding the space:
thingy
Browsers will usually let you get away with leaving the space in, but it's not really valid. If you want to avoid having to think about URL-escaping, HTML-escaping and case-sensitive issues, stick to a–z, 0–9 and underscore.

If you don't want your filenames to be encoded by the server, you should avoid reserved characters: $&+,/:;=?# and unsafe characters: space, quotation marks, <>#%{}|\^~[]`
But as the previous answers stated, the web servers should cope with whatever you want to use by encoding the chars.

Be sure to eliminate
* . " / \ [ ] : ; | = ,
which are never allowed, due to inconsistencies in file naming conventions standard practice is to use a-z and 0-9 and the underscore character. Space is needful for most users but if you can get away from using it there are parsing issues that improve reliability, you can read rfc's on mime ( multi-part internet mail extensions ) to get a taste of what is involved.
No matter what you do, something somewhere is likely to make life difficult - so much so that I now use cryptographic methods to generate random a-z lowercase strings and use those as filenames, embedding the useful info in the file source code.
Avoid the ampersand at any cost, ...

I would say a good rule of thumb for filenames for HTML files on ALL servers can be any combination of alphabet (lowercase preferred) and number characters (1 though 9), plus the underline(_), minus(-) or plus(+) characters but no spaces. Also, end the filename with dot html (e.g. filename.html). I personally avoid using underline and plus characters.

There isn't such a thing as an html filename.
Certain characters have to be encoded in html (eg if used in links) but the allowed characters in the document names will depend on the web server (and possibly the file system on the server).

Any file name will be URL-encoded so you should be fine. And for the record all three of your file names would work just fine.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008