Why doesn't JSON data support special characters?
If json data includes special characters, etc:\r,/,\b,\t, you must transfer them, but why?
JSON supports all Unicode characters in strings. What do you mean by "transferring"?
Those characters need to be escaped because JSON specification says so. For some characters reasons is simple -- for example, double-quotes need to be escaped because regular double-quote ends String value, so there would be no way to tell end marker for character in content. For linefeeds reason probably was to enforce limitation that no String value spans multiple text lines; and for other control-character to avoid "invisible characters". This is similar to escaping required by XML or CSV; all textual data formats require escaping, or prohibit use of certain characters.
Related
I have the following row in a CSV file that I am ingesting into a Splunk index:
"field1","field2","field3\","field4"
Excel and the default Python CSV reader both correctly parse that as 4 separate fields. Splunk does not. It seems to be treating the backslash as an escape character and interpreting field3","field4 as a single mangled field. It is my understanding that the standard escape character for double quotes inside a quoted CSV field is another double quote, according to RFC-4180:
"If double-quotes are used to enclose fields, then a double-quote appearing inside a field must be escaped by preceding it with another double quote."
Why is Splunk treating the backslash as an escape character, and is there any way to change that configuration via props.conf or any other way? I have set:
INDEXED_EXTRACTIONS = csv
KV_MODE = none
for this sourcetype in props.conf, and it is working fine for rows without backslashes in them.
UPDATE: Yeah so Splunk's CSV parsing is indeed not RFC-4180 compliant, and there's not really any workaround that I could find. In the end I changed the upstream data pipeline to output JSON instead of CSVs for ingestion by Splunk. Now it works fine. Let this be a cautionary tale if anyone stumbles across this question while trying to parse CSVs in Splunk!
I try to import some large csv dataset into neo4j using the neo4j-import tool. Quotation is not used anywhere, and therefore i get errors when parsing using --quote " --quote ' --quote ´ and alike. even choosing very rare unicode chars doesnt help with this multi-gig csv because it also contains arabic letters, math symbols and everything you can imagine.
So: Is there a way to disable the quotation checking completely?
Perhaps it would be useful to have the import tool able to accept character configuration values specifying ASCII codes. If so then you could specify --quote \0 and no character would match. That would also be useful for specifying other special characters in general I'd guess.
You need to make sure the CSV file uses quotation marks, since they allow the tool to reliably determine when strings end.
Any string in your data file might contain the delimiter character (a comma, by default). Even if there were a way to turn off quotation checking, the tool would treat every delimiter character as the end of a field. Therefore, any string field that happened to contain the delimiter character would be terminated prematurely, causing errors.
There are html equivalents for ">" and "<" ("<" and ">") in the OBX-5 field which is causing the Terser.get(..) method to only fetch the characters up to the ampersand character. The encoding characters in MSH-2 are "^~\&". Is the terser.get(..) failing because there's an encoding character in the OBX-5 field? Is there a way to change these characters to ">" and "<" easily?
Thanks a lot for your help.
Yes, it fails because the ampersand has been declared as subcomponent separator and the message you are trying to process is not valid -- it should not contain (unescaped) html character entities (< and >).
If you cannot help how the incoming messages are encoded you should preprocess the message before giving it to terser, replacing illegal characters. I'm pretty sure HAPI cannot help you there.
In a valid HL7v2 message, the data type used in OBX-5 is determined by OBX-2. OBX-5 should only contain the characters and escape sequences allowed by declared data type. < and > are among them (if not declared as separators in MSH-2).
HL7 standtard defines escape sequences for the separator and delimiter characters (e.g. \T\ is the escape sequence for subcomponent separator).
My application breaks because some strings that are given as an argument in a url for an httpservice-request contain special characters such as é. Is their a way to convert them to their normal variant (in this case e)?
There is no function that would do it automaticaly for you. You'll have to replace everything one special character at a time.
You could use escape and unescape to safely post with your service.
I never understood this.
Wikipdedia has the info you want
Fields with embedded commas must be enclosed within double-quote characters.
For more than you ever want to know about CSV: RfC4180 - Common Format and MIME Type for Comma-Separated Values (CSV) Files.
This article is quite complete.
Fields that contain a special
character (comma, newline, or double
quote), must be enclosed in double
quotes.
There is no real standard for what people tell csv files.
Mirosoft refers to csv as character separated values.
This is done because dependent on the decimal character the separated character is changed.
German: 1,2
English: 1.2
But I aggree that most times " of ' is used to enclose text elements.
But either all strings are enclosed in " or none.
CSV is very far from standardised. The nearest approach to it is this RFC, which explains how commas and other special characters should be handled.
You have 2 options:
Quote the field (use " character e.g. "..., ...")
Change your delimiter from comma "," to something else (e.g. ";")
See this webpage with an introduction and overview of CSV (comman separated values) format for more