Is it possible to make a configuration file holding element inputs? - html

Can I take a configuration file, like YAML, and use it to fill in element text?
Example YAML:
YOUR_NAME: "your_name"
Output HTML:
<h1>{YOUR_NAME}</h1>

Yes that is certainly possible and easily doable. I use that within the pyramid framework, by parsing the YAML file and use the resulting mapping/dict to update the dictionary handed to the template engine.
#view_config(route_name='search', renderer='templates/search.pt')
def my_search(self):
res = dict(YOUR_NAME="----- not set -----")
with open('your_file.yaml') as fp:
res.update(ruamel.yaml.safe_load(fp))
return res
In the template use <h1>${YOUR_NAME}</h1> (the dollar is needed for the chameleon template engine) or <h1>${structure:YOUR_NAME}</h1> in case the values should be unescaped.
Note that it is not necessary to quote "your_name" in YAML, not even if it had spaces:
YOUR_NAME: first_name last_name
(names normally would not contain characters or character sequences for which you would need quoting of the scalar value)

Related

Spark reading multiple files : double quotes replaced by %22

I have requirements to read random json files in different folders where data has changed. So I can't apply regex to read pattern . I know which are those files and I could list them .But when I form string with all the file path and try reading json in spark. The double quotes are replaced by %22 and reading files via spark fails. Could any one please help ?
val FilePath = "\"/path/2019/02/01/*\"" + ","+ "\"path/2019/02/05/*\"" + "\"path/2019/02/24/*\""
FilePath:String = "path/2019/02/20/*","path/2019/02/05/*","path/2019/02/24/*"
Now when I use this variable to read josn files, it fails with error and quotes are replaced by %22.
spark.read.json(FilePath)
java.lang.IllegalArgumentException: java.net.URISyntaxException: Illegal character in scheme name at index 0: "/path/2019/02/01/*%22,%22/path/2019/02/05/*%22,%22/path/2019/02/24/*%22
I've just tried this with an older version of Spark (1.6.0) and it works fine if you supply separate paths or wildcard patterns as varargs to the json method, i.e.:
sqlContext.read.json("foo/*", "bar/*")
When you pass multiple patterns in a single string, Spark is trying to construct a single URI from them, which is incorrect, and it will try to URL-encode the quotes characters as %22.
As an aside, trying to create a URI is failing because your string starts with a double-quote, which is an illegal character in that position (RFC 3986):
Scheme names consist of a sequence of characters beginning with a
letter and followed by any combination of letters, digits, plus
("+"), period ("."), or hyphen ("-").
Add the paths in a list(ex: pathList) and use as below
spark.read.option("basePath", basePath).json(pathList: _*)

How to get values from JSON file using AppleScript?

In reference to this question,
How to download and get values from JSON file using VBScript or batch file?
how to get the values from JSON file that looks like this,
["AA-BB-CC-MAKE-SAME.json","SS-ED-SIXSIX-TENSE.json","FF-EE-EE-EE-WW.json","ZS-WE-AS-FOUR-MINE.json","DD-RF-LATERS-LATER.json","FG-ER-DC-ED-FG.json"]
using AppleScript in MAC OS?
Here is part of VBScript code in Windows provided by Hackoo,
strJson = http.responseText
Result = Extract(strJson,"(\x22(.*)\x22)")
Arr = Split(Result,",")
For each Item in Arr
wscript.echo Item
Next
'******************************************
Function Extract(Data,Pattern)
Dim oRE,oMatches,Match,Line
set oRE = New RegExp
oRE.IgnoreCase = True
oRE.Global = True
oRE.Pattern = Pattern
set oMatches = oRE.Execute(Data)
If not isEmpty(oMatches) then
For Each Match in oMatches
Line = Line & Trim(Match.Value) & vbCrlf
Next
Extract = Line
End if
End Function
'******************************************
In MAC OS AppleScript I only need the code to get the values of the JSON file to a single array of string values. The above shown example above the VBScript is the how JSON file contents looks like.
Short answer: Unfortunately, AppleScript doesn't provide a built-in feature to parse JSON which is analogous to JavaScript's JSON.parse() method.
Below are a couple of solutions:
Solution 1: Requires a third party plug-in to be installed, which may not always be feasible.
Solution 2: Does not require any third party plug-in to be installed, and instead utilizes tools/features built-in to macOS as standard.
Solution 1:
If you have the luxury of being able to install a third-party plugin on your users systems then you can install JSON Helper for AppleScript (As suggested by #user3439894 in the comments).
Then use it in your AppleScript as follows:
set srcJson to read POSIX file (POSIX path of (path to home folder) & "Desktop/foobar.json")
tell application "JSON Helper" to set myList to read JSON from srcJson
Explanation:
On line 1 we read the contents of the .json file and assign it to the variable named srcJson.
Note You'll need to change the path part (i.e. Desktop/foobar.json) as necessary.
On line 2 we parse the contents using the JSON Helper plug-in. This assigns each item of the source JSON Array to a new AppleScript list. The resultant AppleScript list is assigned to a variable named myList.
Solution 2:
By utilizing tools built-in to macOS as standard, you can also do the following via AppleScript. This assumes that your JSON file is valid and contains a single Array only:
set TID to AppleScript's text item delimiters
set AppleScript's text item delimiters to ","
set myList to text items of (do shell script "tr ''\\\\n\\\\r'' ' ' <~/Desktop/foobar.json | sed 's/^ *\\[ *\"//; s/ *\" *\\] *$//; s/\" *, *\"/,/g;'")
set AppleScript's text item delimiters to TID
Note: you'll need to change the path part (i.e. ~/Desktop/foobar.json) as necessary.
Also, if your .json filename includes a space(s) you'll need to escape them with \\. For instance ~/Desktop/foo\\ bar.json
Explanation:
On line 1 AppleScript's current text item delimiters are assigned to a variable named TID.
On line 2 AppleScript's text item delimiters are set to a comma - this will help when extracting each individual value from the source JSON Array and assigning each value to a new AppleScript list.
On line 3 a shell script is executed via the do shell script command, which performs the following:
Reads the content of the source .json file via the part which reads ~/Desktop/foobar.json. This path currently assumes the file is named foobar.json and resides in your Desktop folder (You'll need to change this path to wherever your actual file exists).
The content of foobar.json is redirected, (note the < before the filepath), to tr (i.e. the part which reads: tr ''\\\\n\\\\r'' ' '). This translation will replace any newline characters which may exists in the contents of the source .json Array with space characters. This ensures the contents of foobar.json is transformed to one line.
Note: A JSON Array can contain newlines between each item and still be valid, so although the example JSON given in your question appears on one line - it is not a requirement of this solution as it will handle multi-line too.
The one line of text is then piped to sed's s command for further processing (i.e. the part which reads: | sed 's/^ *\\[ *\"//; s/ *\" *\\] *$//; s/\" *, *\"/,/g;').
The syntax of the s command is 's/regexp/replacement/flags'.
Let's breakdown each s command to further understand what is happening:
s/^ *\\[ *\"// removes the opening square bracket [, which may be preceded or followed by zero or more space characters, and the following double quote (i.e. the first occurrence) from the beginning of the string.
s/ *\" *\\] *$// removes the closing square bracket ], which may be preceded or followed by zero or more space characters, and the preceding double quote (i.e. the last occurrence) from the end of the string.
s/\" *, *\"/,/g replaces single commas, (which may be preceded with zero or more spaces, and/or followed by zero or more spaces) with a single comma.
The initial part on line 3 which reads; set myList to text items of ... utilizes text items to read the String into an AppleScript list using commas as delimiters to determine each item of the list. The resultant Array is assigned to a variable named myList.
On line 4 AppleScript's text item delimiters are restored to their original value.
Utilizing a variable for the source JSON filepath.
If you want to utilize a variable for the filepath to the source .json file then you can do something like this instead:
set srcFilePath to quoted form of (POSIX path of (path to home folder) & "Desktop/foobar.json")
set TID to AppleScript's text item delimiters
set AppleScript's text item delimiters to ","
set myList to text items of (do shell script "tr ''\\\\n\\\\r'' ' ' <" & srcFilePath & " | sed 's/^ *\\[ *\"//; s/ *\" *\\] *$//; s/\" *, *\"/,/g;'")
set AppleScript's text item delimiters to TID
Note This is very much the same as the first example. The notable differences are:
On the first line we assign the filepath to a variable named srcFilePath.
In the do shell script we reference the srcFilePath variable.
Additional note regarding JSON escaped special characters: Solution 2 preserves any JSON escaped special characters which may be present in the values of source JSON array. However, Solution 1 will interpret them.
Caveats Solution 2 produces unexpected results when an item in the source JSON array includes a comma because a comma is used as a text item delimiters.
How to get the values from JSON file that looks like this,
["AA-BB-CC-MAKE-SAME.json","SS-ED-SIXSIX-TENSE.json","FF-EE-EE-EE-WW.json","ZS-WE-AS-FOUR-MINE.json","DD-RF-LATERS-LATER.json","FG-ER-DC-ED-FG.json"]
If you actually mean what you wrote, and that the contents of the JSON file is that list of six strings in a single array, formatted on a single line, the simplest way is to treat it as text, trim the opening and closing square brackets, then delimit its fields at every occurrence of a ,. Finally, each individual text item can have the surrounding quotes trimmed as well.
Examining the VBScript, it looks like it uses a very similar process, albeit with regular expressions, which AppleScript doesn't feature but which aren't especially necessary in this simple situation.
Let's assume that the JSON array above is stored in a file on your desktop called "myfile.json". Then:
set home to the path to home folder
set f to the POSIX path of home & "Desktop/myfile.json"
set JSONstr to read POSIX file f
# Trim square brackets
set JSONstr to text 2 thru -2 of JSONstr
# Delimit text fields using comma
set the text item delimiters to ","
set Arr to the text items of JSONstr
# Trim quotes of each item in Arr
repeat with a in Arr
set contents of a to text 2 thru -2 of a
end repeat
# The final array
Arr
I only need the code to get the values of the JSON file to a single array of string values. The above shown example above the VBScript is the how JSON file contents looks like.
The variable Arr now contains the array (referred to as lists in AppleScript) of string values. You can access a particular item in it like this:
item 2 of Arr --> "SS-ED-SIXSIX-TENSE.json"
A More General Solution
I've decided to include a more advanced way to handle JSON in an AppleScript, partly because I've been doing a lot of JSON processing quite recently and this is all fresh on my event horizon; but also to demonstrate that, using AppleScriptObjC, parsing even very complex JSON data is not only possible, but quite simple.
I don't think you'll need it in this specific case, but it could come in useful for some future situation.
The script has three sections: it starts off importing the relevant Objective-C framework that gives AppleScript additional powers; then, I define the actual handler itself, called JSONtoRecord, which I describe below. Lastly, comes the bottom of the script where you can enter your code and do whatever you like with it:
use framework "Foundation"
use scripting additions
--------------------------------------------------------------------------------
property ca : a reference to current application
property NSData : a reference to ca's NSData
property NSDictionary : a reference to ca's NSDictionary
property NSJSONSerialization : a reference to ca's NSJSONSerialization
property NSString : a reference to ca's NSString
property NSUTF8StringEncoding : a reference to 4
--------------------------------------------------------------------------------
on JSONtoRecord from fp
local fp
set JSONdata to NSData's dataWithContentsOfFile:fp
set [x, E] to (NSJSONSerialization's ¬
JSONObjectWithData:JSONdata ¬
options:0 ¬
|error|:(reference))
if E ≠ missing value then error E
tell x to if its isKindOfClass:NSDictionary then ¬
return it as record
x as list
end JSONtoRecord
--------------------------------------------------------------------------------
###YOUR CODE BELOW HERE
#
#
set home to the path to home folder
set f to the POSIX path of home & "Desktop/myfile.json"
JSONtoRecord from f
--> {"AA-BB-CC-MAKE-SAME.json", "SS-ED-SIXSIX-TENSE.json", ¬
--> "FF-EE-EE-EE-WW.json", "ZS-WE-AS-FOUR-MINE.json", ¬
--> "DD-RF-LATERS-LATER.json", "FG-ER-DC-ED-FG.json"}
At the bottom of the script, I've called the JSONtoRecord handler, passing it the location of myfile.json. One of the benefits of this handler is that it doesn't matter whether the file is formatted all on one line, or over many lines. It can also handle complex, nested JSON arrays.
In those instances, what it returns is a native AppleScript record object, with all the JSON variables stored as property values in the record. Accessing the variables then becomes very simple.
This is actually exactly what the JSON Helper application that a couple of people have already mentioned does under the hood.
The one criterion (other than the JSON file containing valid JSON data) is that the path to the file is a posix path written in full, e.g. /Users/CK/Desktop/myfile.json, and not ~/Desktop/myfile.json or, even worse, Macintosh HD:Users:CK:Desktop:myfile.json.

How to split this csv file into multiple contents?

I have CSV File which having below contents,
Input.csv
Sample NiFi Data demonstration for below
Due dates 20-02-2017,23-03-2017
My Input No1 inside csv,,,,,,
Animals,Today-20.02.2017,Yesterday-19-02.2017
Fox,21,32
Lion,20,12
My Input No2 inside csv,,,,
Name,ID,City
Mahi,12,UK
And,21,US
Prabh,32,LI
I need to split above whole csv(Input.csv) into two parts like InputNo1.csv and InputNo2.csv.
For InputNo1.csv should have below contents only.,
Animals,Today-20.02.2017,Yesterday-19-02.2017
Fox,21,32
Lion,20,12
For InputNo2.csv should have below contents.,
Name,ID,City
Mahi,12,UK
And,21,US
Prabh,32,LI
Is this possible to convert csv into Multiple parts in NiFi possible with existing processors?
Yes.
Use the ReplaceText processor to remove the global header, use SplitContent to split the resulting flowfile into multiple flowfiles, use another ReplaceText to remove the leftover comment string because SplitContent needs a literal byte string, not a regex, and then perform the normal SplitText operations.
Here is a template specific to the input you provided in your question.

Entry delimiter of JSON files for Hive table

We are collecting JSON data (public social media posts in particular) via REST API invocations, which we plan to dump into HDFS, then abstract a Hive table on top it using SerDe. I wonder though what would be the appropriate delimiter per JSON entry in a file? Is it new line ("\n")? So it would look like this:
{ id: entry1 ... post: }
{ id: entry2 ... post: }
...
{ id: entryn ... post: }
How about if we encounter a new line character within the JSON data itself, for example in post?
The best way would be one record per line, separated by "\n" exactly as you guessed.
This also means that you should be careful to escape "\n" that may be inside the JSON elements.
Indented JSON won't work well with hadoop/hive, since to distribute processing, hadoop must be able to tell when a records ends, so it can split processing of a file with N bytes with W workers in W chunks of size roughly N/W.
The splitting is done by the particular InputFormat that's been used, in case of text, TextInputFormat.
TextInputFormat will basically split the file at the first instance of "\n" found after byte i*N/W (for i from 1 to W-1).
For this reason, having other "\n" around would confuse Hadoop and it will give you incomplete records.
As an alternative, I wouldn't recommend it, but if you really wanted you could use a character other than "\n" by configuring the property "textinputformat.record.delimiter" when reading the file through hadoop/hive, using a character that won't be in JSON (for instance, \001 or CTRL-A is commonly used by Hive as a field delimiter) but that can be tricky since it has to also be supported by the SerDe.
Also, if you change the record delimiter, anybody who copies/uses the file on HDFS must be aware of the delimiter, or they won't be able to parse it correctly, and will need special code to do it, while keeping "\n" as a delimiter, the files will still be normal text files and can be used by other tools.
As for the SerDe, I'd recommend this one, with the disclaimer that I wrote it :)
https://github.com/rcongiu/Hive-JSON-Serde

What is the best value for "Unit Separator" in XML?

I used Unit Separator (US/0x1f) in database. When I export to XML 1.0 file, it is not accepted and leave the attribute with empty value.
I have data in database like this:
"option1=10;option2=20;option3=aaa[US]bbb[US]ccc;"
I'm assuming to export to XML 1.0 file like this:
<elementname, attr1="option1=10;option2=20;option3=aaa[US]bbb[US]ccc;"/>
However, the [US] is not accepted by XML 1.0. Any suggestions?
I can replace '\37' (oct 37, hex 1f) with something like "XXX", "$", "(0x1f)"... before writing to XML;
I can replace it when importing from XML and write to database. However, if I replace it with "& # x 1 F ;", which is the HTML Entity for Unit separator, I end up with "& a m p ; # x 1 F ;", which is definitely not what I wanted.
If I manually modify the XML file to "& # x 1 F ;", I can not use MSXML to load it, giving error "Invalid Unicode Character".
Any suggestions?
Thank you
Summary:
Let's make an analogy: Let's think about how the compiler works, there are two phases: "Pre-compile" and "Compile".
For XML File Generation, it acts like the "Compile" phase. E.g. convert "<" to "& l t ;"
However, the Unit Separator is not supported by XML 1.0, so the "Compile" phase will not convert it to HTML Entity "& # x 1 F ;"
So we have to seek solution in the "Pre-Compile" phase, which is our own application's responsibility.
When writing:
Option1: <unit>aaa</unit><unit>bbb</unit>
Option2: simply use "_x241F_" to replace "\37" in the string if "_x241F_" is not conflicting with any existing token in the string.
When reading:
According to Option1: Load the elements, catenate to a single string with "\37" as separator.
According to Option2: simply use "\37" to replace "_x241F_".
I've also found out that MSXML (even the highest version MSXML6.dll) will not load XML 1.1 .
So if we are unfortunately using MSXML, we have to write our own "Pre-Compile" code to handle the Unicode characters before feeding the "Compile" phase.
Note: I borrowed the idea of "_ x 2 4 1 F _" from here.
Thanks for everyone's help
There is no HTML entity for U+001F UNIT SEPARATOR. Besides, HTML entities would be irrelevant when dealing with generic XML.
The character references would be  and , in HTML and in XML, but the character is not allowed in HTML or in XML. For XML 1.0, which this seems to be about, please refer to section 2.2 Characters, where the normative definition is the following production (the associated comment is misleading, and comments are non-normative):
Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] |
[#x10000-#x10FFFF]
The conclusions to be drawn depend on the meaning and purpose of UNIT SEPARATOR in the text. It has no generally defined meaning; it is up to applications to assign a meaning to it and process it accordingly.
Usually UNIT SEPARATOR is used to separate units of some kind, so the natural approach would be to process the incoming data so that instead of such separators, the data, when converted to XML format, has units denoted by markup. So for data like aaa[US]bbb[US]ccc where [US] is UNIT SEPARATOR, you would generate something like <unit>aaa</unit><unit>bbb</unit><unit>ccc</unit>.
This website
http://www.fileformat.info/info/unicode/char/1f/index.htm
suggests one of the following:
HTML Entity (decimal) 
HTML Entity (hex)