Create fixed width file with different line definitions - multiline

The batch we're developing requires a fixed width input file with different line definitions for each entity.
Eg. the first line defines a family, the second and third line define family members of that family. The family record has different column definitions than the family member records.
I would like to create test cases in something human-readable (even if it has to be Excel) and then convert it to the input file format.
Any suggestions on how this can be done without too much hassle?

XML and XSLT is probably going to be the best options.
You should try that and post question updates if you encounter problems.

Related

Unreadable Text in JSON

I've been working a bit with some files from Minecraft Dungeons, which were extracted using QuickBMS and made available here: https://minecraft.fandom.com/wiki/Minecraft_Wiki:Minecraft_Dungeons_game_files
In the "data" folder, there are a bunch of json-files, which I believe contain a list of textures associated with any given stage of the game. There is, however, a problem. When opened, it reads like any json-file, it has a bunch of names and values, but some of the values are not human-readable, they instead show up as a string of seemingly unrelated characters. Here an example:
"walkable-plane" : "eNpjYSEOMIMAOp+ZmQmND1fEjF2AiQldAJsWDEPRXUKkowkDAM/qA6o=",
Now, given that these are exclusively characters, and not error signs or something of the sorts, I'm assuming this is an encoding issue. Of course, I don't know for sure, Or I wouldn't be asking this in the first place, but the file as it appears in the text is UTF-8, and it obviously doesn't produce a usable result. So, if anyone knows what exactly this is, and how I could extract information from it, I'd be really thankful.

Importing massive dataset in Neo4j where each entity has differing properties

I'm trying to bulk load a massive dataset into a single Neo4j instance. Each node will represent a general Entity which will have specific properties, e.g.:
label
description
date
In addition to these there are zero or more properties specific to the Entity type, so for example if the Entity is a Book, the properties will look something like this:
label
description
date
author
first published
...
And if the Entity is a Car the properties will look something like this:
label
description
date
make
model
...
I first attempted to import the dataset by streaming each Entity from the filesystem and using Cypher to insert each node (some 200M entities and 400M relationships). This was far too slow (as I had expected but worth a try).
I've therefore made use of the bulk import tool neo4j-admin import which works over a CSV file which has specified headers for each property. The problem I'm having is that I don't see a way to add the additional properties specific to each Entity. The only solution I can think of is to include a CSV column for every possible property expressed across the set of entities, however I believe I will end up with a bunch of redundant properties on all my entities.
EDIT1
Each Entity is unique, so there will be some 1M+ types (labels in Neo4j)
Any suggestions on how to accomplish this would be appreciated.
The import command of neo4j-admin supports importing from multiple node and relationship files.
Therefore, to support multiple "types" of nodes (called labels in neo4j), you can split your original CSV file into separate files, one for each Entity "type". Each file can then have data columns specific to that type.
[UPDATED]
Here is one way to support the import of nodes having arbitrary schemata from a CSV file.
The CSV file should not have a header.
Every property on a CSV line should be represented by an adjacent pair of values: 1 for the property name, and 1 for the property value.
With such a CSV file, this code (which takes advantage of the APOC function apoc.map.fromValues) should work:
LOAD CSV FROM "file:///mydata.csv" AS line
CREATE (e:Entity)
SET e = apoc.map.fromValues(line);
NOTE: the above code would use strings for all values. If you want some property values to be integers, booleans, etc., then you can do something like this instead (but this is probably only sensible if the same property occurs frequently; if the property does not exist on a line no property will be created in the node, but it will waste some time):
LOAD CSV FROM "file:///mydata.csv" AS line
WITH apoc.map.fromValues(line) AS data
WITH apoc.map.setKey(data, 'foo', TOINTEGER(data.foo)) AS data
CREATE (e:Entity)
SET e = apoc.map.fromValues(line);

Editing JSON - Add Attribute

I have a slew of JSON files I'm getting dumps of, with data from the day/period it was pulled. Most of the JSON files I'm dealing with are a lot larger than this, but I figured a smaller one would be easier to work with.
{"playlists":[{"uri":"spotify:user:11130196075:playlist:1Ov4b3NkyzIMwfY9E8ixpE","listeners":366,"streams":386,"dateAdded":"2016-02-24","newListeners":327,"title":"#Covers","owner":"Saga Prommeedet"},{"uri":"spotify:user:mickeyrose30:playlist:2Ov4b3NkyzIMwfY9E8ixpE","listeners":229,"streams":263,"dateAdded":"removed","newListeners":154,"title":"bestcovers2016","owner":"Mickey Rose"}],"top":2,"total":53820}
What I'm essentially trying to do is add a date attribute to each line of data, so that when I combine multiple JSON files to put through an analytical tool, the right row of data is associated with the correct date. My first thought was to write it as such:
{"playlists":[{"uri":"spotify:user:11130196075:playlist:1Ov4b3NkyzIMwfY9E8ixpE","listeners":366,"streams":386,"dateAdded":"2016-02-24","newListeners":327,"title":"#Covers","owner":"Saga Prommeedet"},{"uri":"spotify:user:mickeyrose30:playlist:2Ov4b3NkyzIMwfY9E8ixpE","listeners":229,"streams":263,"dateAdded":"removed","newListeners":154,"title":"bestcovers2016","owner":"Mickey Rose"}],"top":2,"total":53820,"date":072617}
since the "top" and "total" attributes are showing up on each row of data (with the associated values also showing up on each row) when I put it through an analytical tool like Tableau.
Also, have been editing and saving files through Brackets, and testing things through this converter (https://konklone.io/json/)
In javascript language
var m = JSON.parse(json_string);
m["date"]="20170804";
JSON.stringify(m);
This will work for you, very simple,

Using Boomi how can I create a flat file profile which has "line" data on the same physical line in the flat file as the "header" data?

I've got parent & child data which I am trying to convert into a flat file using Dell Boomi. The flat file structure is column-based and needs a structure where the lines data is on the same line of the file as the header data.
For instance, a header record which has 4 line items needs to generate a file with a structure of:
[header][line][line][line][line]
Currently what I have been able to generate is either
[header][line]
[header][line]
[header][line]
[header][line]
or
[header]
[line]
[line]
[line]
[line]
I think using the results of the second profile and then using a data processing shape to strip [\r][\n] might be my best option but wanted to check before implementing it.
I created a user define function for each field of data. For this example, lets just speak for "FirstName."
I used a branch immediately-Branch 1 Split the documents, flow controlled them one at a time, and entered the map, then stopped. Branch two contained a message where I built the new file. I typed a static value with the header name, then used a dynamic process property as the parameter next to the header.
User defined function for "FirstName" accepts the first name as input, appends the dynamic process property (need to define a dynamic property for each field), prepends your delimiter of choice, then sets the same dynamic process property.
This was all done with NO scripting at all. I hope this helps. I can provide screenshots if you need more clarification.

How to read a file and write to other file in tcl with replacing values

I have three files: Conf.txt, Temp1.txt and Temp2.txt. I have done regex to fetch some values from config.txt file. I want to place the values (Which are of same name in Temp1.txt and Temp2.txt) and create another two file say Temp1_new.txt and Temp2_new.txt.
For example: In config.txt I have a value say IP1 and the same name appears in Temp1.txt and Temp2.txt. I want to create files Temp1_new.txt and Temp2_new.txt replacing IP1 to say 192.X.X.X in Temp1.txt and Temp2.txt.
I appreciate if someone can help me with tcl code to do same.
Judging from the information provided, there basically are two ways to do what you want:
File-semantics-aware;
Brute-force.
The first way is to read the source file, parse it to produce certain structured in-memory representation of its content, then serialize this content to the new file after replacing the relevant value(s) in the produced representation.
Brute-force method means treating the contents of the source file as plain text (or a series of text strings) and running something like regsub or string replace on this text to produce the new text which you then save to the new file.
The first way should generally be favoured, especially for complex cases as it removes any chance of replacing irrelevant bits of text. The brute-force way me be simpler to code (if there's no handy library to do this, see below) and is therefore good for throw-away scripts.
Note that for certain file formats there are ready-made libraries which can be used to automate what you need. For instance, XSLT facilities of the tdom package can be used to to manipulate XML files, INI-style file can be modified using the appropriate library and so on.