Matlab jsonencode not encoding all of the structure - json

I need to generate a JSON file from a Matlab structure, which is quite big.
Using this code produce the desired json, with the correct structure:
fid=fopen(strcat(date, '_', 'EV.json'),'w');
fprintf(fid, jsonencode(S));
But when I open the generated JSON, I notice that a big part of it is missing, it only encoded the first part of all my data. Does anybody know what could this be caused by? I was thinking that maybe a memory allocation problem.
If I do:
encodedJSON = jsonencode(S);
encodedJSON(end)
it returns '}', meaning the last character of the variable encodedJSON is a closing bracket, but when I open the file I generated, it ends abruptly in the middle of my data like on this picture, which shows the very last part of my generated json file:
It also weighs 68KB while other smaller JSONS generated with the same code weigh almost twice.

Related

Unreadable Text in JSON

I've been working a bit with some files from Minecraft Dungeons, which were extracted using QuickBMS and made available here: https://minecraft.fandom.com/wiki/Minecraft_Wiki:Minecraft_Dungeons_game_files
In the "data" folder, there are a bunch of json-files, which I believe contain a list of textures associated with any given stage of the game. There is, however, a problem. When opened, it reads like any json-file, it has a bunch of names and values, but some of the values are not human-readable, they instead show up as a string of seemingly unrelated characters. Here an example:
"walkable-plane" : "eNpjYSEOMIMAOp+ZmQmND1fEjF2AiQldAJsWDEPRXUKkowkDAM/qA6o=",
Now, given that these are exclusively characters, and not error signs or something of the sorts, I'm assuming this is an encoding issue. Of course, I don't know for sure, Or I wouldn't be asking this in the first place, but the file as it appears in the text is UTF-8, and it obviously doesn't produce a usable result. So, if anyone knows what exactly this is, and how I could extract information from it, I'd be really thankful.

Writing "latin1" encoded byte array with escape characters

I'm working on a database import/export process in VB.NET which writes data from a MySQL (5.5) database to a plain text file. The application reads the data to a DataTable, then goes through the rows/columns to actually write the data to the OutputFile (System.IO.StreamWriter object). The encoding on the tables in this database is Latin1. There is a MediumBlob field in one of the tables I've been using for testing which contains image files stored as a byte array.
In my attempts to validate the output from my application, I've exported the data directly from the database using the MySQL Workbench, then compared that with the results I get when I write the same data from my application. In the direct export from MySQL Workbench, I see some of these bytes are exported with the backslash. When I read the data through my application, however, this escape character does not appear. Viewed through Notepad++, it clearly shows some distinct differences between the two output results (see screenshot).
Obviously, while apparently very similar, the two are not completely identical. My application is not including the backslashes for escaped characters, and some characters such as NULL are coming out differently altogether. My code for writing this field to the file is:
OutputFile.Write("'" & System.Text.Encoding.GetEncoding(28591).GetString(CType(COPYRow(ColumnIndex), Byte())) & "'")
There doesn't appear to be an overload for the GetString method that allows me to specify an escape character, so I'm wondering if there's another way that, using this method, I can ensure the characters are correctly encoded, including escape characters.
I'm "assuming" that this method should also work in general when I start working with my PostgreSQL database, but with possibly a different encoding. I'm trying to build things as "generic" as possible, but I'll have to worry about specifying encodings at run-time instead of hard-coding them later.
EDIT
I just ran across another SO question, which might point me in the right direction: Convert a Unicode string to an escaped ASCII string. Obviously, it might take a bit more work to get it right, but this looks like the closest thing to what I'm trying to accomplish.

Ruby puts item. to_json sometimes prints the string in quotes with escapes

I have a program that can deliver its reports to a variety of destinations including flat files, elasticsearch DB and our call management system.
I have a set to Dest classes which attend to the details of getting the data into the right format for the intended destination. Not surprisingly, the ES one uses a puts.item.to_json to deliver its output (where item is a hash)
The problem is that some hashes are printed as "{\"key\":\"value\"}" and others print 'correctly' i.e without the ". I know that the output from to_json has the escapes in the output so that it produces the expected result with puts but in my case this works just some of the time.
Coping with the two types of behaviour is painful!
Any ideas what is going on?

Spark - load numbers from a CSV file with non-US number format

I have a CSV file which I want to convert to Parquet for futher processing. Using
sqlContext.read()
.format("com.databricks.spark.csv")
.schema(schema)
.option("delimiter",";")
.(other options...)
.load(...)
.write()
.parquet(...)
works fine when my schema contains only Strings. However, some of the fields are numbers that I'd like to be able to store as numbers.
The problem is that the file arrives not as an actual "csv" but semicolon delimited file, and the numbers are formatted with German notation, i.e. comma is used as decimal delimiter.
For example, what in US would be 123.01 in this file would be stored as 123,01
Is there a way to force reading the numbers in different Locale or some other workaround that would allow me to convert this file without first converting the CSV file to a different format? I looked in Spark code and one nasty thing that seems to be causing issue is in CSVInferSchema.scala line 268 (spark 2.1.0) - the parser enforces US formatting rather than e.g. rely on the Locale set for the JVM, or allowing configuring this somehow.
I thought of using UDT but got nowhere with that - I can't work out how to get it to let me handle the parsing myself (couldn't really find a good example of using UDT...)
Any suggestions on a way of achieving this directly, i.e. on parsing step, or will I be forced to do intermediate conversion and only then convert it into parquet?
For anybody else who might be looking for answer - the workaround I went with (in Java) for now is:
JavaRDD<Row> convertedRDD = sqlContext.read()
.format("com.databricks.spark.csv")
.schema(stringOnlySchema)
.option("delimiter",";")
.(other options...)
.load(...)
.javaRDD()
.map ( this::conversionFunction );
sqlContext.createDataFrame(convertedRDD, schemaWithNumbers).write().parquet(...);
The conversion function takes a Row and needs to return a new Row with fields converted to numerical values as appropriate (or, in fact, this could perform any conversion). Rows in Java can be created by RowFactory.create(newFields).
I'd be happy to hear any other suggestions how to approach this but for now this works. :)

How to read a file and write to other file in tcl with replacing values

I have three files: Conf.txt, Temp1.txt and Temp2.txt. I have done regex to fetch some values from config.txt file. I want to place the values (Which are of same name in Temp1.txt and Temp2.txt) and create another two file say Temp1_new.txt and Temp2_new.txt.
For example: In config.txt I have a value say IP1 and the same name appears in Temp1.txt and Temp2.txt. I want to create files Temp1_new.txt and Temp2_new.txt replacing IP1 to say 192.X.X.X in Temp1.txt and Temp2.txt.
I appreciate if someone can help me with tcl code to do same.
Judging from the information provided, there basically are two ways to do what you want:
File-semantics-aware;
Brute-force.
The first way is to read the source file, parse it to produce certain structured in-memory representation of its content, then serialize this content to the new file after replacing the relevant value(s) in the produced representation.
Brute-force method means treating the contents of the source file as plain text (or a series of text strings) and running something like regsub or string replace on this text to produce the new text which you then save to the new file.
The first way should generally be favoured, especially for complex cases as it removes any chance of replacing irrelevant bits of text. The brute-force way me be simpler to code (if there's no handy library to do this, see below) and is therefore good for throw-away scripts.
Note that for certain file formats there are ready-made libraries which can be used to automate what you need. For instance, XSLT facilities of the tdom package can be used to to manipulate XML files, INI-style file can be modified using the appropriate library and so on.