Is there a Python module to read avro files with pyarrow? - pyarrow

I know there is pyarrow.parquet for reading parquet files as arrow table but i'm looking for the equivalent for avro?

Not yet there are plans for one, the first step is the c++ implementation which hasn't been started yet.

Related

How to format JSON code quickly on Linux OS?

I am having a very large JSON file, but it's poorly formatted so that all the chars are in one line only, no \n anywhere, which makes it difficult to read and understand. And since it's more than one kilobyte file, editing it manually is out of the question.
I am looking for a command or some other way to format a JSON file quickly, for human readability, and save it into the same file or another one. Ideally I shouldn't have to setup too many things. I know that some IDEs include automatic formatting features, however installing another IDE onto my work computer is not an option for me.
The JSON is in a file? You could do this:
python -m json.tool my_json.json
Built into python from version 2.5? 2.6? And it will pretty-ify the json for you.

Flink read binary ION records from Kinesis

I had a Kinesis stream containing binary ION records, and I needed to read that stream in Flink.
My solution, 2 years ago, was to write a Base64 SerDe (just about 20 lines of Java code) and use that for the KinesisConsumer.
Now I have the same requirements, but need to use PyFlink.
I guess I could create the same Java file, compile and package it in a jar, add that as a dependency for pyflink, and write a Python wrapper...
That sounds like a lot of effort for a task seemed simple like that.
Is there any simpler way? Like some config in the Kinesis Java SDK which does the base64 encode before yielding the record to the SDK user (Flink), which is same as what the aws cli is doing. Or, even simpler, to have that Java SDK convert an ION record from binary to text mode.
Thanks!

CSV to JSON benchmarks

I'm working on a project that uses parallel methods to convert text from one form to another. We're going to implement a CSV to JSON converter to demonstrate the speedups that are possible using our parallel framework.
We want to benchmark our converter once it's finished. What are the fastest libraries/stand-alone programs/etc out there that are capable of doing CSV-JSON conversion? I found a list of potential candidates here:Large CSV to JSON/Object in Node.js, but I'm not sure how fast the listed options are. In the worst case I'll benchmark them myself, but if someone already knows what the "best in class" converters are it'd save me some time.
Looks like the maintainer of csvtojson has developed a benchmark application. I think I can add my csv to json converter to his benchmark project to test my converter.
if your project can consider in-browser apps, I suggest csvtojson as it is by far the speediest converter on the market as of 2017.
I created it myself so I may be a bit biaised, but I specifically developed it for a bigger project that required big csv to json crunching.
Tell me if it served.

convert multiple files from .LWO to .OBJ or similar

I need to convert many files from .lwo format to .obj or .stl. I have too many to convert "by hand", meaning I don't want to use online tools or import/export the files one by one in Blender or similar.
So I'm trying to do so with a program that would load up each file, convert, then save a new stl . The files are numbered "file000001", "file000002", etc. to make importing easier.
Is there any program out there that will do this? If not, how would I go about accomplishing my goal?
As far as languages go, I am most effective with Processing/Java. I found this which might be similar but doesn't relate to LWOs.
Thanks for any help.
I just found assimp which has a command line tool to convert different file types. Thanks everyone who answered!
I'm sure you can find a few editors that import .lwo and export .obj
For example, Wings3D does that and free/opensource/lightweight.
Wings is scriptable using erlang.
Blender has LWO importer too, but it's not enabled by default. you need to go to Preferences > Addons and enable it there:
Blender has a Python API which should be easy to pickup.
This would allow you to write a script that does a batch conversion (reads a directory, traverses files, imports .lwo, transforms (scales/rotates if needed), exports .obj)
Perhaps if you search enough maybe there is a 3d file format batch converter already out there and .lwo/.obj are old enough formats so might be likely to be supported.
If you want to implement something from scratch, you need to look into each file format (e.g. lightwave object, obj ) to be able to parse and export.
Hopefully there's a java library that for you. I'd start with a 3D java game engine. For example here's a java .LWO importer found via JMonkey.

Reading HDF5 file that was written in java by octave

I'm writing a framework to write HDF5 files that are compatible with Octave.
That is, I want that my framework will be able to read HDF5 files that were written by Octave and in a way that Octave will be able to read HDF5 files written by my framework.
I'm using HDF-JAVA, to read and write HDF5 files.
The problem is that Octave cannot read HDF files that I write in java.
When I try to read such file, I get an error:
d=load('check.h5')
error: value on right hand side of assignment is undefined
From the documentation for load in Octave-Forge:
HDF5 load and save are not available, as this Octave executable was not linked with the HDF5 library.
Is this the problem you are trying to solve with your framework? Or is it the problem that is preventing you from implementing your framework?
That is not the problem. If I create an HDF file that contains only datasets the load works.
(The parameter -hdf5 is not mandatory, Octave can recognize the file type - I tried it).
The problem is that I cannot use only datasets because my framework demand usage of groups(for example cell array of matrixes - for that I must use groups - as Ocave does).
If i'm using groups then the problems start - loading of file that contains groups failes.