Julia: Visualize images saved in csv form - csv

What would be the best way of visualizing images saved in .csv format?
The following doesn't work:
using CSV, ImageView
data = CSV.read("myfile.csv");
imshow(data)
This is the error:
MethodError: no method matching pixelspacing(::DataFrames.DataFrame)
Closest candidates are:
pixelspacing(!Matched::MappedArrays.AbstractMultiMappedArray) at /Users/xxx/.julia/packages/ImageCore/yKxN6/src/traits.jl:63
pixelspacing(!Matched::MappedArrays.AbstractMappedArray) at /Users/xxx/.julia/packages/ImageCore/yKxN6/src/traits.jl:62
pixelspacing(!Matched::OffsetArrays.OffsetArray) at /Users/xxx/.julia/packages/ImageCore/yKxN6/src/traits.jl:67
...
Stacktrace:
[1] imshow(::Any, ::Reactive.Signal{GtkReactive.ZoomRegion{RoundingIntegers.RInt64}}, ::ImageView.SliceData, ::Any; name::Any, aspect::Any) at /Users/xxx/.julia/packages/ImageView/sCn9Q/src/ImageView.jl:269
[2] imshow(::Any; axes::Any, name::Any, aspect::Any) at /Users/xxx.julia/packages/ImageView/sCn9Q/src/ImageView.jl:260
[3] imshow(::Any) at /Users/xxx/.julia/packages/ImageView/sCn9Q/src/ImageView.jl:259
[4] top-level scope at In[5]:2
[5] include_string(::Function, ::Module, ::String, ::String) at ./loading.jl:1091
Reference on github.

This question was answered at https://github.com/JuliaImages/ImageView.jl/issues/241. Copying the answer here:
imshow(Matrix(data))
where data is your DataFrame. But CSV is a poor choice for images; Netbpm if you simply must use text-formatted images, otherwise binary would be recommended. Binary Netpbm are especially easy to write, if you have to write your own (e.g., if the images are coming from some language that doesn't support other file formats), otherwise PNG is typically a good choice.

Does the CSV file have a header line of names for its columns or is it just a delimited file full of text number values?
If the CSV file is actually in the form of a matrix of values, such that the values are the bytes of a 2D image, you may use DelimitedFiles -- see readdlm() docs. Read the file with readdlm() into a matrix and see if ImageView can display the results.

Related

JSON variable indent for different entries

Background: I want to store a dict object in json format that has say, 2 entries:
(1) Some object that describes the data in (2). This is small data mostly definitions, parameters that control, etc. and things (maybe called metadata) that one would like to read before using the actual data in (2). In short, I want good human readability of this portion of the file.
(2) The data itself is a large chunk- should more like machine readable (no need for human to gaze over it on opening the file).
Problem: How to specify some custom indent, say 4 to the (1) and None to the (2). If I use something like json.dump(data, trig_file, indent=4) where data = {'meta_data': small_description, 'actual_data': big_chunk}, meaning the large data will have a lot of whitespace making the file large.
Assuming you can append json to a file:
Write {"meta_data":\n to the file.
Append the json for small_description formatted appropriately to the file.
Append ,\n"actual_data":\n to the file.
Append the json for big_chunk formatted appropriately to the file.
Append \n} to the file.
The idea is to do the json formatting out the "container" object by hand, and using your json formatter as appropriate to each of the contained objects.
Consider a different file format, interleaving keys and values as distinct documents concatenated together within a single file:
{"next_item": "meta_data"}
{
"description": "human-readable content goes here",
"split over": "several lines"
}
{"next_item": "actual_data"}
["big","machine-readable","unformatted","content","here","....."]
That way you can pass any indent parameters you want to each write, and you aren't doing any serialization by hand.
See How do I use the 'json' module to read in one JSON object at a time? for how one would read a file in this format. One of its answers wisely suggests the ijson library, which accepts a multiple_values=True argument.

Can't load sparse matrix correctly into Octave

I have the task to load symmetric positive define sparse matrices from The University of Florida Sparse Matrix Collection into GNU Octave. I need to study different ordering algorithms, like symamd but I can't use them since the matrices are not stored as squared
I have chosen for example bcsstk17.
I've tried different load methods with the .mat files:
load -mat bcsstk17
load -mat-binary bcsstk17
load -6 bcsstk17
load -v6 bcsstk17
load -7 bcsstk17
load -mat4-binary bcsstk17
error: load: can't read binary file
load -4 bcsstk17
But none of them worked, since my workspace's variables are empty.
When I load the Matrix Market format mtx, load bcsstk17.mtx I get a 219813x3 matrix.
I've tried the full command but I get the same 219813x3 matrix.
What I am doing wrong?
Not sure why you're trying to load the .mtx file when there's a matlab/octave specific .mat format offered there.
Just download the bcsstk17.mat file, and load it:
load bcsstk17.mat
You will then see on your workspace a variable called Problem which is of type struct. This contains several fields, including a A field which seems to hold your data in the form of a sparse matrix. In other words, your data can be accessed as Problem.A
You shouldn't be bothering with the .mtx file at all. However for completion I will explain what you're seeing when you load it. The .mat file is a binary format. However, a .mtx file seems to be a human-readable format (i.e. it contains normal ASCII text). In particular it seems that it consists of a 'header' containing comments, which start with a % character,
a row which seems to encode the size of the sparse matrix in each dimension,
and then it contains "space-delimited" data, where presumably each row represents an element in the matrix, and the three columns presumably represent the row, the column, and the value of that element.
When matlab comes across an ASCII file containing data (+comments), regardless of the extension, as long as the data seems like a valid 2D array of numbers, it loads the data contents of this file onto a variable with the same name as the file.
Clearly this is not what you want. Not least because the first row will be interpreted as a normal row of data in a Nx3 matrix. In other words, matlab/octave is just loading a standard file it perceives as text-based, and it loads the values it sees inside onto a variable. The extention .mtx here is irrelevant as far as matlab/octave is concerned, and it is most definitely not interpreting or decoding the .mtx file in any way related to the .mtx specification.

Spark - load numbers from a CSV file with non-US number format

I have a CSV file which I want to convert to Parquet for futher processing. Using
sqlContext.read()
.format("com.databricks.spark.csv")
.schema(schema)
.option("delimiter",";")
.(other options...)
.load(...)
.write()
.parquet(...)
works fine when my schema contains only Strings. However, some of the fields are numbers that I'd like to be able to store as numbers.
The problem is that the file arrives not as an actual "csv" but semicolon delimited file, and the numbers are formatted with German notation, i.e. comma is used as decimal delimiter.
For example, what in US would be 123.01 in this file would be stored as 123,01
Is there a way to force reading the numbers in different Locale or some other workaround that would allow me to convert this file without first converting the CSV file to a different format? I looked in Spark code and one nasty thing that seems to be causing issue is in CSVInferSchema.scala line 268 (spark 2.1.0) - the parser enforces US formatting rather than e.g. rely on the Locale set for the JVM, or allowing configuring this somehow.
I thought of using UDT but got nowhere with that - I can't work out how to get it to let me handle the parsing myself (couldn't really find a good example of using UDT...)
Any suggestions on a way of achieving this directly, i.e. on parsing step, or will I be forced to do intermediate conversion and only then convert it into parquet?
For anybody else who might be looking for answer - the workaround I went with (in Java) for now is:
JavaRDD<Row> convertedRDD = sqlContext.read()
.format("com.databricks.spark.csv")
.schema(stringOnlySchema)
.option("delimiter",";")
.(other options...)
.load(...)
.javaRDD()
.map ( this::conversionFunction );
sqlContext.createDataFrame(convertedRDD, schemaWithNumbers).write().parquet(...);
The conversion function takes a Row and needs to return a new Row with fields converted to numerical values as appropriate (or, in fact, this could perform any conversion). Rows in Java can be created by RowFactory.create(newFields).
I'd be happy to hear any other suggestions how to approach this but for now this works. :)

How to convert multiple images to csv?

I want to run some images through a neural network, and I want to create a .csv file for the data. How can I create a csv that will represent the images and keep each image separate?
One way to approach is to use numpy to convert image to array, which can then be converted to a CSV file or simply a comma separated list.
The csv data can be manipulated or original image can be retrieved when needed.
Here is a basic code that demonstrates above concept.
import Image
import numpy as np
#Function to convert image to array or list
def loadImage (inFileName, outType ) :
img = Image.open( inFileName )
img.load()
data = np.asarray( img, dtype="int32" )
if outType == "anArray":
return data
if outType == "aList":
return list(data)
#Load image to array
myArray1 = loadImage("bug.png", "anArray")
#Load image to a list
myList1 = loadImage("bug.png", "aList")
You can encode your image in Base64 and still use CSV, since commas are not part of characters in Base64.
See: Best way to separate two base64 strings
If possible, create a storage location just for images. If your images have unique filenames, then all you need to track is the filename. If they do not have a unique filename, you can assign one using a timestamp+randomizer function to name the photo. Once named, it must be stored in the proper location so that all you need is the filename in order to reference the appropriate image.
Due to size constraints, I would not recommend storing the actual images in the csv.
Cheers!
I guess that depends a lot on what algorithm and what implementation you select. It is not even clear that CSV is the correct choice.
For your stated requirements, Netpbm format comes to mind; if you want to have one line per image, just squish all the numbers into one line. Note that the naive neural network will ignore the topology of the image, you'd need a bit advanced setup to include it.

Read a Text File into R

I apologize if this has been asked previously, but I haven't been able to find an example online or elsewhere.
I have very dirty data file in a text file (it may be JSON). I want to analyze the data in R, and since I am still new to the language, I want to read in the raw data and manipulate as needed from there.
How would I go about reading in JSON from a text file on my machine? Additionally, if it isn't JSON, how can I read in the raw data as is (not parsed into columns, etc.) so I can go ahead and figure out how to parse it as needed?
Thanks in advance!
Use the rjson package. In particular, look at the fromJSON function in the documentation.
If you want further pointers, then search for rjson at the R Bloggers website.
If you want to use the packages related to JSON in R, there are a number of other posts on SO answering this. I presume you searched on JSON [r] already on this site, plenty of info there.
If you just want to read in the text file line by line and process later on, then you can use either scan() or readLines(). They appear to do the same thing, but there's an important difference between them.
scan() lets you define what kind of objects you want to find, how many, and so on. Read the help file for more info. You can use scan to read in every word/number/sign as element of a vector using eg scan(filename,""). You can also use specific delimiters to separate the data. See also the examples in the help files.
To read line by line, you use readLines(filename) or scan(filename,"",sep="\n"). It gives you a vector with the lines of the file as elements. This again allows you to do custom processing of the text. Then again, if you really have to do this often, you might want to consider doing this in Perl.
Suppose your file is in JSON format, you may try the packages jsonlite ou RJSONIO or rjson. These three package allows you to use the function fromJSON.
To install a package you use the install.packages function. For example:
install.packages("jsonlite")
And, whenever the package is installed, you can load using the function library.
library(jsonlite)
Generally, the line-delimited JSON has one object per line. So, you need to read line by line and collecting the objects. For example:
con <- file('myBigJsonFile.json')
open(con)
objects <- list()
index <- 1
while (length(line <- readLines(con, n = 1, warn = FALSE)) > 0) {
objects[[index]] <- fromJSON(line)
index <- index + 1
}
close(con)
After that, you have all the data in the objects variable. With that variable you may extract the information you want.