Importing a CSV file as a matrix - csv

I would like to import a CSV file (file.csv) as a matrix in Julia to plot it as a heatmap using GR. My CSV file contains 255 rows and 255 entries on each row. Here are some entires from the CSV file to illustrate the format of the rows:
file.csv
-1.838713563526794E-8;-1.863045549663876E-8;-2.334704481052452E-8 ...
-1.7375447279939282E-8;-1.9194929690414267E-8;-2.0258124812468942E-8; ...
⋮
-1.1706980663321613E-8;-1.6244768693064608E-8;-5.443335580296977E-9; ...
Note: The elipsis (...) are not part of the CSV file, rather they indicate that entires have been omitted.
I have tried importing the file as a matrix using the following line m = CSV.read("./file.csv"), but this results in a 255 by 1 vector rather than the 255 by 255 matrix. Does anyone know of an effective way to import CSV files as matrices in Julia?

You can use
using DelimitedFiles
m = readdlm("./file.csv", ';', Float64)
(last argument specifying type can be omitted if you want Float64)

m = CSV.read("./file.csv") returns a DataFrame.
If CSV.jl reads the file correctly so that all the columns of m are of type Float64 containing no missings, then you can convert m to a Float64 matrix with Matrix{Float64}(m), or obtain the matrix with one line:
m = Matrix{Float64}(CSV.read("./file.csv", header=0, delim=';'))
# or with piping syntax
m = CSV.read("./file.csv", header=0, delim=';') |> Matrix{Float64}
readdlm, though, should normally be enough and first solution to go for such simple CSV files like yours.

2022 Answer
Not sure if there has been a change to CSV.jl, however, if I do CSV.read("file.csv") it will error
provide a valid sink argument, like 'using DataFrames; CSV.read(source, DataFrame)'
You can however use the fact that it wants any Tables.jl compatible type:
using CSV, Tables
M = CSV.read("file.csv", Tables.matrix, header=0)

Related

Method Error in Julia: thinks CSV is boolean and won't convert to string

I am trying to read a CSV file into Julia. When I open the file in Excel, its a 199x7 matrix of numbers. I am using the following code to create a variable, Xrel:
Xrel = CSV.read(joinpath(data_path,"Xrel.csv"), header=false)
However, when I try to do this, Julia produces:
"MethodError: Cannot 'convert' an object of type Bool to an object of type String."
data_path is defined in previous code to save space.
I've checked my paths and opened the CSV without a problem in R - it's only in Julia that I am having an issue.
I am confused as to why Julia is saying that my data is Boolean when its a matrix of numbers?
How can I resolve this to read in my CSV file?
Thanks!!
I think you should use either CSV.File() or add a sink variable to CSV.read()
CSV.File(joinpath(data_path,"Xrel.csv"), header=false)
# or with a DataFrame as sink
using DataFrames
CSV.read(joinpath(data_path,"Xrel.csv"), DataFrame, header=false)
from the docs:
?CSV.read
CSV.read(source, sink::T; kwargs...) => T
Read and parses a delimited file, materializing directly using the sink function.
CSV.read supports all the same keyword arguments as CSV.File.

Stata read numeric data as string using variable names

I am reading a csv file into Stata using
import delimited "../data_clean/winter20.csv", encoding(UTF-8)
The raw data looks like:
y id1
-.7709586 000000000020
-.4195721 000000003969
-.8932499 300000000021
-1.256116 200000007153
-.7858037 000000000000
The imported data become:
y id1
-.7709586 20
-.4195721 000000003969
-.8932499 300000000021
-1.256116 200000007153
-.7858037 0
However, there are some columns of IDs which are read as numeric. I would like to import them as strings. I want to read the data exactly as how the raw data looks like.
The way I found online is:
import delimited "/Users/tianwang/Dropbox/Construction/data_clean/winter20.csv", encoding(UTF-8) stringcols(74 97 116) clear
However, the raw data may be updated and column numbers may change. The following
import delimited "/Users/tianwang/Dropbox/Construction/data_clean/winter20.csv", encoding(UTF-8) stringcols(id1 id2 id3) clear
gives error id1: invalid numlist in stringcols() option. Is there a way to specify variable names rather than column numbers?
The reason is leading zeros are missing if I read IDs as numeric. Methodtostring does not recover the leading zeros. format id1 %09.0f only works if variables have equal number of digits.
I think this should do it.
import delimited "../data_clean/winter20.csv", stringcols(_all) encoding(UTF-8) clear
PS: Tested in Stata16/Win10

How does Pyarrow read_csv handle different file encodings?

I have a .dat file that I had been reading with pd.read_csv and always needed to use encoding="latin" for it to read properly / without error. When I use pyarrow.csv.read_csv I dont see a parameter to select the encoding of the file but it still works without issue(which is great! but i dont understand why / if it only auto handles certain encodings). The only parameters im using are setting the delimiter="|" (with ParseOptions) and auto_dict_encode=True with (ConvertOptions).
How is pyarrow handling different encoding types?
pyarrow currently has no functionality to deal with different encodings, and assumes UTF8 for string/text data.
But the reason it doesn't raise an error is that pyarrow will read any non-UTF8 strings as a "binary" type column, instead of "string" type.
A small example:
# writing a small file with latin encoding
with open("test.csv", "w", encoding="latin") as f:
f.writelines(["col1,col2\n", "u,ù"])
Reading with pyarrow gives string for the first column (which only contains ASCII characters, thus also valid UTF8), but reads the second column as binary:
>>> from pyarrow import csv
>>> csv.read_csv("test.csv")
pyarrow.Table
col1: string
col2: binary
With pandas you indeed get an error by default (because pandas has no binary data type, and will try to read all text columns as python strings, thus UTF8):
>>> pd.read_csv("test.csv")
...
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf9 in position 0: invalid start byte
>>> pd.read_csv("test.csv", encoding="latin")
col1 col2
0 u ù
It's now possible to specify encodings with pyarrow.read_csv.
According to the pyarrow docs for read_csv:
The encoding can be changed using the ReadOptions class.
A minimal example follows:
from pyarrow import csv
options = csv.ReadOptions(encoding='latin1')
table = csv.read_csv('path/to/file', options)
From what I can tell, the functionality was added in this PR, so it should work starting with pyarrow 1.0.

How open and read JSON file?

I have json file but this file have weight 186 mb. I try read via python .
import json
f = open('file.json','r')
r = json.loads(f.read())
ValueError: Extra data: line 88 column 2 -...
FILE
How to open it? Help me
Your JSON file isn't a JSON file, it's several JSON files mashed together.
The first instance of this occurs in the 1630070th character:
'шова"}]}]}{"response":[{"count'
^ here
That said, jq appears to be able to handle it, so the individual parts are fine.
You'll need to split the file at the boundaries of the individual JSON objects. Try catching the JSONDecodeError and use its .colno to slice the text into correct chunks.
It should be:
r = json.loads(f)

Import csv file data to populate a Prolog knowledge base

I have a csv file example.csv which contains two columns with header var1 and var2.
I want to populate an initially empty Prolog knowledge base file import.pl with repeated facts, while each row of example.csv is treated same:
fact(A1, A2).
fact(B1, B2).
fact(C1, C2).
How can I code this in SWI-Prolog ?
EDIT, based on answer from #Shevliaskovic:
:- use_module(library(csv)).
import:-
csv_read_file('example.csv', Data, [functor(fact), separator(0';)]),
maplist(assert, Data).
When import. is run in console, we update the knowledge base exactly the way it is requested (except for the fact that the knowledge base is directly updated in memory, rather than doing this via a file and subsequent consult).
Check setof([X, Y], fact(X,Y), Z). :
Z = [['A1', 'A2'], ['B1', 'B2'], ['C1', 'C2'], [var1, var2]].
SWI Prolog has a built in process for this.
It is
csv_read_file(+File, -Rows)
Or you can add some options:
csv_read_file(+File, -Rows, +Options)
You can see it at the documentation. For more information
Here is the example that the documentation has:
Suppose we want to create a predicate table/6 from a CSV file that we
know contains 6 fields per record. This can be done using the code
below. Without the option arity(6), this would generate a predicate
table/N, where N is the number of fields per record in the data.
?- csv_read_file(File, Rows, [functor(table), arity(6)]),
maplist(assert, Rows).
For example:
If you have a File.csv that looks like:
A1 A2
B1 B2
C1 C2
You can import it to SWI like:
9 ?- csv_read_file('File.csv', Data).
The result would be:
Data = [row('A1', 'A2'), row('B1', 'B2'), row('C1', 'C2')].