Reading text/number mixed CSV files as tables in Octave - octave

is there an easy way in octave to load data from a csv in a data structure similar to dataframes in R? I tries csvread dlmread but octave keeps reading test a imaginary numbers, plus I'd like to have column's headers as references. I saw that there are some examples online which see way too twisted, how is it possible that there is not a function or something similar to dataframe of R? I say a package called dataframe but I can't seem to figure out how it works. Any tip or suggestion?
csvread('x') %returns 1 column imaginary numbers
dlmread('x') %returns N columns imaginary numbers
Any working alternative?

Why are you unable to make the dataframe package work? You need to be more specific. Here's a simple example:
$ cat cars.csv
Year,Make,Model
1997,Ford,E350
2000,Mercury,Cougar
$ octave
octave-cli-3.8.2:1> pkg load dataframe
octave-cli-3.8.2:2> cars = dataframe ("cars.csv")
cars = dataframe with 2 rows and 3 columns
Src: cars.csv
_1 Year Make Model
Nr double char char
1 1997 Ford E350
2 2000 Mercury Cougar

Related

drop_duplicates() got an unexpected keyword argument 'ignore_index'

In my machine, the code can run normally. But in my friend's machine, there is an error about drop_duplicates(). The error type is the same as the title.
Open your command prompt, type pip show pandas to check the current version of your pandas.
If it's lower than 1.0.0 as #paulperry says, then type pip install --upgrade pandas --user
(substitute user with your windows account name)
Type import pandas as pd; pd.__version__ and see what version of Pandas you are using and make sure it's >= 1.0 .
I was having the same problem as Wzh -- but am running pandas version 1.1.3. So, it was not a version problem.
Ilya Chernov's comment pointed me in the right direction. I needed to extract a list of unique names from a single column in a more complicated DataFrame so that I could use that list in a lookup table. This seems like something others might need to do, so I will expand a bit on Chernov's comment with this example, using the sample csv file "iris.csv" that isavailable on GitHub. The file lists sepal and petal length for a number of iris varieties. Here we extract the variety names.
df = pd.read_csv('iris.csv')
# drop duplicates BEFORE extracting the column
names = df.drop_duplicates('variety', inplace=False, ignore_index=True)
# THEN extract the column you want
names = names['variety']
print(names)
Here is the output:
0 Setosa
1 Versicolor
2 Virginica
Name: variety, dtype: object
The key idea here is to get rid of the duplicate variety names while the object is still a DataFrame (without changing the original file), and then extract the one column that is of interest.

line feed within a column in csv

I have a csv like below. some of columns have line break like column B below. when I doing wc -l file.csv unix is returning 4 but actually these are 3 records. I don't want to replace line break with space, I am going to load data in database using sql loader and want to load data as it is. what should I do so that unix consider line break as one record?
A,B,C,D
1,"hello
world",sds,sds
2,sdsd,sdds,sdds
Unless you're dealing with trivial cases (No quoted fields, no embedded commas, no embedded newlines, etc.), CSV data is best processed with tools that understand the format. Languages like perl and python have CSV parsing libraries available, there are packages like csvkit that provide useful utilities, and more.
Using csvstat from csvkit on your example:
$ csvstat -H --count foo.csv
Row count: 3

Importing a CSV file as a matrix

I would like to import a CSV file (file.csv) as a matrix in Julia to plot it as a heatmap using GR. My CSV file contains 255 rows and 255 entries on each row. Here are some entires from the CSV file to illustrate the format of the rows:
file.csv
-1.838713563526794E-8;-1.863045549663876E-8;-2.334704481052452E-8 ...
-1.7375447279939282E-8;-1.9194929690414267E-8;-2.0258124812468942E-8; ...
⋮
-1.1706980663321613E-8;-1.6244768693064608E-8;-5.443335580296977E-9; ...
Note: The elipsis (...) are not part of the CSV file, rather they indicate that entires have been omitted.
I have tried importing the file as a matrix using the following line m = CSV.read("./file.csv"), but this results in a 255 by 1 vector rather than the 255 by 255 matrix. Does anyone know of an effective way to import CSV files as matrices in Julia?
You can use
using DelimitedFiles
m = readdlm("./file.csv", ';', Float64)
(last argument specifying type can be omitted if you want Float64)
m = CSV.read("./file.csv") returns a DataFrame.
If CSV.jl reads the file correctly so that all the columns of m are of type Float64 containing no missings, then you can convert m to a Float64 matrix with Matrix{Float64}(m), or obtain the matrix with one line:
m = Matrix{Float64}(CSV.read("./file.csv", header=0, delim=';'))
# or with piping syntax
m = CSV.read("./file.csv", header=0, delim=';') |> Matrix{Float64}
readdlm, though, should normally be enough and first solution to go for such simple CSV files like yours.
2022 Answer
Not sure if there has been a change to CSV.jl, however, if I do CSV.read("file.csv") it will error
provide a valid sink argument, like 'using DataFrames; CSV.read(source, DataFrame)'
You can however use the fact that it wants any Tables.jl compatible type:
using CSV, Tables
M = CSV.read("file.csv", Tables.matrix, header=0)

convert json text entries to a dataframe in r

I have a text file with json like structure that contains values for certain variables as below.
[{"variable1":"111","variable2":"666","variable3":"11","variable4":"aaa","variable5":"0"}]
[{"variable1":"34","variable2":"12","variable3":"78","variable4":"qqq","variable5":"-9"}]
Every line is a new set of values for the same variables 1 through 5. There can be 1000s of lines in a text file but the variables would always remain the same. I want to extract variable 1 through 5 along with their values and convert into a dataframe. Currently I perform these operations in excel using string manipulation and transpose. Here is what it looks like in excel -
How to do this in R? Much appreciated. Thanks.
J
There is a package named jsonlite that you can use.
library("jsonlite")
df<- fromJSON("YourPathToTheFile")
You can find more info here.

Load csv file with integers in Octave 3.2.4 under Windows

I am trying to import in Octave a file (i.e. data.txt) containing 2 columns of integers, such as:
101448,1077
96906,924
105704,1017
I use the following command:
data = load('data.txt')
However, the "data" matrix that results has a 1 x 1 dimension, with all the content of the data.txt file saved in just one cell. If I adjust the numbers to look like floats:
101448.0,1077.0
96906.0,924.0
105704.0,1017.0
the loading works as expected, and I obtain a matrix with 3 rows and 2 columns.
I looked at the various options that can be set for the load command but none of them seem to help. The data file has no headers, just plain integers, comma separated.
Any suggestions on how to load this type of data? How can I force Octave to cast the data as numeric?
The load function is not to read csv files. It is meant to load files saved from Octave itself which define variables.
To read a csv file use csvread ("data.txt"). Also, 3.2.4 is a very old version no longer supported, you should upgrade.