read_csv file in pandas reads whole csv file in one column - csv

I want to read csvfile in pandas. I have used function:
ace = pd.read_csv('C:\\Users\\C313586\\Desktop\\Daniil\\Daniil\\ACE.csv',sep = '\t')
And as output I got this:
a)First row(should be header)
_AdjustedNetWorthToTotalCapitalEmployed _Ebit _StTradeRec _StTradePay _OrdinaryCf _CfWorkingC _InvestingAc _OwnerAc _FinancingAc _ProdValueGrowth _NetFinancialDebtTotalAdjustedCapitalEmployed_BanksAndOtherInterestBearingLiabilitiesTotalEquityAndLiabilities _NFDEbitda _DepreciationAndAmortizationProductionValue _NumberOfDays _NumberOfDays360
#other rows separated by tab
0 5390\t0000000000000125\t0\t2013-12-31\t2013\tF...
1 5390\t0000000000000306\t0\t2015-12-31\t2015\tF...
2 5390\t00000000000003VG\t0\t2015-12-31\t2015\tF...
3 5390\t0000000000000405\t0\t2016-12-31\t2016\tF...
4 5390\t00000000000007VG\t0\t2013-12-31\t2013\tF...
5 5390\t0000000000000917\t0\t2015-12-31\t2015\tF...
6 5390\t00000000000009VG\t0\t2016-12-31\t2016\tF...
7 5390\t0000000000001052\t0\t2015-12-31\t2015\tF...
8 5390\t00000000000010SG\t0\t2015-12-31\t2015\tF...
Do you have any ideas why it happens? How can I fix it?

You should use the argument sep=r'\t' (note the extra r). This will make pandas search for the exact string \t (the r stands for raw)

Related

How can I parse multiple entire lines of text into octave 'matrix'

I want to import a lot of data from multiple files from multiple sub files. Luckily the data is consistent in its output:
Subpro1/data apples 1
Subpro1/data oranges 1
Subpro1/data banana 1
then
Subpro2/data apples 1
Subpro2/data oranges 1
Subpro2/data banana 1
I want to have a a datafilename array that holds the file names for each set of data I need to read. Then I can extract and store the data in a more local file, process it and eventually compare 'sub1_apples' to 'sub2_apples'
I have tried
fid = fopen ("DataFileNames.txt");
DataFileNames = fgets (fid)
fclose (fid);
But this only gives me the first line of 7.
DataFileNames = dlmread('DataFileNames.txt') gives me a 7x3 array but only 0 0 1 in each line as it reads the name breaks as delimiters and I cant change the file names.
DataFileNames = textread("DataFileNames.txt", '%s')
has all the correct information but still the delimiters split it across multiple lines
data
apples
1
data
oranges
1
...
Is there a %? that I am missing, if so what is it?
I want the output to be:
data apples 1
data oranges 1
data banana 1
With spaces, underscores and everything included so that I can then use this to access the data file.
You can read all lines of the file to a cell array like this:
str = fileread("DataFileNames.txt");
DataFileNames = regexp(str, '\r\n|\r|\n', 'split');
Output:
DataFileNames =
{
[1,1] = data apples 1
[1,2] = data oranges 1
[1,3] = data banana 1
}
In the first option you tried, using fgets you are reading just one line. Also, its better to use fgetl to remove the line end. To read line by line (which is longer) you need to do:
DataFileNames = {};
fid = fopen ("DataFileNames.txt");
line = fgetl(fid);
while ischar(line)
if ~isempty(line)
DataFileNames = [DataFileNames line];
endif
line = fgetl(fid);
endwhile
fclose (fid);
The second option you tried, using dlmread is not good because it is intended for reading numeric data to a matrix.
The third option you tried with textread, is not so good because it treats all white spaces (spaces, line-ends, ...) equally

Python, how to import datasets with vertically stacked columns headers, #relation,#attribute,#data?

I'm trying to load a dataset from timeseriesclassification.com, but the datasets are formatted in a way that I've never seen before.
The .csv file looks as follows,
#relation Wine
#attribute att0 numeric
#attribute att1 numeric
#attribute target {1 2}
#data
0,1,1
0,0,0
1,0,0
This is how the data should be formatted.
att0,att1,target
0,1,1
0,0,0
1,0,0
This is my current strategy:
read the file with file('filename.csv)
count the number of rows until #data appears
remove all the headers, and import the data with pandas
add new column names
Does anyone know what type of formatting this dataset is in? Also could anyone point me to a resource where I can reference different dataset formats.
Use Scipy's scipy.io.arff.loadarff to read ARFF files.
In [94]: from scipy.io.arff import loadarff
In [95]: dataset = loadarff(open('filename.csv','r'))
In [96]: df = pd.DataFrame(dataset[0], columns=dataset[1].names())
In [97]: df
Out[97]:
att0 att1 target
0 0.0 1.0 1
1 0.0 0.0 0
2 1.0 0.0 0
That format is a .arff (Attribute-Relation File format) file. You can read it with the scipy.io.arff python module.

Writing a list of lists to file, removing unwanted characters and a new line for each

I have a list "newdetails" that is a list of lists and it needs to be written to a csv file. Each field needs to take up a cell (without the trailing characters and commas) and each sublist needs to go on to a new line.
The code I have so far is:
file = open(s + ".csv","w")
file.write(str(newdetails))
file.write("\n")
file.close()
This however, writes to the csv in the following, unacceptable format:
[['12345670' 'Iphone 9.0' '500' 2 '3' '5'] ['12121212' 'Samsung Laptop' '900' 4 '3' '5']]
The format I wish for it to be in is as shown below:
12345670 Iphone 9.0 500 5 3 5
12121212 Samsung Laptop 900 5 3 5
You can use csv module to write information to csv file.
Please check below links:
csv module in Python 2
csv module in Python 3
Code:
import csv
new_details = [['12345670','Iphone 9.0','500',2,'3','5'],
['12121212','Samsung Laptop','900',4,'3','5']]
import csv
with open("result.csv","w",newline='') as fh
writer = csv.writer(fh,delimiter=' ')
for data in new_details:
writer.writerow(data)
Content of result.csv:
12345670 "Iphone 9.0" 500 2 3 5
12121212 "Samsung Laptop" 900 4 3 5

Parse txt file with shell

I have a txt file containing the output from several commands executed on a networking equipment. I wanted to parse this txt file so i can sort and print on an HTML page.
What is the best/easiest way to do this? Export every command to an array and then print array with sort on the HTML code?
Commands are between lines and they're tabular data. example:
*********************************************************************
# command 1
*********************************************************************
Object column1 column2 Total
-------------------------------------------------------------------
object 1 526 9484 10010
object 2 2 10008 10010
Object 3 0 20000 20000
*********************************************************************
# command 2
*********************************************************************
(... tabular data ...)
Can someone suggest any code or file where see how to make this work?
Thanks!
This can be easily done in Python with this example code:
f = open('input.txt')
rulers = 0
table = []
for line in f.readlines():
if '****' in line:
rulers += 1
if rulers == 2:
table = []
elif rulers > 2:
print(table)
rulers = 0
continue
if line == '\n' or '----' in line or line.startswith('#'):
continue
table.append(line.split())
print(table)
It just prints list of lists of the tabular values. But it can be formatted to whatever HTML or another format you need.
Import into your spreadsheet software. Export to HTML from there, and modify as needed.

Scientific data

I want to import data from a corrupted CSV file. It contains scientific numbers and it's a big data set with about 300000 rows and 27 columns. When I import it using,
Import["data.csv","HeaderLines"->1]
the data format is string. So I change it to data table format by
StringSplit[ToString[data[[#]]], ";"] & /#
Range[Dimensions[
Import["data.csv"]][[1]]]
and I need to use the first column to analyse the data. But the problem is that this row is
scientific numbers in string type!! I want to change it to numbers. I used this command:
ToExpression[Internal`StringToDouble[fdata[[All, 1]][[#]]]] & /#
Range[291407];
But it takes more than hours to do so!!! Do you have any idea how I can do this without wasting of time??
You could try the following:
(* read the first 5 rows *)
d = ReadList["data.csv", Table[Number, {27}], 5]
(* read the rows 100 to 150 *)
s = OpenRead["data.csv"];
Skip[s, Record, 99]
d = ReadList[s, Table[Number, {27}], 51]
Close[s]
And d[[All,1]] will get you the first column.