data looks like
212253820000025000.00000002500.00000000375.00111120211105202117
212456960000000750.00000000075.00000000011.25111120211102202117
212387470000010000.00000001000.00000000150.00111120211105202117
need to add separator like
21225382,0000025000.00,000002500.00,000000375.00,11112021,11052021,17
21245696,0000000750.00,000000075.00,000000011.25,11112021,11022021,17
21238747,0000010000.00,000001000.00,000000150.00,11112021,11052021,17
The CSV file length is high nearly 20000 rows are there is there any possibility to do
This question is generally about reading "fixed width data".
If you're stuck with this data, you'll need to parse it line by line then column by column. I'll show you how to do this with Python.
First off, the columns you counted off in the comment do not match your sample output. You seemed to have omitted the last column with a count of 2 characters.
You'll need accurate column widths to perform the task. I took your sample data and counted the columns for you and got these numbers:
8, 13, 12, 12, 8, 8, 2
So, we'll read the input data line by line, and for every line we'll:
Read 8 chars and save it as a column, then 13 chars and save it as a column, then 12 chars, etc... till we've read all the specified column widths
As we move through the line we'll keep track of our position with the variables beg and end to denote where a column begins (inclusive) and where it ends (exclusive)
The end of the first column becomes the beginning of the next, and so on down the line
We'll store those columns in a list (array) that is the new row
At the end of the line we'll save the new row to a list of all the rows
Then, we'll repeat the process for the next line
Here's how this looks in Python:
import pprint
Col_widths = [8, 13, 12, 12, 8, 8, 2]
all_rows = []
with open("data.txt") as in_file:
for line in in_file:
row = []
beg = 0
for width in Col_widths:
end = beg + width
col = line[beg:end]
row.append(col)
beg = end
all_rows.append(row)
pprint.pprint(all_rows, width=100)
all_rows is just a list of lists of text:
[['21225382', '0000025000.00', '000002500.00', '000000375.00', '11112021', '11052021', '17'],
['21245696', '0000000750.00', '000000075.00', '000000011.25', '11112021', '11022021', '17'],
['21238747', '0000010000.00', '000001000.00', '000000150.00', '11112021', '11052021', '17']]
With this approach, if you miscounted the column width or the number of columns you can easily modify the Column_widths to match your data.
From here we'll use Python's CSV module to make sure the CSV file is written correctly:
import csv
with open("data.csv", "w", newline="") as out_file:
writer = csv.writer(out_file)
writer.writerows(all_rows)
and my data.csv file looks like:
21225382,0000025000.00,000002500.00,000000375.00,11112021,11052021,17
21245696,0000000750.00,000000075.00,000000011.25,11112021,11022021,17
21238747,0000010000.00,000001000.00,000000150.00,11112021,11052021,17
If you have access to the command-line tool awk, you can fix your data like the following:
substr() gives a portion of the string $0, which is the entire line
you start at char 1 then specify the width of your first column, 8
for the next substr(), you again use $0, you start at 9 (1+8 from the last substr), and give it the second column's width, 13
and repeat for each column, starting at "the start of the last column plus the last column's width"
#!/bin/sh
# Col_widths = [8, 13, 12, 12, 8, 8, 2]
awk '{print substr($0,1,8) "," substr($0,9,13) "," substr($0,22,12) "," substr($0,34,12) "," substr($0,46,8) "," substr($0,54,8) "," substr($0,62,2)}' data.txt > data.csv
I want to import a lot of data from multiple files from multiple sub files. Luckily the data is consistent in its output:
Subpro1/data apples 1
Subpro1/data oranges 1
Subpro1/data banana 1
then
Subpro2/data apples 1
Subpro2/data oranges 1
Subpro2/data banana 1
I want to have a a datafilename array that holds the file names for each set of data I need to read. Then I can extract and store the data in a more local file, process it and eventually compare 'sub1_apples' to 'sub2_apples'
I have tried
fid = fopen ("DataFileNames.txt");
DataFileNames = fgets (fid)
fclose (fid);
But this only gives me the first line of 7.
DataFileNames = dlmread('DataFileNames.txt') gives me a 7x3 array but only 0 0 1 in each line as it reads the name breaks as delimiters and I cant change the file names.
DataFileNames = textread("DataFileNames.txt", '%s')
has all the correct information but still the delimiters split it across multiple lines
data
apples
1
data
oranges
1
...
Is there a %? that I am missing, if so what is it?
I want the output to be:
data apples 1
data oranges 1
data banana 1
With spaces, underscores and everything included so that I can then use this to access the data file.
You can read all lines of the file to a cell array like this:
str = fileread("DataFileNames.txt");
DataFileNames = regexp(str, '\r\n|\r|\n', 'split');
Output:
DataFileNames =
{
[1,1] = data apples 1
[1,2] = data oranges 1
[1,3] = data banana 1
}
In the first option you tried, using fgets you are reading just one line. Also, its better to use fgetl to remove the line end. To read line by line (which is longer) you need to do:
DataFileNames = {};
fid = fopen ("DataFileNames.txt");
line = fgetl(fid);
while ischar(line)
if ~isempty(line)
DataFileNames = [DataFileNames line];
endif
line = fgetl(fid);
endwhile
fclose (fid);
The second option you tried, using dlmread is not good because it is intended for reading numeric data to a matrix.
The third option you tried with textread, is not so good because it treats all white spaces (spaces, line-ends, ...) equally
I want to read csvfile in pandas. I have used function:
ace = pd.read_csv('C:\\Users\\C313586\\Desktop\\Daniil\\Daniil\\ACE.csv',sep = '\t')
And as output I got this:
a)First row(should be header)
_AdjustedNetWorthToTotalCapitalEmployed _Ebit _StTradeRec _StTradePay _OrdinaryCf _CfWorkingC _InvestingAc _OwnerAc _FinancingAc _ProdValueGrowth _NetFinancialDebtTotalAdjustedCapitalEmployed_BanksAndOtherInterestBearingLiabilitiesTotalEquityAndLiabilities _NFDEbitda _DepreciationAndAmortizationProductionValue _NumberOfDays _NumberOfDays360
#other rows separated by tab
0 5390\t0000000000000125\t0\t2013-12-31\t2013\tF...
1 5390\t0000000000000306\t0\t2015-12-31\t2015\tF...
2 5390\t00000000000003VG\t0\t2015-12-31\t2015\tF...
3 5390\t0000000000000405\t0\t2016-12-31\t2016\tF...
4 5390\t00000000000007VG\t0\t2013-12-31\t2013\tF...
5 5390\t0000000000000917\t0\t2015-12-31\t2015\tF...
6 5390\t00000000000009VG\t0\t2016-12-31\t2016\tF...
7 5390\t0000000000001052\t0\t2015-12-31\t2015\tF...
8 5390\t00000000000010SG\t0\t2015-12-31\t2015\tF...
Do you have any ideas why it happens? How can I fix it?
You should use the argument sep=r'\t' (note the extra r). This will make pandas search for the exact string \t (the r stands for raw)
I am trying to create an applescript that will create a string out of a CSV file. It is very easy to create the string of the entire file, but I'd like to be able to scan the file for a certain organization. Below is my input and desired output.
My CSV:
Org1 Bobby Bob bobbybob#gmail.com
Org1 Wendy Wen wendywen#gmail.com
Org1 Rachel Rach rachelrach#gmail.com
Org2 Timmy Tim ttim#otheremail.com
Org2 Ronny Ron rron#otheremail.com
Org2 Mike Mik mmik#otheremail.com
My AppleScript:
set csv to read csv_input_file as «class utf8»
set text item delimiters to ","
set csvParagraphs to paragraphs of csv
repeat with current_line in paragraphs of csv
set organization to text item 1 of current_line
--THIS IS WHERE I NEED HELP
set others_in_organization to ...
end repeat
Desired Output:
I would like to create a new string that contains a comma separated list of all of the other people in the current organization. For example, the fourth entry's organization is "Org2". So the string would return "Timmy Tim, Ronny Ron, Mike Mik".
Any help would be great! Thank you!
Try this...
set myOrg to "Org1"
set otherPeople to ""
set csv to "Org1,Bobby Bob,bobbybob#gmail.com
Org1,Wendy Wen,wendywen#gmail.com
Org1,Rachel Rach,rachelrach#gmail.com
Org2,Timmy Tim,ttim#otheremail.com
Org2,Ronny Ron,rron#otheremail.com
Org2,Mike Mik,mmik#otheremail.com"
set text item delimiters to ","
repeat with aLine in paragraphs of csv
set textItems to text items of aLine
if item 1 of textItems is not myOrg then
set otherPeople to otherPeople & item 2 of textItems & ", "
end if
end repeat
set text item delimiters to ""
if otherPeople ends with ", " then
set otherPeople to text 1 thru -3 of otherPeople
end if
return otherPeople
How to import the following csv into a mysql table, using mysql commands?
##
#File name : proj.csv
#line 1 are the field headers
#record 1 starts at line 2, ends at line 583
#from line 2 "<!DOCTYPE" to line 582 "</html>" are actually text blob of
#record 1 's "html" field
##
line 1: "proj_name","proj_id","url","html","proj_dir"
line 2: "Autorun Virus Remover",1,"http://www.softpedia.com/get/Antivirus/Autorun-Virus-Remover.shtml","<!DOCTYPE HTML PUBLIC ""-//W3C//DTD HTML 4.01 Transitional//EN"" ""http://www.w3.org/TR/html4/loose.dtd"">
line 3: <html>
line 4: <head profile=""http://a9.com/-/spec/opensearch/1.1/"">
...
line 582: </html>
line 583: ","Antivirus/Autorun-Virus-Remover"
The trouble is that the target csv file has a text blob field (named "html", which contains text with multiple lines) in it, so I can't use a '\n' to be the record seperator, or it will say "Row 1 doesn't contain data for all columns". A thousand thanks !!!