data looks like
212253820000025000.00000002500.00000000375.00111120211105202117
212456960000000750.00000000075.00000000011.25111120211102202117
212387470000010000.00000001000.00000000150.00111120211105202117
need to add separator like
21225382,0000025000.00,000002500.00,000000375.00,11112021,11052021,17
21245696,0000000750.00,000000075.00,000000011.25,11112021,11022021,17
21238747,0000010000.00,000001000.00,000000150.00,11112021,11052021,17
The CSV file length is high nearly 20000 rows are there is there any possibility to do
This question is generally about reading "fixed width data".
If you're stuck with this data, you'll need to parse it line by line then column by column. I'll show you how to do this with Python.
First off, the columns you counted off in the comment do not match your sample output. You seemed to have omitted the last column with a count of 2 characters.
You'll need accurate column widths to perform the task. I took your sample data and counted the columns for you and got these numbers:
8, 13, 12, 12, 8, 8, 2
So, we'll read the input data line by line, and for every line we'll:
Read 8 chars and save it as a column, then 13 chars and save it as a column, then 12 chars, etc... till we've read all the specified column widths
As we move through the line we'll keep track of our position with the variables beg and end to denote where a column begins (inclusive) and where it ends (exclusive)
The end of the first column becomes the beginning of the next, and so on down the line
We'll store those columns in a list (array) that is the new row
At the end of the line we'll save the new row to a list of all the rows
Then, we'll repeat the process for the next line
Here's how this looks in Python:
import pprint
Col_widths = [8, 13, 12, 12, 8, 8, 2]
all_rows = []
with open("data.txt") as in_file:
for line in in_file:
row = []
beg = 0
for width in Col_widths:
end = beg + width
col = line[beg:end]
row.append(col)
beg = end
all_rows.append(row)
pprint.pprint(all_rows, width=100)
all_rows is just a list of lists of text:
[['21225382', '0000025000.00', '000002500.00', '000000375.00', '11112021', '11052021', '17'],
['21245696', '0000000750.00', '000000075.00', '000000011.25', '11112021', '11022021', '17'],
['21238747', '0000010000.00', '000001000.00', '000000150.00', '11112021', '11052021', '17']]
With this approach, if you miscounted the column width or the number of columns you can easily modify the Column_widths to match your data.
From here we'll use Python's CSV module to make sure the CSV file is written correctly:
import csv
with open("data.csv", "w", newline="") as out_file:
writer = csv.writer(out_file)
writer.writerows(all_rows)
and my data.csv file looks like:
21225382,0000025000.00,000002500.00,000000375.00,11112021,11052021,17
21245696,0000000750.00,000000075.00,000000011.25,11112021,11022021,17
21238747,0000010000.00,000001000.00,000000150.00,11112021,11052021,17
If you have access to the command-line tool awk, you can fix your data like the following:
substr() gives a portion of the string $0, which is the entire line
you start at char 1 then specify the width of your first column, 8
for the next substr(), you again use $0, you start at 9 (1+8 from the last substr), and give it the second column's width, 13
and repeat for each column, starting at "the start of the last column plus the last column's width"
#!/bin/sh
# Col_widths = [8, 13, 12, 12, 8, 8, 2]
awk '{print substr($0,1,8) "," substr($0,9,13) "," substr($0,22,12) "," substr($0,34,12) "," substr($0,46,8) "," substr($0,54,8) "," substr($0,62,2)}' data.txt > data.csv
I want to import a lot of data from multiple files from multiple sub files. Luckily the data is consistent in its output:
Subpro1/data apples 1
Subpro1/data oranges 1
Subpro1/data banana 1
then
Subpro2/data apples 1
Subpro2/data oranges 1
Subpro2/data banana 1
I want to have a a datafilename array that holds the file names for each set of data I need to read. Then I can extract and store the data in a more local file, process it and eventually compare 'sub1_apples' to 'sub2_apples'
I have tried
fid = fopen ("DataFileNames.txt");
DataFileNames = fgets (fid)
fclose (fid);
But this only gives me the first line of 7.
DataFileNames = dlmread('DataFileNames.txt') gives me a 7x3 array but only 0 0 1 in each line as it reads the name breaks as delimiters and I cant change the file names.
DataFileNames = textread("DataFileNames.txt", '%s')
has all the correct information but still the delimiters split it across multiple lines
data
apples
1
data
oranges
1
...
Is there a %? that I am missing, if so what is it?
I want the output to be:
data apples 1
data oranges 1
data banana 1
With spaces, underscores and everything included so that I can then use this to access the data file.
You can read all lines of the file to a cell array like this:
str = fileread("DataFileNames.txt");
DataFileNames = regexp(str, '\r\n|\r|\n', 'split');
Output:
DataFileNames =
{
[1,1] = data apples 1
[1,2] = data oranges 1
[1,3] = data banana 1
}
In the first option you tried, using fgets you are reading just one line. Also, its better to use fgetl to remove the line end. To read line by line (which is longer) you need to do:
DataFileNames = {};
fid = fopen ("DataFileNames.txt");
line = fgetl(fid);
while ischar(line)
if ~isempty(line)
DataFileNames = [DataFileNames line];
endif
line = fgetl(fid);
endwhile
fclose (fid);
The second option you tried, using dlmread is not good because it is intended for reading numeric data to a matrix.
The third option you tried with textread, is not so good because it treats all white spaces (spaces, line-ends, ...) equally
I want to read csvfile in pandas. I have used function:
ace = pd.read_csv('C:\\Users\\C313586\\Desktop\\Daniil\\Daniil\\ACE.csv',sep = '\t')
And as output I got this:
a)First row(should be header)
_AdjustedNetWorthToTotalCapitalEmployed _Ebit _StTradeRec _StTradePay _OrdinaryCf _CfWorkingC _InvestingAc _OwnerAc _FinancingAc _ProdValueGrowth _NetFinancialDebtTotalAdjustedCapitalEmployed_BanksAndOtherInterestBearingLiabilitiesTotalEquityAndLiabilities _NFDEbitda _DepreciationAndAmortizationProductionValue _NumberOfDays _NumberOfDays360
#other rows separated by tab
0 5390\t0000000000000125\t0\t2013-12-31\t2013\tF...
1 5390\t0000000000000306\t0\t2015-12-31\t2015\tF...
2 5390\t00000000000003VG\t0\t2015-12-31\t2015\tF...
3 5390\t0000000000000405\t0\t2016-12-31\t2016\tF...
4 5390\t00000000000007VG\t0\t2013-12-31\t2013\tF...
5 5390\t0000000000000917\t0\t2015-12-31\t2015\tF...
6 5390\t00000000000009VG\t0\t2016-12-31\t2016\tF...
7 5390\t0000000000001052\t0\t2015-12-31\t2015\tF...
8 5390\t00000000000010SG\t0\t2015-12-31\t2015\tF...
Do you have any ideas why it happens? How can I fix it?
You should use the argument sep=r'\t' (note the extra r). This will make pandas search for the exact string \t (the r stands for raw)
I have a list "newdetails" that is a list of lists and it needs to be written to a csv file. Each field needs to take up a cell (without the trailing characters and commas) and each sublist needs to go on to a new line.
The code I have so far is:
file = open(s + ".csv","w")
file.write(str(newdetails))
file.write("\n")
file.close()
This however, writes to the csv in the following, unacceptable format:
[['12345670' 'Iphone 9.0' '500' 2 '3' '5'] ['12121212' 'Samsung Laptop' '900' 4 '3' '5']]
The format I wish for it to be in is as shown below:
12345670 Iphone 9.0 500 5 3 5
12121212 Samsung Laptop 900 5 3 5
You can use csv module to write information to csv file.
Please check below links:
csv module in Python 2
csv module in Python 3
Code:
import csv
new_details = [['12345670','Iphone 9.0','500',2,'3','5'],
['12121212','Samsung Laptop','900',4,'3','5']]
import csv
with open("result.csv","w",newline='') as fh
writer = csv.writer(fh,delimiter=' ')
for data in new_details:
writer.writerow(data)
Content of result.csv:
12345670 "Iphone 9.0" 500 2 3 5
12121212 "Samsung Laptop" 900 4 3 5
When i try to store text containing 'C' code in MS ACCESS table (programatically). It replaces escape sequences ('\n', '\t') with some question-mark symbol.
Example :
code to store :
#include<stdio.h>
int main()
{
printf("\n\n\t Hi there...");
return 0;
}
When i see MS-Access table for above inserted code it shows every newline and '\t' character replaced with a '?' kind of symbol.
My question "is there any other data type for MS-Access filed which stores code as it is without replacing escape sequences with some symbol?"
and
"Is 'raw' data type present in other DBMS like MYSQL will do my job? "
This is how it shows in access-07 :
It looks like the line breaks in your source text are not the Windows-standard CRLF (carriage return, line feed). Find out the character codes of those mystery characters.
Using the procedure below, I can feed it a text string, and it will list the code of each character. Here is an example from the Immediate window.
AsciiValues "a" & vbcrlf & "b"
position Asc AscW
1 97 97
2 13 13
3 10 10
4 98 98
If I want to examine the value stored in a table text field, I can use DLookup to fetch that value and feed it to the function.
AsciiValues DLookup("memo_field", "tblFoo", "id=1")
position Asc AscW
1 108 108
2 105 105
3 110 110
4 101 101
5 32 32
Once you determine the codes of the problem characters, you can execute an UPDATE statement to replace the problem character codes with suitable alternatives.
UPDATE YourTable
SET YourField = Replace(YourField, Chr(x), Chr(y));
And this is the procedure ...
Public Sub AsciiValues(ByVal pInput As String)
Dim i As Long
Dim lngSize As Long
lngSize = Len(pInput)
Debug.Print "position", "Asc", "AscW"
For i = 1 To lngSize
Debug.Print i, Asc(Mid(pInput, i, 1)), AscW(Mid(pInput, i, 1))
Next
End Sub
I'd say it's probably that you're lacking the whole newline. A newline in Access consists of a Carriage Return (ASCII 13) AND a Line Feed (ASCII 10). This is abbreviated as CRLF. You probably only have one or the other, but not both.
Use HansUp's AsciiValues procedure to take a look.