Why neo4j is not adding a new line with \n character coming in data from csv? - csv

I am having some data coming from csv which has \n character in it and I expect neo4j to add a new line when assigning that string to some attribute in node. Apparently its not working. I can see \n character as it is added in the string.
How to make it work? Thanks in Advance.
Following is one such string example from CSV:
Combo 4 4 4 5 \n\nSpare Fiber Inventory. \nMultimode Individual fibers from 9927/9928 to FDB.\nNo available spares from either BTS to FDB - New conduits would be required\n\nFrom FDB to tower top. 9 of 9 Spares available on 2.5 riser cables.
My load command:
USING PERIODIC COMMIT 500
LOAD CSV WITH HEADERS
FROM 'file:///abc.csv' AS line
WITH line WHERE line.parent <> "" AND line.type = 'LSD' AND line.parent_type = 'XYZ'

This is a hack that I made to replace the occurrences of \n with a newline. The character \ is an escape character so it will replace \n with a new line in line 4. Do not remove line 5 and combine with line 4.
LOAD CSV WITH HEADERS
FROM 'file:///abc.csv' AS line
WITH line WHERE line.parent <> ""
WITH replace(line.parent,'\\n',"
") as parent
MERGE (p:Parent {parent: parent})
RESULT:
{
"identity": 16,
"labels": [
"Parent"
],
"properties": {
"parent": "Combo 4 4 4 5
Spare Fiber Inventory.
Multimode Individual fibers from 9927/9928 to FDB.
No available spares from either BTS to FDB - New conduits would be required
From FDB to tower top. 9 of 9 Spares available on 2.5 riser cables."
}
}

Related

How to read fixed-width data?

data looks like
212253820000025000.00000002500.00000000375.00111120211105202117
212456960000000750.00000000075.00000000011.25111120211102202117
212387470000010000.00000001000.00000000150.00111120211105202117
need to add separator like
21225382,0000025000.00,000002500.00,000000375.00,11112021,11052021,17
21245696,0000000750.00,000000075.00,000000011.25,11112021,11022021,17
21238747,0000010000.00,000001000.00,000000150.00,11112021,11052021,17
The CSV file length is high nearly 20000 rows are there is there any possibility to do
This question is generally about reading "fixed width data".
If you're stuck with this data, you'll need to parse it line by line then column by column. I'll show you how to do this with Python.
First off, the columns you counted off in the comment do not match your sample output. You seemed to have omitted the last column with a count of 2 characters.
You'll need accurate column widths to perform the task. I took your sample data and counted the columns for you and got these numbers:
8, 13, 12, 12, 8, 8, 2
So, we'll read the input data line by line, and for every line we'll:
Read 8 chars and save it as a column, then 13 chars and save it as a column, then 12 chars, etc... till we've read all the specified column widths
As we move through the line we'll keep track of our position with the variables beg and end to denote where a column begins (inclusive) and where it ends (exclusive)
The end of the first column becomes the beginning of the next, and so on down the line
We'll store those columns in a list (array) that is the new row
At the end of the line we'll save the new row to a list of all the rows
Then, we'll repeat the process for the next line
Here's how this looks in Python:
import pprint
Col_widths = [8, 13, 12, 12, 8, 8, 2]
all_rows = []
with open("data.txt") as in_file:
for line in in_file:
row = []
beg = 0
for width in Col_widths:
end = beg + width
col = line[beg:end]
row.append(col)
beg = end
all_rows.append(row)
pprint.pprint(all_rows, width=100)
all_rows is just a list of lists of text:
[['21225382', '0000025000.00', '000002500.00', '000000375.00', '11112021', '11052021', '17'],
['21245696', '0000000750.00', '000000075.00', '000000011.25', '11112021', '11022021', '17'],
['21238747', '0000010000.00', '000001000.00', '000000150.00', '11112021', '11052021', '17']]
With this approach, if you miscounted the column width or the number of columns you can easily modify the Column_widths to match your data.
From here we'll use Python's CSV module to make sure the CSV file is written correctly:
import csv
with open("data.csv", "w", newline="") as out_file:
writer = csv.writer(out_file)
writer.writerows(all_rows)
and my data.csv file looks like:
21225382,0000025000.00,000002500.00,000000375.00,11112021,11052021,17
21245696,0000000750.00,000000075.00,000000011.25,11112021,11022021,17
21238747,0000010000.00,000001000.00,000000150.00,11112021,11052021,17
If you have access to the command-line tool awk, you can fix your data like the following:
substr() gives a portion of the string $0, which is the entire line
you start at char 1 then specify the width of your first column, 8
for the next substr(), you again use $0, you start at 9 (1+8 from the last substr), and give it the second column's width, 13
and repeat for each column, starting at "the start of the last column plus the last column's width"
#!/bin/sh
# Col_widths = [8, 13, 12, 12, 8, 8, 2]
awk '{print substr($0,1,8) "," substr($0,9,13) "," substr($0,22,12) "," substr($0,34,12) "," substr($0,46,8) "," substr($0,54,8) "," substr($0,62,2)}' data.txt > data.csv

How can I parse multiple entire lines of text into octave 'matrix'

I want to import a lot of data from multiple files from multiple sub files. Luckily the data is consistent in its output:
Subpro1/data apples 1
Subpro1/data oranges 1
Subpro1/data banana 1
then
Subpro2/data apples 1
Subpro2/data oranges 1
Subpro2/data banana 1
I want to have a a datafilename array that holds the file names for each set of data I need to read. Then I can extract and store the data in a more local file, process it and eventually compare 'sub1_apples' to 'sub2_apples'
I have tried
fid = fopen ("DataFileNames.txt");
DataFileNames = fgets (fid)
fclose (fid);
But this only gives me the first line of 7.
DataFileNames = dlmread('DataFileNames.txt') gives me a 7x3 array but only 0 0 1 in each line as it reads the name breaks as delimiters and I cant change the file names.
DataFileNames = textread("DataFileNames.txt", '%s')
has all the correct information but still the delimiters split it across multiple lines
data
apples
1
data
oranges
1
...
Is there a %? that I am missing, if so what is it?
I want the output to be:
data apples 1
data oranges 1
data banana 1
With spaces, underscores and everything included so that I can then use this to access the data file.
You can read all lines of the file to a cell array like this:
str = fileread("DataFileNames.txt");
DataFileNames = regexp(str, '\r\n|\r|\n', 'split');
Output:
DataFileNames =
{
[1,1] = data apples 1
[1,2] = data oranges 1
[1,3] = data banana 1
}
In the first option you tried, using fgets you are reading just one line. Also, its better to use fgetl to remove the line end. To read line by line (which is longer) you need to do:
DataFileNames = {};
fid = fopen ("DataFileNames.txt");
line = fgetl(fid);
while ischar(line)
if ~isempty(line)
DataFileNames = [DataFileNames line];
endif
line = fgetl(fid);
endwhile
fclose (fid);
The second option you tried, using dlmread is not good because it is intended for reading numeric data to a matrix.
The third option you tried with textread, is not so good because it treats all white spaces (spaces, line-ends, ...) equally

read_csv file in pandas reads whole csv file in one column

I want to read csvfile in pandas. I have used function:
ace = pd.read_csv('C:\\Users\\C313586\\Desktop\\Daniil\\Daniil\\ACE.csv',sep = '\t')
And as output I got this:
a)First row(should be header)
_AdjustedNetWorthToTotalCapitalEmployed _Ebit _StTradeRec _StTradePay _OrdinaryCf _CfWorkingC _InvestingAc _OwnerAc _FinancingAc _ProdValueGrowth _NetFinancialDebtTotalAdjustedCapitalEmployed_BanksAndOtherInterestBearingLiabilitiesTotalEquityAndLiabilities _NFDEbitda _DepreciationAndAmortizationProductionValue _NumberOfDays _NumberOfDays360
#other rows separated by tab
0 5390\t0000000000000125\t0\t2013-12-31\t2013\tF...
1 5390\t0000000000000306\t0\t2015-12-31\t2015\tF...
2 5390\t00000000000003VG\t0\t2015-12-31\t2015\tF...
3 5390\t0000000000000405\t0\t2016-12-31\t2016\tF...
4 5390\t00000000000007VG\t0\t2013-12-31\t2013\tF...
5 5390\t0000000000000917\t0\t2015-12-31\t2015\tF...
6 5390\t00000000000009VG\t0\t2016-12-31\t2016\tF...
7 5390\t0000000000001052\t0\t2015-12-31\t2015\tF...
8 5390\t00000000000010SG\t0\t2015-12-31\t2015\tF...
Do you have any ideas why it happens? How can I fix it?
You should use the argument sep=r'\t' (note the extra r). This will make pandas search for the exact string \t (the r stands for raw)

Writing a list of lists to file, removing unwanted characters and a new line for each

I have a list "newdetails" that is a list of lists and it needs to be written to a csv file. Each field needs to take up a cell (without the trailing characters and commas) and each sublist needs to go on to a new line.
The code I have so far is:
file = open(s + ".csv","w")
file.write(str(newdetails))
file.write("\n")
file.close()
This however, writes to the csv in the following, unacceptable format:
[['12345670' 'Iphone 9.0' '500' 2 '3' '5'] ['12121212' 'Samsung Laptop' '900' 4 '3' '5']]
The format I wish for it to be in is as shown below:
12345670 Iphone 9.0 500 5 3 5
12121212 Samsung Laptop 900 5 3 5
You can use csv module to write information to csv file.
Please check below links:
csv module in Python 2
csv module in Python 3
Code:
import csv
new_details = [['12345670','Iphone 9.0','500',2,'3','5'],
['12121212','Samsung Laptop','900',4,'3','5']]
import csv
with open("result.csv","w",newline='') as fh
writer = csv.writer(fh,delimiter=' ')
for data in new_details:
writer.writerow(data)
Content of result.csv:
12345670 "Iphone 9.0" 500 2 3 5
12121212 "Samsung Laptop" 900 4 3 5

how to store text containing escape sequences in ms access

When i try to store text containing 'C' code in MS ACCESS table (programatically). It replaces escape sequences ('\n', '\t') with some question-mark symbol.
Example :
code to store :
#include<stdio.h>
int main()
{
printf("\n\n\t Hi there...");
return 0;
}
When i see MS-Access table for above inserted code it shows every newline and '\t' character replaced with a '?' kind of symbol.
My question "is there any other data type for MS-Access filed which stores code as it is without replacing escape sequences with some symbol?"
and
"Is 'raw' data type present in other DBMS like MYSQL will do my job? "
This is how it shows in access-07 :
It looks like the line breaks in your source text are not the Windows-standard CRLF (carriage return, line feed). Find out the character codes of those mystery characters.
Using the procedure below, I can feed it a text string, and it will list the code of each character. Here is an example from the Immediate window.
AsciiValues "a" & vbcrlf & "b"
position Asc AscW
1 97 97
2 13 13
3 10 10
4 98 98
If I want to examine the value stored in a table text field, I can use DLookup to fetch that value and feed it to the function.
AsciiValues DLookup("memo_field", "tblFoo", "id=1")
position Asc AscW
1 108 108
2 105 105
3 110 110
4 101 101
5 32 32
Once you determine the codes of the problem characters, you can execute an UPDATE statement to replace the problem character codes with suitable alternatives.
UPDATE YourTable
SET YourField = Replace(YourField, Chr(x), Chr(y));
And this is the procedure ...
Public Sub AsciiValues(ByVal pInput As String)
Dim i As Long
Dim lngSize As Long
lngSize = Len(pInput)
Debug.Print "position", "Asc", "AscW"
For i = 1 To lngSize
Debug.Print i, Asc(Mid(pInput, i, 1)), AscW(Mid(pInput, i, 1))
Next
End Sub
I'd say it's probably that you're lacking the whole newline. A newline in Access consists of a Carriage Return (ASCII 13) AND a Line Feed (ASCII 10). This is abbreviated as CRLF. You probably only have one or the other, but not both.
Use HansUp's AsciiValues procedure to take a look.