Weka: file not recognized as csv data files - csv

my line 1 is:
column0,column1,column2,column3,column4,column5,column6,column7,column8,column9,column10,column11,column12,column13,column14,column15,column16,column17,column18,column19,column20
line 2 is:
225,1,9d36efa8d56c724ceb5b8834873d5457,38.69.182.103,,,,,,3,62930,0,,,,,6f4b457b6044ccd205dcf5531582af54,Apache-HttpClient%2fUNAVAILABLE%20%28java%201.4%29,1646,,160807,1

Related

make a list of a list out of the header from a csv file

I want to put the header of the csv file in a nested list.
It should have ann output like this:
[[name], [age], [""], [""]]
how can I do this without reading the line again (I am not allowed to and I also am not allowed to use csv module and pandas (all imports except os are forbidden))
Just map the item of the list to list. Check below
def value_to_list(tlist):
l=len(tlist)
for i in range(l):
tlist[i]=[tlist[i]]
return tlist
headers=[]
with open(r"D:\my_projects\DemoProject\test.csv","r") as file :
headers = value_to_list(file.readline().split(","))
test.csv file is "col1,col2,col3"
output :
> python -u "run.py"
[['col1'], ['col2'], ['col3']]
>

How I can read json files inside of tar.gz format?

I have a huge file with format tar.gz. inside of this file there are 1000 json files. how I can read them? I did this:
import tarfile
file = tarfile.open('normalizer_output.tar.gz')
The if I run file.list() I see the names of these json files like the following:
?rw-rw-r-- ubuntu/ubuntu 6471545 2022-06-02 09:25:53 output/normalized-1054.json
?rw-rw-r-- ubuntu/ubuntu 6535150 2022-06-02 09:26:06 output/normalized-1055.json
?rw-rw-r-- ubuntu/ubuntu 6690476 2022-06-02 09:26:15 output/normalized-1056.json

How to read/write CSV file with Lauterbach CMM script row by row

I'm trying to read/write a CSV file row by row using Lauterbach CMM script, but it's not working.
I thought I can read each row of CSV file as each line if I read csv as normal file, but it's not true, if you use LF characters in the cell data: Reading CSV file line by line will be terminate. I mean the script reads the row as partially.
Can you please let me know to read CSV file row by row?
My code to read CSV row by row :
OPEN #1 &csv_name /Read
WHILE !FILE.EOF(1)
(
READ #1 %LINE &csv_row
PRINT "&csv_row"
)
Can you please provide information on how to read/write CSV file row by row by using a Lauterbach CMM script?
The way you are using OPEN and READ is intended to read one line of a text file. They are not made especially for CSV files. And while CSV files may have a line-feed-character inside a line (when encapsulated in double-quotes), a line of a normal text file simply ends with a line-feed.
So I think you have two options here: Read the file byte by byte as a binary file or load the whole file to the debuggers virtual memory (VM) and read it from there byte by byte.
Option 1 : Read the file byte by byte
OPEN #1 "&csv_name" /Read /Binary
WHILE !FILE.EOF(1)
(
PRIVATE &line
GOSUB getline "1"
RETURNVALUES &line
IF "&line"!=""
ECHO "&line"
)
CLOSE #1
ENDDO
getline:
PARAMETERS &fh
PRIVATE &c &last &line &inquotes
&last=0
&inquotes=FALSE()
READ #&fh &c
WHILE !FILE.EOF(&fh)
(
IF (&last=='"')&&(&c!='"')
&inquotes=!&inquotes
IF (!&inquotes)&&(&c==CONvert.CHAR(0x0A))
(
IF &last==CONvert.CHAR(0x0D)
&line=STRing.CUT("&line",-1)
RETURN "&line"
)
&line="&line"+CONvert.CHAR(&c)
&last=&c
READ #&fh &c
)
RETURN "&line"
Option 2 : Load the whole file to the debuggers virtual memory (VM) and read it from there byte by byte.
PRIVATEn &line &size &pos
&pos=0
&size=OS.FILE.SIZE("&csv_name")
SILENT.Data.LOAD.Binary "&csv_name" VM:&pos
WHILE &pos<&size
(
GOSUB getline "1" "&pos" "&size"
RETURNVALUES &line &pos
IF "&line"!=""
ECHO "&line"
)
CLOSE #1
ENDDO
getline:
PARAMETERS &fh &pos &size
PRIVATE &c &last &line &inquotes
&last=0
&inquotes=FALSE()
WHILE &pos<&size
(
&c=Data.Byte(VM:&pos)
&pos=&pos+1
IF (&last=='"')&&(&c!='"')
&inquotes=!&inquotes
IF (!&inquotes)&&(&c==CONvert.CHAR(0x0A))
(
IF &last==CONvert.CHAR(0x0D)
&line=STRing.CUT("&line",-1)
RETURN "&line" "&pos"
)
&line="&line"+CONvert.CHAR(&c)
&last=&c
)
RETURN "&line" "&pos"

Writing a list of lists to file, removing unwanted characters and a new line for each

I have a list "newdetails" that is a list of lists and it needs to be written to a csv file. Each field needs to take up a cell (without the trailing characters and commas) and each sublist needs to go on to a new line.
The code I have so far is:
file = open(s + ".csv","w")
file.write(str(newdetails))
file.write("\n")
file.close()
This however, writes to the csv in the following, unacceptable format:
[['12345670' 'Iphone 9.0' '500' 2 '3' '5'] ['12121212' 'Samsung Laptop' '900' 4 '3' '5']]
The format I wish for it to be in is as shown below:
12345670 Iphone 9.0 500 5 3 5
12121212 Samsung Laptop 900 5 3 5
You can use csv module to write information to csv file.
Please check below links:
csv module in Python 2
csv module in Python 3
Code:
import csv
new_details = [['12345670','Iphone 9.0','500',2,'3','5'],
['12121212','Samsung Laptop','900',4,'3','5']]
import csv
with open("result.csv","w",newline='') as fh
writer = csv.writer(fh,delimiter=' ')
for data in new_details:
writer.writerow(data)
Content of result.csv:
12345670 "Iphone 9.0" 500 2 3 5
12121212 "Samsung Laptop" 900 4 3 5

how to import the following csv into mysql, using mysql commands?

How to import the following csv into a mysql table, using mysql commands?
##
#File name : proj.csv
#line 1 are the field headers
#record 1 starts at line 2, ends at line 583
#from line 2 "<!DOCTYPE" to line 582 "</html>" are actually text blob of
#record 1 's "html" field
##
line 1: "proj_name","proj_id","url","html","proj_dir"
line 2: "Autorun Virus Remover",1,"http://www.softpedia.com/get/Antivirus/Autorun-Virus-Remover.shtml","<!DOCTYPE HTML PUBLIC ""-//W3C//DTD HTML 4.01 Transitional//EN"" ""http://www.w3.org/TR/html4/loose.dtd"">
line 3: <html>
line 4: <head profile=""http://a9.com/-/spec/opensearch/1.1/"">
...
line 582: </html>
line 583: ","Antivirus/Autorun-Virus-Remover"
The trouble is that the target csv file has a text blob field (named "html", which contains text with multiple lines) in it, so I can't use a '\n' to be the record seperator, or it will say "Row 1 doesn't contain data for all columns". A thousand thanks !!!