How to import the following csv into a mysql table, using mysql commands?
##
#File name : proj.csv
#line 1 are the field headers
#record 1 starts at line 2, ends at line 583
#from line 2 "<!DOCTYPE" to line 582 "</html>" are actually text blob of
#record 1 's "html" field
##
line 1: "proj_name","proj_id","url","html","proj_dir"
line 2: "Autorun Virus Remover",1,"http://www.softpedia.com/get/Antivirus/Autorun-Virus-Remover.shtml","<!DOCTYPE HTML PUBLIC ""-//W3C//DTD HTML 4.01 Transitional//EN"" ""http://www.w3.org/TR/html4/loose.dtd"">
line 3: <html>
line 4: <head profile=""http://a9.com/-/spec/opensearch/1.1/"">
...
line 582: </html>
line 583: ","Antivirus/Autorun-Virus-Remover"
The trouble is that the target csv file has a text blob field (named "html", which contains text with multiple lines) in it, so I can't use a '\n' to be the record seperator, or it will say "Row 1 doesn't contain data for all columns". A thousand thanks !!!
Related
I am trying to read data of the following format with textscan:
date,location,new_cases,new_deaths,total_cases,total_deaths
2019-12-31,Afghanistan,0,0,0,0
2020-01-01,Afghanistan,0,0,0,0
2020-01-02,Afghanistan,0,0,0,0
2020-01-03,Afghanistan,0,0,0,0
2020-01-04,Afghanistan,0,0,0,0
...
(Full data file available here: https://covid.ourworldindata.org/data/ecdc/full_data.csv)
My code is:
# Whitespace replaced with _
file_name = "full_data.csv";
fid = fopen(file_name, "rt");
data= textscan(fid, "%s%s%d%d%d%d", "Delimiter", ",", "HeaderLines", 1, ...
"ReturnOnError", 0);
fclose(fid);
Text scan terminates with an error:
error: textscan: Read error in field 3 of row 421
Row 421 is the center row in the example below:
2020-01-12,Australia,0,0,0,0
2020-01-13,Australia,0,0,0,0
2020-01-14,Australia,0,0,0,0
2020-01-15,Australia,0,0,0,0
2020-01-16,Australia,0,0,0,0
2020-01-17,Australia,0,0,0,0
2020-01-18,Australia,0,0,0,0
I've checked the row it complains about and there is nothing different from the example above. I've replaced all spaces in the file with underscores too. Am I doing something wrong with textcan?
I want to read csvfile in pandas. I have used function:
ace = pd.read_csv('C:\\Users\\C313586\\Desktop\\Daniil\\Daniil\\ACE.csv',sep = '\t')
And as output I got this:
a)First row(should be header)
_AdjustedNetWorthToTotalCapitalEmployed _Ebit _StTradeRec _StTradePay _OrdinaryCf _CfWorkingC _InvestingAc _OwnerAc _FinancingAc _ProdValueGrowth _NetFinancialDebtTotalAdjustedCapitalEmployed_BanksAndOtherInterestBearingLiabilitiesTotalEquityAndLiabilities _NFDEbitda _DepreciationAndAmortizationProductionValue _NumberOfDays _NumberOfDays360
#other rows separated by tab
0 5390\t0000000000000125\t0\t2013-12-31\t2013\tF...
1 5390\t0000000000000306\t0\t2015-12-31\t2015\tF...
2 5390\t00000000000003VG\t0\t2015-12-31\t2015\tF...
3 5390\t0000000000000405\t0\t2016-12-31\t2016\tF...
4 5390\t00000000000007VG\t0\t2013-12-31\t2013\tF...
5 5390\t0000000000000917\t0\t2015-12-31\t2015\tF...
6 5390\t00000000000009VG\t0\t2016-12-31\t2016\tF...
7 5390\t0000000000001052\t0\t2015-12-31\t2015\tF...
8 5390\t00000000000010SG\t0\t2015-12-31\t2015\tF...
Do you have any ideas why it happens? How can I fix it?
You should use the argument sep=r'\t' (note the extra r). This will make pandas search for the exact string \t (the r stands for raw)
my line 1 is:
column0,column1,column2,column3,column4,column5,column6,column7,column8,column9,column10,column11,column12,column13,column14,column15,column16,column17,column18,column19,column20
line 2 is:
225,1,9d36efa8d56c724ceb5b8834873d5457,38.69.182.103,,,,,,3,62930,0,,,,,6f4b457b6044ccd205dcf5531582af54,Apache-HttpClient%2fUNAVAILABLE%20%28java%201.4%29,1646,,160807,1
I have a txt file containing the output from several commands executed on a networking equipment. I wanted to parse this txt file so i can sort and print on an HTML page.
What is the best/easiest way to do this? Export every command to an array and then print array with sort on the HTML code?
Commands are between lines and they're tabular data. example:
*********************************************************************
# command 1
*********************************************************************
Object column1 column2 Total
-------------------------------------------------------------------
object 1 526 9484 10010
object 2 2 10008 10010
Object 3 0 20000 20000
*********************************************************************
# command 2
*********************************************************************
(... tabular data ...)
Can someone suggest any code or file where see how to make this work?
Thanks!
This can be easily done in Python with this example code:
f = open('input.txt')
rulers = 0
table = []
for line in f.readlines():
if '****' in line:
rulers += 1
if rulers == 2:
table = []
elif rulers > 2:
print(table)
rulers = 0
continue
if line == '\n' or '----' in line or line.startswith('#'):
continue
table.append(line.split())
print(table)
It just prints list of lists of the tabular values. But it can be formatted to whatever HTML or another format you need.
Import into your spreadsheet software. Export to HTML from there, and modify as needed.
I have a txt file with the following structure:
I also want to add to the end of each long line, the data (after the comma) of the short lines above them, without the description (STN_NO, STN_ID, INST_HT), like this:
Is it possible? Any ideas?
P.S. I am using Python Version 3.3.
Alternatively, you could use a simpler (albeit longer) solution that does not involve regex.
f = open('file.txt')
for line in f:
line = line.replace('\n', '')
if 'STN_NO' in line:
stn_no = line.split(',')[-1]
print(line)
elif 'STN_ID' in line:
stn_id = line.split(',')[-1]
print(line)
elif 'INST_HT' in line:
inst_ht = line.split(',')[-1]
print(line)
else:
print(line[:-1] + ',' + stn_no + ',' + stn_id + ',' + inst_ht)
Note that this puts the semicolon from the INST_HT line back at the end of every long line. If not desired, it can be removed with inst_ht[:-1].
Let's assume this simplified version of the file in your image:
STN_NO, 41943043
STN_ID, KAST
INST_HT, 1.01500;
Line 1
Line 2
Line 3
STN_NO, 41943062
STN_ID, S2
INST_HT, 0.75;
Line 4
Line 5
Line 6
STN_NO, 123456
STN_ID, XXX
INST_HT, 0.99;
Line 7
Line 8
Line 9
You can use a regex to capture the pattern in blocks and combine:
import re
pat=re.compile(r'^STN_NO,\s+([^\n]+)$\s*^STN_ID,\s+([^\n]+)$\s*^INST_HT,\s+([^;]+);\s*(.*?)(?=^STN_NO|\Z)', re.S | re.M)
with open(fn) as f:
txt=f.read()
for mg in pat.finditer(txt):
for line in mg.group(4).splitlines():
print(line+','+','.join([mg.group(1), mg.group(2), mg.group(3)]))
Prints:
Line 1,41943043,KAST,1.01500
Line 2,41943043,KAST,1.01500
Line 3,41943043,KAST,1.01500
Line 4,41943062,S2,0.75
Line 5,41943062,S2,0.75
Line 6,41943062,S2,0.75
Line 7,123456,XXX,0.99
Line 8,123456,XXX,0.99
Line 9,123456,XXX,0.99
If your file is bigger than what will fit in memory, use mmap to virtualize.