CSV PARSING THROUGH PYTHON - csv

I have a sample csv file . I have to parse only ip address , port and qid related columns from the csv file. and store it in json format.enter image description here
Please help.

Code:
with open(your csv file)as _file:
ip_addresses = [ line.split()[0] for line in file.readlines() ]
assuming ip address as first column, hence 0 index, also it seems as if your csv requires cleaning up, also csvs are just text files seperated by commas

Related

How can I escape characters when creating a csv file in Data fusion?

I am creating a pipeline in google data fusion that should read records from a source database and write them to a target csv file in Cloud Storage.
The problem is that in the resulting file the separator character is a comma ",", and some fields are of type string and contains phrases with commas, so when I try to load the resulting file in wrangler as a csv, I get an error, because the number of fields in the csv does not match the number of fields in the schema (because of fields containing comma strings).
How can I escape these special characters in the pipeline?
Thanks and regards
Try writing the data as TSV instead of CSV (set the format of the sink plugin to tsv). Then load the data as tsv in Wrangler.

structure in getMetadata activity for csv file dataset ignores an extra comma in the files first row

I am using a reference CSV file with just the correct number and name of columns and want to compare its structure with that of incoming CSV files before proceeding to use Copy Data to import the incoming CSV data into Azure SQL. Files arriving in blob storage trigger this pipeline.
The need to validate the structure has arisen due to random files arriving with a trailing comma in the header row which causes a failure in the copy data pipeline as it sees the trailing comma as an extra column.
I have set up a getMetadata for both the reference file & the incoming files. Using an If Condition, I compare schemas.
The problem I have is that the output of getMetadata is ignoring the trailing comma.
I have tried 'column count' & 'structure' arguments. The same problem either way as the getMetadata fails to see the trailing comma as an issue.
Any help appreciated
I tried with extra commas in header of csv file. Its not ignoring them
reading those extra commas also as columns.
Please check below screenshots.

problems in copying a csv file from s3 to redshift

i am getting the following error if i run a copy command to copy contents of a .csv file in s3 to a table in redshift.
error:"String length exceeds DDL length".
i am using following copy command:
COPY enjoy from 's3://nmk-redshift-bucket/my_workbook.csv' CREDENTIALS 'aws_access_key_id=”****”;aws_secret_access_key=’**** ' CSV QUOTE '"' DELIMITER ',' NULL AS '\0'
i figured lets open the link given by s3 for my file through was console.
link for the work book is :
link to my s3bucket cvs file
the above file is filled with many weird characters i really don't understand.
the copy command is taking these characters instead of the information i have entered in my csv file.So hence leading to string length exceeded error.
i use sql workbench to query.My 'stl_load_errors' table in redshift has raw_field_values component similar to the chars in the link i mentioned above, thats how i got to know how its taking in the input
i am new to aws and utf-8 configs. so please i appreciate help on this
The link you provide points to a .xlsx file (but has a .csv extension instead of .xlsx), which is actually a zip file.
That is why you see those strange characters, the first 2 being 'PK', which means it is a zip file.
So you will have to export to .csv first, before using the file.

Importing PIPE delimited format txt into MySQL via PHPMyAdmin

I am importing some thousands lines of Data from a .txt file containing two columns and the format is as it follows:
A8041550408#=86^:|blablablablablablablablablablablablablablablablablablablabla1
blablablablablablablablablablablablablablablablablablablabla2
blablablablablablablablablablablablablablablablablablablabla3
A8041550408#=86^:|blablablablablablablablablablablablablablablablablablablabla1
blablablablablablablablablablablablablablablablablablablabla2
A8041550408#=86^:|blablablablablablablablablablablablablablablablablablablabla1
blablablablablablablablablablablablablablablablablablablabla2
blablablablablablablablablablablablablablablablablablablabla3
blablablablablablablablablablablablablablablablablablablabla4
etc....
What I have done so far is create a table with the two fields, but when i try to import the .txt file as a CSV and putting / Columns separated By : | /, I get an error:
"Invalid column count in CSV input on line 2."
Which is quite obvious since the second line of the .txt file is empty.
Moreover, I have tried importing the file as a CSV using LOAD DATA, and it didn't work as well it has just filled up the table with random words and phrases from the .txt file .
So my question is : How can I import the data from this file ?
You have to fix your file; in its current state you cannot expect the import module to be able to understand it. First step would be to remove the empty lines: How to remove blank lines from a Unix file

Why Dataset columns are not recognised with CSV on Azure Machine Learning?

I have created a csv file, with no header.
its 1496 rows of data, on 2 columns in the form:
Real; String
example:
0.24; "Some very long string"
I go to New - Dataset - From local file
Pick my file, and No header csv format
But after its done loading i get an error message i cant decrypt:
Dataset upload failed. Internal Service Error. Request ID:
ca378649-009b-4ee6-b2c2-87d93d4549d7 2015-06-29 18:33:14Z
Any idea what is going wrong?
At this time Azure Machine Learning only accepts the comma , seperated, American style CSV.
You will need to convert to a comma separated CSV