Search and Replace Text in CSV file using Python - csv

I just started with Python 3.4.2 and trying to find and replace text in csv file.
In Details, Input.csv file contain below line:
0,0,0,13,.\New_Path-1.1.12\Impl\Appli\Library\Module_RM\Code\src\Exception.cpp
0,0,0,98,.\Old_Path-1.1.12\Impl\Appli\Library\Prof_bus\Code\src\Wrapper.cpp
0,0,0,26,.\New_Path-1.1.12\Impl\Support\Custom\Vital\Code\src\Interface.cpp
0,0,0,114,.\Old_Path-1.1.12\Impl\Support\Custom\Cust\Code\src\Config.cpp
I maintained my strings to be searched in other file named list.csv
Module_RM
Prof_bus
Vital
Cust
Now I need to go through each line of Input.csvand replace the last column with the matched string.
So my end result should be like this:
0,0,0,13,Module_RM
0,0,0,98,Prof_bus
0,0,0,26,Vital
0,0,0,114,Cust
I read the input files first line as a list. So text which i need to replace came in line[4]. I am reading each module name in the list.csv file and checking if there is any match of text in line[4]. I am not able to make that if condition true. Please let me know if it is not a proper search.
import csv
import re
with open("D:\\My_Python\\New_Python_Test\\Input.csv") as source, open("D:\\My_Python\\New_Python_Test\\List.csv") as module_names, open("D:\\My_Python\\New_Python_Test\\Final_File.csv","w",newline="") as result:
reader=csv.reader(source)
module=csv.reader(module_names)
writer=csv.writer(result)
#lines=source.readlines()
for line in reader:
for mod in module_names:
if any([mod in s for s in line]):
line.replace(reader[4],mod)
print ("YES")
writer.writerow("OUT")
print (mod)
module_names.seek(0)
lines=reader
Please guide me to complete this task.
Thanks for your support!

At-last i succeeded in solving this problem!
The below code works well!
import csv
with open("D:\\My_Python\\New_Python_Test\\Input.csv") as source, open("D:\\My_Python\\New_Python_Test\\List.csv") as module_names, open("D:\\My_Python\\New_Python_Test\\Final_File.csv","w",newline="") as result:
reader=csv.reader(source)
module=csv.reader(module_names)
writer=csv.writer(result)
flag=False
for row in reader:
i=row[4]
for s in module_names:
k=s.strip()
if i.find(k)!=-1 and flag==False:
row[4]=k
writer.writerow(row)
flag=True
module_names.seek(0)
flag=False
Thanks for people who tried to solve! If you have any better coding practices please do share!
Good Luck!

Related

How do I preserve the leading 0 of a number using Unoconv when converting from a .csv file to a .xls file?

I have a 3 column csv file. The 2nd column contains numbers with a leading zero. For example:
044934343
I need to convert a .csv file into a .xls and to do that I'm using the command line tool called 'unoconv'.
It's converting as expected, however when I load up the .xls in Excel instead of showing '04493434', the cell shows '4493434' (the leading 0 has been removed).
I have tried surrounding the number in the .csv file with a single quote and a double quote however the leading 0 is still removed after conversion.
Is there a way to tell unoconv that a particular column should be of a TEXT type? I've tried to read the man page of unocov however the options are little confusing.
Any help would be greatly appreciated.
Perhaps I came too late at the scene, but just in case someone is looking for an answer for a similar question this is how to do:
unoconv -i FilterOptions=44,34,76,1,1/1/2/2/3/1 --format xls <csvFileName>
The key here is "1/1/2/2/3/1" part, which tells unoconv that the second column's type should be "TEXT", leaving the first and third as "Standard".
You can find more info here: https://wiki.openoffice.org/wiki/Documentation/DevGuide/Spreadsheets/Filter_Options#Token_7.2C_csv_import
BTW this is my first post here...

Read a log file in R

I'm trying to read a log file in R.
It looks like an extract from a JSON file to me, but when trying to read it using jsonlite I get the following error message: "Error: parse error: trailing garbage".
Here is how my log file look like:
{"date":"2017-05-11T04:37:15.587Z","userId":"admin","module":"Quote","action":"CreateQuote","identifier":"-.admin1002"},
{"date":"2017-05-11T05:12:24.939Z","userId":"a145fhyy","module":"Quote","action":"Call","identifier":"RunUY"},
{"date":"2017-05-11T05:12:28.174Z","userId":"a145fhyy","license":"named","usage":"External","module":"Catalog","action":"OpenCatalog","identifier":"wks.klu"},
Has you can see, the column name is precised directly in front of the content for each line (e.g: "date": or "action":)
And some line can skip some columns and add some other.
What I want to get as output would be to have 7 columns with the corresponding data filled in each:
date
userId
license
usage
module
action
identifier
Does anyone has a suggestion about how to get there?
Thanks a lot in advance
/////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
Thanks everyone for your answers. Here are some precisions about my issue:
The data that I gave as example in an extract of one of my log files. I've got a lot of them that I need to read as one unique table.
I haven't added any commas or anything to it.
#r2evans
I've tried the following:
Log3 <-read.table("/Projects/data/analytics.log.agregated.2017-05‌​-11.log") jsonlite::stream_in(textConnection(gsub(",$","",Log3)))
It returns the following error:
Error: lexical error: invalid char in json text.
c(17, 18, 19, 20, 21, 22, 23, 2
(right here) ------^
I'm not sure how to use sed -e 's/,$//g' infile > outfile and Sys.which("sed"), that something I'm not familiar with. I'm looking into it, but if you have anymore precisions to give me about the usage of it that would be great.
I have saved your example as a file "test.json" and was able to read and parse it like this:
library(jsonlite)
rf <- read_file("test.json")
rfr <- gsub("\\},", "\\}", rf)
data <- stream_in(textConnection(rfr))
It parses and simplifies into a neat data frame exactly like you want. What I do is look for "}," rather than ",$", because the very last comma is not (necessarily) followed by a newline character(s).
However, this might not be the best solution for very large files.. For those you may need to first look for a way to modify the text file itself by getting rid of the commas. Or, if that's possible, ask the people who exported this file to export it in a normal ndjson format:-)

Golang CSV read : extraneous " in field error

I am using a simple program to read CSV file, somehow I noticed when I created a CSV using EXCEL or windows based computer go library fails to read it. even when I use cat command it only shows me last line on the terminal. It always results in this error extraneous " in field.
I researched somewhat than I found it is somewhat related to carriage return differences between OS.
But I really want to ask how to make a generic csv reader. I tried reading the same csv using pandas and it was reading successfully. But i am not been able to achieve this using my Go code.
Also screen shot of correct csv Is here
Your file clearly shows that you've got an extra quote at the end of the content. While programs like pandas may be fine with that, I assume it's not valid csv so go does return an error.
Quick example of what's wrong with your data: https://play.golang.org/p/KBikSc1nzD
Update: After your update and a little bit of searching, I have to apoligize, the carriage return does matter and seems to be tha main culprit here, Go seems to be ok handling the \r\n windows variant but not the \r one. In that case what you can do is wrap the bytes.Reader into a custom reader that replaces the \r byte with the \n byte.
Here's an example: https://play.golang.org/p/vNjzwAHmtg
Please note, that the example is just that, an example, it's not handling all the possible cases where \r might be a legit byte.

creating a list from json csv file using python

I am sorry for asking this question, but i already look through but could not find the answer. I am honestly newbie.I am trying to generate a list of whole word from a json csv file. I already created a list of lines, but then i cannot use split() to generate new list containing separate word (later i need to count word occurrence).
My input file contains twitter information:
twitter data
i tried to write simple code:
myfile=open('fileName','r')
words=[]
for line in myfile:
words.append(line.split())
len(words)=82
I also tried reader=csv.reader(myFile) and reader=csv.DictReader(myFile)
but in all I can get each line, but how to further split the string/line into independent word. Sorry and thank you in advanced.
My data #I change to a different example as maybe last one was bad formatted:
id,flags,expiration,cas,value
493926581610364928,0,0,2635740904247446,"{""contributors"":null,""truncated"":false,""text"":""#xaaronh #blueredandgold If Namco Bandai's One Piece Unlimited World is anything to go by, no local retail release means no eShop either =\\"",""in_reply_to_status_id"":493925918998425600,""id"":493926581610364928,""favorite_count"":0,""source"":""Twitter Web Client"",""retweeted"":false,""coordinates"":null,""entities"":{""symbols"":[],""user_mentions"":[{""id"":139852376,""indices"":[0,8],""id_str"":""139852376"",""screen_name"":""xaaronh"",""name"":""Aaron""},{""id"":74393990,""indices"":[9,24],""id_str"":""74393990"",""screen_name"":""blueredandgold"",""name"":""Leigh""}],""hashtags"":[],""urls"":[]},""in_reply_to_screen_name"":""xaaronh"",""in_reply_to_user_id"":139852376,""retweet_count"":0,""id_str"":""493926581610364928"",""favorited"":false,""user"":{""follow_request_sent"":false,""profile_use_background_image"":true,""default_profile_image"":false,""id"":42302246,""profile_background_image_url_hp"":""hp://pbs.twimg.com/profile_background_images/464279459932020736/v1xnMcrV.jpeg"",""verified"":false,""profile_text_color"":""333333"",""profile_image_url_https"":""hp://pbs.twimg.com/profile_images/490791031487463424/udSldTQ3_normal.png"",""profile_sidebar_fill_color"":""DDEEF6"",""entities"":{""description"":{""urls"":[{""url"":""hp:tttt"",""indices"":[67,89],""expanded_url"":""hp://infernalmonkey.com"",""display_url"":""infernalmonkey.com""}]}},""followers_count"":506,""profile_sidebar_border_color"":""000000"",""id_str"":""42302246"",""profile_background_color"":""1A1B1F"",""listed_count"":22,""is_translation_enabled"":false,""utc_offset"":36000,""statuses_count"":8676,""description"":""I probably tweet about video games and onaholes. Let's be friends! (NSFW)"",""friends_count"":261,""location"":""Sydney, Australia"",""profile_link_color"":""2FC2EF"",""profile_image_url"":""hp://pbs.twimg.com/profile_images/490791031487463424/udSldTQ3_normal.png"",""following"":false,""geo_enabled"":false,""profile_banner_url"":""hp://pbs.twimg.com/profile_banners/42302246/1406105444"",""profile_background_image_url"":""hp://pbs.twimg.com/profile_background_images/464279459932020736/v1xnMcrV.jpeg"",""screen_name"":""infernal_monkey"",""lang"":""en"",""profile_background_tile"":false,""favourites_count"":2018,""name"":""Lance McGill"",""notifications"":false,""url"":null,""created_at"":""Sun May 24 23:20:25 +0000 2009"",""contributors_enabled"":false,""time_zone"":""Sydney"",""protected"":false,""default_profile"":false,""is_translator"":false},""geo"":null,""in_reply_to_user_id_str"":""139852376"",""lang"":""en"",""_id"":""493926581610364928"",""created_at"":""Tue Jul 29 01:10:48 +0000 2014"",""in_reply_to_status_id_str"":""493925918998425600"",""place"":null,""metadata"":{""iso_language_code"":""en"",""result_type"":""recent""}}"
This is not the best solution, just an effort from a noob (me), definitely need further editing for better output. I am using windows OS.
import csv
import json
abc=[]
myList=[]
myDict={}
myFile=open('fileName.csv','r',encoding='utf-8')
myReader=csv.reader(myFile)
header=next(myReader)
for line in myReader:
abc=json.loads(line[4])
myDict=abc
myList.append(myDict['text'])
dct={}
for eachLine in myList:
item=eachLine.split()
for one in item:
if one in dct:
dct[one]+=1
else:
dct[one]=1
finalList=list(dct.items())
finalList.sort()

Output a *.csv file created from a list of lists

Firstly, I'm just learning Python (which is my first language) so, while I recognise there are numerous websites that address this, I've spent a weekend trying to get my head around implementing a solution and got nowhere with it. So, I'm hoping someone here can help me :)
The problem is simple: I've created a list of lists in a Python program, and I need to output them to a *.csv file so I can import it into Excel etc.
The list looks like this:
[['title1','title2','title3'],['date1','info1','category1'],['date2','info2','category3'],...]
I've found solutions where the elements in each list are integers, I can't get them to work with strings.
Any help on this would be much appreciated!
Thanks,
Adam
There's a CSV module that can do this:
import csv
data = [['title1','title2','title3'],['date1','info1','category1'],['date2','info2','category3']]
with open('stuff.csv', 'wb') as csvfile:
writer = csv.writer(csvfile)
for line in data:
writer.writerow(line)