import csv
with open('thefile.csv', 'rb') as f:
data = list(csv.reader(f))
import collections
counter = collections.defaultdict(int)
for row in data:
counter[row[10]] += 1
with open('/pythonwork/thefile_subset11.csv', 'w') as outfile:
writer = csv.writer(outfile)
for row in data:
if counter[row[10]] >= 504:
writer.writerow(row)
This code reads thefile.csv, makes changes, and writes results to thefile_subset1.
However, when I open the resulting csv in Microsoft Excel, there is an extra blank line after each record!
Is there a way to make it not put an extra blank line?
The csv.writer module directly controls line endings and writes \r\n into the file directly. In Python 3 the file must be opened in untranslated text mode with the parameters 'w', newline='' (empty string) or it will write \r\r\n on Windows, where the default text mode will translate each \n into \r\n.
#!python3
with open('/pythonwork/thefile_subset11.csv', 'w', newline='') as outfile:
writer = csv.writer(outfile)
In Python 2, use binary mode to open outfile with mode 'wb' instead of 'w' to prevent Windows newline translation. Python 2 also has problems with Unicode and requires other workarounds to write non-ASCII text. See the Python 2 link below and the UnicodeReader and UnicodeWriter examples at the end of the page if you have to deal with writing Unicode strings to CSVs on Python 2, or look into the 3rd party unicodecsv module:
#!python2
with open('/pythonwork/thefile_subset11.csv', 'wb') as outfile:
writer = csv.writer(outfile)
Documentation Links
https://docs.python.org/3/library/csv.html#csv.writer
https://docs.python.org/2/library/csv.html#csv.writer
Opening the file in binary mode "wb" will not work in Python 3+. Or rather, you'd have to convert your data to binary before writing it. That's just a hassle.
Instead, you should keep it in text mode, but override the newline as empty. Like so:
with open('/pythonwork/thefile_subset11.csv', 'w', newline='') as outfile:
Note: It seems this is not the preferred solution because of how the extra line was being added on a Windows system. As stated in the python document:
If csvfile is a file object, it must be opened with the ‘b’ flag on platforms where that makes a difference.
Windows is one such platform where that makes a difference. While changing the line terminator as I described below may have fixed the problem, the problem could be avoided altogether by opening the file in binary mode. One might say this solution is more "elegent". "Fiddling" with the line terminator would have likely resulted in unportable code between systems in this case, where opening a file in binary mode on a unix system results in no effect. ie. it results in cross system compatible code.
From Python Docs:
On Windows, 'b' appended to the mode
opens the file in binary mode, so
there are also modes like 'rb', 'wb',
and 'r+b'. Python on Windows makes a
distinction between text and binary
files; the end-of-line characters in
text files are automatically altered
slightly when data is read or written.
This behind-the-scenes modification to
file data is fine for ASCII text
files, but it’ll corrupt binary data
like that in JPEG or EXE files. Be
very careful to use binary mode when
reading and writing such files. On
Unix, it doesn’t hurt to append a 'b'
to the mode, so you can use it
platform-independently for all binary
files.
Original:
As part of optional paramaters for the csv.writer if you are getting extra blank lines you may have to change the lineterminator (info here). Example below adapated from the python page csv docs. Change it from '\n' to whatever it should be. As this is just a stab in the dark at the problem this may or may not work, but it's my best guess.
>>> import csv
>>> spamWriter = csv.writer(open('eggs.csv', 'w'), lineterminator='\n')
>>> spamWriter.writerow(['Spam'] * 5 + ['Baked Beans'])
>>> spamWriter.writerow(['Spam', 'Lovely Spam', 'Wonderful Spam'])
The simple answer is that csv files should always be opened in binary mode whether for input or output, as otherwise on Windows there are problems with the line ending. Specifically on output the csv module will write \r\n (the standard CSV row terminator) and then (in text mode) the runtime will replace the \n by \r\n (the Windows standard line terminator) giving a result of \r\r\n.
Fiddling with the lineterminator is NOT the solution.
A lot of the other answers have become out of date in the ten years since the original question. For Python3, the answer is right in the documentation:
If csvfile is a file object, it should be opened with newline=''
The footnote explains in more detail:
If newline='' is not specified, newlines embedded inside quoted fields will not be interpreted correctly, and on platforms that use \r\n linendings on write an extra \r will be added. It should always be safe to specify newline='', since the csv module does its own (universal) newline handling.
Use the method defined below to write data to the CSV file.
open('outputFile.csv', 'a',newline='')
Just add an additional newline='' parameter inside the open method :
def writePhoneSpecsToCSV():
rowData=["field1", "field2"]
with open('outputFile.csv', 'a',newline='') as csv_file:
writer = csv.writer(csv_file)
writer.writerow(rowData)
This will write CSV rows without creating additional rows!
I'm writing this answer w.r.t. to python 3, as I've initially got the same problem.
I was supposed to get data from arduino using PySerial, and write them in a .csv file. Each reading in my case ended with '\r\n', so newline was always separating each line.
In my case, newline='' option didn't work. Because it showed some error like :
with open('op.csv', 'a',newline=' ') as csv_file:
ValueError: illegal newline value: ''
So it seemed that they don't accept omission of newline here.
Seeing one of the answers here only, I mentioned line terminator in the writer object, like,
writer = csv.writer(csv_file, delimiter=' ',lineterminator='\r')
and that worked for me for skipping the extra newlines.
with open(destPath+'\\'+csvXML, 'a+') as csvFile:
writer = csv.writer(csvFile, delimiter=';', lineterminator='\r')
writer.writerows(xmlList)
The "lineterminator='\r'" permit to pass to next row, without empty row between two.
Borrowing from this answer, it seems like the cleanest solution is to use io.TextIOWrapper. I managed to solve this problem for myself as follows:
from io import TextIOWrapper
...
with open(filename, 'wb') as csvfile, TextIOWrapper(csvfile, encoding='utf-8', newline='') as wrapper:
csvwriter = csv.writer(wrapper)
for data_row in data:
csvwriter.writerow(data_row)
The above answer is not compatible with Python 2. To have compatibility, I suppose one would simply need to wrap all the writing logic in an if block:
if sys.version_info < (3,):
# Python 2 way of handling CSVs
else:
# The above logic
I used writerow
def write_csv(writer, var1, var2, var3, var4):
"""
write four variables into a csv file
"""
writer.writerow([var1, var2, var3, var4])
numbers=set([1,2,3,4,5,6,7,2,4,6,8,10,12,14,16])
rules = list(permutations(numbers, 4))
#print(rules)
selection=[]
with open("count.csv", 'w',newline='') as csvfile:
writer = csv.writer(csvfile)
for rule in rules:
number1,number2,number3,number4=rule
if ((number1+number2+number3+number4)%5==0):
#print(rule)
selection.append(rule)
write_csv(writer,number1,number2,number3,number4)
When using Python 3 the empty lines can be avoid by using the codecs module. As stated in the documentation, files are opened in binary mode so no change of the newline kwarg is necessary. I was running into the same issue recently and that worked for me:
with codecs.open( csv_file, mode='w', encoding='utf-8') as out_csv:
csv_out_file = csv.DictWriter(out_csv)
When I pres ctrl+shift+F to search across all the files in the current scope, I get a new window listing all the files that contain that search term.
How can I quickly open all of these files?
Hold down the F4 key from the Search Results screen, and it will "Navigate to next match" - which causes it to open each file listed in the results.
Just a small note, if you are getting 10+ matches per file this method starts to fail because it gets slow.
Sublime doesn't have the ability to do this out of the box; however the plugin API gives you the power to create a plugin to do something like this fairly simply (depending on how you ultimately want it to work).
I assume there are plugins available for something like this, but for reference purposes here is a simple example:
import sublime
import sublime_plugin
class OpenAllFoundFilesCommand(sublime_plugin.TextCommand):
def run(self, edit, new_window=False):
# Collect all found filenames
positions = self.view.find_by_selector ("entity.name.filename.find-in-files")
if len(positions) > 0:
# Set up the window to open the files in
if new_window:
sublime.run_command ("new_window")
window = sublime.active_window ()
else:
window = self.view.window ()
# Open each file in the new window
for position in positions:
window.run_command ('open_file', {'file': self.view.substr (position)})
else:
self.view.window ().status_message ("No find results")
This provides a command named open_all_found_files which could be bound to a key, added to a menu, added to the command palette, etc.
Using the notion that sublime has a custom syntax for the find results with a scope dedicated to the matching filenames, this collects all such regions and then opens the associated files.
The optional command argument new_window can be passed and set to true to open the files in a new window; leaving it off or setting it to false opens the files in the same window as the find results. You can of course change the default as you see fit.
You can't do that from within Sublime Text.
If you are using Linux/UNIX/OSX you can open all the files that contain a particular string or matching regex by using a combination of grep and xargs on the command line using a command like this:
grep -rlZ "search_str_or_regex" /path/to/search/* | xargs -0 subl
// Command line options (may vary between OSes):
//
// grep -r Recurse directories
// grep -l Output only the filenames of the files which contain the search pattern
// grep -Z Output null terminated filenames
// xargs -0 Input filenames are null terminated
// xargs subl Sublime Text executable
//
// The combination of -Z and -0 allows filenames containing spaces to be handled
The files will be opened in the most recently used Sublime Text window. Add -n or --new-window after subl to have them opened in a new window.
If you are using Windows, consider using GOW or Cygwin.
I had to change the Code of OdatNurd. I had problems with the ":" and the " " at the end of the Filename... Under Mac OS Big Sur...
import sublime
import sublime_plugin
class OpenAllFoundFilesCommand(sublime_plugin.TextCommand):
"""
Collect the names of all files from a Find in Files result and open them
all at once, optionally in a new window.
"""
def run(self, edit, new_window=False):
# Collect all found filenames
positions = self.view.find_by_selector("entity.name.filename.find-in-files")
if len(positions) > 0:
# Set up the window to open the files in
if new_window:
sublime.run_command("new_window")
window = sublime.active_window()
else:
window = self.view.window()
# Open each file in the new window
for position in positions:
file = self.view.substr (position)
#print(file)
file = file.replace(":","")
window.run_command('open_file', {'file': file.strip()})
else:
self.view.window().status_message("No find results")
def is_enabled(self):
return self.view.match_selector(0, "text.find-in-files")
I'm extracting a certain part of a HTML document (to be fair: basis for this is an iXBRL document which means I do have a lot of written formatting code inside) and write my output, the original file without the extracted part, to a .txt file. My aim is to measure the difference in document size (how much KB of the original document refers to the extracted part). As far as I know there shouldn't be any difference in HTML to text format, so my difference should be reliable although I am comparing two different document formats. My code so far is:
import glob
import os
import contextlib
import re
#contextlib.contextmanager
def stdout2file(fname):
import sys
f = open(fname, 'w')
sys.stdout = f
yield
sys.stdout = sys.__stdout__
f.close()
def extractor():
os.chdir(r"F:\Test")
with stdout2file("FileShortened.txt"):
for file in glob.iglob('*.html', recursive=True):
with open(file) as f:
contents = f.read()
extract = re.compile(r'(This is the beginning of).*?Until the End', re.I | re.S)
cut = extract.sub('', contents)
print(file.split(os.path.sep)[-1], end="| ")
print(cut, end="\n")
extractor()
Note: I am NOT using BS4 or lxml because I am not only interested in HTML text but actually in ALL lines between my start and end-RegEx incl. all formatting code lines.
My code is working without problems, however as I have a lot of files my FileShortened.txt document is quickly going to be massive in size. My problem is not with the file or the extraction, but with redirecting my output to various txt-file. For now, I am getting everything into one file, what I would need is some kind of a "for each file searched, create new txt-file with the same name as the original document" condition (arcpy module?!)?
Somehting like:
File1.html --> File1Short.txt
File2.html --> File2Short.txt
...
Is there an easy way (without changing my code too much) to invert my code in the sense of printing the "RegEx Match" to a new .txt file instead of "everything except my RegEx match"?
Any help appreciated!
Ok, I figured it out.
Final Code is:
import glob
import os
import re
from os import path
def extractor():
os.chdir(r"F:\Test") # the directory containing my html
for file in glob.glob("*.html"): # iterates over all files in the directory ending in .html
with open(file) as f, open((file.rsplit(".", 1)[0]) + ".txt", "w") as out:
contents = f.read()
extract = re.compile(r'Start.*?End', re.I | re.S)
cut = extract.sub('', contents)
out.write(cut)
out.close()
extractor()
I want to build the haml in the sublime text 2 without switch to the iterm. So I build a simple plugin for the sublime, just like this.
import os
import sublime, sublime_plugin
class HamlToHtmlCommand(sublime_plugin.TextCommand):
def run(self,edit):
source = self.view.file_name()
filefullname = source.split('/')[-1]
filename = filefullname.split('.')[0]
target = "/".join(source.split('/')[0:-1])
com = "haml " + source + " > " + target + "/" + filename +'.html'
os.system(com)
def is_enabled(self):
return True
But the problem is that, when i build in the sublime, the target html file is empty.
for example, the "com" is "haml /Users/latpaw/login_register/login.haml /Users/latpaw/login_register/login.html". But if do the os.system(com) in the python cli, it is right.
So what really happens here
By the way, I really don't know Haml, so at first I just tried to fix your code. But I think you can build Haml directly from ST, by going to Tools→Build (⌥B). Or create a new build system Tools→Build Systems→New build system. Apparently you have the following variables that you could use:
$file The full path to the current file, e. g., C:\Files\Chapter1.txt.
$file_path The directory of the current file, e. g., C:\Files.
$file_name The name portion of the current file, e. g., Chapter1.txt.
$file_extension The extension portion of the current file, e. g., txt.
$file_base_name The name only portion of the current file, e. g., Document.
$packages The full path to the Packages folder.
$project The full path to the current project file.
$project_path The directory of the current project file.
$project_name The name portion of the current project file.
$project_extension The extension portion of the current project file.
$project_base_name The name only portion of the current project file.
Apparently you have an issue with path escaping. You must have, like me when I tried, a space somewhere in the path of your file. Or you probably know that commands like cat ~/Sublime Text/test do not work well. You can either:
cat ~/Sublime\ Text/test
cat ~/'Sublime Text/test'
I would advise the later and do this:
com = "haml '" + source + "' > '" + target + "/" + filename +"'.html"
PS: arguably, a better solution would be to add these single quotes directly when defining source, target and filename.
In Sublime Text, is there a way I can extract a selected piece of text into a separate file?
I do this often in LaTeX. Consider the following file:
main.tex
\section{Introduction}
...
...
\section{Conclusion}
I want to be able to select the text starting from Introduction until one line before the Conclusion, right-click and then say "Extract to file" (somewhat similar to how "Extract method" works in Visual Studio). Is there a way to achieve this using any shortcuts?
Bonus: Once the extraction is complete, substitute the extracted text with custom text such as \input{introduction} where introduction is the name of the file that the text was extracted into.
Nothing built in, but it's easily doable with a plugin. Note the following is minimally tested and won't handle everything in ST well. That being said, it should be a good base for you to start with. Just to be safe, I'd throw everything into a local git repo before using this to much. Hate for this to lead to loss of work. I copy the content being replaced to the clipboard just to be safe, but if you feel confident with it, you can remove sublime.set_clipboard(content)
import sublime
import sublime_plugin
import os
import re
class ExtractAndInput(sublime_plugin.TextCommand):
def run(self, edit):
view = self.view
self.region = view.sel()[0]
content = view.substr(self.region)
sublime.set_clipboard(content)
match = re.search(r"\\section{(.+?)}", content)
if match:
replace = "\\input{%s}" % match.group(1)
view.replace(edit, view.sel()[0], replace)
current = view.file_name()
new_file = "%s.tex" % match.group(1)
path = os.path.normpath(os.path.join(current, "..", new_file))
with open(path, "a") as file_obj:
file_obj.write("% Generated using ExtractAndInput Plugin\n")
file_obj.write(content)
After saving the plugin, you can create a key binding to extract_and_input. You can also add a context menu by creating a Context.sublime-menu in Packages/User with the following content.
[
{ "caption": "Extract to File", "command": "extract_and_input"}
]