'b' Added to file name when trying to load data file in Jupyter - csv

When trying to load a data file into a Jupyter notebook I get the following error message
File b'data_file.csv' does not exist: b'data_file.csv'
Following suggestions I can find online on this problem, I tried the following variations, including specifying the full path and utf encoding
pd.read_csv("data_file.csv")
pd.read_csv("C:\\FULL_PATH\\EBI\\data_file.csv")
pd.read_csv(r"data_file.csv")
pd.read_csv(r"C:\\FULL_PATH\\EBI\\data_file.csv")
pd.read_csv("data_file.csv",encoding='utf-8')
pd.read_csv("C:\\FULL_PATH\\EBI\\data_file.csv",encoding='utf-8')
pd.read_csv(r"data_file.csv",encoding='utf-8')
pd.read_csv(r"C:\\FULL_PATH\\EBI\\data_file.csv",encoding='utf-8')
as well as
pd.read_csv('C:\\FULL_PATH\\EBI\\"data_file.csv"')
However, all of these yield the same error message
File b'data_file.csv' does not exist: b'data_file.csv'
Not sure if it is helpful to add that the Jupyter notebook is being run on a Windows Server 2012 platform. Please note that I checked using os.getcwd() that the full path is indeed as quoted above.
Any suggestions would be much appreciated!

Assuming the file is in your working directory, could you try:
import os
file = os.path.join(os.getcwd(),"data_file.csv")
df = pd.read_csv(file)

Related

How should one zip a large folder in Windows 10, upload it to GDrive, then unzip it?

I have a directory consisting of 22 sub-directories. Altogether, the directory is about 750GB in size and I need this data on GDrive so that I can work with it in Google Colab. Obviously uploading this takes an absolute age (particularly with my slow connection) so I would like to zip it, upload it, then unzip it in the cloud.
I am using 7zip and zipping each subdirectory using the zip format and "normal" compression level. (EDIT: Can now confirm that I get the same error for 7z and tar format). Each subdirectory ends up between 14 and 20GB in size. I then upload this and attempt to unzip it in Google Colab using the following code:
drive.mount('/content/gdrive/')
!apt-get install p7zip-full
!7za x "/content/gdrive/My Drive/av_tfrecords/drumming_7zip.zip" -o"/content/gdrive/My Drive/unzipped_av_tfrecords/" -aos
This extracts some portion of the zip file before throwing an error. There are a variety of errors and sometimes the code will not even begin unzipping the file before throwing an error. This is the most common error:
Can not open the file as archive
ERROR: Unknown error -2147024891
Archives with Errors: 1
If I then attempt to rerun the !7za command, it may extract one or 2 files more from the zip file before throwing this error:
terminate called after throwing an instance of 'CInBufferException'
It may also complain about particular files within the zip archive:
ERROR: Headers Error : drumming/yt-g0fi0iLRJCE_23.tfrecords
I have also tried using:
!unzip -n "/content/gdrive/My Drive/av_tfrecords/drumming_7zip.zip" -d "/content/gdrive/My Drive/unzipped_av_tfrecords/"
But that just begins throwing errors:
file #254: bad zipfile offset (lseek): 8137146368
file #255: bad zipfile offset (lseek): 8168710144
file #256: bad zipfile offset (lseek): 8207515648
Although I would prefer a solution in Colab, I have also tried using an app available in GDrive named "Zip Extractor". But that too throws an error and has a dataquota.
This has now happened across 4 zip files and each time I try something new, it takes an a long time to try it out because of the upload speeds. Any explanations for why this is happening and how I can resolve the issue would be greatly appreciated. Also I understand there are probably alternatives to what I am trying to do and they would be appreciated also, even if they do not directly answer the question. Thank you!
I got same problem
Solve it by
new ProcessBuilder(new String[] {"7z", "x", fPath, "-o" + dir)
Use command line array, not just full line!
Luck!
Why does this command behave differently depending on whether it's called from terminal.app or a scala program?

LOAD CSV command keeps using old file: location, ignores command input

I am using Community edition 3.0.5 on Windows 10 . I made multiple efforts to execute a LOAD CSV command before being told that such files cannot reside on an external drive. When I moved the file to users/user/ and tried to execute the LOAD CSV command I got the same message "Couldn't load the external resource at: file:/F:/Neo4j%20DBs/Data.gov%20Consumer%20Complaints/Consumer%20Complaints%20DB/import/Users/CharlieOh/Consumer_Complaints.csv" in spite of the fact the command I entered was
"LOAD CSV WITH HEADERS FROM
'file:///Users/CharlieOh/Consumer_Complaints.csv' AS line
WITH line
LIMIT 1
RETURN line"
I tried to locate the file neo4j.conf and could only find C:\Program Files (x86)\Neo4j Community 3.2.2\Neo4j Community.install4j\i4jparams.conf . I even deleted the old DB and recreated the small amount of data and got the same error, which seems to indicate that the LOAD CSV function is totally useless across all my neo4j databases. BTW the %20 in the file specification was due to suggestions on Stack Overflow as well as using underscores to avoid any use of blank spaces in the file specification. None of it worked and now that I believe that I may have solved the problem by putting the csv file in the user directory, the LOAD CSV function won't let me do it. One last thing, I am following the YouTube video https://www.youtube.com/watch?v=Eh_79goBRUk to learn how to load a csv file into neo4j.
The csv file needs to go in the import directory of the specific database. With Neo4j Desktop this is easy to identify by clicking on the Manage button of the database and then the open folder button. It looks like you've found it.
Once the database import directory is located, you specify it in the LOAD CSV with the statement LOAD CSV WITH HEADERS FROM 'file:///" + FN + "'where FN is your file name, including the csv extension. You do NOT use the full path; that is assumed.

Error when loading shape files into Bluemix dashDB

I am running into the following error when I am loading my shape files through the DashDB console:
My shape files are the following:
Would anyone have experience working with DashDB and ran into a similar problem?
UPDATE:
I downloaded a separate dataset with the following files, and I still running into the same error:
Please find the following sample files https://www.dropbox.com/s/bkrac971g9uc02x/deng.zip?dl=0
I brought the Shapefile into QGIS easily, so I knew the format was OK. I unzipped the Shapefile, changed the file names to lower-case and re-zipped it up. Then I was able to get further in the dashDB upload UI. I got to a message saying the SRS was unknown. I then used QGIS to convert the SRS (spatial reference system) into a known one -- EPSG:4269, NAD83, and I was then able to upload it into dashDB. Here's the version of your file that works:
https://dl.dropboxusercontent.com/u/8196680/dc.zip

neo4j LOAD CSV returns Couldn't Load external resource

Trying CSV import to Neo4j - doesn't seem to be working.
I'm loading a local file using the syntax:
LOAD CSV WITH HEADERS FROM "file:///location/local/my.csv" AS csvDoc
Am wondering if there's something wrong with my CSV file, or if there's some syntax problem here.
If you didn't read the title, the error is:
Couldn't load the external resource at: file:/location/local/my.csv
[Neo.TransientError.Statement.ExternalResourceFailure]
Neo4j seems to need a full path spec to get a file on the local system.
On linux or mac try
LOAD CSV FROM "file:/Users/you/location/local/my.csv"
On windows try
LOAD CSV FROM "file://c:/location/local/my.csv"
.
In the browser interface (Neo4j 3.0.3, MacOS 10.11) it looks like Neo4j prefixes your file path with $path_to_graph_database/import. So you could move your files there. If you are using a command line tool, then see this SO question.
Easy solution:
Once you choose your database location (in my case ReactomeGraphDB60)...
here I placed my ddbb
...go to that folder, and create inside a folder called "import".
Later in the cypher query write (as an example):
LOAD CSV WITH HEADERS FROM "file:///ILClasiffStruct.csv" AS row
CREATE (n:Interleukines)
SET n = row

get data from .csv file, analyze, produce output - python3

I am trying to complete an assignment in Python3. It is very similar to the pdf found here
I have a few questions on both the execution of how to get the information I need, and if possible, some code that could move me along. I am new to python. As right now from the code I have, I keep getting the error "directory not found" after running a function to try and read the data. I know the .csv file should be in the directory where I save it to in WingIDE, but I can't get it to work correctly.
My first question is after getting each line of the .csv file to read from my get_file_list, what is the best way to take each category and throw it into an efficiency equation?
Here is my get_data_list function:
def get_data_list(filename):
data_file = open(filename, "r")
data_list = [ ]
for line_str in data_file:
data_list.append(line_str.strip().split(','))
return data_list
when I run get_data_list("player_regular_season.csv") I get the following error:
builtins.IOError: [Errno 2] No such file or directory:'player_regular_season.csv'
For the first try, you can put the data file to the same directory with the Python program and launch it from the directory.
Try also a single purpose script to learn how to work with directories. Learn the functions from the standard doc 15.1.5. Files and Directories, namely os.getcwd(), os.chdir(path), and then 10.1. os.path — Common pathname manipulations, namely os.path.isfile(path).
But read also the doc of other functions in the documents to learn what is available.
When knowing how to work with filenames and paths, have a look at the 13.1. csv — CSV File Reading and Writing. Not to be scared of all the stuff, start from the end -- 13.1.5. Examples of using the csv module.