django storages with boto using pirvate ACL throws 404 on save

django storages with boto using pirvate ACL throws 404 on save - acl

I am using django-storages with boto. Everything works fine if I let storages handle S3 file uploads in my model as public.
However when I set the ACL to private on save/update I get this error message
S3ResponseError: 404 Not Found
<?xml version="1.0" encoding="UTF-8"?>
<Error><Code>NoSuchKey</Code><Message>The specified key does not exist.</Message><Key>https:/s3.amazonaws.com/mahbuckit/mods/1366814943/1363379259-re6pc-x-l4d2-the-witch-psd-jpgcopy.zip</Key><RequestId>9631D1222C18F323</RequestId><HostId>bmMgn75bqITigKJWM7L7JrjN2TcsPCslOt9d3LX6WvzxWbHcdBfeqBIdFSZsmhXW</HostId></Error>
This happens on Add/update of record.
This is my save part for the Model where I have FileFIeld. I override to set the acl to private.
def save(self, *args, **kwargs):
super(MyModel, self).save(*args, **kwargs)
if self.file:
conn = boto.connect_s3(settings.AWS_ACCESS_KEY_ID,settings.AWS_SECRET_ACCESS_KEY)
bucket = conn.create_bucket(settings.AWS_STORAGE_BUCKET_NAME)
k = boto.s3.key.Key(bucket)
k.key = settings.MEDIA_URL + self.file.name
k.set_acl('private')
However the file saves all ok. It's the damn errors.

I found the problem. In reference to the good person whose work I used and modified http://www.gyford.com/phil/writing/2012/09/26/django-s3-temporary.php
I noticed that I construct the Key with URL. authors points out in twitter. Which is the whole reason for the error message. Lack of sleep but the error message clearly says missing Key but shows a URL instead of a key. That my problem right there. Key should be a file or path+file name.

It looks like from what I've tested, you need the k.key to reflect self.file.name:
k.key = self.file.name

Related

pyarrow read_csv - how to fill trailing optional columns with nulls

I can't find an option or workaround for this using pyarrow.csv.read_csv and there are many other reasons why using pandas doesn't work for us.
We have csv files with final columns that are effectively optional, and the source data doesn't always include empty cells for them, for example:
name,date,serial_number,prior_name,comments
A,2021-01-01,1234
B,2021-01-02,1235,A,Name changed for new version
C,2021-01-02,1236,B
This fails with an error like pyarrow.lib.ArrowInvalid: CSV parse error: Expected 5 columns, got 3:
I've got to assume that pyarrow can handle this, but I can't see how. Even the invalid row handler doesn't appear to let me return the "appropriate" value, only to "skip" these rows. That would even be okay if I could save them and append later, but as arrow tables are immutable, it just seems like there should be a more straightforward way to handle these cases.

As Pace noted, this is unfortunately not presently available in pyarrow. We actually process the "csv" files on the fly after extracting from a zip, and so creating intermediate files wasn't an option.
For anyone else needing a quick-ish way to handle this, I was able to get around this (and a couple other issues) by creating a class to wrap the stream and overwrite read(size, *args, **kwargs) to quickly perform the stripping. Even with the middleman class, it's faster than attempting to load in pandas (and there are
several other reasons why we aren't using pandas here).
Here's a template example:
class StreamWrapper():
def __init__(self, obj=None):
self.delegation = obj
self.header = None
def __getattr__(self, *args, **kwargs):
# any other call is delegated to the stream/object
return(self.delegation.__getattribute__(*args, **kwargs))
def read(self, size=None, *args, **kwargs):
bytedata = self.delegation.read(size, *args, **kwargs)
# .. the rest of the logic pre-processes the byte data,
# identifies the header and number of columns (which are retained persistently),
# and then strips out extra columns when encountered
return(bytedata)
This allow our call to be:
df = pyarrow.csv.read_csv(
StreamWrapper(zipfile_stream),
parse_options = ...
)

How do I get django "Data too long for column '<column>' at row" errors to print the actual value?

I have a Django application. Sometimes in production I get an error when uploading data that one of the values is too long. It would be very helpful for debugging if I could see which value was the one that went over the limit. Can I configure this somehow? I'm using MySQL.
It would also be nice if I could enable/disable this on a per-model or column basis so that I don't leak user data to error logs.

When creating model instances from outside sources, one must take care to validate the input or have other guarantees that this data cannot violate constraints.
When not calling at least full_clean() on the model, but directly calling save, one bypasses Django's validators and will only get alerted to the problem by the database driver at which point it's harder to obtain diagnostics:
class JsonImportManager(models.Manager):
def import(self, json_string: str) -> int:
data_list = json.loads(json_string) # list of objects => list of dicts
failed = 0
for data in data_list:
obj = self.model(**data)
try:
obj.full_clean()
except ValidationError as e:
print(e.message_dict) # or use better formatting function
failed += 1
else:
obj.save()
return failed
This is of course very simple, but it's a good boilerplate to get started with.

Cherrypy assignment confusion

I am a student and struggling with an assignment. Our professor wants us to create a server using cherrypy, and her test script to test the server's functionality sends json data to our sever and we must decode it. She gave us a line to help us, yet I have no idea how to use it. This is what it says in my code. What does the cherrypy.request line do, and how does it accept incoming json data? Thanks :-)
def GET(self):
output = { 'result': 'success'}
#this is the line she wants us to use below
body = cherrypy.request.body.read(int(cherrypy.request.headers['Content-Length']))
try:
body = json.loads(body)
except KeyError as ex:
output['results'] = 'error'
output['message'] = str(ex)
return json.dumps(body)

R - Twitter Extraction - Error in .subset2(x, i, exact=exact)

I am making an R-script to get all of the mentions (#username) of a specific set of users.
My first issue isn't a big deal. I try to work at home, as well as work. At work, the code works fine. At home, I get Error 32 - Could not authenticate you from Oauth. This is using the exact same code, key, secret, token. I have tried resetting my secret key/token, same thing. Not a problem, since I can do remote login, but its frustrating.
The REAL issue here...
I construct a URL (ex: final_url = "https://api.twitter.com/1.1/search/tweets.json?q=#JimFKenney&until=2015-10-25&result_type=recent&count=100")
Then I search twitter for my query of #usernameDesired to get all the comments where they were mentioned.
mentions = GET(final_url, sig)
This works fine, but then I want my data in a usable format so I do...
library(rjson)
#install.packages("jsonlite", repos="http://cran.rstudio.com/")
library(jsonlite)
#install.packages("bit64", repos="http://cran.rstudio.com/")
json = content(mentions)
I then get the following error -
$statuses
Error in .subset2(x, i, exact = exact) : subscript out of bounds
I don't have even the first idea of what can be causing this.
Any help is gratly appreciated.
EDIT 1: For Clarity, I get the error when trying to see what is in json. If I do "json = content(mentions)" that line of code executes fine. I then type "json" to see what is in the variable, and I get the above error that starts with $statuses.

How to upload multiple JSON files into CouchDB

I am new to CouchDB. I need to get 60 or more JSON files in a minute from a server.
I have to upload these JSON files to CouchDB individually as soon as I receive them.
I installed CouchDB on my Linux machine.
I hope some one can help me with my requirement.
If possible can someone help me with pseudo code.
My Idea:
Is to write a python script for uploading all JSON files to CouchDB.
Each and every JSON file must be each document and the data present in
JSON must be inserted same into CouchDB
(the specified format with values in a file).
Note:
These JSON files are Transactional, every second 1 file is generated
so I need to read the file upload as same format into CouchDB on
successful uploading archive the file into local system of different folder.

python program to parse the json and insert into CouchDb:
import sys
import glob
import errno,time,os
import couchdb,simplejson
import json
from pprint import pprint
couch = couchdb.Server() # Assuming localhost:5984
#couch.resource.credentials = (USERNAME, PASSWORD)
# If your CouchDB server is running elsewhere, set it up like this:
couch = couchdb.Server('http://localhost:5984/')
db = couch['mydb']
path = 'C:/Users/Desktop/CouchDB_Python/Json_files/*.json'
#dirPath = 'C:/Users/VijayKumar/Desktop/CouchDB_Python'
files = glob.glob(path)
for file1 in files:
#dirs = os.listdir( dirPath )
file2 = glob.glob(file1)
for name in file2: # 'file' is a builtin type, 'name' is a less-ambiguous variable name.
try:
with open(name) as f: # No need to specify 'r': this is the default.
#sys.stdout.write(f.read())
json_data=f
data = json.load(json_data)
db.save(data)
pprint(data)
json_data.close()
#time.sleep(2)
except IOError as exc:
if exc.errno != errno.EISDIR: # Do not fail if a directory is found, just ignore it.
raise # Propagate other kinds of IOError.

I would use CouchDB bulk API, even though you have specified that you need to send them to db one by one. For example, by implementing a simple queue that gets sent out every say 5 - 10 seconds via a bulk doc call will greatly increase performance of your application.
There is obviously a quirk in that and that is you need to know the IDs of the docs that you want to get from the DB. But for the PUTs it is perfect. (it is not entirely true, you can get ranges of docs using bulk operation if the IDs you are using for your docs can be sorted nicely).
From my experience working with CouchDB, I have a hunch that you are dealing with Transactional documents in order to compile them into some sort of sum result and act on that data accordingly (maybe creating next transactional doc in series). For that you can rely on CouchDB by using 'reduce' functions on the views you create. It takes a little practice to get reduce function working properly and is highly dependent on what it is you actually what to achieve and what data you are prepared to emit by the view so I can't really provide you with more detail on that.
So in the end the app logic would go something like that:
get _design/someDesign/_view/yourReducedView
calculate new transaction
add transaction to queue
onTimeout
send all in transaction queue
If I got that first part of why you are using transactional docs wrong all that would really change is the part where you getting those transactional docs in my app logic.
Also, before writing your own 'reduce' function, have a look at buil-in ones (they are alot faster then anything outside of db engine can do)
http://wiki.apache.org/couchdb/HTTP_Bulk_Document_API
EDIT:
Since you are starting, I strongly recommend to have a look at CouchDB Definitive Guide.
NOTE FOR LATER:
Here is one hidden stone (well maybe not so much a hidden stone but not an obvious thing to look out for for the new-comer in any case). When you write reduce function make sure that it does not produce too much output for the query without boundaries. This will extremely slow down the entire view even when you provide reduce=false when getting stuff from it.

So you need to get JSON documents from a server and send them to CouchDB as you receive them. A Python script would work fine. Here is some pseudo-code:
loop (until no more docs)
get new JSON doc from server
send JSON doc to CouchDB
end loop
In Python, you could use requests to send the documents to CouchDB and probably to get the documents from the server as well (if it is using an HTTP API).

You might want to checkout the pycouchdb module for python3. I've used it myself to upload lots of JSON objects into couchdb instance. My project does pretty much the same as you describe so you can take a look at my project Pyro at Github for details.
My class looks like that:
class MyCouch:
""" COMMUNICATES WITH COUCHDB SERVER """
def __init__(self, server, port, user, password, database):
# ESTABLISHING CONNECTION
self.server = pycouchdb.Server("http://" + user + ":" + password + "#" + server + ":" + port + "/")
self.db = self.server.database(database)
def check_doc_rev(self, doc_id):
# CHECKS REVISION OF SUPPLIED DOCUMENT
try:
rev = self.db.get(doc_id)
return rev["_rev"]
except Exception as inst:
return -1
def update(self, all_computers):
# UPDATES DATABASE WITH JSON STRING
try:
result = self.db.save_bulk( all_computers, transaction=False )
sys.stdout.write( " Updating database" )
sys.stdout.flush()
return result
except Exception as ex:
sys.stdout.write( "Updating database" )
sys.stdout.write( "Exception: " )
print( ex )
sys.stdout.flush()
return None
Let me know in case of any questions - I will be more than glad to help if you will find some of my code usable.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

django storages with boto using pirvate ACL throws 404 on save - acl

It looks like from what I've tested, you need the k.key to reflect self.file.name: k.key = self.file.name

Related

pyarrow read_csv - how to fill trailing optional columns with nulls

How do I get django "Data too long for column '<column>' at row" errors to print the actual value?

Cherrypy assignment confusion

R - Twitter Extraction - Error in .subset2(x, i, exact=exact)

How to upload multiple JSON files into CouchDB

Categories

Resources