Jython: test to prevent exception "Cannot create PyString with non-byte value"? - exception

Jython 2.7.0 (final release). OS: W7 (64-bit)
this code:
keys = javax.swing.UIManager.getDefaults().keys()
while keys.hasMoreElements():
key = keys.nextElement()
logger.info( "=== key %s" % str( key ) )
try:
value = javax.swing.UIManager.get(key)
except java.lang.Throwable, t:
logger.error( "=== thrown %s" % str( t ) )
produces all sorts of keys... until it outputs
=== key PasswordField.echoChar
it then throws
java.lang.IllegalArgumentException: Cannot create PyString with
non-byte value
I'm aware this is a known bug in Jython ... just wondering if there is a way of testing for this before the exception is thrown?

For me this gets triggered when using print() directly on a Java HashMap that contains any value with a Unicode character. A simple Python version of isBytes from PyString class is one way to detect it, but frankly, I don't think that is a good option unless you (a) know what element of the data that is triggering the issue and/or (b) intend to mask or fix the values triggering the issue. Probably the best solution is to just catch the exception.
This definitely effects Jython >= 2.7 and is a very annoying bug when troubleshooting. For me I just commented out the IllegalArgumentException in the PyString class code and recompiled. Now Jython will happily print out HashMaps directly and replace the Unicode characters with ? just like it did in previous versions. I would guess that this causes issues somewhere, maybe with code dealing with a lot of Unicode or something, but I haven't found any issues as of yet.
Catch Exception:
from java.lang import IllegalArgumentException
keys = javax.swing.UIManager.getDefaults().keys()
while keys.hasMoreElements():
key = keys.nextElement()
logger.info( "=== key %s" % str( key ) )
try:
value = javax.swing.UIManager.get(key)
except java.lang.Throwable, t:
try:
logger.error( "=== thrown %s" % str( t ) )
except IllegalArgumentException:
# fix it or w/e
Python version of isBytes:
def test_if_char(value):
for e in value:
if ord(e) > 255: return False
return True

In case you use Jython 2.7.0, one could use the following code to use any of Unicode strings within your code:
PyString str = Py.newStringOrUnicode("颜军")

Related

Python 3 psycopg2 COPY from stdin failed: error in .read()

I am trying to apply the code found on this page, in particular part 'Copy Data from String Iterator' of the Table of Contents, but run into an issue with my code.
Since not all lines coming from the generator (here log_lines) can be imported into the PostgreSQL database, I try to filter the correct lines (here row) using itertools.filterfalse like in the codeblock below:
def copy_string_iterator(connection, log_lines) -> None:
with connection.cursor() as cursor:
create_staging_table(cursor)
log_string_iterator = StringIteratorIO((
'|'.join(map(clean_csv_value, (
row['date'],
row['time'],
row['cs_uri_query'],
row['s_contentpath'],
row['sc_status'],
row['s_computername'],
...
row['sc_substates'],
row['s_port'],
row['cs_version'],
row['c_protocol'],
row.update({'cs_cookie':'x'}),
row['timetakenms'],
row['cs_uri_stem'],
))) + '\n')
for row in filterfalse(lambda line: "#" in line.get('date'), log_lines)
)
cursor.copy_from(log_string_iterator, 'log_table', sep = '|')
When I run this, cursor.copy_from() gives me the following error:
QueryCanceled: COPY from stdin failed: error in .read() call
CONTEXT: COPY log_table, line 112910
I understand why this error happens, it is because in the test file I use there are only 112909 lines that meet the filterfalse condition. But why does it try to copy line 112910 and throw the error and not just stop?
Since Python doesn't have a coalescing operator, add something like:
(map(clean_csv_value, (
row['date'] if 'date' in row else None,
:
row['cs_uri_stem'] if 'cs_uri_stem' in row else None,
))) + '\n')
for each of your fields so you can handle any missing fields in the JSON file. Of course the fields should be nullable in the db if you use None otherwise replace with None with some default value for that field.

cythonize under py3.6.4 Cannot convert 'basestring' object to bytes implicitly. This is not portable

This code snippet works just fine under python 3.6.4 but is triggering a portability issue when present in .pyx files. I could use some help figuring out how to best format python 3.5.1+ bytes in Cython.
EDIT: Changing this in light of DavidW's comment.
Following works in python 3.6.4 under ipython
def py_foo():
bytes_1 = b'bytes 1'
bytes_2 = b'bytes 2'
return b'%(bytes_1)b %(bytes_2)b' % {
b'bytes_1': bytes_1,
b'bytes_2': bytes_2}
As hoped this results in:
print(py_foo())
b'bytes 1 bytes 2'
Using cython with the only changes to the code being the name of the function, a return type declared, and declaring the two variables.
%load_ext Cython
# Cython==0.28
followed by:
%%cython
cpdef bytes cy_foo():
cdef:
bytes bytes_1, bytes_2
bytes_1 = b'bytes 1'
bytes_2 = b'bytes 2'
return b'%(bytes_1)b %(bytes_2)b' % {
b'bytes_1': bytes_1,
b'bytes_2': bytes_2}
Results in:
Error compiling Cython file:
....
return b'%(bytes_1)b %(bytes_2)b' % {
^
..._cython_magic_b0aa5be86bdfdf75b98df1af1a2394af.pyx:7:38: Cannot convert 'basestring' object to bytes implicitly. This is not portable.
-djv
I'm not sure if this is a useful answer or just a more detailed diagnosis, but: the issue is with the return type. If you do:
cpdef cy_foo1(): # no return type specified
# everything else exactly the same
then it's happy. If you do
cpdef bytes cy_foo2():
# everything else the same
return bytes(b'%(bytes_1)b %(bytes_2)b' % {
b'bytes_1': bytes_1,
b'bytes_2': bytes_2})
then it's happy. If you do
def mystery_function_that_returns_not_bytes():
return 1
cpdef bytes cy_foo3():
return mystery_function_that_returns_not_bytes()
then it compiles happily but gives a runtime exception (as you would expect)
The issue seems to be that it knows bytes % something returns a basestring but it isn't confident that it returns bytes and isn't prepared to leave it until runtime to try (unlike the cases where it's totally sure, or completely unsure, when it will leave it until runtime).
The above examples show a couple of ways of working round it. Personally, I'd just remove the return type - you don't get a lot of benefit from typing Python objects such as bytes anyway. You should probably also report this as a bug to https://github.com/cython/cython/issues

Exception handling using mysql with twisted adbapi and scrapy

I'm using this scrapy pipeline. If there is any error in the sql in the insert_record function, it fails silently. For example, if a column name is miss-spelled, like this
def _insert_record(self, tx, item):
print "before tx.execute"
result = tx.execute(
""" INSERT INTO table(col_one, col_typo, col_three) VALUES (1,2,3)"""
)
print "after tx.execute"
if result > 0:
self.stats.inc_value('database/items_added')
then nothing is output after "before execute". There is a handle_error method but that's not called either. How can I catch and handle such errors?
Just needed to surround it with try...except
try:
result = tx.execute(
"""INSERT INTO table(col_one, col_typo, col_three) VALUES (1,2,3)"""
)
except Exception,e:
print str(e)

Error in fromJSON(paste(raw.data, collapse = "")) : unclosed string

I am using the R package rjson to download weather data from Wunderground.com. Often I leave the program to run and there are no problems, with the data being collected fine. However, often the program stops running and I get the following error message:
Error in fromJSON(paste(raw.data, collapse = "")) : unclosed string
In addition: Warning message:
In readLines(conn, n = -1L, ok = TRUE) :
incomplete final line found on 'http://api.wunderground.com/api/[my_API_code]/history_20121214pws:1/q/pws:IBIRMING7.json'
Does anyone know what this means, and how I can avoid it since it stops my program from collecting data as I would like?
Many thanks,
Ben
I can recreate your error message using the rjson package.
Here's an example that works.
rjson::fromJSON('{"x":"a string"}')
# $x
# [1] "a string"
If we omit a double quote from the value of x, then we get the error message.
rjson::fromJSON('{"x":"a string}')
# Error in rjson::fromJSON("{\"x\":\"a string}") : unclosed string
The RJSONIO package behaves slightly differently. Rather than throwing an error, it silently returns a NULL value.
RJSONIO::fromJSON('{"x":"a string}')
# $x
# NULL

Is there a tool to check database integrity in Django?

The MySQL database powering our Django site has developed some integrity problems; e.g. foreign keys that refer to nonexistent rows. I won't go into how we got into this mess, but I'm now looking at how to fix it.
Basically, I'm looking for a script that scans all models in the Django site, and checks whether all foreign keys and other constraints are correct. Hopefully, the number of problems will be small enough so they can be fixed by hand.
I could code this up myself but I'm hoping that somebody here has a better idea.
I found django-check-constraints but it doesn't quite fit the bill: right now, I don't need something to prevent these problems, but to find them so they can be fixed manually before taking other steps.
Other constraints:
Django 1.1.1 and upgrading has been determined to break things
MySQL 5.0.51 (Debian Lenny), currently with MyISAM tables
Python 2.5, might be upgradable but I'd rather not right now
(Later, we will convert to InnoDB for proper transaction support, and maybe foreign key constraints on the database level, to prevent similar problems in the future. But that's not the topic of this question.)
I whipped up something myself. The management script below should be saved in myapp/management/commands/checkdb.py. Make sure that intermediate directories have an __init__.py file.
Usage: ./manage.py checkdb for a full check; use --exclude app.Model or -e app.Model to exclude the model Model in the app app.
from django.core.management.base import BaseCommand, CommandError
from django.core.management.base import NoArgsCommand
from django.core.exceptions import ObjectDoesNotExist
from django.db import models
from optparse import make_option
from lib.progress import with_progress_meter
def model_name(model):
return '%s.%s' % (model._meta.app_label, model._meta.object_name)
class Command(BaseCommand):
args = '[-e|--exclude app_name.ModelName]'
help = 'Checks constraints in the database and reports violations on stdout'
option_list = NoArgsCommand.option_list + (
make_option('-e', '--exclude', action='append', type='string', dest='exclude'),
)
def handle(self, *args, **options):
# TODO once we're on Django 1.2, write to self.stdout and self.stderr instead of plain print
exclude = options.get('exclude', None) or []
failed_instance_count = 0
failed_model_count = 0
for app in models.get_apps():
for model in models.get_models(app):
if model_name(model) in exclude:
print 'Skipping model %s' % model_name(model)
continue
fail_count = self.check_model(app, model)
if fail_count > 0:
failed_model_count += 1
failed_instance_count += fail_count
print 'Detected %d errors in %d models' % (failed_instance_count, failed_model_count)
def check_model(self, app, model):
meta = model._meta
if meta.proxy:
print 'WARNING: proxy models not currently supported; ignored'
return
# Define all the checks we can do; they return True if they are ok,
# False if not (and print a message to stdout)
def check_foreign_key(model, field):
foreign_model = field.related.parent_model
def check_instance(instance):
try:
# name: name of the attribute containing the model instance (e.g. 'user')
# attname: name of the attribute containing the id (e.g. 'user_id')
getattr(instance, field.name)
return True
except ObjectDoesNotExist:
print '%s with pk %s refers via field %s to nonexistent %s with pk %s' % \
(model_name(model), str(instance.pk), field.name, model_name(foreign_model), getattr(instance, field.attname))
return check_instance
# Make a list of checks to run on each model instance
checks = []
for field in meta.local_fields + meta.local_many_to_many + meta.virtual_fields:
if isinstance(field, models.ForeignKey):
checks.append(check_foreign_key(model, field))
# Run all checks
fail_count = 0
if checks:
for instance in with_progress_meter(model.objects.all(), model.objects.count(), 'Checking model %s ...' % model_name(model)):
for check in checks:
if not check(instance):
fail_count += 1
return fail_count
I'm making this a community wiki because I welcome any and all improvements to my code!
Thomas' answer is great but is now a bit out of date.
I have updated it as a gist to support Django 1.8+.