IMDbPY Importing to MySQL Problem - mysql

I am having a problem while trying to import imdb data from text files to MySQL database using imdbpy2sql.py script.
It throws the following error. It seems like the exception handler code tries to insert a duplicated record into cast_info table with existing primary key.
Can anyone help me fix this problem or suggest any work around solution?
SCANNING actor: Hall, Stephan
SCANNING actor: Halsey, William F.
* FLUSHING CharactersCache...
* TOO MANY DATA (100000 items in CharactersCache), recursion: 1
* SPLITTING (run 1 of 2), recursion: 1
* FLUSHING CharactersCache...
Traceback (most recent call last):
File "D:\project\IMDB\IMDbPY-4.7\bin\imdbpy2sql.py", line 2951, in <module>
run()
File "D:\project\IMDB\IMDbPY-4.7\bin\imdbpy2sql.py", line 2812, in run
castLists(_charIDsList=characters_imdbIDs)
File "D:\project\IMDB\IMDbPY-4.7\bin\imdbpy2sql.py", line 1576, in castLists
doCast(f, roleid, rolename)
File "D:\project\IMDB\IMDbPY-4.7\bin\imdbpy2sql.py", line 1535, in doCast
cid = CACHE_CID.addUnique(role)
File "D:\project\IMDB\IMDbPY-4.7\bin\imdbpy2sql.py", line 957, in addUnique
else: return self.add(key, miscData)
File "D:\project\IMDB\IMDbPY-4.7\bin\imdbpy2sql.py", line 950, in add
self[key] = c
File "D:\project\IMDB\IMDbPY-4.7\bin\imdbpy2sql.py", line 860, in __setitem__
self.flush()
File "D:\project\IMDB\IMDbPY-4.7\bin\imdbpy2sql.py", line 912, in flush
self.flush(quiet=quiet, _recursionLevel=_recursionLevel)
File "D:\project\IMDB\IMDbPY-4.7\bin\imdbpy2sql.py", line 883, in flush
self._toDB(quiet)
File "D:\project\IMDB\IMDbPY-4.7\bin\imdbpy2sql.py", line 1186, in _toDB
CURS.executemany(self.sqlstr, self.converter(l))
File "C:\Python27\lib\site-packages\MySQLdb\cursors.py", line 206, in executemany
r = r + self.execute(query, a)
File "C:\Python27\lib\site-packages\MySQLdb\cursors.py", line 174, in execute
self.errorhandler(self, exc, value)
File "C:\Python27\lib\site-packages\MySQLdb\connections.py", line 36, in defaulterrorhandler
raise errorclass, errorvalue
_mysql_exceptions.IntegrityError: (1062, "Duplicate entry '745684' for key 'PRIMARY'")

A lot of similar problems were fixed in the development version.
Please try again with the version in the Mercurial repository:
https://bitbucket.org/alberanid/imdbpy/

Related

Having "make_aware expects a naive datetime" while migrate

I have developed an application with Django.
This is working fine in my PC with sqlite backend.
But when I am trying to go live with linux server and mysql backend then I am getting bellow error while first time migration.
(env-bulkmailer) [root#localhost bulkmailer]# python3 manage.py migrate
Traceback (most recent call last):
File "/var/www/bulkmailer-folder/bulkmailer/manage.py", line 22, in <module>
main()
File "/var/www/bulkmailer-folder/bulkmailer/manage.py", line 18, in main
execute_from_command_line(sys.argv)
File "/var/www/bulkmailer-folder/env-bulkmailer/lib64/python3.9/site-packages/django/core/management/__init__.py", line 446, in execute_from_command_line
utility.execute()
File "/var/www/bulkmailer-folder/env-bulkmailer/lib64/python3.9/site-packages/django/core/management/__init__.py", line 440, in execute
self.fetch_command(subcommand).run_from_argv(self.argv)
File "/var/www/bulkmailer-folder/env-bulkmailer/lib64/python3.9/site-packages/django/core/management/base.py", line 402, in run_from_argv
self.execute(*args, **cmd_options)
File "/var/www/bulkmailer-folder/env-bulkmailer/lib64/python3.9/site-packages/django/core/management/base.py", line 448, in execute
output = self.handle(*args, **options)
File "/var/www/bulkmailer-folder/env-bulkmailer/lib64/python3.9/site-packages/django/core/management/base.py", line 96, in wrapped
res = handle_func(*args, **kwargs)
File "/var/www/bulkmailer-folder/env-bulkmailer/lib64/python3.9/site-packages/django/core/management/commands/migrate.py", line 114, in handle
executor = MigrationExecutor(connection, self.migration_progress_callback)
File "/var/www/bulkmailer-folder/env-bulkmailer/lib64/python3.9/site-packages/django/db/migrations/executor.py", line 18, in __init__
self.loader = MigrationLoader(self.connection)
File "/var/www/bulkmailer-folder/env-bulkmailer/lib64/python3.9/site-packages/django/db/migrations/loader.py", line 58, in __init__
self.build_graph()
File "/var/www/bulkmailer-folder/env-bulkmailer/lib64/python3.9/site-packages/django/db/migrations/loader.py", line 235, in build_graph
self.applied_migrations = recorder.applied_migrations()
File "/var/www/bulkmailer-folder/env-bulkmailer/lib64/python3.9/site-packages/django/db/migrations/recorder.py", line 82, in applied_migrations
return {
File "/var/www/bulkmailer-folder/env-bulkmailer/lib64/python3.9/site-packages/django/db/models/query.py", line 394, in __iter__
self._fetch_all()
File "/var/www/bulkmailer-folder/env-bulkmailer/lib64/python3.9/site-packages/django/db/models/query.py", line 1866, in _fetch_all
self._result_cache = list(self._iterable_class(self))
File "/var/www/bulkmailer-folder/env-bulkmailer/lib64/python3.9/site-packages/django/db/models/query.py", line 117, in __iter__
for row in compiler.results_iter(results):
File "/var/www/bulkmailer-folder/env-bulkmailer/lib64/python3.9/site-packages/django/db/models/sql/compiler.py", line 1336, in apply_converters
value = converter(value, expression, connection)
File "/var/www/bulkmailer-folder/env-bulkmailer/lib64/python3.9/site-packages/django/db/backends/mysql/operations.py", line 331, in convert_datetimefield_value
value = timezone.make_aware(value, self.connection.timezone)
File "/var/www/bulkmailer-folder/env-bulkmailer/lib64/python3.9/site-packages/django/utils/timezone.py", line 291, in make_aware
raise ValueError("make_aware expects a naive datetime, got %s" % value)
ValueError: make_aware expects a naive datetime, got 2022-11-20 12:39:18.866299+00:00
In settings-
USE_TZ = True
I have run mysql_tzinfo_to_sql /usr/share/zoneinfo | mysql -u root mysql also as django doc.
I am using django 4.1.3 and mysql community 8.0.30
Thanks in advance.
Ran into the same issue. At some point, django assumes that the the data is timezone-naive without checking. Here's the work-around.
Update the make_aware function that is listed in your stack trace here:
/var/www/bulkmailer-folder/env-bulkmailer/lib64/python3.9/site-packages/django/utils/timezone.py", line 291, in make_aware
Instead of raising an error if the value is already aware, just return the aware value. See the last else statement below.
def make_aware(value, timezone=None, is_dst=NOT_PASSED):
"""Make a naive datetime.datetime in a given time zone aware."""
if is_dst is NOT_PASSED:
is_dst = None
else:
warnings.warn(
"The is_dst argument to make_aware(), used by the Trunc() "
"database functions and QuerySet.datetimes(), is deprecated as it "
"has no effect with zoneinfo time zones.",
RemovedInDjango50Warning,
)
if timezone is None:
timezone = get_current_timezone()
if _is_pytz_zone(timezone):
# This method is available for pytz time zones.
return timezone.localize(value, is_dst=is_dst)
else:
# Check that we won't overwrite the timezone of an aware datetime.
if is_aware(value):
# ADD THIS
return value
# REMOVE THE FOLLOWING LINE
# raise ValueError("make_aware expects a naive datetime, got %s" % value)
# This may be wrong around DST changes!
return value.replace(tzinfo=timezone)

How to solve a duplicate column name error in web2py

I'm running web2py on pythonanywhere. I've started getting a duplicate column name error when the db.py file runs. I've tried restoring the database from a backup and also dropping the table and adding it back, without success. At this point I'm completely locked out of my app.
I've tried removing all but the last field in the table, but then the problem appears in the next table.
I'm wondering if web2py uses a cache that needs to be cleared.
Here is the relevant portion of my db.py file:
db.define_table('library',
Field('title', 'string'),
Field('created','datetime'),
Field('duration','float'),
Field('error_message','string'),
Field('external_id','string'),
Field('hosting_type','string'),
Field('source_id','string', unique = True),
Field('last_modified','datetime'),
Field('media_type','string'),
Field('mime_type','string'),
Field('relationships','string'),
Field('source_schema','string'),
Field('source_url','string'),
Field('status','string'),
Field('trim_in_point','string'),
Field('trim_out_point','string'),
Field('source_type','string'),
Field('poster','string'),
Field('background_poster_filename','string'),
Field('background_poster','upload', required=False, requires = IS_EMPTY_OR(IS_IMAGE(extensions=('png', 'jpg', 'jpeg'), maxsize=(1920, 1080)))),
Field('sources','list:string',length = 4096),
Field('tracks','string'),
singular="Library",
plural="Library"
)
Here is the error ticket:
Ticket ID
67.0.14.100.2022-06-23.16-49-09.ecf75fd3-1bb4-4f76-b557-95e072c6681f
<class 'gluon.contrib.pymysql.err.InternalError'> (1060, "Duplicate column name 'title'")
Version
web2py™ Version 2.21.1-stable+timestamp.2020.11.28.04.10.44
Python Python 3.7.10: /usr/local/bin/uwsgi (prefix: /home/ghdev/.virtualenvs/ghdevvirtualenv)
Traceback
Traceback (most recent call last):
File "/home/ghdev/web2py/gluon/restricted.py", line 219, in restricted
exec(ccode, environment)
File "/home/ghdev/web2py/applications/ghrokucms/models/db.py", line 283, in <module>
plural="Library"
File "/home/ghdev/web2py/gluon/packages/dal/pydal/base.py", line 660, in define_table
table = self.lazy_define_table(tablename, *fields, **kwargs)
File "/home/ghdev/web2py/gluon/packages/dal/pydal/base.py", line 701, in lazy_define_table
polymodel=polymodel,
File "/home/ghdev/web2py/gluon/packages/dal/pydal/adapters/base.py", line 920, in create_table
return self.migrator.create_table(*args, **kwargs)
File "/home/ghdev/web2py/gluon/packages/dal/pydal/migrator.py", line 376, in create_table
fake_migrate=fake_migrate,
File "/home/ghdev/web2py/gluon/packages/dal/pydal/migrator.py", line 544, in migrate_table
self.adapter.execute(sub_query)
File "/home/ghdev/web2py/gluon/packages/dal/pydal/adapters/__init__.py", line 69, in wrap
return f(*args, **kwargs)
File "/home/ghdev/web2py/gluon/packages/dal/pydal/adapters/base.py", line 468, in execute
rv = self.cursor.execute(command, *args[1:], **kwargs)
File "/home/ghdev/web2py/gluon/contrib/pymysql/cursors.py", line 166, in execute
result = self._query(query)
File "/home/ghdev/web2py/gluon/contrib/pymysql/cursors.py", line 322, in _query
conn.query(q)
File "/home/ghdev/web2py/gluon/contrib/pymysql/connections.py", line 835, in query
self._affected_rows = self._read_query_result(unbuffered=unbuffered)
File "/home/ghdev/web2py/gluon/contrib/pymysql/connections.py", line 1019, in _read_query_result
result.read()
File "/home/ghdev/web2py/gluon/contrib/pymysql/connections.py", line 1302, in read
first_packet = self.connection._read_packet()
File "/home/ghdev/web2py/gluon/contrib/pymysql/connections.py", line 981, in _read_packet
packet.check_error()
File "/home/ghdev/web2py/gluon/contrib/pymysql/connections.py", line 393, in check_error
err.raise_mysql_exception(self._data)
File "/home/ghdev/web2py/gluon/contrib/pymysql/err.py", line 107, in raise_mysql_exception
raise errorclass(errno, errval)
gluon.contrib.pymysql.err.InternalError: (1060, "Duplicate column name 'title'")
I'm at a loss to figure how to resolve this. Any help would be greatly appreciated.
Thanks.
I solved this problem. In addition to dropping the table, I needed to delete the associated file in the web2py database directory. I then had to manually add back the table to the mySql database.
SOLVED.

Error while using INSERT INTO table ON DUPLICATE KEY, using a for loop array

I am working on updating a mysql database using pyspark framework, and running on AWS Glue services.
I have a dataframe as follows:
df2= sqlContext.createDataFrame([("xxx1","81A01","TERR NAME 55","NY"),("xxx2","81A01","TERR NAME 55","NY"),("x103","81A01","TERR NAME 01","NJ")], ["zip_code","territory_code","territory_name","state"])
# Print out information about this data
df2.show()
+--------+--------------+--------------+-----+
|zip_code|territory_code|territory_name|state|
+--------+--------------+--------------+-----+
| xxx1| 81A01| TERR NAME 55| NY|
| xxx2| 81A01| TERR NAME 55| NY|
| x103| 81A01| TERR NAME 01| NJ|
+---------------------------------------------
I have a primary key ZIP_CODE, and I need to ensure, there is no duplicate keys, or primary key exceptions, and hence am using INSERT INTO .... ON DUPLICATE KEYS.
And since I have more than one rows to insert/update, I have used for array in python to loop through the records, and perform INSERT into database. The code is as follows:
sarry = df2.collect()
for r in sarry:
db = MySQLdb.connect("xxxx.rds.amazonaws.com", "username", "password",
"databasename")
cursor = db.cursor()
insertQry=INSERT INTO ZIP_TERR(zip_code, territory_code, territory_name,
state) VALUES(r.zip_code, r.territory_code, r.territory_name, r.state) ON
DUPLICATE KEY UPDATE territory_name = VALUES(territory_name), state =
VALUES(state);"
n=cursor.execute(insertQry)
db.commit()
db.close()
When running the above insert query function, I am getting the following error message, couldn't get any clue on the error. Please help.
Traceback (most recent call last):
File "/tmp/zeppelin_pyspark-2291407229037300959.py", line 367, in <module>
raise Exception(traceback.format_exc())
Exception: Traceback (most recent call last):
File "/tmp/zeppelin_pyspark-2291407229037300959.py", line 360, in <module>
exec(code, _zcUserQueryNameSpace)
File "<stdin>", line 8, in <module>
File "/usr/local/lib/python2.7/site-packages/pymysql/cursors.py", line 170, in execute
result = self._query(query)
File "/usr/local/lib/python2.7/site-packages/pymysql/cursors.py", line 328, in _query
conn.query(q)
File "/usr/local/lib/python2.7/site-packages/pymysql/connections.py", line 893, in query
self._affected_rows = self._read_query_result(unbuffered=unbuffered)
File "/usr/local/lib/python2.7/site-packages/pymysql/connections.py", line 1103, in _read_query_result
result.read()
File "/usr/local/lib/python2.7/site-packages/pymysql/connections.py", line 1396, in read
first_packet = self.connection._read_packet()
File "/usr/local/lib/python2.7/site-packages/pymysql/connections.py", line 1059, in _read_packet
packet.check_error()
File "/usr/local/lib/python2.7/site-packages/pymysql/connections.py", line 384, in check_error
err.raise_mysql_exception(self._data)
File "/usr/local/lib/python2.7/site-packages/pymysql/err.py", line 109, in raise_mysql_exception
raise errorclass(errno, errval)
InternalError: (1054, u"Unknown column 'r.zip_code' in 'field list'")
If i simply try to print the values for one row, am getting the values printed as follows:
print('zip_code_new: ', r.zip_code, r.territory_code, r.territory_name, r.state)
zip_code_new: xxx1 81A01 TERR NAME 55 NY
Thanks. I am working on AWS Glue/Pyspark, so I need to use native python libraries.
The following insert query works, with a for loop.
insertQry="INSERT INTO ZIP_TERR(zip_code, territory_code, territory_name, state) VALUES(%s, %s, %s, %s) ON DUPLICATE KEY UPDATE territory_name = %s, state = %s;
n=cursor.execute(insertQry, (r.zip_code, r.territory_code, r.territory_name, r.state, r.territory_name, r.state))
print (" CURSOR status :", n)
Result output:
CURSOR status : 2
Thanks. Hope this will be of reference to others.

Django mysql when saving a object got warning

I was just writing a little app to store words in my mysql database using Django.I read data from a text file which is extremely well organised,like this:
The text file is like this:
DELUGE
DELUSION
DELVE
DEMAGOGUE
DEMANDING
DEMOLITION
DEMONSTRATE
DEMORALIZE
DEMOTIC
DEMUR
DENIGRATE
DENOUEMENT
DENOUNCE
DENT
DENUDE
DEPLETE
DEPLORE
DEPLOY
And then I read date from it using open('thefile').readlines like this:
for line in open('/home/jacos/sorted-gre.txt').readlines():
... if line:
... p = Word(word_spelling = line)
... p.save()
The word_spelling field is the primary key.
Then came this warning:
Traceback (most recent call last):
File "<console>", line 4, in <module>
File "/usr/local/lib/python2.7/dist-packages/django/db/models/base.py", line 460, in save
self.save_base(using=using, force_insert=force_insert, force_update=force_update)
File "/usr/local/lib/python2.7/dist-packages/django/db/models/base.py", line 553, in save_base
result = manager._insert(values, return_id=update_pk, using=using)
File "/usr/local/lib/python2.7/dist-packages/django/db/models/manager.py", line 195, in _insert
return insert_query(self.model, values, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/django/db/models/query.py", line 1436, in insert_query
return query.get_compiler(using=using).execute_sql(return_id)
File "/usr/local/lib/python2.7/dist-packages/django/db/models/sql/compiler.py", line 791, in execute_sql
cursor = super(SQLInsertCompiler, self).execute_sql(None)
File "/usr/local/lib/python2.7/dist-packages/django/db/models/sql/compiler.py", line 735, in execute_sql
cursor.execute(sql, params)
File "/usr/local/lib/python2.7/dist-packages/django/db/backends/util.py", line 34, in execute
return self.cursor.execute(sql, params)
File "/usr/local/lib/python2.7/dist-packages/django/db/backends/mysql/base.py", line 86, in execute
return self.cursor.execute(query, args)
File "/usr/lib/pymodules/python2.7/MySQLdb/cursors.py", line 176, in execute
if not self._defer_warnings: self._warning_check()
File "/usr/lib/pymodules/python2.7/MySQLdb/cursors.py", line 92, in _warning_check
warn(w[-1], self.Warning, 3)
Data truncated for column 'word_spelling' at row 1
As a result, only part of these words were stored in mysql. I'd like to know why.
CharFields have a max_length attribute. What did you set when you generate database
for Word.object.get(pk=1).word_spelling?
nothing related with your warning but
it's recommended to close file or open it with a with statement.
with open('/home/jacos/sorted-gre.txt') as f:
for line in f.readlines():
if line:
p = Word(word_spelling = line)
p.save()

Mercurial Push Error on Google Code Value Error

I am trying to learn to use mercurial by pushing onto Google code.
I have two .hgrc files: One file is located $PROJECT_DIR/.hg/.hrgc and $HOME/.hgrc. I have two separate files because I did not want to put the password out on the central repository.
Here is the content of $PROJECT_DIR/.hg/.hrgc:
[ui]
usermane=Venkat S. Rao <vrao423#gmail.com>
verbose=true
[paths]
default-push =https:vrao423:gc4yy3vB3mc4#//personal-site423.googlecode.com/hg/us
Here is the content of $HOME/.hgrc:
[ui]
username= Venkat Rao <vrao423#gmail.com>
verbose=True
[auth]
project.prefix=https://personal-site423.googlecode.com/hg/
password=###
username=vrao423
For username I have my Gmail id.
I can commit changes to my local repository, but when I try hg push I get this error.
** unknown exception encountered, details follow
** report bug details to http://mercurial.selenic.com/bts/
** or mercurial#selenic.com
** Mercurial Distributed SCM (version 1.4.3)
** Extensions loaded:
Traceback (most recent call last):
File "/usr/bin/hg", line 27, in
mercurial.dispatch.run()
File "/usr/lib/pymodules/python2.6/mercurial/dispatch.py", line 16, in run
sys.exit(dispatch(sys.argv[1:]))
File "/usr/lib/pymodules/python2.6/mercurial/dispatch.py", line 30, in dispatch
return _runcatch(u, args)
File "/usr/lib/pymodules/python2.6/mercurial/dispatch.py", line 46, in _runcatch
return _dispatch(ui, args)
File "/usr/lib/pymodules/python2.6/mercurial/dispatch.py", line 454, in _dispatch
return runcommand(lui, repo, cmd, fullargs, ui, options, d)
File "/usr/lib/pymodules/python2.6/mercurial/dispatch.py", line 324, in runcommand
ret = _runcommand(ui, options, cmd, d)
File "/usr/lib/pymodules/python2.6/mercurial/dispatch.py", line 505, in _runcommand
return checkargs()
File "/usr/lib/pymodules/python2.6/mercurial/dispatch.py", line 459, in checkargs
return cmdfunc()
File "/usr/lib/pymodules/python2.6/mercurial/dispatch.py", line 453, in
d = lambda: util.checksignature(func)(ui, *args, **cmdoptions)
File "/usr/lib/pymodules/python2.6/mercurial/util.py", line 386, in check
return func(*args, **kwargs)
File "/usr/lib/pymodules/python2.6/mercurial/commands.py", line 2345, in push
other = hg.repository(cmdutil.remoteui(repo, opts), dest)
File "/usr/lib/pymodules/python2.6/mercurial/hg.py", line 63, in repository
repo = _lookup(path).instance(ui, path, create)
File "/usr/lib/pymodules/python2.6/mercurial/httprepo.py", line 263, in instance
inst.between([(nullid, nullid)])
File "/usr/lib/pymodules/python2.6/mercurial/httprepo.py", line 184, in between
d = self.do_read("between", pairs=n)
File "/usr/lib/pymodules/python2.6/mercurial/httprepo.py", line 128, in do_read
fp = self.do_cmd(cmd, **args)
File "/usr/lib/pymodules/python2.6/mercurial/httprepo.py", line 80, in do_cmd
resp = self.urlopener.open(urllib2.Request(cu, data, headers))
File "/usr/lib/python2.6/urllib2.py", line 391, in open
response = self._open(req, data)
File "/usr/lib/python2.6/urllib2.py", line 409, in _open
'_open', req)
File "/usr/lib/python2.6/urllib2.py", line 369, in _call_chain
result = func(*args)
File "/usr/lib/pymodules/python2.6/mercurial/url.py", line 455, in https_open
self.auth = self.pwmgr.readauthtoken(req.get_full_url())
File "/usr/lib/pymodules/python2.6/mercurial/url.py", line 141, in readauthtoken
group, setting = key.split('.', 1)
ValueError: need more than 1 value to unpack
Please help me. I have tried reading the hgrc man but that is just gibberish.
Thank You
Venkat
I'm a Mercurial developer. Please report problems with our man page on the mailinglist or on our bugtracker. I would love to hear from you so that we can make the man page better, so please write to us and tell us which part you found to be "gibberish".
In this particular case, the problem is that you need to write your auth section like this:
[auth]
project.prefix=https://personal-site423.googlecode.com/hg/
project.password=###
project.username=vrao423
where I would replace project with googlecode or something similar. We should of course report something sensible instead of a traceback and I can see that we already fixed this particular bug in Mercurial 1.5.