Python crashes running assemble_data.py for Flickr Style dataset - caffe

I'm trying to download the Flickr Style dataset using assemble_data.py provided in the examples folder. However, whenever I run this python crashes with error 'python quit unexpectedly'.
It seems to be related to multiprocessing and urllib. When I replace pool.map with a single threaded loop it works but is very slow. Also, if I run with multiprocessing but remove urlretrieve it seems to work too.

Answering my own question here... I resolved this by using urllib3 instead.
http = urllib3.PoolManager(10)
def download_image(args_tuple):
"For use with multiprocessing map. Returns filename on fail."
url, filename = args_tuple
try:
if not os.path.exists(filename):
print url + ' -> ' + filename
# Dont redirect.
response = http.request('GET', url, redirect=False)
with open(filename, 'wb') as f:
f.write(response.data)
with open(filename) as f:
assert hashlib.sha1(f.read()).hexdigest() != MISSING_IMAGE_SHA1
test_read_image = io.imread(filename)
return True
except KeyboardInterrupt:
raise Exception() # multiprocessing doesn't catch keyboard exceptions
except:
os.remove(filename)
return False
Gist here.

Related

Fixed web socket endpoint for Chrome CDP (Chrome DevTools Protocol)

I use Chrome CDP for some tasks automation.
One have to first start the chrome with CDP:
chromium-browser --remote-debugging-port=9222
and it reports something like
DevTools listening on ws://127.0.0.1:9222/devtools/browser/3e3152c6-20fc-4cea-a9d2-60e4e6b8ad70
I have to copy the ws://... URL to my config file manually to be able to proceed with my task. I probably can work around this using python's subprocess.Popen to do this instead and extract the URL but isn't there a way how to make this URL configurable or at least fixed?
Thanks to wOxxOm! It really can be read from http://127.0.0.1:9222/json/version (Documentation)
As an alternative, I wrote Python script to launch it and get the endpoint as well:
from subprocess import Popen, PIPE
class Browser:
BANNER = "DevTools listening on "
def __init__(self, path="/usr/bin/chromium-browser",
port=9222, ignore_tls_errors=False):
cmd = [path, f"--remote-debugging-port={port}"]
if ignore_tls_errors:
cmd.append("--ignore-certificate-errors")
self.process = Popen(cmd, stdout=PIPE, stderr=PIPE, universal_newlines=True)
output = ""
for line in self.process.stderr:
output += line
if self.BANNER in output:
start_pos = output.find(self.BANNER) + len(self.BANNER)
end_pos = output.find("\n", start_pos)
self.url = output[start_pos:end_pos]
break
def close(self):
self.process.terminate()
if __name__ == "__main__":
try:
b = Browser()
print("URL:", b.url)
finally:
b.close()

Creating an exe - file for data exchange with server in Tcl

I am completely lost and I do not know how to approach the following problem which my boss assigned to me.
I have to create an exe - file containing a code which works as follows when I run it: It sends a certain file, say file_A, to a server. When the server receives this file it sends back a json-file, say file_B, which contains an url. More precisely, the attribute of the json-file contains the url. The code should then open the url in a browser.
And here are the details:
The above code (one version in tcl) must accept three parameters and a fourth optional parameter (so, it is not necessary to pass a fourth parameter). The three parameters are: server, type and file.
server: this is the path to the server. For example, https://localhost:0008.
type: this is the type of the file (file_A) to be send to the server: xdt / png
file: this is the path to the file (file_A) to be send to the server.
The fourth optional parameter is:
wksName: if this paramater is given, then the url should be opened with it in the browser.
I got an example code for the above procedure written in python. It should serve as an orientation. I do not know anything about this language but to a large extend I understand the code. It looks as follows:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import platform
import sys
import webbrowser
import requests
args_dict = {}
for arg in sys.argv[1:]:
if '=' in arg:
sep = arg.find('=')
key, value = arg[:sep], arg[sep + 1:]
args_dict[key] = value
server = args_dict.get('server', 'http://localhost:0008')
request_url = server + '/NAME_List_number'
type = args_dict.get('type', 'xdt')
file = args_dict.get('file', 'xdtExamples/xdtExample.gdt')
wksName = args_dict.get('wksName', platform.node())
try:
with open(file, 'rb') as f:
try:
r = requests.post(request_url, data={'type': type}, files={'file': f})
request_url = r.json()['url'] + '?wksName=' + wksName
webbrowser.open(url = request_url, new = 2)
except Exception as e:
print('Error:', e)
print('Request url:', request_url)
except:
print('File \'' + file + '\' not found')
As you can see, the crucial part of the above code is this:
try:
with open(file, 'rb') as f:
try:
r = requests.post(request_url, data={'type': type}, files={'file': f})
request_url = r.json()['url'] + '?wksName=' + wksName
webbrowser.open(url = request_url, new = 2)
except Exception as e:
print('Error:', e)
print('Request url:', request_url)
except:
print('File \'' + file + '\' not found')
Everything else above it are just definitions. If possible, I would like to translate the above code into tcl. Could you please help me with this issue?
It doesn't have to be a 1-1 "tcl-translation" as long as it works as described above, and hopefully as simple as the above one.
I am not familiar with concepts such as sending/receiving data to/from servers, reading json-files etc.
Any help is welcome.

Inserting to MySQL with mysql.connector - good practice/efficiency

I am working on a personal project and was wondering if my solution for inserting data to a MySQL database would be considered "pythonic" and efficient.
I have written a separate class for that, which will be called from an object which holds a dataframe. From there I am calling my save() function to write the dataframe to the database.
The script will be running once a day where I scrape some data from some websites and save it to my database. So it is important that it really runs through completely even when I have bad data or temporary connection issues (script and database run on different machines).
import mysql.connector
# custom logger
from myLog import logger
# custom class for formatting the data, a lot of potential errors are handled here
from myFormat import myFormat
# insert strings to mysql are stored and referenced here
import sqlStrings
class saveSQL:
def __init__(self):
self.frmt = myFormat()
self.host = 'XXX.XXX.XXX.XXX'
self.user = 'XXXXXXXX'
self.password = 'XXXXXXXX'
self.database = 'XXXXXXXX'
def save(self, payload, type):
match type:
case 'First':
return self.__first(payload)
case 'Second':
...
case _:
logger.error('Undefined Input for Type!')
def __first(self, payload):
try:
self.mydb = mysql.connector.connect(host=self.host,user=self.user,password=self.password,database=self.database)
mycursor = self.mydb.cursor()
except mysql.connector.Error as err:
logger.error('Couldn\'t establish connection to DB!')
try:
tmpList = payload.values.tolist()
except ValueError:
logger.error('Value error in converting dataframe to list: ' % payload)
try:
mycursor.executemany(sqlStrings.First, tmpList)
self.mydb.commit()
dbWrite = mycursor.rowcount
except mysql.connector.Error as err:
logger.error('Error in writing to database: %s' % err)
for ele in myList:
dbWrite = 0
try:
mycursor.execute(sqlStrings.First, ele)
self.mydb.commit()
dbWrite = dbWrite + mycursor.rowcount
except mysql.connector.Error as err:
logger.error('Error in writing to database: %s \n ele: %s' % [err,ele])
continue
pass
mycursor.close()
return dbWrite
Things I am wondering about:
Is the match case a good option to distinguish between writing to different tables depending on the data?
Are the different try/except blocks really necessary or are there easier ways of handling potential errors?
Do I really need the pass command at the end of the for-loop?

How to use mock_open with json.load()?

I'm trying to get a unit test working that validates a function that reads credentials from a JSON-encoded file. Since the credentials themselves aren't fixed, the unit test needs to provide some and then test that they are correctly retrieved.
Here is the credentials function:
def read_credentials():
basedir = os.path.dirname(__file__)
with open(os.path.join(basedir, "authentication.json")) as f:
data = json.load(f)
return data["bot_name"], data["bot_password"]
and here is the test:
def test_credentials(self):
with patch("builtins.open", mock_open(
read_data='{"bot_name": "name", "bot_password": "password"}\n'
)):
name, password = shared.read_credentials()
self.assertEqual(name, "name")
self.assertEqual(password, "password")
However, when I run the test, the json code blows up with a decode error. Looking at the json code itself, I'm struggling to see why the mock test is failing because json.load(f) simply calls f.read() then calls json.loads().
Indeed, if I change my authentication function to the following, the unit test works:
def read_credentials():
# Read the authentication file from the current directory and create a
# HTTPBasicAuth object that can then be used for future calls.
basedir = os.path.dirname(__file__)
with open(os.path.join(basedir, "authentication.json")) as f:
content = f.read()
data = json.loads(content)
return data["bot_name"], data["bot_password"]
I don't necessarily mind leaving my code in this form, but I'd like to understand if I've got something wrong in my test that would allow me to keep my function in its original form.
Stack trace:
Traceback (most recent call last):
File "test_shared.py", line 56, in test_credentials
shared.read_credentials()
File "shared.py", line 60, in read_credentials
data = json.loads(content)
File "/home/philip/.local/share/virtualenvs/atlassian-webhook-basic-3gOncDp4/lib/python3.6/site-packages/flask/json/__init__.py", line 205, in loads
return _json.loads(s, **kwargs)
File "/usr/lib/python3.6/json/__init__.py", line 367, in loads
return cls(**kw).decode(s)
File "/usr/lib/python3.6/json/decoder.py", line 339, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib/python3.6/json/decoder.py", line 357, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
I had the same issue and got around it by mocking json.load and builtins.open:
import json
from unittest.mock import patch, MagicMock
# I don't care about the actual open
p1 = patch( "builtins.open", MagicMock() )
m = MagicMock( side_effect = [ { "foo": "bar" } ] )
p2 = patch( "json.load", m )
with p1 as p_open:
with p2 as p_json_load:
f = open( "filename" )
print( json.load( f ) )
Result:
{'foo': 'bar'}
I had the exact same issue and solved it. Full code below, first the function to test, then the test itself.
The original function I want to test loads a json file that is structured like a dictionary, and checks to see if there's a specific key-value pair in it:
def check_if_file_has_real_data(filepath):
with open(filepath, "r") as f:
data = json.load(f)
if "fake" in data["the_data"]:
return False
else:
return True
But I want to test this without loading any actual file, exactly as you describe. Here's how I solved it:
from my_module import check_if_file_has_real_data
import mock
#mock.patch("my_module.json.load")
#mock.patch("my_module.open")
def test_check_if_file_has_real_data(mock_open, mock_json_load):
mock_json_load.return_value = dict({"the_data": "This is fake data"})
assert check_if_file_has_real_data("filepath") == False
mock_json_load.return_value = dict({"the_data": "This is real data"})
assert check_if_file_has_real_data("filepath") == True
The mock_open object isn't called explicitly in the test function, but if you don't include that decorator and argument you get a filepath error when the with open part of the check_if_file_has_real_data function tries to run using the actual open function rather than the MagicMock object that's been passed into it.
Then you overwrite the response provided by the json.load mock with whatever you want to test.

Python 3.6 asyncio send json message error

I'm trying to set-up a TCP echo client and server that can exchange messages using the JSON format.
I took the code from the documentation and modified it as follows:
Edit: include fix and have both server and client send JSON style messages.
import asyncio
# https://docs.python.org/3/library/asyncio-stream.html
import json
async def handle_echo(reader, writer):
data = await reader.read(100)
message = json.loads(data.decode())
addr = writer.get_extra_info('peername')
print("Received %r from %r" % (message, addr))
print("Send: %r" % json.dumps(message)) # message
json_mess_en = json.dumps(message).encode()
writer.write(json_mess_en)
#writer.write(json_mess) # not wokring
#writer.write(json.dumps(json_mess)) # not working
# Yielding from drain() gives the opportunity for the loop to schedule the write operation
# and flush the buffer. It should especially be used when a possibly large amount of data
# is written to the transport, and the coroutine does not yield-from between calls to write().
#await writer.drain()
#print("Close the client socket")
writer.close()
loop = asyncio.get_event_loop()
coro = asyncio.start_server(handle_echo, '0.0.0.0', 9090, loop=loop)
server = loop.run_until_complete(coro)
# Serve requests until Ctrl+C is pressed
print('Serving on {}'.format(server.sockets[0].getsockname()))
try:
loop.run_forever()
except KeyboardInterrupt:
pass
# Close the server
server.close()
loop.run_until_complete(server.wait_closed())
loop.close()
and the client code:
import asyncio
import json
async def tcp_echo_client(message, loop):
reader, writer = await asyncio.open_connection('0.0.0.0', 9090,
loop=loop)
print('Send: %r' % message)
writer.write(json.dumps(message).encode())
data = await reader.read(100)
data_json = json.loads(data.decode())
print('Received: %r' % data_json)
print(data_json['welcome'])
print('Close the socket')
writer.close()
message = {'welcome': 'Hello World!'}
loop = asyncio.get_event_loop()
loop.run_until_complete(tcp_echo_client(message, loop))
loop.close()
Error
TypeError: data argument must be a bytes-like object, not 'str'
Should I use another function than writer.write to encode for JSON? Or any suggestions?
Found the solution, replace:
writer.write(json.dumps(json_mess))
for
# encode as 'UTF8'
json_mess_en = json.dumps(json_mess).encode()
writer.write(json_mess_en)