BaseHTTPRequestHandler hangs on self.rfile.read() - html

I implemented a python server using the BaseHTTPRequestHandler and it, often times, will hang while reading from the socket fileobject. It doesnt seem to matter how many bytes I read. I could read 30k bytes and it wont hang or I could read 7k bytes and it would hang. It is reading a Base64 string encoded image so i understand if it takes a second or two to read but it literally just hangs.
And then sometimes, when I press CTRL-C, it'll unhang and magically read everything. It's really bizarre. Any help will be appreciated. Thanks. Also, this is python2.7.
Code:
def do_POST(self):
print self.rfile
# Processing HTTP POST request data
content_len = int(self.headers.getheader('content-length'))
print 'Reading from HTTP header. Size: %s' % (content_len)
# THIS IS WHERE IT HANGS
post_body_json = self.rfile.read(content_len)
print 'Got it. Moving on, now.'
post_body = json.loads(post_body_json)
image_data = post_body.get('img_string_b64', 'No Image String')
print 'Decoding image string.'
# Processing image data
image_name = 'image.jpg'
decoded_str = base64.decodestring(image_data)
self.write_image_to_system(decoded_str, image_name)
print 'Getting text translation.'
opencv_handler = OpenCVHandler()
# Get translation from OpenCV then play text audio
text_trans = opencv_handler.get_text_translation_from_image(image_name)
opencv_handler.play_audio_translation_from_text(text_trans)
# Responding to the POST requester.
# text_trans = 'Translated'
response = text_trans
self.send_response(200) # OK
self.send_header('Content-type', 'text/html')
self.end_headers()
self.wfile.write(response)
return

I had the same issue with the call self.rfile.read(length), blocking for AJAX requests in POST.
This was due to an earlier statement :
form = cgi.FieldStorage(
fp=self.rfile,
headers=self.headers,
environ={'REQUEST_METHOD': 'POST'}
)
Once this is removed, it worked.
I hope it helps.

I had a similar problem. Found out I had to convert the data to string. Like: post_body_json = str(self.rfile.read(content_len)). Kept it from hanging there.

Related

how to update progress bar in callback?

I need to create a heatmap, before plotting heatmap, I need to download a lot of data from database which take time like 5 minutes, I'd like to show a progress bar when downloading data from oracle database to let me know if it is in progress of downloading data from oracle.
I googled a lot and fortunately I found a website where it use dbc.Progress() and how to update the progress bar by connecting to tqdm with a file. But I still not sure how to do it for my own example. I tried and it doesn't work, could anyone help me with that? Thank you so much for your help.
https://towardsdatascience.com/long-callbacks-in-dash-web-apps-72fd8de25937
here is my code
I defined one tab, I include progress bar using dbc.Progress() and graph
progress_bar_heatmap=dbc.Progress(value=25, striped=True, animated=True,
children=['25%'],color='success',
style={'height':'20px'},
id="progress_bar_heatmap")
loading_timer_progress = dcc.Interval(id='loading_timer_progress',
interval=1000)
heatmap_graph = dcc.Graph(id="heatmap-graph", **graph_kwargs)
#wrap contour in dcc.loading's chilren so we can see loading signal
heatmap_loading=dcc.Loading(
id='loading-heatmap',
type='default',
children=heatmap_graph # wrap contour in loading's children
)
dcc.Tab(
[progress_bar_heatmap,loading_timer_progress, heatmap_loading],
label=label,
value='heatmap',
id='heatmap-tab',
className="single-tab",
selected_className="single-tab--selected",
)
in callback, I copied some codes from the above website,
#app.callback(
[
Output("heatmap-graph", "figure"),
Output("progress_bar_dts_heatmap", "value"),
],
[
Input("plot-dts", "n_clicks"),
Input('loading_timer_progress', 'n_intervals'),
],
prevent_initial_call=True, # disable output in the first load
)
def change_plot(n_clicks,n_intervals):
progress_bar_value=0
import sys
try:
with open('progress.txt', 'r') as file:
str_raw = file.read()
last_line = list(filter(None, str_raw.split('\n')))[-1]
percent = float(last_line.split('%')[0])
except: # no progress file created meansing it is creating
percent = 0
std_err_backup = sys.stderr
file_prog = open('progress.txt', 'w')
sys.stderr = file_prog
df=time_consuming_function()
result_str = f'Long callback triggered by {btn_name}. Result: {x:.2f}'
file_prog.close()
sys.stderr = std_err_backup
finally: # must do under all circustances
text = f'{percent:.0f}%'
fig=create_fig(df)
inside the time_consuming function
def time_consuming_function():
download_data_from_oracle()
# after that, I added below as website did
for i in tqdm(range(20)):
time.sleep(0.5)
return df
it doesn't work above, not sure which one is wrong?

How do I read keyboard events from file?

I have read this question, which is similar and gets me most of the way.
The answer of the code isn't posted, but I believe I have followed the instructions and managed to get it working -- except after it's been opened.
It works perfectly fine immediately after recording, however I want to save the data and read it again for later use: literally every time I run the program and I don't want to have to re-record it every time.
import keyboard
import threading
from keyboard import KeyboardEvent
import time
import json
def record(file='record.txt'):
f = open(file, 'w+')
keyboard_events = []
keyboard.start_recording()
starttime = time.time()
keyboard.wait('esc')
keyboard_events = keyboard.stop_recording()
print(starttime, file=f)
for kevent in range(0, len(keyboard_events)):
print(keyboard_events[kevent].to_json(), file = f)
f.close()
def play(file="record.txt", speed = 1):
f = open(file, 'r')
lines = f.readlines()
f.close()
keyboard_events = []
for index in range(1,len(lines)):
keyboard_events.append(keyboard.KeyboardEvent(**json.loads(lines[index])))
starttime = float(lines[0])
keyboard_time_interval = keyboard_events[0].time - starttime
keyboard_time_interval /= speed
k_thread = threading.Thread(target = lambda : time.sleep(keyboard_time_interval) == keyboard.play(keyboard_events, speed_factor=speed) )
k_thread.start()
k_thread.join()
I am not especially new to coding, or the Python language, but this problem perplexes me. I've tested all the variables and none of them are being sustained outside of the record function.
(I don't fully understand lambda, Threading or **json.loads, but I don't think that's a problem.)
What's going on here?
For extra bonus points, if this is possible to do asynchronously, that'd be amazing. One problem at a time, though.
Just in case anyone else ever has the same problem as me, just tag this at the start of your code. No idea why it works, but it does.
keyboard.start_recording()
temp = keyboard.stop_recording()
You can forget about the temp variable immediately.

Weird KeyError (Python)

So, I have to work with this JSON (from URL):
{'player': {'racing': 25260.154000000017, 'player': 259114.57700000296}, 'farming': {'fishing': 33783.390999999414, 'mining': 29048.60500000002, 'farming': 25334.504000000023}, 'piloting': {'piloting': 25570.18800000001, 'cargos': 3080.713000000036, 'heli': 10433.977000000004}, 'physical': {'strength': 198358.86700000675}, 'business': {'business': 50922.88500000005}, 'trucking': {'mechanic': 2724.5620000000004, 'garbage': 755.642999999997, 'trucking': 223784.99700000713, 'postop': 1411.4190000000006}, 'train': {'bus': 669.1940000000001, 'train': 1363.805999999999}, 'ems': {'fire': 25449.43400000001, 'ems': 13844.628000000012}, 'hunting': {'skill': 4179.033000000316}, 'casino': {'casino': 18545.526000000027}}
It is indeed one line. I am trying to make it so that for example, I can get racing, which is the first one you see. For this, you need go into Player first, and then you can get to Racing. How do I do this?
My current code:
def allthethings():
# Grab all the skills
geturl = ("http://server.tycoon.community:30120/status/data/" + str(setting_playerid))
print(geturl)
a = requests.get(geturl,headers={"X-Tycoon-Key":setting_apikeyTT}).json()
jsonconverted = (a["data"]["gaptitudes_v"])
print(jsonconverted)
# Convert JSON into many, many variables
Raw_RACR = jsonconverted['player.racing']
print(Raw_RACR)
I believe this is all the code that is needed.
Also, this is the error:
KeyError: 'player.racing'

How can I display a TemporaryUploadedFile from Django in HTML as an image?

In Django, I have programmed a form in which you can upload one image. After uploading the image, the image is passed to another method with the type TemporaryUploadedFile, after executing the method it is given to the HTML page.
What I would like to do is display that TemporaryUploadedFile as an image in HTML. It sounds quite simple to me but I could not find the answer on StackOverflow or on Google to the question: How to display a TemporaryUploadedFile in HTML without having to save it first, hence my question.
All help is appreciated.
Edit 1:
To give some more information about the code and the variables while debugging.
input_image = next(iter(request.FILES.values()))
output_b64 = (input_image.content_type, str(base64.b64encode(input_image.read()), 'utf8'))
Well, you can encode the image to base64 and use a data url as the value for src.
A base64 data url looks like this:
<img src="data:image/png;base64,SGLAFdsfsafsf098sflf">
\_______/ \__________________/
| |
File type base64 encoded data
Read the Mozilla docs for more on data urls.
Here's some relevant code:
import base64
def my_view(request):
# assuming `image` is a <TemporaryUploadedFile object>
image_b64 = base64.b64encode(image.read())
image_b64 = image_b64.decode('utf8') # convert bytes to string
image_type = image.content_type # png or jpeg or something else
return render('template', {'image_b64': image_b64, 'image_type': image_type})
Then in your template:
<img src="data:{{ image_type }};base64,{{ image_b64 }}">
I want to thank xyres for pushing me in the right direction. As you can see, I used some parts of his solution in the code below:
# As input I take one image from the form.
temp_uploaded_file = next(iter(request.FILES.values()))
# The TemporaryUploadedFile is converted to a Pillow Image
input_image = pil_image.open(temp_uploaded_file)
# The input image does not have a name so I set it afterwards. (This step, of course, is not mandatory)
input_image.filename = temp_uploaded_file.name
# The image is saved to an InMemoryFile
output = BytesIO()
input_image.save(output, format=img.format)
# Then the InMemoryFile is encoded
img_data = str(base64.b64encode(output.getvalue()), 'utf8')
output_b64 = ('image/' + img.format, img_data)
# Pass it to the template
return render(request, 'visualsearch/similarity_output.html', {
"output_image": output_b64
})
In the template:
<img id="output_image" src="data:{{ image.0 }};base64,{{ image.1 }}">
The current solution works but I don't think it is perfect because I expect that it can be done with less code and faster, so if you know how this can be done better you are welcome to post your answer here.

How to obtain a list of titles of all Wikipedia articles

I'd like to obtain a list of all the titles of all Wikipedia articles. I know there are two possible ways to get content from a Wikimedia powered wiki. One would be the API and the other one would be a database dump.
I'd prefer not to download the wiki dump. First, it's huge, and second, I'm not really experienced with querying databases. The problem with the API on the other hand is that I couldn't figure out a way to only retrieve a list of the article titles and even if it would need > 4 mio requests which would probably get me blocked from any further requests anyway.
So my question is
Is there a way to obtain only the titles of Wikipedia articles via the API?
Is there a way to combine multiple request/queries into one? Or do I actually have to download a Wikipedia dump?
The allpages API module allows you to do just that. Its limit (when you set aplimit=max) is 500, so to query all 4.5M articles, you would need about 9000 requests.
But a dump is a better choice, because there are many different dumps, including all-titles-in-ns0 which, as its name suggests, contains exactly what you want (59 MB of gzipped text).
Right now, as per the current statistics the number of articles is around 5.8M.
To get the list of pages I did use the AllPages API. However, the number of pages I get is around 14.5M which is ~3 times of what I was expecting. I restricted myself to namespace 0 to get the list. Following is the sample code that I am using:
# get the list of all wikipedia pages (articles) -- English
import sys
from simplemediawiki import MediaWiki
listOfPagesFile = open("wikiListOfArticles_nonredirects.txt", "w")
wiki = MediaWiki('https://en.wikipedia.org/w/api.php')
continueParam = ''
requestObj = {}
requestObj['action'] = 'query'
requestObj['list'] = 'allpages'
requestObj['aplimit'] = 'max'
requestObj['apnamespace'] = '0'
pagelist = wiki.call(requestObj)
pagesInQuery = pagelist['query']['allpages']
for eachPage in pagesInQuery:
pageId = eachPage['pageid']
title = eachPage['title'].encode('utf-8')
writestr = str(pageId) + "; " + title + "\n"
listOfPagesFile.write(writestr)
numQueries = 1
while len(pagelist['query']['allpages']) > 0:
requestObj['apcontinue'] = pagelist["continue"]["apcontinue"]
pagelist = wiki.call(requestObj)
pagesInQuery = pagelist['query']['allpages']
for eachPage in pagesInQuery:
pageId = eachPage['pageid']
title = eachPage['title'].encode('utf-8')
writestr = str(pageId) + "; " + title + "\n"
listOfPagesFile.write(writestr)
# print writestr
numQueries += 1
if numQueries % 100 == 0:
print "Done with queries -- ", numQueries
print numQueries
listOfPagesFile.close()
The number of queries fired is around 28900, which results in approx. 14.5M names of the pages.
I also tried the all-titles link mentioned in the above answer. In that case as well I am getting around 14.5M pages.
I thought that this overestimate to the actual number of pages is because of the redirects, and did add the 'nonredirects' option to the request object:
requestObj['apfilterredir'] = 'nonredirects'
After doing that I get only 112340 number of pages. Which is too small as compared to 5.8M.
With the above code I was expecting roughly 5.8M pages, but that doesn't seem to be the case.
Is there any other option that I should be trying to get the actual (~5.8M) set of page names?
Here is an asynchronous program that will generate mediawiki pages titles:
async def wikimedia_titles(http, wiki="https://en.wikipedia.org/"):
log.debug('Started generating asynchronously wiki titles at {}', wiki)
# XXX: https://www.mediawiki.org/wiki/API:Allpages#Python
url = "{}/w/api.php".format(wiki)
params = {
"action": "query",
"format": "json",
"list": "allpages",
"apfilterredir": "nonredirects",
"apfrom": "",
}
while True:
content = await get(http, url, params=params)
if content is None:
continue
content = json.loads(content)
for page in content["query"]["allpages"]:
yield page["title"]
try:
apcontinue = content['continue']['apcontinue']
except KeyError:
return
else:
params["apfrom"] = apcontinue