Chrome refuses to cache large binary .data files - google-chrome

I've written a primitive HTTP server for testing my Emscripten apps. It serves static files from the current directory. The specifics is that I have large binary files (.data and .wasm), some of them rarely change. So it makes sense to have browser cache them indefinitely.
Chrome successfully sends If-None-Match for .html and .js files, but fails to do so for .data (in my case it's ~ 70 Mb). .html and .js files thus receive well 304, while .data gets 200 and the whole large file sent again, and this is slow-ish even on localhost.
How do I force Chrome to cache large binary files?
import os
import hashlib
import http.server
root = '.'
mime = {
'.manifest': 'text/cache-manifest',
'.html': 'text/html',
'.png': 'image/png',
'.jpg': 'image/jpg',
'.svg': 'image/svg+xml',
'.css': 'text/css',
'.js': 'application/x-javascript',
'.wasm': 'application/wasm',
'.data': 'application/octet-stream',
}
mime_fallback = 'application/octet-stream'
def md5(file_path):
hash = hashlib.md5()
with open(file_path, 'rb') as f:
hash.update(f.read())
return hash.hexdigest()
cache = {os.path.join(root, f) : md5(f) for f in os.listdir() if any(map(f.endswith, mime)) and os.path.isfile(f)}
class EtagHandler(http.server.BaseHTTPRequestHandler):
def do_GET(self, body = True):
self.protocol_version = 'HTTP/1.1'
self.path = os.path.join(root, self.path.lstrip('/') + ('index.html' if self.path == '/' else ''))
if not os.path.exists(self.path) or not os.path.isfile(self.path):
self.send_response(404)
elif self.path not in cache or cache[self.path] != self.headers.get('If-None-Match'):
content_type = ([content_type for ext, content_type in sorted(mime.items(), reverse = True) if self.path.endswith(ext)] + [mime_fallback])[0]
with open(self.path, 'rb') as f:
content = f.read()
self.send_response(200)
self.send_header('Content-Length', len(content))
self.send_header('Content-Type', content_type)
self.send_header('ETag', cache[self.path])
self.end_headers()
self.wfile.write(content)
else:
self.send_response(304)
self.send_header('ETag', cache[self.path])
self.end_headers()
if __name__ == '__main__':
PORT = 8080
print("serving at port", PORT)
httpd = http.server.HTTPServer(("", PORT), EtagHandler)
httpd.serve_forever()

Related

serving html with the socket module

I opened port 8080 using python socket:
Sk.bind(ip_addrr, 8080)
But I want it to open a html page in that port,so that when I navigate to :8080 in browser I must get a web page.Any ideas?
my problem is that I need to get a html page which I created, to be displayed in port 8080.for example I have index.html for port 80 similarly,I need to have a html page in port 8080.How will I do that?
Very minimal example I was able to put together after some research. You can find this also on https://replit.com/#bluebrown/python-socket-html
import socket
s = socket.socket()
s.bind(('0.0.0.0', 8080))
s.listen(1)
with open('index.html', 'rb') as file:
html = file.read()
while True:
conn, addr = s.accept()
with conn:
print('Connected by', addr)
req = conn.recv(1024)
print('request:', req)
conn.send('HTTP/1.1 200 OK\nContent-Type: text/html\n\n'.encode())
conn.sendall(html)
Based on your request int he comments I have taken it a bit further. Not much though. Just enough to give you an idea what it takes to service different pages.
import socket
def parseRequest(request):
output = {}
r = request.decode("utf-8").split("\r\n")
parts = r[0].split(' ')
output["method"] = parts[0]
output["path"] = parts[1]
output["protocol"] = parts[2]
output["headers"] = { (kv.split(':')[0]): kv.split(':')[1].strip() for kv in r[1:] if (len(kv.split(':')) > 1) }
return output
s = socket.socket()
s.bind(('0.0.0.0', 8080))
s.listen(1)
while True:
conn, addr = s.accept()
with conn:
print('Connected by', addr)
req = conn.recv(1024)
r = parseRequest(req)
path = r["path"][1:]
if (path == ""):
path = "index"
with open(f'{path}.html', 'rb') as file:
html = file.read()
conn.send('HTTP/1.1 200 OK\nContent-Type: text/html\n\n'.encode())
conn.sendall(html)
You can check the updated repl.

User to download file to their local directory

Backend - I wrote a python script that creates a csv file after some aggregation.
Frontend - Once the method finished running and the .csv file is generated and saved to a directory in the server, I want to be able to prompt the user to save the .csv file on their local computer (just like the windows prompt you get when you press "save as..." on a webpage).
This is an example of what I've done so far from what I learned in Return Excel file in Flask app and Download a file when button is pressed on web application? :
Sample code:
with open(save_path + unique_filename + ".csv", 'w', encoding = 'utf8') as g:
writer = csv.writer(g, lineterminator = '\n')
writer.writerow(['name', 'place', 'location'])
HTML:
#app.route('/login', method='POST')
def do_login():
category = request.forms.get('category')
return '''
<html><body>
Hello. Save Results
</body></html>
'''
#app.route("/getCSV", methods = ['GET', 'POST'])
def getPlotCSV():
return send_from_directory(save_path + unique_filename + ".csv", as_attachment=True)
if __name__ == "__main__":
run(app, host = 'localhost', port = 8000)
My questions are:
1) send_from_directory is from flask, what is the bottle equivalent?
2) Where in the code do I place the csv I created so the user can download it to their local machine?
3) What else is wrong with my code?
Bottle example: From https://bottlepy.org/docs/dev/tutorial.html
#route('/download/<filename:path>')
def download(filename):
return static_file(filename, root='/path/to/static/files', download=filename)

ROS service failed to save files

I want to have a service 'save_readings' that automatically saves data from a rostopic into a file. But each time the service gets called, it doesn't save any file.
I've tried to run those saving-file code in python without using a rosservice and the code works fine.
I don't understand why this is happening.
#!/usr/bin/env python
# license removed for brevity
import rospy,numpy
from std_msgs.msg import String,Int32MultiArray,Float32MultiArray,Bool
from std_srvs.srv import Empty,EmptyResponse
import geometry_msgs.msg
from geometry_msgs.msg import WrenchStamped
import json
# import settings
pos_record = []
wrench_record = []
def ftmsg2listandflip(ftmsg):
return [ftmsg.wrench.force.x,ftmsg.wrench.force.y,ftmsg.wrench.force.z, ftmsg.wrench.torque.x,ftmsg.wrench.torque.y,ftmsg.wrench.torque.z]
def callback_pos(data):
global pos_record
pos_record.append(data.data)
def callback_wrench(data):
global wrench_record
ft = ftmsg2listandflip(data)
wrench_record.append([data.header.stamp.to_sec()] + ft)
def exp_listener():
stop_sign = False
rospy.Subscriber("stage_pos", Float32MultiArray, callback_pos)
rospy.Subscriber("netft_data", WrenchStamped, callback_wrench)
rospy.spin()
def start_read(req):
global pos_record
global wrench_record
pos_record = []
wrench_record = []
return EmptyResponse()
def save_readings(req):
global pos_record
global wrench_record
filename = rospy.get_param('save_file_name')
output_data = {'pos_list':pos_record, 'wrench_list': wrench_record }
rospy.loginfo("output_data %s",output_data)
with open(filename, 'w') as outfile: # write data to 'data.json'
print('dumping json file')
json.dump(output_data, outfile) #TODO: find out why failing to save the file.
outfile.close()
print("file saved")
rospy.sleep(2)
return EmptyResponse()
if __name__ == '__main__':
try:
rospy.init_node('lisener_node', log_level = rospy.INFO)
s_1 = rospy.Service('start_read', Empty, start_read)
s_1 = rospy.Service('save_readings', Empty, save_readings)
exp_listener()
print ('mylistener ready!')
except rospy.ROSInterruptException:
pass
Got it. I need to specify a path for the file to be saved.
save_path = '/home/user/catkin_ws/src/motionstage/'
filename = save_path + filename

Fail to store data in csv file through scraping

I try to scraping a webpage and extracting data ,then store all data in a csv file. Before adding ScrapeCallback class and calling it, everything works fine. However, it does not store any type of data except headers in the cvs file after adding the new class. Can anyone help me to figure out the problem?
import re
import urlparse
import urllib2
import time
from datetime import datetime
import robotparser
import Queue
import csv
import lxml.html
class ScrapeCallback:
# extract and store all data in a csv file
def __init__( self):
self.writer = csv.writer(open('countries.csv', 'w'))
self.fields = ('area', 'population', 'iso', 'country', 'capital', 'continent', 'tld', 'currency_code', 'currency_name', 'phone', 'postal_code_format', 'postal_code_regex', 'languages', 'neighbours')
self.writer.writerow( self.fields)
def __call__( self, url, html):
if re.search('/view/',url):
tree = lxml.html.fromstring(html)
row = []
for field in self.fields:
row.append(tree.cssselect('table > tr#places_{}__row > td.w2p_fw'.format(field))[0].text_content())
print row
self.writer.writerow(row)
def link_crawler(seed_url, link_regex=None, delay=5, max_depth=-1, max_urls=-1, headers=None, user_agent='wswp', proxy=None, num_retries=1, scrape_callback=None):
"""Crawl from the given seed URL following links matched by link_regex
"""
# the queue of URL's that still need to be crawled
crawl_queue = [seed_url]
# the URL's that have been seen and at what depth
seen = {seed_url: 0}
# track how many URL's have been downloaded
num_urls = 0
rp = get_robots(seed_url)
throttle = Throttle(delay)
headers = headers or {}
if user_agent:
headers['User-agent'] = user_agent
while crawl_queue:
url = crawl_queue.pop()
depth = seen[url]
# check url passes robots.txt restrictions
if rp.can_fetch(user_agent, url):
throttle.wait(url)
html = download(url, headers, proxy=proxy, num_retries=num_retries)
links = []
if scrape_callback:
links.extend(scrape_callback(url, html) or [])
if depth != max_depth:
# can still crawl further
if link_regex:
# filter for links matching our regular expression
links.extend(link for link in get_links(html) if re.match(link_regex, link))
for link in links:
link = normalize(seed_url, link)
# check whether already crawled this link
if link not in seen:
seen[link] = depth + 1
# check link is within same domain
if same_domain(seed_url, link):
# success! add this new link to queue
crawl_queue.append(link)
# check whether have reached downloaded maximum
num_urls += 1
if num_urls == max_urls:
break
else:
print 'Blocked by robots.txt:', url
class Throttle:
"""Throttle downloading by sleeping between requests to same domain
"""
def __init__(self, delay):
# amount of delay between downloads for each domain
self.delay = delay
# timestamp of when a domain was last accessed
self.domains = {}
def wait(self, url):
"""Delay if have accessed this domain recently
"""
domain = urlparse.urlsplit(url).netloc
last_accessed = self.domains.get(domain)
if self.delay > 0 and last_accessed is not None:
sleep_secs = self.delay - (datetime.now() - last_accessed).seconds
if sleep_secs > 0:
time.sleep(sleep_secs)
self.domains[domain] = datetime.now()
def download(url, headers, proxy, num_retries, data=None):
print 'Downloading:', url
request = urllib2.Request(url, data, headers)
opener = urllib2.build_opener()
if proxy:
proxy_params = {urlparse.urlparse(url).scheme: proxy}
opener.add_handler(urllib2.ProxyHandler(proxy_params))
try:
response = opener.open(request)
html = response.read()
code = response.code
except urllib2.URLError as e:
print 'Download error:', e.reason
html = ''
if hasattr(e, 'code'):
code = e.code
if num_retries > 0 and 500 <= code < 600:
# retry 5XX HTTP errors
html = download(url, headers, proxy, num_retries-1, data)
else:
code = None
return html
def normalize(seed_url, link):
"""Normalize this URL by removing hash and adding domain
"""
link, _ = urlparse.urldefrag(link) # remove hash to avoid duplicates
return urlparse.urljoin(seed_url, link)
def same_domain(url1, url2):
"""Return True if both URL's belong to same domain
"""
return urlparse.urlparse(url1).netloc == urlparse.urlparse(url2).netloc
def get_robots(url):
"""Initialize robots parser for this domain
"""
rp = robotparser.RobotFileParser()
rp.set_url(urlparse.urljoin(url, '/robots.txt'))
rp.read()
return rp
def get_links(html):
"""Return a list of links from html
"""
# a regular expression to extract all links from the webpage
webpage_regex = re.compile('<a[^>]+href=["\'](.*?)["\']', re.IGNORECASE)
# list of all links from the webpage
return webpage_regex.findall(html)
if __name__ == '__main__':
# link_crawler('http://example.webscraping.com', '/(index|view)', delay=0, num_retries=1, user_agent='BadCrawler')
# link_crawler('http://example.webscraping.com', '/(index|view)', delay=0, num_retries=1, max_depth=1, user_agent='GoodCrawler')
link_crawler('http://example.webscraping.com', '/(index|view)', max_depth =2, scrape_callback = ScrapeCallback())

Preferred method for downloading a file generated on the fly in Flask

I have a page that displays a list of files in a directory. When the user clicks on the Download button, all of these files are zipped into a single file, which is then offered for download. I know how to send this file to the browser when the button is clicked, and I know how to reload the current page (or redirect to a different one), but is it possible to do both in the same step? Or would it make more sense to redirect to a different page with a download link?
My download is initiated with the Flask API's send_from_directory. Relevant test code:
#app.route('/download', methods=['GET','POST'])
def download():
error=None
# ...
if request.method == 'POST':
if download_list == None or len(download_list) < 1:
error = 'No files to download'
else:
timestamp = dt.now().strftime('%Y%m%d:%H%M%S')
zfname = 'reports-' + str(timestamp) + '.zip'
zf = zipfile.ZipFile(downloaddir + zfname, 'a')
for f in download_list:
zf.write(downloaddir + f, f)
zf.close()
# TODO: remove zipped files, move zip to archive
return send_from_directory(downloaddir, zfname, as_attachment=True)
return render_template('download.html', error=error, download_list=download_list)
Update: As a workaround, I am now loading a new page with the button click, which lets the user initiate the download (using send_from_directory) before returning to the updated listing.
Are you running the flask app behind a front end web server such as nginx or apache (which would be the best way to handle the downloading of files). If you're using nginx you can use the 'X-Accel-Redirect' header. For this example I'll use the directory /srv/static/reports as the directory you're creating the zipfiles in and wanting to serve them out of.
nginx.conf
in the server section
server {
# add this to your current server config
location /reports/ {
internal;
root /srv/static;
}
}
your flask method
send the header to nginx to server
from flask import make_response
#app.route('/download', methods=['GET','POST'])
def download():
error=None
# ..
if request.method == 'POST':
if download_list == None or len(download_list) < 1:
error = 'No files to download'
return render_template('download.html', error=error, download_list=download_list)
else:
timestamp = dt.now().strftime('%Y%m%d:%H%M%S')
zfname = 'reports-' + str(timestamp) + '.zip'
zf = zipfile.ZipFile(downloaddir + zfname, 'a')
for f in download_list:
zf.write(downloaddir + f, f)
zf.close()
# TODO: remove zipped files, move zip to archive
# tell nginx to server the file and where to find it
response = make_response()
response.headers['Cache-Control'] = 'no-cache'
response.headers['Content-Type'] = 'application/zip'
response.headers['X-Accel-Redirect'] = '/reports/' + zf.filename
return response
If you're using apache, you can use their sendfile directive http://httpd.apache.org/docs/2.0/mod/core.html#enablesendfile