Python 3.9 Download and Save HTML - File is garbled - html

I have a simple Python script that successfully retrieves a web page and saves the web page as an .html on my local pc. When I open the local .html file, regardless of URL/web site, the file contains garbage. This seems like an encoding problem or a binary-to-text problem but not sure how to fix it?
cache = scrape_cache_dir + 'cache_' + symbol + '_' + cachetime + '.html'
try:
response = urlopen(URL_TARGET)
except urllib.error.HTTPError as err:
print(err.code)
html_doc = response.read()
fh = open(cache,'w')
fh.write(str(html_doc))
fh.close()
Here is a sampling of the resulting file that was saved
b'\x1f\x8b\x08\x00\x00\x00\x00\x00\x00\x00\xec\xbd\xebv\xdbF\xb20\xfa\x7f\x9e\x02bb\x9b\xdc& \xdcy3\xed\x91%9\xf6\x8cd\xe9\x93\x94xg\xbc\xbd\xb9#\x00\x14\x11\x83\x00\x03\x80\xbaX\xe1\xb7\xcek\x9c\xd7;Or\xaa\xaa\x1b n\x94(Y\xcexf\xcd8#\x02\x8d\xbeVU\xd7\xa5\xbb\xab\xfa\xc5\xd6\xde\xd1\xee\xd9\xaf\xc7\xfb\xc24\x99\xf9/\xe0\xc1s\x86\r+\tg\x9e\xdd\x10l\xdf\x8a\xe3a\xe3}\xf8\xb7X\xb0\xa7Q8s\x05\xc7\x8d?'\xe1\\x98X\x9e\x1f[\x13\xb7!\xf8Vp>l\xb8\x81\xf8\xf3i\x03\xeap-G\x98G\xee\xc4\xbb\x1a6\xc2\xf3>T\x9d\xcc\xfb\xdb\xdb\xe1\xf9\\x9a\xb9\xdbA\xfc\x03d\x8a\xed\xc8\x9b'//\xbd\xc0\t/\xa5\xb9\x1bM\xc2hf\x05\xb6+

Related

Django, How to Download a Docx file, or any file in general

My current website creates and saves a Docx file to the server based on the the current users inputs/information. I have the program saving it to the server, so the user can access it later. So I am assuming the docx file can be considered static? Well anyways, I am having trouble getting the download to work.
I have looked at many different threads on how to get a Docx to download and none have worked for me so far.
1. Downloadable docx file in Django
2. Django create .odt or .docx documents to download
Generate the MS word document in django
The closest I have gotten, was a docx file that downloaded, but the content was the path and not the actual docx file that I wanted. Hoping someone can help, Thanks.
Code:
response = HttpResponse('docx_temps/extracted3/test.docx', content_type='application/vnd')
response['Content-Disposition'] = 'attachment; filename=test.doc'
return response
Code for Link, Still cannot get it to work.
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Download</title>
</head>
<body>
<a href="users/Tyler/Desktop/Django_Formatically/mysite/Formatically/docx_temps/extracted3/test.docx"
download="Test.docx">
Test.docx
</a>
</body>
</html>
Solution A - static files
To be able to download a file considered static it has to be served in a way. In a production environment this task likely will be handled by a webserver like Apache or nginx.
To serve your media root via Django development server you can add the following pattern to your urls.py:
# urls.py
if settings.DEBUG:
urlpatterns = patterns('',
(r'^media/(?P<path>.*)$', 'django.views.static.serve',
{'document_root': settings.MEDIA_ROOT, 'show_indexes': True}),
(r'', include('django.contrib.staticfiles.urls')),
) + urlpatterns
So any path below /media/ will be served directly. You also have to make sure you have correctly set MEDIA_ROOT and MEDIA_URL in your settings:
# settings.py
MEDIA_ROOT = 'Users/Tyler/Desktop/Django_Formatically/mysite/media/'
MEDIA_URL = '/media/'
However - this approach does not allow you to interact on the Django level. So you cannot e.g. check for user permissions or track/log requests in Django. Every user knowing the URL to a file is able to access it.
Solution B - through Django view
# views.py
def file_view(request):
filename = '<path to your file>'
data = open(filename, "rb").read()
response = HttpResponse(data, content_type='application/vnd')
response['Content-Length'] = os.path.getsize(filename)
return response
This is just the most simple way that has some drawbacks. The whole file content is loaded in python - so it is not very efficient when sending large files and having a lot of requests. A solution using FileWrapper can be found here: Serving large files ( with high loads ) in Django
Or you could use django-sendfile which allows easy use of Apache mod_xsendfile or nginx XSendfile.

Saving web page as an html file on computer

I want to save the source code for my website page into my computer. I know that I have to use an http request to download the source code for my web page into the computer as a html file. I want to run a diff to track changes between two html files for a web page. I am wondering how to implement a program to perform the function of saving a web page as an html file on my computer. Please help it is really appreciated. I want to solve the problem programatically. I was researching on this topic and found that httpget, and selenium scripts can achieve this task but I am struggling with the implementation.
With linux you can just use wget.
wget http://google.com
that will save a file called index.html on your computer.
Programmatically you can use python:
import urllib2
# create a list of urls that you want to download
urls = ['http://example.com', 'http://google.com']
# loop over the urls
for url in urls:
# make the request
request = urllib2.urlopen(url)
# make the filename valid (you can change this to suit your needs)
filename = url.replace('http://', '')
filename = filename.replace('/','-')
# write it to a file.
with open(filename + '.html', 'w') as f:
f.write(request.read())

Read a CSV file from a stream using Roo in Rails 4

I have another question on this here Open a CSV file from S3 using Roo on Heroku but I'm not getting any bites - so a reword:
I have a CSV file in an S3 bucket
I want to read it using Roo in a Heroku based app (i.e. no local file access)
How do I open the CSV file from a stream?
Or is there a better tool for doing this?
I am using Rails 4, Ruby 2. Note I can successfuly open the CSV for reading if I post it from a form. How can I adapt this to snap the file from an S3 bucket?
Short answer - don't use Roo.
I ended up using the standard CSV commands, working with small CSV files you can very simply read the file contents into memory using something like this:
body = file.read
CSV.parse(body, col_sep: ",", headers: true) do |row|
row_hash = row.to_hash
field = row_hash["FieldName"]
reading a file passed in from a form, just reference the params:
file = params[:file]
body = file.read
To read in form S3 you can use the AWS gem:
s3 = AWS::S3.new(access_key_id: ENV['AWS_ACCESS_KEY_ID'], secret_access_key: ENV['AWS_SECRET_ACCESS_KEY'])
bucket = s3.buckets['BUCKET_NAME']
# check each object in the bucket
bucket.objects.each do |obj|
import_file = obj.key
body = obj.read
# call the same style import code as above...
end
I put some code together based on this:
Make Remote Files Local With Ruby Tempfile
and Roo seems to work OK when handed a temp file. I couldn't get it to work with S3 directly. I don't particularly like the copy approach, but my processing runs on delayed job, and I want to keep the Roo features a little more than I dislike the file copy. Plain CSV files work without fishing out the encoding info, but XLS files would not.

Download files from folder with webmatrix

I have a folder created to get the files that users upload. However when i try to download the files, some extensions don't download.
I have this code to list all files that are in the folder, i can see all the files with no problem.
#foreach (string fullFilePath in Directory.GetFiles(Path.Combine(Server.MapPath("~/uploadedFiles"),"Ticket Id - "+#id)))
{
<div class="linkFicheiros">#Path.GetFileName(fullFilePath)</div>
}
With this line
#Path.GetFileName(fullFilePath)
i can download files with extension: ".zip" , ".xls" but extensions like ".msg"(sometimes users need to upload this extension) i got an error "The page cannot be found". Even ".jpg" instead of downloading the file it opens the image on the browser.
I think that is something how i'm trying to reach the file, but i can't get to a solution.
Any thoughts ?
The browser would try to always view the content of the file. If its an Image file like .jng etc. But if there is a .zip file, it would let the user download and open it. Because browser can't open it.
You need to push the file to the user. For that you can try the following code:
var file = Server.MapPath("~/images/" + Request["img"]);
Response.AppendHeader(
"content-disposition", "attachment; filename=" +
Request["img"]);
Response.ContentType = "application/octet-stream";
Response.TransmitFile(file);
Now, you can see that in the code I am sharing, the file is a variable, which is being pointed to the file in the File System. Please note that there is a Query String parameter, which would be sent alongwith the URL, for example:
Download Image
Now the header would be appened, and the
Request.ContentType = "application/octet-stream"
is used to force the browser to show the Dialoug of Open/Save.
Then the transmit file to download it.
To only execute the code when needed
To only execute that download code block, you need to set the value in a block for condition to be true. For example, you can use a parameter to check whether to download the file or not. Try something simple like,
<a href="~/download_file/image_link.png?download=true>Download</a>
Then on the code behind use this:
var download = Request.QueryString["download"];
if(download == "true") {
/* place the code here */
}
Now it would only execute if the condition is true, otherwise it would skip that part.

jRecorder - jQuery

I want to record the voice using html5 and I have tried jRecorder-jQuery too. From the document, it mentioned that the binary file is saved in browser cache. My question is where it? I have checked Chrome's cache but cannot see the temp file.
host (Mandatory): The PHP file http location where the recorded WAV file is posted.
That is from the jRecorder documentation(http://www.sajithmr.me/jrecorder/index.html), it seems that the file is not saved locally and is sent trough a post request to the mentioned php page on the host settings.
Add this in the jRecorder settings:
'host': 'acceptfile.php?filename=hello.wav'
And change the acceptfile.php to your php script that will handle the posted file.
Example php script for handeling the wav file(also from the documentation):
$upload_path = dirname(__FILE__). '/';
//here assume that filename parameter is passed. or your can write $filename= 'test.wav';
$filename = $_REQUEST['filename'];
$fp = fopen($upload_path."/".$filename.".wav", "wb");
fwrite($fp, file_get_contents('php://input'));
fclose($fp);
exit('done');
This script will save the audio file(wav) in the script folder.