Saving web page as an html file on computer - html

I want to save the source code for my website page into my computer. I know that I have to use an http request to download the source code for my web page into the computer as a html file. I want to run a diff to track changes between two html files for a web page. I am wondering how to implement a program to perform the function of saving a web page as an html file on my computer. Please help it is really appreciated. I want to solve the problem programatically. I was researching on this topic and found that httpget, and selenium scripts can achieve this task but I am struggling with the implementation.

With linux you can just use wget.
wget http://google.com
that will save a file called index.html on your computer.
Programmatically you can use python:
import urllib2
# create a list of urls that you want to download
urls = ['http://example.com', 'http://google.com']
# loop over the urls
for url in urls:
# make the request
request = urllib2.urlopen(url)
# make the filename valid (you can change this to suit your needs)
filename = url.replace('http://', '')
filename = filename.replace('/','-')
# write it to a file.
with open(filename + '.html', 'w') as f:
f.write(request.read())

Related

Importing a file to a web page [SELENIUM]

I am creating a selenium script to automatically log in to a webpage, and then import a .xls file. However I am stuck in the part when the webpage prompts me to select a file from my computer. How do I code to send the keys with the file path and press "enter"? Thanks in advance!
To upload a file with Selenium you can send the file's path to input element with type file on the web page.
If, for example your file path is C:\your_file.xls your code will be something like this:
file_path = 'C:\your_file.xls'
upload_input = driver.find_element_by_xpath('//input[#type="file"]')
upload_input.send_keys(file_path)

Is there a good alternative for embedding a PDF with HTML next to using a local file path, online file path or data source as base64-string?

I am building a web app and I would like to show PDF files to my users. My files are mainly stored as byte arrays in the database as they are generated in the backend. I am using the embed element and have found three ways to display a PDF:
Local file path in src attribute: Works, but I need to generate a file from the database byte array, which is not desirable as I have to manage routines to delete them once they are not needed anymore.
Online file path in src attribute: Not possible since my files may not be hosted anywhere but on the server. Also has the same issues as the previous method anyway.
Data as base64 string in src attribute: Current method, but I ran into a problem for larger files (>2MB). Edge and Chrome will not display a PDF when I covert a PDF of this size to a base64 string (no error but the docs reveal that there is a limit for the data in the src attribute). It works on Firefox but I cannot have my users be restricted to Firefox.
Is there any other way to transmit valid PDF data from a byte array out of the database without generating a file locally?
You have made the common mistake of thinking of URLs and file paths as the same thing; but a URL is just a string that's sent to the server, and some content is sent back. Just as you wouldn't save an HTML file to disk for every dynamic page on the site, you don't have to write to the file system to display a dynamic PDF.
So the solution to this is to have a script on your server that takes the identifier of a PDF in your system, maybe does some access checking, and outputs it to the browser.
For example, if you were using PHP, you might write the HTML with <embed src="/loadpdf.php?id=42"> and then in loadpdf.php would write something like this:
$pdfContent = load_pdf_from_database((int)$_GET['id']);
header('Content-Type: application/pdf');
echo $pdfContent;
Loading /loadpdf.php?id=42 directly in the browser would then render the PDF just the same as if it was a "real" file, and embedding it should work the same way too.

Import files into a directory on a HTML document

I am wondering if I can have a webpage where I can tell it to grab my file and put it in a directory, such as: "http://example.ex/folder". Meaning the file I provided is put into the "folder" folder.
Overall process:
Button says: "Import file"
I select a file, and my file is "text.txt"
It takes my file "text.txt" and adds it to the local system/directory of the website.
You can do this using JQuery File Upload and then adding a backend service that captures the file and saves it.
For example, here is a repository that has a basic Python (Flask) server integrated with JQuery File Upload that will take an uploaded file and place it on the server:
https://github.com/ngoduykhanh/flask-file-uploader
I'd put the rest of the code here, but it is a lot - and requires HTML, JavaScript and a back-end language (like Python).
Here is the documentation on JQuery File Upload: https://github.com/blueimp/jQuery-File-Upload
As a word of caution, DO NOT TRUST ANYTHING UPLOADED TO YOUR SERVER. Meaning, do not put it out on the open internet without some sort of authentication or checks in place to make sure only files you intend are uploaded. Otherwise, people will find it and upload scripts turning your device into a Bitcoin miner, spam relay, or bot host.
Instead of doing it this way, why not use SFTP to upload it to your server to host? At least that way you can lock down access.

Django, How to Download a Docx file, or any file in general

My current website creates and saves a Docx file to the server based on the the current users inputs/information. I have the program saving it to the server, so the user can access it later. So I am assuming the docx file can be considered static? Well anyways, I am having trouble getting the download to work.
I have looked at many different threads on how to get a Docx to download and none have worked for me so far.
1. Downloadable docx file in Django
2. Django create .odt or .docx documents to download
Generate the MS word document in django
The closest I have gotten, was a docx file that downloaded, but the content was the path and not the actual docx file that I wanted. Hoping someone can help, Thanks.
Code:
response = HttpResponse('docx_temps/extracted3/test.docx', content_type='application/vnd')
response['Content-Disposition'] = 'attachment; filename=test.doc'
return response
Code for Link, Still cannot get it to work.
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Download</title>
</head>
<body>
<a href="users/Tyler/Desktop/Django_Formatically/mysite/Formatically/docx_temps/extracted3/test.docx"
download="Test.docx">
Test.docx
</a>
</body>
</html>
Solution A - static files
To be able to download a file considered static it has to be served in a way. In a production environment this task likely will be handled by a webserver like Apache or nginx.
To serve your media root via Django development server you can add the following pattern to your urls.py:
# urls.py
if settings.DEBUG:
urlpatterns = patterns('',
(r'^media/(?P<path>.*)$', 'django.views.static.serve',
{'document_root': settings.MEDIA_ROOT, 'show_indexes': True}),
(r'', include('django.contrib.staticfiles.urls')),
) + urlpatterns
So any path below /media/ will be served directly. You also have to make sure you have correctly set MEDIA_ROOT and MEDIA_URL in your settings:
# settings.py
MEDIA_ROOT = 'Users/Tyler/Desktop/Django_Formatically/mysite/media/'
MEDIA_URL = '/media/'
However - this approach does not allow you to interact on the Django level. So you cannot e.g. check for user permissions or track/log requests in Django. Every user knowing the URL to a file is able to access it.
Solution B - through Django view
# views.py
def file_view(request):
filename = '<path to your file>'
data = open(filename, "rb").read()
response = HttpResponse(data, content_type='application/vnd')
response['Content-Length'] = os.path.getsize(filename)
return response
This is just the most simple way that has some drawbacks. The whole file content is loaded in python - so it is not very efficient when sending large files and having a lot of requests. A solution using FileWrapper can be found here: Serving large files ( with high loads ) in Django
Or you could use django-sendfile which allows easy use of Apache mod_xsendfile or nginx XSendfile.

Save a file after a perl script runs on a web page

I have a web page that will be used to create KML Files with a perl script.I want the user to add some data to a form that will be used in my perl script. When the form is submitted it will run the script, create a kml file, then prompt the user to save the file. The only part I am not sure about is how to have the user save the file after the script has created the KML. Do I have the perl script prompt the download or use something on the HTML page prompt the download. I am not sure the best way to do this.
If you have a link or a form for telling the server to build the KML then just generate the KML normally and send it back to the browser with some extra HTTP headers. The headers you want are:
Content-disposition set to attachment;filename=whatever.kml where "whatever.kml" is what you want the file to be called.
Content-type set to application/vnd.google-earth.kml+xml.
The Content-disposition should tell the browser to download the KML instead of trying to handle it.
So the Perl script will be prompting the browser to prompt the download.
Assuming the contents of the kml file are in $kml then you'd want to do something like:
use CGI;
my $cgi = new CGI;
print $cgi->header('-Content-disposition' => 'attachment;filename=kml.xml',
'-Content-type' => 'application/vnd.google-earth.kml+xml');
print $kml;