Display first page of PDF as Image - html

I am creating web application where I am displaying images/ pdf in thumbnail format. Onclicking respective image/ pdf it get open in new window.
For PDF, I have (this is code of the new window)
<iframe src="images/testes.pdf" width="800" height="200" />
Using this I can see all PDF in web browser. However for thumbnail purpose, I want to display only first page of PDF as an Image.
I tried
<h:graphicImage value="images/testes.pdf" width="800" height="200" />
however it is not working. Any idea how to get this done?
Update 1
I am providing path of pdf file for example purpose. However I have images in Database. In actual I have code as below.
<iframe src="#{PersonalInformationDataBean.myAttachmentString}" width="800" height="200" />
Update 2
For sake of thumbnail, what I am using is
<h:graphicImage height=200 width=200 value="....">
however I need to achieve same for PDF also.
Hope I am clear what I am expecting...

I'm not sure if all browsers display your embedded PDF (done via <h:graphicImage value="some.pdf" ... />
) equally well.
Extracting 1st Page as PDF
If you insist on using PDF, I'd recommend one of these 2 commandline tools to extract the first page of any PDF:
pdftk
Ghostscript
Both are available for Linux, Mac OS X and Windows.
pdftk command
pdftk input.pdf cat 1 output page-1-of-input.pdf
Ghostscript command
gs -o page-1-of-input.pdf -sDEVICE=pdfwrite -dPDFLastPage=1 input.pdf
(On Windows use gswin32c.exe or gswin64c.exe instead of gs.)
pdftk is slightly faster than Ghostscript when it comes to page extraction, but for a single page that difference is probably neglectable. As of the most recent released version, v9.05, the previous sentence is no longer true. I found that Ghostscript (including all startup overhead) requires ~1 second to extract the 1st page from the 756 page PDF specification, while PDFTK needed ~11 seconds.
Converting 1st Page to JPEG
If you want to be sure that even older browsers can display your 1st page well, then convert it to JPEG. Ghostscript is your friend here (ImageMagick cannot do it by itself, it needs the help of Ghostscript anyway):
gs -o page-1-of-input-PDF.jpeg -sDEVICE=jpeg -dLastPage=1 input.pdf
Should you need page 33, you can do it like this:
gs -o page-33-of-input-PDF.jpeg -sDEVICE=jpeg -dFirstPage=33 -dLastPage33 input.pdf
If you need a range of PDFs, like pages 17-23, try this:
gs -o page-16+%03d-of-input-PDF.jpeg -sDEVICE=jpeg -dFirstPage=17 -dLastPage23 input.pdf
Note, that the %03d notation increments with each page processed, starting with 1. So your first JPEG's name would be page-16+001-of-input-PDF.jpeg.
Maybe PNG is better?
Be aware that JPEG isn't a format suited well for images containing high black+white contrast and sharp edges like text pages. PNG is much better for this.
To create a PNG from the 1st PDF pages with Ghostscript is easy:
gs -o page-1-of-input-PDF.png -sDEVICE=pngalpha -dLastPage=1 input.pdf
The analog options as with JPEGs are true when it comes to extract ranges of pages.

Warning: Don't use Ma9ic's script (posted in another answer) unless you want to...
...make the PDF->JPEG conversion consume much more time + resources than it should be
...give up your own control over the PDF->JPEG conversion process altogether.
While it may work well for you there are so many problems in these 8 little lines of Bash.
First,
it uses identify to extract the number of pages from the input PDF. However, identify (part of ImageMagick) is completely unable to process PDFs all by itself. It has to run Ghostscript as a 'delegate' to handle PDF input. It would be much more efficient to use Ghostscript directly instead of running it indirectly, via ImageMagick.
Second,
it uses convert to PDF->JPEG conversion. Same remark as above: it uses Ghostscript anyway, so why not run it directly?
Third,
it loops over the pages and runs a different convert process for every single page of the PDF, that is 100 converts for a 100 page PDF file. That means: it also runs 100 Ghostscript commands to produce 100 JPEGs.
Fourth,
Fahim Parkar's question was to get a thumbnail from the first page of the PDF, not from all of them.
The script does run at least 201 different commands for a 100 page PDF, when it could all be done in just 1 command. If you Ghostscript directly...
...not only will it run faster and more efficiently,
...but also it will give you more fine-grained and better control over the JPEGs' quality settings.
Use the right tool for the job, and use it correctly!
Update:
Since I was asked, here is my alternative implementation to Ma9ic's script.
#!/bin/bash
infile=${1}
gs -q -o $(basename "${infile}")_p%04d.jpeg -sDEVICE=jpeg "${infile}"
# To get thumbnail JPEGs with a width 200 pixel use the following command:
# gs -q -o name_200px_p%04d.jpg -sDEVICE=jpeg -dPDFFitPage -g200x400 "${infile}"
# To get higher quality JPEGs (but also bigger-in-size ones) with a
# resolution of 300 dpi use the following command:
# gs -q -o name_300dpi_p%04d.jpg -sDEVICE=jpeg -dJPEGQ=100 -r300 "${infile}"
echo "Done"
I've even run a benchmark on it. I converted the 756-page PDF-1.7 specification to JPEGs with both scripts:
Ma9ic's version needs 1413 seconds generate the 756 JPEGs.
My version saves 93% of that time and takes 91 seconds.
Moreover, Ma9ic's script produces on my system mostly black JPEG images, mine are Ok.

This is what I used
Document document = new Document();
try {
document.setFile(myProjectPath);
System.out.println("Parsed successfully...");
} catch (PDFException ex) {
System.out.println("Error parsing PDF document " + ex);
} catch (PDFSecurityException ex) {
System.out.println("Error encryption not supported " + ex);
} catch (FileNotFoundException ex) {
System.out.println("Error file not found " + ex);
} catch (IOException ex) {
System.out.println("Error handling PDF document " + ex);
}
// save page caputres to file.
float scale = 1.0f;
float rotation = 0f;
System.out.println("scale == " + scale);
// Paint each pages content to an image and write the image to file
InputStream fis2 = null;
File file = null;
for (int i = 0; i < 1; i++) {
BufferedImage image = (BufferedImage) document.getPageImage(i,
GraphicsRenderingHints.SCREEN,
Page.BOUNDARY_CROPBOX, rotation, scale);
RenderedImage rendImage = image;
// capture the page image to file
try {
System.out.println("\t capturing page " + i);
file = new File(myProjectActualPath + "myImage.png");
ImageIO.write(rendImage, "png", file);
fis2 = new BufferedInputStream(new FileInputStream(myProjectActualPath + "myImage.png"));
} catch (IOException ioe) {
System.out.println("IOException :: " + ioe);
} catch (Exception e) {
System.out.println("Exception :: " + e);
}
image.flush();
}

Related

Espressif ESP32 web server HTML example

I'm working on an embedded ESP32 design using one of the web server examples included in the esp-idf examples. I'm able to get the device into soft AP mode and display a simple web page. Now that I have that working, I'm trying to build a page with a graphic.
I'm using the Linux hex tool "xxd -i " to convert the HTML file into a hex dump array for the C include file. It works fine if the document is just HTML, but I'm stuck on trying to do this with an image.
I went as far as using xxd on both the HTML file and the image file and using "netconn_write" to write out both files. I also tried combining them into a single hex dump file. At this point I'm not sure how to proceed, any help is greatly appreciated.
You can use this utility to embed any number of binary files in your executable. Don't forget to set a correct mime type. Also, if the file is big, you have to rate limit the sending, which might become a non-trivial task.
Therefore I suggest to use a filesystem and an embedded web server to do the job. Take a look at https://github.com/cesanta/mongoose-os/tree/master/fw/examples/mjs_hello (disclaimer: I am one of the developers). It'll take you few minutes to get a firmware with working HTTP server, ready for you prototypes.
You can use de directive EMBED_FILES directly in CMakeLists.txt. For example, to add the file favicon.jpg image, in my CMakeLists.txt, in the same directory of main.c:
idf_component_register(SRCS "main.c"
INCLUDE_DIRS "."
EMBED_FILES "favicon.jpg")
And somewhere in the main.c:
/* The favicon */
static esp_err_t favicon_handler(httpd_req_t *req)
{
extern const char favicon_start[] asm("_binary_favicon_jpg_start");
extern const char favicon_end[] asm("_binary_favicon_jpg_end");
size_t favicon_len = favicon_end - favicon_start;
httpd_resp_set_type(req, "image/jpeg");
httpd_resp_send(req, favicon_start, favicon_len);
return ESP_OK;
}
static const httpd_uri_t favicon_uri = {
.uri = "/favicon.ico",
.method = HTTP_GET,
.handler = favicon_handler,
.user_ctx = NULL
};
You can add as many files you need in this way, text, html, json, etc... (respecting device memory, of course).

Extracting the outputs/results from an executed .pexe file

My goal is to convert a C++ program in to a .pexe file in order to execute it later on a remote computer. The .pexe file will contain some mathematical formulas or functions to be calculated on a remote computer, so I’ll be basically using the computational power of the remote computer. For all this I’ll be using the nacl_sdk with the Pepper library and I will be grateful if someone could clarify some things for me:
Is it possible to save the outputs of the executed .pexe file on the remote computer in to a file, if it’s possible then how? Which file formats are supported?
Is it possible to send the outputs of the executed .pexe file on the remote computer automatically to the host computer, if it’s possible then how?
Do I have to install anything for that to work on the remote computer?
Any suggestion will be appreciated.
From what I've tried it seems like you can't capture the stuff that your pexe writes to stdout - it just goes to the stdout of the browser (it took me hours to realize that it does go somewhere - I followed a bad tutorial that had me believe the pexes stdout was going to be posted to the javascript side and was wondering why it "did nothing").
I currently work on porting my stuff to .pexe also, and it turned out to be quite simple, but that has to do with the way I write my programs:
I write my (C++) programs such that all code-parts read inputs only from an std::istream object and write their outputs to some std::ostream object. Then I just pass std::cin and std::cout to the top-level call and can use the program interactively in the shell. But then I can easily swap out the top-level call to use an std::ifstream and std::ofstream to use the program for batch-processing (without pipes from cat and redirecting to files, which can be troublesome under some circumstances).
Since I write my programs like that, I can just implement the message handler like
class foo : public pp::Instance {
... ctor, dtor,...
virtual void HandleMessage(const pp::Var& msg) override {
std::stringstream i, o;
i << msg.AsString();
toplevelCall(i,o);
PostMessage(o.str());
}
};
so the data I get from the browser is put into a stringstream, which the rest of the code can use for inputs. It gets another stringstream where the rest of the code can write its outputs to. And then I just send that output back to the browser. (Downside is you have to wait for the program to finish before you get to see the result - you could derive a class from ostream and have the << operator post to the browser directly... nacl should come with a class that does that - I don't know if it actually does...)
On the html/js side, you can then have a textarea and a pre (which I like to call stdin and stdout ;-) ) and a button which posts the content of the textarea to the pexe - And have an eventhandler that writes the messages from the pexe to the pre like this
<embed id='pnacl' type='application/x-pnacl' src='manifest.nmf' width='0' height='0'/>
<textarea id="stdin">Type your input here...</textarea>
<pre id='stdout' width='80' height='25'></pre>
<script>
var pnacl = document.getElementById('pnacl');
var stdout = document.getElementById('stdout');
var stdin = document.getElementById('stdin');
pnacl.addEventListener('message', function(ev){stdout.textContent += ev.data;});
</script>
<button onclick="pnacl.postMessage(stdin.value);">Submit</button>
Congratulations! Your program now runs in the browser!
I am not through with porting my compilers, but it seems like this would even work for stuff that uses flex & bison (you only have to copy FlexLexer.h to the include directory of the pnacl sdk and ignore the warnings about the "register" storage location specifier :-)
Are you using the .pexe in a browser? That's the usual case.
I recommend using nacl_io to emulate POSIX in the browser (also look at file_io. This will allow you to save files locally, retrieve them, in any format you fancy.
To send the output use the browser's usual capabilities such as XMLHttpRequest. You need PNaCl to talk to JavaScript for this, you may want to look at some of the examples.
A regular web server will do, it really depends on what you're doing.

Importing local json file using d3.json does not work

I try to import a local .json-file using d3.json().
The file filename.json is stored in the same folder as my html file.
Yet the (json)-parameter is null.
d3.json("filename.json", function(json) {
root = json;
root.x0 = h / 2;
root.y0 = 0;});
. . .
}
My code is basically the same as in this d3.js example
If you're running in a browser, you cannot load local files.
But it's fairly easy to run a dev server, on the commandline, simply cd into the directory with your files, then:
python -m SimpleHTTPServer
(or python -m http.server using python 3)
Now in your browser, go to localhost:3000 (or :8000 or whatever is shown on the commandline).
The following used to work in older versions of d3:
var json = {"my": "json"};
d3.json(json, function(json) {
root = json;
root.x0 = h / 2;
root.y0 = 0;
});
In version d3.v5, you should do it as
d3.json("file.json").then(function(data){ console.log(data)});
Similarly, with csv and other file formats.
You can find more details at https://github.com/d3/d3/blob/master/CHANGES.md
Adding to the previous answers it's simpler to use an HTTP server provided by most Linux/ Mac machines (just by having python installed).
Run the following command in the root of your project
python -m SimpleHTTPServer
Then instead of accessing file://.....index.html open your browser on http://localhost:8080 or the port provided by running the server. This way will make the browser fetch all the files in your project without being blocked.
http://bl.ocks.org/eyaler/10586116
Refer to this code, this is reading from a file and creating a graph.
I also had the same problem, but later I figured out that the problem was in the json file I was using(an extra comma). If you are getting null here try printing the error you are getting, like this may be.
d3.json("filename.json", function(error, graph) {
alert(error)
})
This is working in firefox, in chrome somehow its not printing the error.
Loading a local csv or json file with (d3)js is not safe to do. They prevent you from doing it. There are some solutions to get it working though. The following line basically does not work (csv or json) because it is a local import:
d3.csv("path_to_your_csv", function(data) {console.log(data) });
Solution 1:
Disable the security in your browser
Different browsers have different security setting that you can disable. This solution can work and you can load your files. Disabling is however not advisable. It will make you vulnerable for all kind of threads. On the other hand, who is going to use your software if you tell them to manually disable the security?
Disable the security in Chrome:
--disable-web-security
--allow-file-access-from-files
Solution 2:
Load your csv/json file from a website.
This may seem like a weird solution but it will work. It is an easy fix but can be unpractical though. See here for an example. Check out the page-source. This is the idea:
d3.csv("https://path_to_your_csv", function(data) {console.log(data) });
Solution 3:
Start you own browser, with e.g. Python.
Such a browser does not include all kind of security checks. This may be a solution when you experiment with your code on your own machine. In many cases, this may not be the solution when you have users. This example will serve HTTP on port 8888 unless it is already taken:
python -m http.server 8888
python -m SimpleHTTPServer 8888 &
Open the (Chrome) browser address bar and type the underneath. This will open the index.html. In case you have a different name, type the path to that local HTML page.
localhost:8888
Solution 4:
Use local-host and CORS
You may can use local-host and CORS but the approach is not user-friendly coz setting up this, may not be so straightforward.
Solution 5:
Embed your data in the HTML file
I like this solution the most. Instead of loading your csv, you can write a script that embeds your data directly in the html. This will allow users use their favorite browser, and there are no security issues. This solution may not be so elegant because your html file can grow very hard depending on your data but it will work though. See here for an example. Check out the page-source.
Remove this line:
d3.csv("path_to_your_csv", function(data) { })
Replace with this:
var data =
[
$DATA_COMES_HERE$
]
You can't readily read local files, at least not in Chrome, and possibly not in other browsers either.
The simplest workaround is to simply include your JSON data in your script file and then simply get rid of your d3.json call and keep the code in the callback you pass to it.
Your code would then look like this:
json = { ... };
root = json;
root.x0 = h / 2;
root.y0 = 0;
...
I have used this
d3.json("graph.json", function(error, xyz) {
if (error) throw error;
// the rest of my d3 graph code here
}
so you can refer to your json file by using the variable xyz and graph is the name of my local json file
Use resource as local variable
var filename = {x0:0,y0:0};
//you can change different name for the function than json
d3.json = (x,cb)=>cb.call(null,x);
d3.json(filename, function(json) {
root = json;
root.x0 = h / 2;
root.y0 = 0;});
//...
}

Perl HTML file upload issue. File has zero size

I have a perl CGI script, that works, to upload a file from a PC to a Linux server.
It works exactly as intended when I write the call to the CGI in my own HTML form and then execute, but when I put the same call into an existing application, the file is created on the server, but does not get the data, it is size zero.
I have compared environment variables (those I can extract from %ENV) and nothing there looks like a cause. I actually tried changing several of the ENV in my own HTML script, to the values the existing application was using, and this did not reveal the problem.
Nothing in the log gives me a clue, the upload operation thinks it was successful.
The user is the same for both tests. If permissions were an issue, then the file would not even be created on the server.
Results are the same in IE as in Chrome (works from my own HTML script, not from within the application).
What specific set up should I be looking at, to compare?
This is the upload code:
if (open(UPLOADFILE, ">$upload_dir/$fname")) {
binmode UPLOADFILE;
while (<$from_fh>) {
print UPLOADFILE;
}
close UPLOADFILE;
$out_msg = "Done with Upload: upload_dir=$upload_dir fname=$fname";
}
else {
$out_msg = "ERROR opening for upload: upload_dir=$upload_dir filename=$filename";
}
I did verify that
It does NOT enter the while loop, when running from inside the application.
It does enter the while loop, when called from my own HTML script.
The value of $from_fh is the same for both runs.
All values, used in the below block, are exactly the same for both runs.
You could check the error result of your open?
my $err;
open(my $uploadfile, ">", "$upload_dir/$fname") or $err = $!;
if (!$uploadfile) {
my $out_msg = "ERROR opening for upload: upload_dir=$upload_dir filename=$filename: $err";
}
else {
### Stuff
...;
}
My guess, based on the fact you are embedding it in another application, is that all the input has been read already by some functionality that is part of the other application. For example, if I tried to use this program as part of a CGI script, and I had used the param() function from CGI.pm, then the entire file upload would have been read already. So if my own code tried to read the file again, it would receive zero data, because the data would have been ready already.

Proper way to convert HTML to PDF

I want to convert HTML page to PDF. There are several options, but they have some problems.
Print HTML page in IE through PDFCreator (too cumbersome)
Use wkhtmltopdf (low quality)
Use PhantomJS (low quality)
Maybe I can use a complex solution? To print with PhantomJS through PDFCreator, or improve quality of wkhtmltopdf, or maybe something else?
Maybe you can try with Amyuni WebkitPDF. It's not open source, but it's free for commercial use, and it can be used from C#.
Sample code for C# from the documentation:
static private void SaveToFile(string url, string file)
{
// Store the WebkitPDFContext returned value in an IntPtr
IntPtr context = IntPtr.Zero;
// Open the URL. The WebkitPDFContext returned value will be stored in
// the passed in IntPtr
int ret = WKPDFOpenURL(url, out context, 0, false);
if (ret == 0)
{
// if ret is 0, then we succeeded in opening the URL.
// Save the result as PDF to a file. Use the obtained context value
ret = WKPDFSaveToFile(file, context);
}
if (ret != 0)
Debug.WriteLine("Failed to run SaveToFile on '" + url + "' to generate file '" + file + "'");
// Make sure to close the WebkitPDFContext because otherwise the
// internal PDFCreator as well as other objects will not be released
WKPDFCloseContext(context);
}
Usual disclaimer applies.
You can properly convert HTML to PDF using GroupDocs.Conversion for .NET API.
Have a look at the code:
// Setup Conversion configuration and Initailize ConversionHandler
ConversionConfig config = new ConversionConfig();
config.StoragePath = "source file storage path";
// Initailize ConversionHandler
ConversionHandler conversionHandler = new ConversionHandler(config);
// Convert and save converted document
var convertedDocumentPath = conversionHandler.Convert("sample.html", new PdfSaveOptions{});
convertedDocumentPath.Save("result-" + Path.GetFileNameWithoutExtension("sample.html") + ".pdf");
Disclosure: I work as Developer Evangelist at GroupDocs.
patched wkhtmltopdf (a very good WebKit-based command line tool, fast) with --print-media-type --no-stop-slow-scripts keys
chromium --headless --no-zygote --single-process ... --print-to-pdf= ... (slower, Portrait orientation only)
chromium headless via DevTools Protocol (slow, only a few programming languages do have bindings to)
wrapper around Blink Engine (e.g., Qt5 https://code.qt.io/cgit/qt/qtwebengine.git/tree/examples/webenginewidgets/html2pdf?h=5.15)
If you believe in containers, - https://github.com/thecodingmachine/gotenberg (internally - chromium headless via DevTools Protocol)
google chrome Save as PDF
the output looks exactly the same (as rendered by chrome)
here, I use Puppeteer to automate the process: singlefile or in Folder
https://github.com/FuPeiJiang/puppeteer-pdf