Generate html with images from pdf using Linux -poppler-utils-pdftohtml - html

Currently I am working with pdftohtml, under CentOS, poppler-utils. The concept is simple - user uploads the PDF file and sees the HTML version of that file. I use the simple command -
$> pdftohtml source.pdf target.html
but it doesn't work! Later on, I try to create html using complex switch with no frames:
$> pdftohtml -c - noframes source.pdf target.html
Still no Luck! The problem is - The image of the pdf file (the images are inside of that pdf file) can't appear in html, sometimes, the image overlaps! Any ideas?
Here is the PHP Code -
Add.php
<!DOCTYPE HTML PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"><head>
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
<link href="css/style.css" rel="stylesheet" type="text/css"/>
<title>CompleteView</title>
</head>
<body>
<form method="post" action="save.php" enctype="multipart/form-data">
<input type="hidden" name="action" value="add">
<tr class="dark_bgcolor text-content">
<td align="left" width="20%">Upload</td>
<td align="left" width="1%">:</td>
<td align="left">
<input type="file" name="img_full" class="look" size="50">
(Only .pdf)
</td>
</tr>
<tr class="bottom_bgcolor">
<td align="center" colspan="3"><input type="submit" name="" value="Upload" class="look"></td>
</tr>
</form>
</body>
</html>
Save.php
<?php
$myNewFolderPath=rand();
mkdir($myNewFolderPath);
$fname="full_".uniqid("");
$filename=$fname.'.pdf';
//$uploadpath=SPL_IMG_UPLOADPATH.$filename;
move_uploaded_file($_FILES['img_full']['tmp_name'], $myNewFolderPath.'/'.$filename);
chmod($myNewFolderPath.'/'.$filename, 0777);
echo ('/usr/local/bin/pdftohtml '.$myNewFolderPath.'/'.$filename);
exec('/usr/local/bin/pdftohtml -c -noframes'.$myNewFolderPath.'/'.$filename);
header('Location:'.$fname.'.html');
//exec('/usr/local/bin/pdftohtml 2098602105/EssentialC.pdf');
?>
One More thing - the pdftohtml version is -0.36
Here is The Screenshots -
Result -

$ pdftohtml -c source.pdf target.html
This will output in complex mode. You can't use -noframes with the complex flag.
$ man pdftohtml
-noframes generate no frames. Not supported in complex output mode.

Related

Table in HTML not presented correctly

My code generates the following html file. As you can see the table tag is at the beginning of the code, but when we display it in a browser, the table is always at the end of the page. Can any one explain why it happens?
<table border = '1'><tr><td>count(*)</td></tr><tr><td>3616</td> </tr>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head><title>Project 1B</title><meta http-equiv="Content- Type" content="text/html; charset=iso-8859-1" /></head>
<body>
<form method="GET">
<p><h3>Project 1B: Queries</h3></p>
<p>Type an SQL query in the following box:
</p>
<TEXTAREA NAME="area" ROWS=20 COLS=100>select count(*) from Movie;
</TEXTAREA>
<p>
<input type="submit" name="Submit" value="Submit" />
</p>
</form>
<hr />
<h3>Report:</h3>
</body>
</html>
I imagine it's because your <table> tag appears before your <html> tag. Try changing your code to generate the <html> then the <table> tags.
Thanks every one, it turns out that I forget to close the table tag and it does all the weird things.
as I said, the browsers will actually fix many errors while decoding(parsing) the html

Do not see changes made in HTML file in Eclipse

I have created simple login.html file in Dynamic Web Project + Maven + GlassFish in Eclipse.
!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<title>Login Page</title>
<style type="text/css">
form {text-align: center}
</style>
</head>
<body>
<form action="LoginServlet" method="post">
USERNAME:<input type="text" name="name" value=""/><br>
PASSWORD:<input type="password" name="password" value=""/><br>
<input type="submit" value="ENTER"/>
<input type="reset" value="CLEAR"/>
</form>
</body>
</html>
I launch login.html file like Run as -> Run on server and it looks ok.
But when I changed something in html and try to run it on server I did not see my changes.
Try mvn clean, mvn install.
What should I do to see changes in html file when I Run it on Server? (In Preview I see changes)
When you launch your *.html file on server it opens in Eclipse browser. To see your changes you need to click on Green triangle "Go to the selected URL" and page will be updated.

HTML file with Dojo refuses to work from my local machine

I am using some CDN links in a file copied directly from a dojo tutorial. I can't figure out why it won't work. There is supposed to be a dialog when you click the Show Me! button but the parts of the dialog are just visible on the page and the button does nothing.
<!DOCTYPE html>
<html >
<head>
<link href="http://ajax.googleapis.com/ajax/libs/dojo/1.7.4/dijit/themes/claro/claro.css" type="text/css" rel="stylesheet" />
<script>dojoConfig = {parseOnLoad: true}</script>
<script src="//ajax.googleapis.com/ajax/libs/dojo/1.7.4/dojo/dojo.js"
data-dojo-config="async: true"></script>
<script>
require(["dijit/Dialog", "dijit/form/TextBox", "dijit/form/Button"]);
</script>
</head>
<body class="claro">
<div data-dojo-type="dijit/Dialog" data-dojo-id="myDialog" title="Name and Address">
<table class="dijitDialogPaneContentArea">
<tr>
<td><label for="name">Name:</label></td>
<td><input data-dojo-type="dijit/form/TextBox" name="name" id="name"></td>
</tr>
<tr>
<td><label for="address">Address:</label></td>
<td><input data-dojo-type="dijit/form/TextBox" name="address" id="address"> </td>
</tr>
</table>
<div class="dijitDialogPaneActionBar">
<button data-dojo-type="dijit/form/Button" type="submit" id="ok">OK</button>
<button data-dojo-type="dijit/form/Button" type="button" data-dojo-props="onClick:function(){myDialog.hide();}"
id="cancel">Cancel</button>
</div>
</div>
<button data-dojo-type="dijit/form/Button" type="button" onClick="myDialog.show();">
Show me!
</button>
</body>
</html>
You've always got to use a webserver even if you're viewing locally. It simply won't run from your local filesystem.
Run your source code from a web server, not the file system, even if the web server is
running on your development machine. The browser's handling of HTTP requests from the local
file system are more restrictive than from a web server, even when it's running in the same
machine. For consistent results, you should always run Dojo from within any HTTP web server
(Apache, nginx, Tomcat, IIS, Jetty, etc.).
Lucian's remark about the missing http:, that isn't a problem:
You may also load Dojo from a CDN. This is useful for quickly using Dojo, as it doesn't
require you to host your own copy of Dojo. You'll notice in many of our tutorials that we
show protocol-less URLs, e.g. . This allows you to use Dojo within http and https
applications without adjusting the URL. For more information, check out the Dojo CDN
tutorial
See: Dojo Start
First you had a "http:" missing in the second CDN loading thingy. And then it seems that somehow the dijit dialog files aren't loading... ! You could load them manually OR
you could switch the version of dojo from 1.7.4 to 1.8 (i would advise that) by loading 1.8 files & then it should work. Here is the code:
<!DOCTYPE html>
<html >
<head>
<link href="http://ajax.googleapis.com/ajax/libs/dojo/1.8/dijit/themes/claro/claro.css" type="text/css" rel="stylesheet" />
<script>dojoConfig = {parseOnLoad: true}</script>
<script src="http://ajax.googleapis.com/ajax/libs/dojo/1.8/dojo/dojo.js"
data-dojo-config="async: true"></script>
<script>
require(["dijit/Dialog", "dijit/form/TextBox", "dijit/form/Button"]);
</script>
</head>
<body class="claro">
<div data-dojo-type="dijit/Dialog" data-dojo-id="myDialog" title="Name and Address">
<table class="dijitDialogPaneContentArea">
<tr>
<td><label for="name">Name:</label></td>
<td><input data-dojo-type="dijit/form/TextBox" name="name" id="name"></td>
</tr>
<tr>
<td><label for="address">Address:</label></td>
<td><input data-dojo-type="dijit/form/TextBox" name="address" id="address"> </td>
</tr>
</table>
<div class="dijitDialogPaneActionBar">
<button data-dojo-type="dijit/form/Button" type="submit" id="ok">OK</button>
<button data-dojo-type="dijit/form/Button" type="button" data-dojo-props="onClick:function(){myDialog.hide();}"
id="cancel">Cancel</button>
</div>
</div>
<button data-dojo-type="dijit/form/Button" type="button" onClick="myDialog.show();">
Show me!
</button>
</body>
</html>

Why `?` is displayed instead of non-english chars under cPanel? But not under localhost nor DirectAdmin? Using codeigniter and UTF-8. NO mysql used

I first created a simple HTML page that uses UTF-8 as its character encoding.
Then I copied the HTML content to a view in codeigniter and it was still ok (I had used non-english characters that was no problem as always)
I added a simple dynamic functionality (there is a contact form that emails users' feedback to site admins).
There was still no problem at localhost. (which is a LAMP server running on Ubuntu 12.04 LTS)
I have not even connected to mysql or any other DBMSs.
But strange is that when I uploaded the app to server, only ??????? is shown instead of the Persian characters (For example نام (which means Name) became ??? and so so...)
Maybe I should state that I have also used session library to thank the user after his feedback, nothing else.
Note that it's the only page in the little project (it's more an under construction page) and nothing else has been used in it.
I have really no idea about where the problem comes from!
UPDATE after comments
Now I can see that the problem is only with cPanel. On Directadmin (and local LAMP of course) everything is normal. I have not tested other hosting control panels.
Both Chromium and Firefox DO understand UTF-8 as the page's character encoding. They are both set as Auto-detect encoding.
URL is http://WEBSITE.COM/dmf/dynamic/ (dmf is the abbreviation of the project name!). There is nothing non-english in the URL.
The page's code is as follows:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>نام وب‌سایت</title>
<link rel="stylesheet" type="text/css" href="<?php echo base_url('template/css/style.css'); ?>" />
<!-- 1. jquery library -->
<script
src="http://ajax.googleapis.com/ajax/libs/jquery/1/jquery.min.js">
</script>
<!-- 2. flowplayer -->
<script src="http://releases.flowplayer.org/5.1.1/flowplayer.min.js"></script>
<!-- 3. skin -->
<link rel="stylesheet" type="text/css" href="http://releases.flowplayer.org/5.1.1/skin/minimalist.css" />
</head>
<body>
<div id="wrapper">
<header>
<h1>نام وب‌سایت</h1>
</header>
<section id="box-container">
<?php
echo form_open('contact', "id='contact-us'");
echo form_fieldset('تماس با ما');
if ($this->session->userdata('mailsent'))
{
echo '<div>باتشکر، پیام شما ارسال شد</div>';
$this->session->sess_destroy();
}
echo '<input tabindex="1" id="name-in" value="نام" type="text" name="name"/>
<input tabindex="2" id="mail-in" value="ایمیل" type="email" name="email"/>
<textarea tabindex="3" id="content-in" name="message">توضیحات</textarea>
<input tabindex="4" id="submit" type="submit" value="ارسال" />';
echo '<div class="clear"></div>';
echo form_fieldset_close();
echo form_close();
?>
<div id="sms-comp">
<h2>مسابقه پیامکی</h2>
<p>
<span id="comp-title">یک متن تستی</span>
سامی برندگان هفته پیش
</p>
</div>
<div id="last-program">
<h2>دریافت آخرین برنامه</h2>
<div class="flowplayer">
<video id="my_video_1" width="212" height="126"
poster="<?php echo base_url('template/images/img.jpg'); ?>" controls="controls"
src="http://archive.org/download/Pbtestfilemp4videotestmp4/video_test.ogv" type='video/mp4'>
</video>
</div>
</div>
<div class="clear"></div>
</section>
</div>
<footer>
طراحی و توسعه : پاورد بای
</footer>
</body>
</html>

bad conversion from html to pdf with htmldoc

I'm trying to convert HTML to PDF using htmldoc, but even basic HTML does not convert properly, I have this HTML:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<title>pdf test</title>
</head>
<body>
<table border="1">
<tr>
<td width="50%">
a
</td>
<td>
<p>
some address
</p>
<p>
some other text
</p>
</td>
</tr>
<tr>
<td>
test<br>
test2<br>
asdfasdf<br>
qwerqwer<br>
fasdfasdf
</td>
<td>
bla
</td>
</tr>
</table>
</body>
</html>
but it renders like this: test.pdf using this command:
htmldoc --webpage --color --charset utf-8 -t pdf14 --size a4 test.html -f test.pdf
it's HTMLDOC Version 1.9svn, I tried to change charset, add thead, tbody etc and nothing helped .. do you know what can be the problem ?
also it doesn't accept style="padding: 10px" in that paragraphs etc
The command:
htmldoc --size universal --webpage -t pdf --firstpage p1 -f test.pdf test.html
renders the page well for me. It is unclear from the original question whether the options for utf-8 color and pdf type you entered are actually needed for your result or are actually the cause of the incorrect rendering.