I have a process where the html is stored in database with image links. the images are also stored in db as well. I've created a controller action which reads the image from database. the path I'm generating is something like /File/Image?path=Root/test.jpg.
this image path is embedded in html in img tag like <img alt="logo" src="/File/Image?path=Root/001.jpg" />
I'm trying to use itextsharp to read the html from the database and create a pdf document
string _html = GenerateDocumentHelpers.CommissioningSheet(fleetId);
string _html = GenerateDocumentHelpers.CommissioningSheet(fleetId);
Document _document = new Document(PageSize.A4, 80, 50, 30, 65);
MemoryStream _memStream = new MemoryStream();
PdfWriter _writer = PdfWriter.GetInstance(_document, _memStream);
StringReader _reader = new StringReader(_html);
HTMLWorker _worker = new HTMLWorker(_document);
_document.Open();
_worker.Parse(_reader);
_document.Close();
Response.Clear();
Response.AddHeader("content-disposition", "attachment; filename=Commissioning.pdf");
Response.ContentType = "application/pdf";
Response.Buffer = true;
Response.OutputStream.Write(_memStream.GetBuffer(), 0, _memStream.GetBuffer().Length);
Response.OutputStream.Flush();
Response.End();
return new FileStreamResult(Response.OutputStream, "application/pdf");
This code gives me an illegal character error. this comes from the image tag, it is not recognizing ? and = characters, is there a way I can render this html with img tag so that when I create a pdf it renders the html and image from the database and creates a pdf or if itextsharp can't do it, can you provide me with any other third party open source tools that can accomplish this task?
If the image source isn't a fully qualified URL including protocol then iTextSharp assumes that it is a file-based URL. The solution is to just convert all image links to absolute in the form http://YOUR_DOMAIN/File/Image?path=Root/001.jpg.
You can also set a global property on the parser that works pretty much the same as the HTML <BASE> tag:
//Create a provider collection to set various processing properties
System.Collections.Generic.Dictionary<string, object> providers = new System.Collections.Generic.Dictionary<string, object>();
//Set the image base. This will be prepended to the SRC so watch your forward slashes
providers.Add(HTMLWorker.IMG_BASEURL, "http://YOUR_DOMAIN");
//Bind the providers to the worker
worker.SetProviders(providers);
worker.Parse(reader);
Below is a full working C# 2010 WinForms app targeting iTextSharp 5.1.2.0 that shows how to use a relative image and set its base using the global provider. Everything is pretty much the same as your code, although I through in a bunch of using statements to ensure proper cleanup. Make sure to watch the leading and trailing forward slashes on everything, the base URL gets prepended directly only the SRC attribute and you might end up with double-slashes if its not done correctly. I'm hard-balling a domain in here but you should be able to easily use the System.Web.HttpContext.Current.Request object.
using System;
using System.IO;
using System.Windows.Forms;
using iTextSharp.text;
using iTextSharp.text.html.simpleparser;
using iTextSharp.text.pdf;
namespace WindowsFormsApplication1
{
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
}
private void Form1_Load(object sender, EventArgs e)
{
string html = #"<img src=""/images/home_mississippi.jpg"" />";
string outputFile = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), "HtmlTest.pdf");
using (FileStream fs = new FileStream(outputFile, FileMode.Create, FileAccess.Write, FileShare.None)) {
using (Document doc = new Document(PageSize.TABLOID)) {
using (PdfWriter writer = PdfWriter.GetInstance(doc, fs)) {
doc.Open();
using (StringReader reader = new StringReader(html)) {
using (HTMLWorker worker = new HTMLWorker(doc)) {
//Create a provider collection to set various processing properties
System.Collections.Generic.Dictionary<string, object> providers = new System.Collections.Generic.Dictionary<string, object>();
//Set the image base. This will be prepended to the SRC so watch your forward slashes
providers.Add(HTMLWorker.IMG_BASEURL, "http://www.vendiadvertising.com");
//Bind the providers to the worker
worker.SetProviders(providers);
worker.Parse(reader);
}
}
doc.Close();
}
}
}
this.Close();
}
}
}
Related
I have a need to embed MS-Office documents (Excel, Word) into AutoCAD using Design Automation. Searching around the web, it seems that this is not possible because the MS-Office applications, which would act as an OLE Client, would need to be running on the Forge Server. Could someone confirm that this is the case?
If I am correct in my above statement, my next best alternative would be to embed .EMF files created from each page of the document I want to embed; alternatively using raster images would also be acceptable. Creating the .EMF or raster files is not a problem. I just can't find a solution for embedding the file that does not involve copying them to the clipboard and using the PASTECLIP command. This approach has worked for me in the AutoCAD application using a C# AutoCAD.NET plugin, an OLE2Frame object is created, but it fails in accoreconsole (because PASTECLIP uses a UI class which is not available). This leads me to think that the same would occur while running the bundle in Design Automation.
The best I have been able achieve so far is to write a raster image files to the working directory and linking the raster images to the AutoCAD document using RasterImageDef and RasterImage (code below). Is this the only way I can do this? Can I do something similar using an EMF image, which is vector based, instead of a raster image? Or is there a way to actually embed an EMF (preferred) or raster image instead of just linking the images?
The code below fails if I use .EMF files, because RasterImageDef and RasterImage do not support the the EMF file; the EMF file being a vector format, not a raster format?
[CommandMethod("TEST")]
public void Test()
{
Document doc = Application.DocumentManager.MdiActiveDocument;
Database db = doc.Database;
Editor ed = doc.Editor;
// Get the file name of the image using the editor to prompt for the file name
// Create the prompt
PromptOpenFileOptions options = new PromptOpenFileOptions("Enter Sequence file path");
options.PreferCommandLine = true;
// Get the file name, use no quotes
PromptFileNameResult result = null;
try { result = ed.GetFileNameForOpen(options); }
catch (System.Exception ex)
{
DisplayLogMessage($"Could not get sequence file location. Exception: {ex.Message}.", ed);
return;
}
// Get the rtf filename from the results
string filename = result.StringResult;
DisplayLogMessage($"Got sequence filename: {filename}", ed);
// Load the Sequence.rtf document
Aspose.Words.Document seq;
using (FileStream st = new FileStream(filename, FileMode.Open))
{
seq = new Aspose.Words.Document(st);
st.Close();
}
DisplayLogMessage($"Aspose.Words Loaded: {filename}", ed);
Transaction trans = db.TransactionManager.StartTransaction();
// Get or create the image dictionary
ObjectId imageDictId = RasterImageDef.GetImageDictionary(db);
if (imageDictId != null)
imageDictId = RasterImageDef.CreateImageDictionary(db);
// Open the Image Dictonary
DBDictionary imageDict = (DBDictionary)trans.GetObject(imageDictId, OpenMode.ForRead);
double x = 0.0;
double y = 0.0;
try
{
// For each page in the Sequence.
for (int i = 0; i < seq.PageCount; i++)
{
DisplayLogMessage($"Starting page {i + 1}", ed);
// extract the page.
Aspose.Words.Document newSeq = seq.ExtractPages(i, 1);
Aspose.Words.Saving.ImageSaveOptions imgOptions = new Aspose.Words.Saving.ImageSaveOptions(Aspose.Words.SaveFormat.Emf);
imgOptions.Resolution = 300;
DisplayLogMessage($"Extracted page {i + 1}", ed);
string dictName = Guid.NewGuid().ToString();
filename = Path.Combine(Path.GetDirectoryName(doc.Name), dictName + ".Emf");
// Save the image
SaveOutputParameters sp = newSeq.Save(filename, imgOptions);
DisplayLogMessage($"Saved {dictName}.Emf", ed);
RasterImageDef imageDef = null;
ObjectId imageDefId;
// see if my guid is in there
if (imageDict.Contains(dictName))
imageDefId = (ObjectId)imageDict.GetAt(dictName);
else
{
// Create an image def
imageDef = new RasterImageDef();
imageDef.SourceFileName = $"./{dictName}.Emf";
// load the image
imageDef.Load();
imageDict.UpgradeOpen();
imageDefId = imageDict.SetAt(dictName, imageDef);
trans.AddNewlyCreatedDBObject(imageDef, true);
}
// create raster image to reference the definition
RasterImage image = new RasterImage();
image.ImageDefId = imageDefId;
// Prepare orientation
Vector3d uCorner = new Vector3d(8.5, 0, 0);
Vector3d vOnPlane = new Vector3d(0, 11, 0);
Point3d ptInsert = new Point3d(x, y, 0);
x += 8.5;
CoordinateSystem3d coordinateSystem = new CoordinateSystem3d(ptInsert, uCorner, vOnPlane);
image.Orientation = coordinateSystem;
// some other stuff
image.ImageTransparency = true;
image.ShowImage = true;
// Add the image to ModelSpace
BlockTable bt = (BlockTable)trans.GetObject(db.BlockTableId, OpenMode.ForRead);
BlockTableRecord btr = (BlockTableRecord)trans.GetObject(bt[BlockTableRecord.ModelSpace], OpenMode.ForWrite);
btr.AppendEntity(image);
trans.AddNewlyCreatedDBObject(image, true);
// Create a reactor between the RasterImage
// and the RasterImageDef to avoid the "Unreferenced"
// warning the XRef palette
RasterImage.EnableReactors(true); // in the original was true
image.AssociateRasterDef(imageDef);
}
trans.Commit();
}
catch (System.Exception ex)
{
DisplayLogMessage("ERROR: " + ex.Message,ed);
trans.Abort();
}
}
Raster images are always linked. There's no way to embed them. The only way to embed an image is to use AcDbOle2Frame (C++) or Autodesk.AutoCAD.DatabaseServices.Ole2Frame (C#). In theory, it is possible to create these objects without the "OLE server" being present but I haven't tried so I don't know if enough APIs are exposed to make it happen.
You should try it and see how far you can get.
Albert
There is way to embed raster image, it is not straightforeward, you need to use C++\ObjectARX API, please refer this https://github.com/MadhukarMoogala/EmbedRasterImage/tree/EmbedRasterImageUsingDBX
I'm using itextpdf-5.0.6.jar (Java 8) and when I try to export html code with base64 image tag I get file not found exception.
if I remove the image tag everything works great!
I found few solutions about overriding image tag processor but most of them are old and not compatiable with the 5.0.6 version.
Here is the HTML I send:
"<!doctype html>\n<html lang=\"en\">\n<head>\n
<meta charset=\"UTF-8\">\n
<title>Test PDF</title>\n</head>\n<body>\n\n
<div class=\"pdf-header\">\n\n
<img src=\"\"> \n\n\n</div>\n\n<div class=\"main\">\n<div class=\"canvas\">\nHellow world</div></div></body>\n</html>"
part of my code:
fileOutputStream = new FileOutputStream(file);
Document document = new Document();
PdfWriter.getInstance(document, fileOutputStream);
document.open();
HTMLWorker htmlWorker = new HTMLWorker(document);
StringReader stringReader = new StringReader(htmlCode);
htmlWorker.parse(stringReader);
document.close();
fileOutputStream.close();
any help will be appricated
thanks
Please stop using HTMLWorker, as repeated many times on StackOverflow, the HTMLWorker class has been abandoned in favor of XML Worker a long time ago. We won't invest in further development of HTMLWorker so it's a very bad choice to use it. Please switch to XML Worker.
Also upgrade to the latest iText version, the version you are using dates from February 4, 2011, many bugs have been fixed in the 4 years that have passed. Make sure you have both the iText jar and the XML Worker jar with the same version number.
Base64 images aren't supported yet, but I have made you a very simple Proof of Concept, showing how easy it is to add support for such images. Take a look at the ParseHtml4 example and the resulting PDF: html_4.pdf.
To achieve this, you need to write an implementation of the ImageProvider interface. I have done this by extending the AbstractImageProvider class:
class Base64ImageProvider extends AbstractImageProvider {
#Override
public Image retrieve(String src) {
int pos = src.indexOf("base64,");
try {
if (src.startsWith("data") && pos > 0) {
byte[] img = Base64.decode(src.substring(pos + 7));
return Image.getInstance(img);
}
else {
return Image.getInstance(src);
}
} catch (BadElementException ex) {
return null;
} catch (IOException ex) {
return null;
}
}
#Override
public String getImageRootPath() {
return null;
}
}
As you can see, I check for the existence of "base64," in whatever is passed to XML Worker through the src attribute of the img tag. If that String is present, I decode whatever follows that "base64," and I return an Image object that is created using the resulting bytes.
Once you have this ImageProvider implementation, it's only a matter of passing it to XML Worker.
I have an sql statements that selects a table of data that i want to export to excel in the .xls format,
i added this table to a grid view then rendered that grid view to create an html writer and write it on excel file using asp.net.
But i keep having this warning that the file format and extension does not match.
The issue is that the file you are creating is not a genuine Excel file. It's HTML with a .xls extension.
Please, i need to know what is the best way to export these selected data to the xls file without the warning.
I Have also tried exporting from the dataTable directly, but i still get the warning when tying to open the excel.
// these namespaces need to be added to your code behind file
using System.Configuration;
using System.Data.SqlClient;
using System.Data;
namespace MySpot.UserPages
{
public partial class Journal : System.Web.UI.Page
{
SqlConnection conn = new SqlConnection(ConfigurationManager.ConnectionStrings["MySpotDBConnStr"].ConnectionString);
DataTable dt = new DataTable();
// regular page_load from .aspx file
protected void Page_Load(object sender, EventArgs e)
{
if (!IsPostBack)
{
}
}
// added a button with ID=btnDownload and double clicked it's onclick event to auto create method
protected void btnDownload_Click(object sender, EventArgs e)
{
string queryStr = "SELECT * from table";
SqlDataAdapter sda = new SqlDataAdapter(queryStr, conn);
sda.Fill(dt);
ExportTableData(dt);
}
// this does all the work to export to excel
public void ExportTableData(DataTable dtdata)
{
string attach = "attachment;filename=journal.xls";
Response.ClearContent();
Response.AddHeader("content-disposition", attach);
Response.ContentType = "application/ms-excel";
if (dtdata != null)
{
foreach (DataColumn dc in dtdata.Columns)
{
Response.Write(dc.ColumnName + "\t");
//sep = ";";
}
Response.Write(System.Environment.NewLine);
foreach (DataRow dr in dtdata.Rows)
{
for (int i = 0; i < dtdata.Columns.Count; i++)
{
Response.Write(dr[i].ToString() + "\t");
}
Response.Write("\n");
}
Response.End();
}
}
}
}
http://blogs.msdn.com/b/vsofficedeveloper/archive/2008/03/11/excel-2007-extension-warning.aspx
The current design does not allow you to open HTML content from a web site in Excel unless the extension of the URL is .HTM/.HTML/.MHT/.MHTML. So ASP pages that return HTML and set the MIME type to something like XLS to try to force the HTML to open in Excel instead of the web browser (as expected) will always get the security alert since the content does not match the MIME type. If you use an HTML MIME type, then the web browser will open the content instead of Excel. So there is no good workaround for this case because of the lack of a special MIME type for HTML/MHTML that is Excel specific. You can add your own MIME type if you control both the web server and the client desktops that need access to it, but otherwise the best option is to use a different file format or alert your users of the warning and tell them to select Yes to the dialog.
I need to generate pdf from html dynamically using asp.net. HTML is stored in database. HTML has tables and css, upto 10 pages. I have tried iTextSharp by directly passing html, it produces pdf which is not opening. Destination pdf.codeplex.com has no documentation, it produces PDF with styles from parent page.
Any other solution will be helpful.
I've tried many HTML to PDF solutions including iTextSharp, wkhtmltopdf and ABCpdf (paid)
I'm currently settled on PhantomJS a headless, open-source, WebKit-based browser. It is scriptable with a javascript API which is reasonably well documented.
The only disadvantage I found was that attempting to use stdin to pass HTML into the process was unsuccessful because the REPL still has some bugs. I also found that using stdout seemed to be a lot slower than simply allowing the process to write to disk.
The code below avoids stdin and stdout by creating the javascript input as a temp file, executing PhantomJS, copying the output file to a MemoryStream and cleaning up the temporary files at the end.
using System.IO;
using System.Drawing;
using System.Diagnostics;
public Stream HTMLtoPDF (string html, Size pageSize) {
string path = "C:\\dev\\";
string inputFileName = "tmp.js";
string outputFileName = "tmp.pdf";
StringBuilder input = new StringBuilder();
input.Append("var page = require('webpage').create();");
input.Append(String.Format("page.viewportSize = {{ width: {0}, height: {1} }};", pageSize.Width, pageSize.Height));
input.Append("page.paperSize = { format: 'Letter', orientation: 'portrait', margin: '1cm' };");
input.Append("page.onLoadFinished = function() {");
input.Append(String.Format("page.render('{0}');", outputFileName));
input.Append("phantom.exit();");
input.Append("};");
// html is being passed into a string literal so make sure any double quotes are properly escaped
input.Append("page.content = \"" + html.Replace("\"", "\\\"") + "\";");
File.WriteAllText(path + inputFileName, input.ToString());
Process p;
ProcessStartInfo psi = new ProcessStartInfo();
psi.FileName = path + "phantomjs.exe";
psi.Arguments = inputFileName;
psi.WorkingDirectory = Path.GetDirectoryName(psi.FileName);
psi.UseShellExecute = false;
psi.CreateNoWindow = true;
p = Process.Start(psi);
p.WaitForExit(10000);
Stream strOut = new MemoryStream();
Stream fileStream = File.OpenRead(path + outputFileName);
fileStream.CopyTo(strOut);
fileStream.Close();
strOut.Position = 0;
File.Delete(path + inputFileName);
File.Delete(path + outputFileName);
return strOut;
}
I'm trying to create a "report" by generating a PDF based on HTML.
At first, I simply attempted to write raw encoded HTML to a document and then print that document using Javascript. However, this gave me little to no control involving headers and footers.
I attempted using thead and tfoot elements, which worked reasonably well in most browsers, however I wasn't able to get the formatting that I was looking for.
Currently - I am trying to work on a server-side solution using iTextSharp in MVC3, however I am a bit lost as to how to proceed, having not worked with iTextSharp much.
Input and Description of Output:
There will be 4 items used in creating the Report:
Report Content (which is currently encoded HTML, as I am unsure if decoding will change any formatting)
Report Title (will simply be the name of the PDF generated)
Report Header (will be displayed at the upper-left of each page)
Report Footer (will be displayed at the lower-left of each page)
Controller Action:
//This will be accessed by a jQuery Post
[HttpPost]
public FileStreamResult GeneratePDF(string id)
{
//Grab Report Item
ReportClass report = reportingAgent.GetReportById(id);
Document doc = new Document();
//Do I need to decode the HTML or is it possible to use the encoded HTML?
//Adding Headers / Footers
//Best method of returning the PDF?
}
iTextSharp cannot convert HTML to PDF. It's not what it was designed to do. It was designed to create PDF files from scratch, not converting between various formats into PDF. If you want to convert HTML into PDF you could for example use the the flying-saucer library which is based on iText. I have blogged about how this could be done in .NET using IKVM.NET Bytecode Compiler (ikvmc.exe).
So your controller action might look something along the lines of:
[HttpPost]
public FileStreamResult GeneratePDF(string id)
{
ReportClass report = reportingAgent.GetReportById(id);
return PdfResult(report.Html);
}
where PdfResult could be a custom action result taking the raw HTML and outputting the PDF into the response stream:
public class PdfResult : ActionResult
{
private readonly string _html;
public PdfResult(string html)
{
_html = html;
}
public override void ExecuteResult(ControllerContext context)
{
var response = context.HttpContext.Response;
response.ContentType = "application/pdf";
var builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
using (var bais = new ByteArrayInputStream(Encoding.UTF8.GetBytes(_html)))
using (var bao = new ByteArrayOutputStream())
{
var doc = builder.parse(bais);
var renderer = new ITextRenderer();
renderer.setDocument(doc, null);
renderer.layout();
renderer.createPDF(bao);
var buffer = bao.toByteArray();
response.OutputStream.Write(buffer, 0, buffer.Length);
}
}
}