How would you simplify this process? - html

I have a bunch (over 1000) HTML files with just simple text. It's just a combination of text within a <table>. It's an internal batch of documents, not for web production.
The job we have is to convert them into JPEG files using Photoshop and the old copy paste method. It's tedious.
Is there a way you would do this process to make it more efficient/easier/simple?
I thought about trying to convert the HTML into Excel and then mail merging it into Word to print as JGEG. But I can't find (and rightly so) anything to convert HTML to XLSX.
Thoughts? Or is this just a manual job?

Here's a little something I created to convert a single html file to jpeg. It's not pretty (to say the least), but it works fine with a table larger than my screen. Put it inside a windows forms project. You can add more checks and call this program in a loop, or refactor it to work on multiple html files.
Ideas and techniques taken from -
Finding the needed size - http://social.msdn.microsoft.com/Forums/ie/en-US/f6f0c641-43bd-44cc-8be0-12b40fbc4c43/webbrowser-object-use-to-find-the-width-of-a-web-page
Creating the graphics - http://cplus.about.com/od/learnc/a/How-To-Save-Web-Page-Screen-Grab-csharp.htm
A table for example - copy-paste enlarged version of http://www.w3schools.com/html/html_tables.asp
static class Program
{
static WebBrowser webBrowser = new WebBrowser();
private static string m_fileName;
[STAThread]
static void Main(string[] args)
{
if (args.Length != 1)
{
MessageBox.Show("Usage: [fileName]");
return;
}
m_fileName = args[0];
webBrowser.DocumentCompleted += (a, b) => webBrowser_DocumentCompleted();
webBrowser.ScrollBarsEnabled = false; // Don't want them rendered
webBrowser.Navigate(new Uri(m_fileName));
Application.Run();
}
static void webBrowser_DocumentCompleted()
{
// Get the needed size of the control
webBrowser.Width = webBrowser.Document.Body.ScrollRectangle.Width + webBrowser.Margin.Horizontal;
webBrowser.Height = webBrowser.Document.Body.ScrollRectangle.Height + webBrowser.Margin.Vertical;
// Create the graphics and save the image
using (var graphics = webBrowser.CreateGraphics())
{
var bitmap = new Bitmap(webBrowser.Size.Width, webBrowser.Size.Height, graphics);
webBrowser.DrawToBitmap(bitmap, webBrowser.ClientRectangle);
string newFileName = Path.ChangeExtension(m_fileName, ".jpg");
bitmap.Save(newFileName, ImageFormat.Jpeg);
}
// Shamefully exit the application
Application.ExitThread();
}
}

You can load all files in one page and use this lib html2canvas to covert.
You can running in the background use nodejs with node-canvas or make it a desk app with node-webkit

In case anyone was looking for answer that works, I ended up using a program called Prince: https://www.princexml.com
It works amazingly, and just have to target the HTML with CSS or JS to make it match your output!

Related

QWebEngineView - loading of > 2mb content

So, using PyQt5's QWebEngineView and the .setHTML and .setContent methods have a 2 MB size limitation. When googling for solutions around this, I found two methods:
Use SimpleHTTPServer to serve the file. This however gets nuked by a firewall employed in the company.
Use File Urls and point to local files. This however is a rather bad solution, as the HTML contains confidential data and I can't leave it on the harddrive, under any circumstance.
The best solution I currently see is to use file urls, and get rid of the file on program exit/when loadCompleted reports it is done, whichever comes first.
This is however not a great solution and I wanted to ask if there is a solution I'm overlooking that would be better?
Why don't you load/link most of the content through a custom url scheme handler?
webEngineView->page()->profile()->installUrlSchemeHandler("app", new UrlSchemeHandler(e));
class UrlSchemeHandler : public QWebEngineUrlSchemeHandler
{ Q_OBJECT
public:
void requestStarted(QWebEngineUrlRequestJob *request) {
QUrl url = request->requestUrl();
QString filePath = url.path().mid(1);
// get the data for this url
QByteArray data = ..
//
if (!data.isEmpty())
{
QMimeDatabase db;
QString contentType = db.mimeTypeForFileNameAndData(filePath,data).name();
QBuffer *buffer = new QBuffer();
buffer->open(QIODevice::WriteOnly);
buffer->write(data);
buffer->close();
connect(request, SIGNAL(destroyed()), buffer, SLOT(deleteLater()));
request->reply(contentType.toUtf8(), buffer);
} else {
request->fail(QWebEngineUrlRequestJob::UrlNotFound);
}
}
};
you can then load a website by webEngineView->load(new QUrl("app://start.html"));
All relative pathes from inside will also be forwarded to your UrlSchemeHandler..
And rember to add the respective includes
#include <QWebEngineUrlRequestJob>
#include <QWebEngineUrlSchemeHandler>
#include <QBuffer>
One way you can go around this is to use requests and QWebEnginePage's method runJavaScript:
web_engine = QWebEngineView()
web_page = web_engine.page()
web_page.setHtml('')
url = 'https://youtube.com'
page_content = requests.get(url).text
# document.write writes a string of text to a document stream
# https://developer.mozilla.org/en-US/docs/Web/API/Document/write
# And backtick symbol(``) is for multiline strings
web_page.runJavaScript('document.write(`{}`);'.format(page_content))

Get HTML from Frame using WebBrowser control - unauthorizedaccessexception

I'm looking for a free tool or dlls that I can use to write my own code in .NET to process some web requests.
Let's say I have a URL with some query string parameters similar to http://www.example.com?param=1 and when I use it in a browser several redirects occur and eventually HTML is rendered that has a frameset and a frame's inner html contains a table with data that I need. I want to store this data in the external file in a CSV format. Obviously the data is different depending on the querystring parameter param. Let's say I want to run the application and generate 1000 CSV files for param values from 1 to 1000.
I have good knowledge in .NET, javascript, HTML, but the main problem is how to get the final HTML in the server code.
What I tried is I created a new Form Application, added a webbrowser control and used code like this:
private void FormMain_Shown(object sender, EventArgs e)
{
var param = 1; //test
var url = string.Format(Constants.URL_PATTERN, param);
WebBrowserMain.Navigated += WebBrowserMain_Navigated;
WebBrowserMain.Navigate(url);
}
void WebBrowserMain_Navigated(object sender, WebBrowserNavigatedEventArgs e)
{
if (e.Url.OriginalString == Constants.FINAL_URL)
{
var document = WebBrowserMain.Document.Window.Frames[0].Document;
}
}
But unfortunately I receieve unauthorizedaccessexception because probably frame and the document are in different domains. Does anybody has an idea of how to work around this and maybe another brand new approach to implement functionality like this?
Thanks to the Noseratio's comments I managed to do that with the WebBrowser control. Here are some major points that might help others who have similar questions:
1) DocumentCompleted event should be used. For Navigated event body of the document is NULL.
2) Following answer helped a lot: WebBrowserControl: UnauthorizedAccessException when accessing property of a Frame
3) I was not aware about IHTMLWindow2 similar interfaces, for them to work correctly I added references to following COM libs: Microsoft Internet Controls (SHDocVw), Microsoft HTML Object Library (MSHTML).
4) I grabbed the html of the frame with the following code:
void WebBrowserMain_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
if (e.Url.OriginalString == Constants.FINAL_URL)
{
try
{
var doc = (IHTMLDocument2) WebBrowserMain.Document.DomDocument;
var frame = (IHTMLWindow2) doc.frames.item(0);
var document = CrossFrameIE.GetDocumentFromWindow(frame);
var html = document.body.outerHTML;
var dataParser = new DataParser(html);
//my logic here
}
5) For the work with Html, I used the fine HTML Agility Pack that has some pretty good XPath search.

How to browse mobile directory in flex?

I have captured 3 videos on my mobile which is by default stored on the phone gallery (Gallery/videos/). I have to play these 3 videos in one of my flex mobile application. How can I get the videos to the flex project? if I need to browse the mobile directory means kindly help me with some code to do so.
I too am looking for an answer to this question. Right now, based on other Stackoverflow discussions, exhaustive perusal of tutorials and Adobe documentation, and comments to both (often the more useful resource), I'm coming to the conclusion that it's not possible.
you can use CameraRoll.browseForImage() and open the iOS gallery of photos to see all entities of MediaType.IMAGE, but it will not show you MediaType.VIDEO
you can use CameraUI to launch the system camera by delegation and that returns a MediaPromise, but as far as I can tell, it does not save the video you capture anywhere, and I cannot find a way to access the captured video using the MediaPromise (at least using the Loader class)
Here's my code as a hint in that direction. The second code block is using the CameraRoll to browseForImage() but there is no browseForVideo() in the API.
if(CameraUI.isSupported)
{
camera = new CameraUI();
camera.addEventListener(MediaEvent.COMPLETE, videoMediaEventComplete);
camera.addEventListener(Event.CANCEL, cameraCanceled);
camera.addEventListener(ErrorEvent.ERROR, cameraError);
camera.launch(MediaType.VIDEO);
}
else
{
statusText.text = "Camera not supported on this device.";
startTimer();
}
if (CameraRoll.supportsBrowseForImage)
{
roll = new CameraRoll();
roll.addEventListener(MediaEvent.SELECT, cameraRollEventComplete);
roll.addEventListener(Event.CANCEL, cameraCanceled);
roll.addEventListener(ErrorEvent.ERROR, cameraError);
roll.browseForImage();
}
else
{
statusText.text = "Camera roll not supported on this device.";
startTimer();
}
I've since found that Videos captured using the delegated system camera are stored in a temporary storage location that iOS -DOES!- allow access to. (I was pleasantly shocked.)
The Captured video is not added to the device's Camera Roll as other videos captured using the iOS System Camera app, so it's not enough to capture video and expect to be able to access it later (if, for instance, CameraRoll.browseForVideo() is ever added to the API.
Therefore, you have to 'get while the getting is good' and move the file from the temporary storage location to some non-volatile location such as ApplicationStorageDirectory or the user's Documents directory (The only options in iOS I think).
The MediaPromise... I think... is completely useless for accessing the video via any direct progressive loader/streamer method, but still provides the location/url/path/filename of the temporary file so you can perform File operations on it.
Ironic that there are tutorials for getting around the lack of a file location/url/path/filename in the MediaPromise when using CameraRoll.browseForImage()... and that method is to use a loader class to load the image content (which you can then write out to a file), but when taking video, the video content is not accessible, and instead a file location/url/path/filename is provided. Ironic that there are nearly no resources I was able to find to help with this also. grumble
I'm going to include some code chunks w/o really editing them to strip out extraneous bits because it's way past when I need to be in bed, but I wanted you to have this. I may come clean it up later.
This section is in a Spark SkinnablePopUpContainer and I use the same click event for several buttons, thus the below 'case' is in the switch-case in that event handler function.
In case you are not familiar, the 'close(true, data)' is the method to close the SkinnablePopUpContainer, tell the parent/owner that the container was closed purposefully and that it should look for the data object being shared back (i.e., there are changes to be 'commit'ed).
case "cameraVideo":
{
if(CameraUI.isSupported)
{
camera = new CameraUI();
camera.addEventListener(MediaEvent.COMPLETE, videoMediaEventComplete);
camera.addEventListener(Event.CANCEL, cameraCanceled);
camera.addEventListener(ErrorEvent.ERROR, cameraError);
camera.launch(MediaType.VIDEO);
}
else
{
statusText.text = "Camera not supported on this device.";
startTimer();
}
break;
}
protected function cameraCanceled(event:Event):void
{
statusText.text = "Camera access canceled by user.";
startTimer();
}
protected function cameraError(event:ErrorEvent):void
{
statusText.text = "There was an error while trying to use the camera.";
startTimer();
}
protected function videoMediaEventComplete(event:MediaEvent):void
{
statusText.text="Preparing captured video...";
camera.removeEventListener(MediaEvent.COMPLETE, videoMediaEventComplete);
camera.removeEventListener(Event.CANCEL, cameraCanceled);
camera.removeEventListener(ErrorEvent.ERROR, cameraError);
var media:MediaPromise = event.data;
data.MediaType = MediaType.VIDEO;
data.MediaPromise = media;
data.source = "camera video";
close(true,data)
}
This section is the Actionscript in the close handler of the parent/owner of the SkinnablePopUpContainer (truncated once the useful code is included)
private function choosePictureLightboxClosed(event:PopUpEvent):void
{
imageButtonsActive = false;
if(event.commit)
{
this.data = event.data as Object;
filters = new Array();
selection = true;
switch(data.MediaType)
{
case MediaType.VIDEO:
{
mediaType = "video";
trace(data.MediaPromise.file.url + " - " + data.MediaPromise.relativePath + " - " +data.MediaPromise.mediaType);
var sourceFile:File = new File(data.MediaPromise.file.url);
var destinationFile:File = File.applicationStorageDirectory.resolvePath("User" +parentApplication.userid);
if(destinationFile.exists && !destinationFile.isDirectory)
{
destinationFile.deleteFile();
}
destinationFile.createDirectory();
destinationFile = destinationFile.resolvePath("Videos");
if(destinationFile.exists && !destinationFile.isDirectory)
{
destinationFile.deleteFile();
}
destinationFile.createDirectory();
destinationFile = destinationFile.resolvePath(parentApplication.userid+"Video"+new Date().getTime()+".mov");
trace(destinationFile.nativePath);
sourceFile.moveTo(destinationFile,true);
break;
}
I sure do hope this helps. This has been a very frustrating (and costly in terms of our project being government grant funded and having deadlines we utterly failed to meet), and I very much hope that these hard-won solutions might help others avoid the same experience.

Rendering an email throws a TemplateCompilationException using RazorEngine 3 in a non-MVC project

I am trying to render emails in a windows service host.
I use RazorEngine 3 forked by coxp which has support for Razor 2.
https://github.com/coxp/RazorEngine/tree/release-3.0/src
This works fine for a couple of emailtemplates but there is one causing me problems.
#model string
Click here to enter a new password for your account.
This throws a CompilationException: The name 'WriteAttribute' does not exist in the current context. So passing in a string as model and putting it in the href-attribute causes problems.
I can make it work by changing this line by:
#Raw(string.Format("Klik hier.", #Model))
but this makes the template very unreadable and harder to pass along to a marketing department for further styling.
I like to add that referencing the RazorEngine by using a Nuget package is not a solution since it is based on Razor 1 and somewhere along the process the DLL for system.web.razor gets replaced by version 2 which breaks any code using RazorEngine. It seems more interesting to use Razor 2 to benefit from the new features and to be up to date.
Any suggestions on how to fix this would be great. Sharing your experiences is also very welcome.
UPDATE 1
It seems like calling SetTemplateBaseType might help, but this method does not exist anymore, so I wonder how to be able to bind the templatebasetype?
//Missing method in the new RazorEngine build from coxp.
Razor.SetTemplateBaseType(typeof(HtmlTemplateBase<>));
I use Windsor to inject the template service rather than using the Razor object. Here is a simplified part of the code that shows how to set the base template type.
private static ITemplateService CreateTemplateService()
{
var config = new TemplateServiceConfiguration
{
BaseTemplateType = typeof (HtmlTemplateBase<>),
};
return new TemplateService(config);
}
RazorEngine 3.1.0
Little bit modified example based on coxp answer without the injection:
private static bool _razorInitialized;
private static void InitializeRazor()
{
if (_razorInitialized) return;
_razorInitialized = true;
Razor.SetTemplateService(CreateTemplateService());
}
private static ITemplateService CreateTemplateService()
{
var config = new TemplateServiceConfiguration
{
BaseTemplateType = typeof (HtmlTemplateBase<>),
};
return new TemplateService(config);
}
public static string ParseTemplate(string name, object model)
{
InitializeRazor();
var appFileName = "~/EmailTemplates/" + name + ".cshtml";
var template = File.ReadAllText(HttpContext.Current.Server.MapPath(appFileName));
return RazorEngine.Razor.Parse(template, model);
}

How To Generate a HTML Report From A String Of URLs

In my program I have a string which contains URLs separated by /n (One per line)
Let's say the string is called "links". I want to take this string and generate a HTML file that will automatically open in my default browser which will make each URL a hyperlink (one per line). How would I make such a report not using any third party components using WPF C# 4.0? I want the report to be generated by clicking a button called "Export".
There are plenty of ways to do this, but here is a quick and dirty example (debugging may be necessary since I wrote this on the fly). [Edit: Now uses Uri objects to formulate the actual address.]
private void export_Click(object sender, RoutedEventArgs e)
{
string tempFileName = "list.html";
string links = "http://www.google.com/#sclient=psy&hl=en&site=&source=hp&q=test+me&aq=f&aqi=&aql=&oq=&gs_rfai=&pbx=1&fp=ddfbf15c2e2f4021\nhttp://www.testme.com/Test-Prep.html?afdt=Q3RzePF0jU8KEwja-5WM7PqkAhUUiZ0KHaoG_wcYASAAMJbwoAM4MEC4w6uX7dS53gdQlvCgA1CEra8PUJzr_xNQg73wFVCKttweUJStzNoBUNv67ZsD";
List<Uri> uriCollection = new List<Uri>();
foreach (string url in links.Split(new char[] { '\n' }, StringSplitOptions.RemoveEmptyEntries))
{
uriCollection.Add(new Uri(url));
}
// Create temporary file.
using (TextWriter writer = new StreamWriter(tempFileName))
{
try
{
writer.WriteLine("<html>");
writer.WriteLine("<head><title>Links</title></head>");
writer.WriteLine("<body>");
writer.WriteLine("<p>");
foreach (Uri uri in uriCollection)
{
writer.WriteLine("{1}<br />", uri.OriginalString, uri.Host);
}
writer.WriteLine("</p>");
writer.WriteLine("</body>");
writer.WriteLine("</html>");
}
catch (Exception ex)
{
System.Diagnostics.Trace.TraceError(ex.Message);
}
finally
{
writer.Close();
}
}
// Open browser with temporary file.
if (File.Exists(tempFileName))
{
System.Diagnostics.Process.Start(tempFileName);
}
}
The 'Export' button is wired to the event 'export_Click'. I hard-coded the the string with '\n''s for the example. Simply break these apart using split and write a temporary file creating the HTML you need. Then, once the file is completed, you can open it using the Process.Start() method.
Ideally this can be done using DataBinding and other elements available in WPF if the need to open a browser window was not required. This would also remove any external dependencies the program may have.