Problem starting a new JRuby ScriptingContainer and setting Output & Error PrintStreams - jruby

I am embedding JRuby using Red Bridge into an app and seem to have problems setting out and err. My tests use a PrintStream that writes to a StringBuilder, after doing some stuff with the JRuby, I then check the stuff printed to out with puts. The problem is only the first setOutput works everyting gets printed to that.
The simple sample below demonstrates the problem. It is using PrintStreams that write to different files to demonstrate the problem. Looking at your file system will show that everything goes to the first "file" when one would expect files 2 and 3 to also be written.
package XXX.jruby;
import java.io.PrintStream;
import org.jruby.embed.LocalContextScope;
import org.jruby.embed.LocalVariableBehavior;
import org.jruby.embed.ScriptingContainer;
public class JRubyShellPutsProblem {
public static void main(String[] args) throws Exception {
for (int i = 0; i < 3; i++) {
final ScriptingContainer container = new ScriptingContainer(LocalContextScope.CONCURRENT, LocalVariableBehavior.PERSISTENT);
// container.resetWriter();
// container.resetErrorWriter();
final PrintStream printStream = new PrintStream("d:\\out" + i + ".txt");
container.setOutput(printStream);
container.setError(printStream);
container.runScriptlet("puts \"hello" + i + "\"\n");
printStream.flush();
container.terminate();
}
}
}
I've tried ScriptingContainer.resetWriter but that does not work as one would expect. Sometimes stuff gets written to files 1 and 2 but nothing to 3.

I changed LocalContextScope to THREADSAFE from CONCURRENT and this seemed to fix the problem.

Related

windows 8 app FileOpenPicker np file info

I'm trying to get some file information about a file the user select with the FileOpenPicker, but all the information like the path and name are empty. When I try to view the object in a breakpoint I got the following message:
file = 0x03489cd4 <Information not available, no symbols loaded for shell32.dll>
I use the following code for calling the FileOpenPicker and handeling the file
#include "pch.h"
#include "LocalFilePicker.h"
using namespace concurrency;
using namespace Platform;
using namespace Windows::Storage;
using namespace Windows::Storage::Pickers;
const int LocalFilePicker::AUDIO = 0;
const int LocalFilePicker::VIDEO = 1;
const int LocalFilePicker::IMAGES = 2;
LocalFilePicker::LocalFilePicker()
{
_init();
}
void LocalFilePicker::_init()
{
_openPicker = ref new FileOpenPicker();
_openPicker->ViewMode = PickerViewMode::Thumbnail;
}
void LocalFilePicker::askFile(int categorie)
{
switch (categorie)
{
case 0:
break;
case 1:
_openPicker->SuggestedStartLocation = PickerLocationId::VideosLibrary;
_openPicker->FileTypeFilter->Append(".mp4");
break;
case 2:
break;
default:
break;
}
create_task(_openPicker->PickSingleFileAsync()).then([this](StorageFile^ file)
{
if (file)
{
int n = 0;
wchar_t buf[1024];
_snwprintf_s(buf, 1024, _TRUNCATE, L"Test: '%s'\n", file->Path);
OutputDebugString(buf);
}
else
{
OutputDebugString(L"canceled");
}
});
}
Can anybody see whats wrong with the code or some problems with settings for the app why it isn't work as expected.
First an explanation why you are having trouble debugging, this is going to happen a lot more when you write WinRT programs. First, do make sure that you have the correct debugging engine enabled. Tools + Options, Debugging, General. Ensure that the "Use Managed Compatibility Mode" is turned off.
You can now inspect the "file" option, it should resemble this:
Hard to interpret of course. What you are looking at is a proxy. It is a COM term, a wrapper for COM objects that are not thread-safe or live in another process or machine. The proxy implementation lives in shell32.dll, thus the confuzzling diagnostic message. You can't see the actual object at all, accessing its properties requires calling proxy methods. Something that the debugger is not capable of doing, a proxy marshals the call from one thread to another, that other thread is frozen while the debugger break is active.
That makes you pretty blind, in tough cases you may want to write a littler helper code to store the property in a local variable. Like:
auto path = file->Path;
No trouble inspecting or watching that one. You should now have confidence that there's nothing wrong with file and you get a perfectly good path. Note how writing const wchar_t* path = file->Path; gets you a loud complaint from the compiler.
Which helps you find the bug, you can't pass a Platform::String to a printf() style function. Just like you can't with, say, std::wstring. You need to use an accessor function to convert it. Fix:
_snwprintf_s(buf, 1024, _TRUNCATE,
L"Test: '%s'\n",
file->Path->Data());

Mallet Api - Get consistent results

I am new to LDA and mallet. I have the following query
I tried running Mallet-LDA with the command line and by setting the --random-seed to a fixed value, I was able to get consistent results for multiple runs of the algorithm
However, I did try with the Mallet-Java-API and everytime I run the program I get different output.
I did google around and found out that random-seed needs to be fixed and I have it fixed in my java code. I still am getting different results.
Could anyone let me know what other parameters do I need to consider for consistent results (when run multiple times)
I might want to add that train-topics when ran multiple times(command line) yields same result. However, when I rerun import-dir and then run train-topics, the results do not match with previous one. (Probably as expected).
I am ok with running import-dir just once and then experiment with different number of topics and iterations by running train-topics.
Similarly, what needs to be changed/ kept constant if I want to replicate the same when I use Java-Api.
I was able to solve this.
I will respond in detail here:
There are two ways in which Mallet could be run.
a. Command mode
b. Using Java API
To get consistent results for different runs, we need to fix the 'random seed' and in the command line we have an option of setting it. We have no surprises there.
However, while using APIs, though we have an option of setting 'random seed', we need to know that it needs to be done at proper point, else it does not work. (see code)
I have pasted the code here which would create a model(read InstanceList) file from the data
and then we could use the same model file and set the random seed and see to it that we get consistent(read same) results every time we run.
Creating and saving model for later use.
Note: Follow this link to know the format of input file.
http://mallet.cs.umass.edu/ap.txt
public void getModelReady(String inputFile) throws IOException {
if(inputFile != null && (! inputFile.isEmpty())) {
List<Pipe> pipeList = new ArrayList<Pipe>();
pipeList.add(new Target2Label());
pipeList.add(new Input2CharSequence("UTF-8"));
pipeList.add(new CharSequence2TokenSequence());
pipeList.add(new TokenSequenceLowercase());
pipeList.add(new TokenSequenceRemoveStopwords());
pipeList.add(new TokenSequence2FeatureSequence());
Reader fileReader = new InputStreamReader(new FileInputStream(new File(inputFile)), "UTF-8");
CsvIterator ci = new CsvIterator (fileReader, Pattern.compile("^(\\S*)[\\s,]*(\\S*)[\\s,]*(.*)$"),
3, 2, 1); // data, label, name fields
InstanceList instances = new InstanceList(new SerialPipes(pipeList));
instances.addThruPipe(ci);
ObjectOutputStream oos;
oos = new ObjectOutputStream(new FileOutputStream("Resources\\Input\\Model\\Model.vectors"));
oos.writeObject(instances);
oos.close();
}
}
Once model file is saved, this uses the above saved file to generate topics
public void applyLDA(ParallelTopicModel model) throws IOException {
InstanceList training = InstanceList.load (new File("Resources\\Input\\Model\\Model.vectors"));
logger.debug("InstanceList Data loaded.");
if (training.size() > 0 &&
training.get(0) != null) {
Object data = training.get(0).getData();
if (! (data instanceof FeatureSequence)) {
logger.error("Topic modeling currently only supports feature sequences.");
System.exit(1);
}
}
// IT HAS TO BE SET HERE, BEFORE CALLING ADDINSTANCE METHOD.
model.setRandomSeed(5);
model.addInstances(training);
model.estimate();
model.printTopWords(new File("Resources\\Output\\OutputFile\\topic_keys_java.txt"), 25,
false);
model.printDocumentTopics(new File ("Resources\\Output\\OutputFile\\document_topicssplit_java.txt"));
}

How would you simplify this process?

I have a bunch (over 1000) HTML files with just simple text. It's just a combination of text within a <table>. It's an internal batch of documents, not for web production.
The job we have is to convert them into JPEG files using Photoshop and the old copy paste method. It's tedious.
Is there a way you would do this process to make it more efficient/easier/simple?
I thought about trying to convert the HTML into Excel and then mail merging it into Word to print as JGEG. But I can't find (and rightly so) anything to convert HTML to XLSX.
Thoughts? Or is this just a manual job?
Here's a little something I created to convert a single html file to jpeg. It's not pretty (to say the least), but it works fine with a table larger than my screen. Put it inside a windows forms project. You can add more checks and call this program in a loop, or refactor it to work on multiple html files.
Ideas and techniques taken from -
Finding the needed size - http://social.msdn.microsoft.com/Forums/ie/en-US/f6f0c641-43bd-44cc-8be0-12b40fbc4c43/webbrowser-object-use-to-find-the-width-of-a-web-page
Creating the graphics - http://cplus.about.com/od/learnc/a/How-To-Save-Web-Page-Screen-Grab-csharp.htm
A table for example - copy-paste enlarged version of http://www.w3schools.com/html/html_tables.asp
static class Program
{
static WebBrowser webBrowser = new WebBrowser();
private static string m_fileName;
[STAThread]
static void Main(string[] args)
{
if (args.Length != 1)
{
MessageBox.Show("Usage: [fileName]");
return;
}
m_fileName = args[0];
webBrowser.DocumentCompleted += (a, b) => webBrowser_DocumentCompleted();
webBrowser.ScrollBarsEnabled = false; // Don't want them rendered
webBrowser.Navigate(new Uri(m_fileName));
Application.Run();
}
static void webBrowser_DocumentCompleted()
{
// Get the needed size of the control
webBrowser.Width = webBrowser.Document.Body.ScrollRectangle.Width + webBrowser.Margin.Horizontal;
webBrowser.Height = webBrowser.Document.Body.ScrollRectangle.Height + webBrowser.Margin.Vertical;
// Create the graphics and save the image
using (var graphics = webBrowser.CreateGraphics())
{
var bitmap = new Bitmap(webBrowser.Size.Width, webBrowser.Size.Height, graphics);
webBrowser.DrawToBitmap(bitmap, webBrowser.ClientRectangle);
string newFileName = Path.ChangeExtension(m_fileName, ".jpg");
bitmap.Save(newFileName, ImageFormat.Jpeg);
}
// Shamefully exit the application
Application.ExitThread();
}
}
You can load all files in one page and use this lib html2canvas to covert.
You can running in the background use nodejs with node-canvas or make it a desk app with node-webkit
In case anyone was looking for answer that works, I ended up using a program called Prince: https://www.princexml.com
It works amazingly, and just have to target the HTML with CSS or JS to make it match your output!

Rendering an email throws a TemplateCompilationException using RazorEngine 3 in a non-MVC project

I am trying to render emails in a windows service host.
I use RazorEngine 3 forked by coxp which has support for Razor 2.
https://github.com/coxp/RazorEngine/tree/release-3.0/src
This works fine for a couple of emailtemplates but there is one causing me problems.
#model string
Click here to enter a new password for your account.
This throws a CompilationException: The name 'WriteAttribute' does not exist in the current context. So passing in a string as model and putting it in the href-attribute causes problems.
I can make it work by changing this line by:
#Raw(string.Format("Klik hier.", #Model))
but this makes the template very unreadable and harder to pass along to a marketing department for further styling.
I like to add that referencing the RazorEngine by using a Nuget package is not a solution since it is based on Razor 1 and somewhere along the process the DLL for system.web.razor gets replaced by version 2 which breaks any code using RazorEngine. It seems more interesting to use Razor 2 to benefit from the new features and to be up to date.
Any suggestions on how to fix this would be great. Sharing your experiences is also very welcome.
UPDATE 1
It seems like calling SetTemplateBaseType might help, but this method does not exist anymore, so I wonder how to be able to bind the templatebasetype?
//Missing method in the new RazorEngine build from coxp.
Razor.SetTemplateBaseType(typeof(HtmlTemplateBase<>));
I use Windsor to inject the template service rather than using the Razor object. Here is a simplified part of the code that shows how to set the base template type.
private static ITemplateService CreateTemplateService()
{
var config = new TemplateServiceConfiguration
{
BaseTemplateType = typeof (HtmlTemplateBase<>),
};
return new TemplateService(config);
}
RazorEngine 3.1.0
Little bit modified example based on coxp answer without the injection:
private static bool _razorInitialized;
private static void InitializeRazor()
{
if (_razorInitialized) return;
_razorInitialized = true;
Razor.SetTemplateService(CreateTemplateService());
}
private static ITemplateService CreateTemplateService()
{
var config = new TemplateServiceConfiguration
{
BaseTemplateType = typeof (HtmlTemplateBase<>),
};
return new TemplateService(config);
}
public static string ParseTemplate(string name, object model)
{
InitializeRazor();
var appFileName = "~/EmailTemplates/" + name + ".cshtml";
var template = File.ReadAllText(HttpContext.Current.Server.MapPath(appFileName));
return RazorEngine.Razor.Parse(template, model);
}

Calling wkhtmltopdf to generate PDF from HTML

I'm attempting to create a PDF file from an HTML file. After looking around a little I've found: wkhtmltopdf to be perfect. I need to call this .exe from the ASP.NET server. I've attempted:
Process p = new Process();
p.StartInfo.UseShellExecute = false;
p.StartInfo.FileName = HttpContext.Current.Server.MapPath("wkhtmltopdf.exe");
p.StartInfo.Arguments = "TestPDF.htm TestPDF.pdf";
p.Start();
p.WaitForExit();
With no success of any files being created on the server. Can anyone give me a pointer in the right direction? I put the wkhtmltopdf.exe file at the top level directory of the site. Is there anywhere else it should be held?
Edit: If anyone has better solutions to dynamically create pdf files from html, please let me know.
Update:
My answer below, creates the pdf file on the disk. I then streamed that file to the users browser as a download. Consider using something like Hath's answer below to get wkhtml2pdf to output to a stream instead and then send that directly to the user - that will bypass lots of issues with file permissions etc.
My original answer:
Make sure you've specified an output path for the PDF that is writeable by the ASP.NET process of IIS running on your server (usually NETWORK_SERVICE I think).
Mine looks like this (and it works):
/// <summary>
/// Convert Html page at a given URL to a PDF file using open-source tool wkhtml2pdf
/// </summary>
/// <param name="Url"></param>
/// <param name="outputFilename"></param>
/// <returns></returns>
public static bool HtmlToPdf(string Url, string outputFilename)
{
// assemble destination PDF file name
string filename = ConfigurationManager.AppSettings["ExportFilePath"] + "\\" + outputFilename + ".pdf";
// get proj no for header
Project project = new Project(int.Parse(outputFilename));
var p = new System.Diagnostics.Process();
p.StartInfo.FileName = ConfigurationManager.AppSettings["HtmlToPdfExePath"];
string switches = "--print-media-type ";
switches += "--margin-top 4mm --margin-bottom 4mm --margin-right 0mm --margin-left 0mm ";
switches += "--page-size A4 ";
switches += "--no-background ";
switches += "--redirect-delay 100";
p.StartInfo.Arguments = switches + " " + Url + " " + filename;
p.StartInfo.UseShellExecute = false; // needs to be false in order to redirect output
p.StartInfo.RedirectStandardOutput = true;
p.StartInfo.RedirectStandardError = true;
p.StartInfo.RedirectStandardInput = true; // redirect all 3, as it should be all 3 or none
p.StartInfo.WorkingDirectory = StripFilenameFromFullPath(p.StartInfo.FileName);
p.Start();
// read the output here...
string output = p.StandardOutput.ReadToEnd();
// ...then wait n milliseconds for exit (as after exit, it can't read the output)
p.WaitForExit(60000);
// read the exit code, close process
int returnCode = p.ExitCode;
p.Close();
// if 0 or 2, it worked (not sure about other values, I want a better way to confirm this)
return (returnCode == 0 || returnCode == 2);
}
I had the same problem when i tried using msmq with a windows service but it was very slow for some reason. (the process part).
This is what finally worked:
private void DoDownload()
{
var url = Request.Url.GetLeftPart(UriPartial.Authority) + "/CPCDownload.aspx?IsPDF=False?UserID=" + this.CurrentUser.UserID.ToString();
var file = WKHtmlToPdf(url);
if (file != null)
{
Response.ContentType = "Application/pdf";
Response.BinaryWrite(file);
Response.End();
}
}
public byte[] WKHtmlToPdf(string url)
{
var fileName = " - ";
var wkhtmlDir = "C:\\Program Files\\wkhtmltopdf\\";
var wkhtml = "C:\\Program Files\\wkhtmltopdf\\wkhtmltopdf.exe";
var p = new Process();
p.StartInfo.CreateNoWindow = true;
p.StartInfo.RedirectStandardOutput = true;
p.StartInfo.RedirectStandardError = true;
p.StartInfo.RedirectStandardInput = true;
p.StartInfo.UseShellExecute = false;
p.StartInfo.FileName = wkhtml;
p.StartInfo.WorkingDirectory = wkhtmlDir;
string switches = "";
switches += "--print-media-type ";
switches += "--margin-top 10mm --margin-bottom 10mm --margin-right 10mm --margin-left 10mm ";
switches += "--page-size Letter ";
p.StartInfo.Arguments = switches + " " + url + " " + fileName;
p.Start();
//read output
byte[] buffer = new byte[32768];
byte[] file;
using(var ms = new MemoryStream())
{
while(true)
{
int read = p.StandardOutput.BaseStream.Read(buffer, 0,buffer.Length);
if(read <=0)
{
break;
}
ms.Write(buffer, 0, read);
}
file = ms.ToArray();
}
// wait or exit
p.WaitForExit(60000);
// read the exit code, close process
int returnCode = p.ExitCode;
p.Close();
return returnCode == 0 ? file : null;
}
Thanks Graham Ambrose and everyone else.
OK, so this is an old question, but an excellent one. And since I did not find a good answer, I made my own :) Also, I've posted this super simple project to GitHub.
Here is some sample code:
var pdfData = HtmlToXConverter.ConvertToPdf("<h1>SOO COOL!</h1>");
Here are some key points:
No P/Invoke
No creating of a new process
No file-system (all in RAM)
Native .NET DLL with intellisense, etc.
Ability to generate a PDF or PNG (HtmlToXConverter.ConvertToPng)
Check out the C# wrapper library (using P/Invoke) for the wkhtmltopdf library: https://github.com/pruiz/WkHtmlToXSharp
There are many reason why this is generally a bad idea. How are you going to control the executables that get spawned off but end up living on in memory if there is a crash? What about denial-of-service attacks, or if something malicious gets into TestPDF.htm?
My understanding is that the ASP.NET user account will not have the rights to logon locally. It also needs to have the correct file permissions to access the executable and to write to the file system. You need to edit the local security policy and let the ASP.NET user account (maybe ASPNET) logon locally (it may be in the deny list by default). Then you need to edit the permissions on the NTFS filesystem for the other files. If you are in a shared hosting environment it may be impossible to apply the configuration you need.
The best way to use an external executable like this is to queue jobs from the ASP.NET code and have some sort of service monitor the queue. If you do this you will protect yourself from all sorts of bad things happening. The maintenance issues with changing the user account are not worth the effort in my opinion, and whilst setting up a service or scheduled job is a pain, its just a better design. The ASP.NET page should poll a result queue for the output and you can present the user with a wait page. This is acceptable in most cases.
You can tell wkhtmltopdf to send it's output to sout by specifying "-" as the output file.
You can then read the output from the process into the response stream and avoid the permissions issues with writing to the file system.
My take on this with 2018 stuff.
I am using async. I am streaming to and from wkhtmltopdf. I created a new StreamWriter because wkhtmltopdf is expecting utf-8 by default but it is set to something else when the process starts.
I didn't include a lot of arguments since those varies from user to user. You can add what you need using additionalArgs.
I removed p.WaitForExit(...) since I wasn't handling if it fails and it would hang anyway on await tStandardOutput. If timeout is needed, then you would have to call Wait(...) on the different tasks with a cancellationtoken or timeout and handle accordingly.
public async Task<byte[]> GeneratePdf(string html, string additionalArgs)
{
ProcessStartInfo psi = new ProcessStartInfo
{
FileName = #"C:\Program Files\wkhtmltopdf\wkhtmltopdf.exe",
UseShellExecute = false,
CreateNoWindow = true,
RedirectStandardInput = true,
RedirectStandardOutput = true,
RedirectStandardError = true,
Arguments = "-q -n " + additionalArgs + " - -";
};
using (var p = Process.Start(psi))
using (var pdfSream = new MemoryStream())
using (var utf8Writer = new StreamWriter(p.StandardInput.BaseStream,
Encoding.UTF8))
{
await utf8Writer.WriteAsync(html);
utf8Writer.Close();
var tStdOut = p.StandardOutput.BaseStream.CopyToAsync(pdfSream);
var tStdError = p.StandardError.ReadToEndAsync();
await tStandardOutput;
string errors = await tStandardError;
if (!string.IsNullOrEmpty(errors)) { /* deal/log with errors */ }
return pdfSream.ToArray();
}
}
Things I haven't included in there but could be useful if you have images, css or other stuff that wkhtmltopdf will have to load when rendering the html page:
you can pass the authentication cookie using --cookie
in the header of the html page, you can set the base tag with href pointing to the server and wkhtmltopdf will use that if need be
Thanks for the question / answer / all the comments above. I came upon this when I was writing my own C# wrapper for WKHTMLtoPDF and it answered a couple of the problems I had. I ended up writing about this in a blog post - which also contains my wrapper (you'll no doubt see the "inspiration" from the entries above seeping into my code...)
Making PDFs from HTML in C# using WKHTMLtoPDF
Thanks again guys!
The ASP .Net process probably doesn't have write access to the directory.
Try telling it to write to %TEMP%, and see if it works.
Also, make your ASP .Net page echo the process's stdout and stderr, and check for error messages.
Generally return code =0 is coming if the pdf file is created properly and correctly.If it's not created then the value is in -ve range.
using System;
using System.Diagnostics;
using System.Web;
public partial class pdftest : System.Web.UI.Page
{
protected void Page_Load(object sender, EventArgs e)
{
}
private void fn_test()
{
try
{
string url = HttpContext.Current.Request.Url.AbsoluteUri;
Response.Write(url);
ProcessStartInfo startInfo = new ProcessStartInfo();
startInfo.FileName =
#"C:\PROGRA~1\WKHTML~1\wkhtmltopdf.exe";//"wkhtmltopdf.exe";
startInfo.Arguments = url + #" C:\test"
+ Guid.NewGuid().ToString() + ".pdf";
Process.Start(startInfo);
}
catch (Exception ex)
{
string xx = ex.Message.ToString();
Response.Write("<br>" + xx);
}
}
protected void btn_test_Click(object sender, EventArgs e)
{
fn_test();
}
}