Mercurial merge awesomeness - what am I missing? - mercurial

I've been using Mercurial for a while, and there's one "fact" that is given many times.
In fact, it hit me while watching a video made by Fogcreek yesterday, this video: Fog Creek Kiln: Unlock the power of DVCS for your company that there seems to be something that doesn't work for me here.
At around 1:39 in that video and onwards it makes a point of saying that while other version control systems tracks revisions (ie. snapshots), DVCS' like Mercurial track changesets (ie. what happened between the snapshots.)
This gives them an edge in merging scenarios, and then it shows an example. If you move a function in one branch, and change the same function in another branch, Mercurial is able to merge that.
And I've seen this mentioned elsewhere, though I can't find any direct links now.
This doesn't seem to work for me.
Edit: This is a problem with the default "beyondcompare3" merge tool configuration for TortoiseHg. I added the configuration below to my Mercurial.ini file, and now it works as expected. Sure, it'll punt to the GUI tool if it can't automerge, but now the merge described in this question here runs without any prompts and just does the right thing out of the box
[ui]
merge = bc3
[merge-tools]
bc3.executable = C:\Program Files (x86)\Beyond Compare 3\bcomp.exe
bc3.args = $local $other $base $output /automerge /reviewconflicts /closescript
bc3.priority = 1
bc3.premerge = True
bc3.gui = True
To test this, I committed this file to a repository:
void Main()
{
Function1();
Function2();
}
public void Function1()
{
Debug.WriteLine("Function 1");
for (int index = 0; index < 10; index++)
Debug.WriteLine("f1: " + index);
}
public void Function2()
{
Debug.WriteLine("Function 1");
}
Then in two different parallel changesets branching out from this one, I did the following two changes:
I moved the Function1 function to the bottom of the file
I changed the message inside Function1
I then tried to merge, and Mercurial gives me a merge conflict window, trying to figure out what I did.
Basically, it tries to change the text in Function2, which is now in the position that Function1 was before it was moved.
This was not supposed to happen!
Here's the source files for reproducing my example:
Batch file for producing repository:
#echo off
setlocal
if exist repo rd /s /q repo
hg init repo
cd repo
copy ..\example1.linq example.linq
hg commit -m "initial commit" --addremove --user "Bob" --date "2010-01-01 18:00:00"
copy ..\example2.linq example.linq
hg commit -m "moved function" --user "Bob" --date "2010-01-01 19:00:00"
hg update 0
copy ..\example3.linq example.linq
hg commit -m "moved function" --user "Alice" --date "2010-01-01 20:00:00"
The 3 versions of the file, example1.linq, example2.linq and example3.linq:
Example1.linq:
<Query Kind="Program" />
void Main()
{
Function1();
Function2();
}
public void Function1()
{
Debug.WriteLine("Function 1");
for (int index = 0; index < 10; index++)
Debug.WriteLine("f1: " + index);
}
public void Function2()
{
Debug.WriteLine("Function 1");
}
Example2.linq:
<Query Kind="Program" />
void Main()
{
Function1();
Function2();
}
public void Function2()
{
Debug.WriteLine("Function 1");
}
public void Function1()
{
Debug.WriteLine("Function 1");
for (int index = 0; index < 10; index++)
Debug.WriteLine("f1: " + index);
}
Example3.linq:
<Query Kind="Program" />
void Main()
{
Function1();
Function2();
}
public void Function1()
{
Debug.WriteLine("Function 1b");
for (int index = 0; index < 10; index++)
Debug.WriteLine("f1: " + index);
}
public void Function2()
{
Debug.WriteLine("Function 1");
}

Well, you're currently hitting one of the limitation of, basically, ANY current VCS (DVCS or not, it does not matter).
The thing is that VCS are currently language agnostic, the base of their merge algorithm is a textual diff. It means that they are looking for what change and what is the related context.
The important part here is the context. It is nothing more that some lines before and after the changes you made.
This means that they are really bad at dealing with code re-organisation inside of a same file, because you're basically screwing all the context they can rely on.
Typically, in your example, by switching the two functions, you not only completely inverted the context between the two changesets, but worse, by having no extra lines after the latest function, you reduced implicitly the context of the latest change, diminishing the chances that a merge algorithm got able to figure out what you really did.
I currently only know about one diff tool, from msft, for XML, that is trying to deal with the semantic of your change and not just its textual representation.
I also know that the guys of PlasticSCM are trying to implement such feature for some mainstream languages, but it is really a place where there is room for improvement.

Related

Enforce Microsoft.Build to reload the project

I'm trying to iteratively (part of automation):
Create backup of the projects in solution (physical files on the filesystem)
Using Microsoft.Build programmatically load and change projects inside of the solution (refernces, includes, some other properties)
Build it with console call of msbuild
Restore projects (physically overriding patched versions from backups)
This approach works well for first iteration, but for second it appears that it does not load restored projects and trying to work with values that I patched on the first iteration. It looks like projects are cached: inside of the csproj files I see correct values, but on the code I see previously patched values.
My best guess is that Microsoft.Build is caching solution/projects in the context of the current process.
Here is code that is responsible to load project and call method to update project information:
private static void ForEachProject(string slnPath, Func<ProjectRootElement> patchProject)
{
SolutionFile slnFile = SolutionFile.Parse(slnPath);
var filtredProjects = slnFile
.ProjectsInOrder
.Where(prj => prj.ProjectType == SolutionProjectType.KnownToBeMSBuildFormat);
foreach (ProjectInSolution projectInfo in filtredProjects)
{
try
{
ProjectRootElement project = ProjectRootElement.Open(projectInfo.AbsolutePath);
patchProject(project);
project.Save();
}
catch (InvalidProjectFileException ex)
{
Console.WriteLine("Failed to patch project '{0}' with error: {1}", projectInfo.AbsolutePath, ex);
}
}
}
There is Reload method for the ProjectRootElement that migh be called before iteraction with content of the project.
It will enforce Microsoft.Build to read latest information from the file.
Code that is working for me:
private static void ForEachProject(string slnPath, Func<ProjectRootElement> patchProject)
{
SolutionFile slnFile = SolutionFile.Parse(slnPath);
var filtredProjects = slnFile
.ProjectsInOrder
.Where(prj => prj.ProjectType == SolutionProjectType.KnownToBeMSBuildFormat);
foreach (ProjectInSolution projectInfo in filtredProjects)
{
try
{
ProjectRootElement project = ProjectRootElement.Open(projectInfo.AbsolutePath);
project.Reload(false); // Ignore cached state, read actual from the file
patchProject(project);
project.Save();
}
catch (InvalidProjectFileException ex)
{
Console.WriteLine("Failed to patch project '{0}' with error: {1}", projectInfo.AbsolutePath, ex);
}
}
}
Note: It better to use custom properties inside of the project and provide it for each msbuild call instead of physical project patching. Please consider it as better solution and use it if possible.

How to find the Flat file which is currently updating record into it

In SSIS
In a folder there are many flat files and by using for each loop container we are processing it one by one. If any new file is placed in the folder and it is still in copying mode. Then, We should not take it for continue process. We should process Only fully copied file alone to our next process.
How can we achieve this? Please give your suggestions.
Best way I have done this in the past is to use a C# Script Task and try to open the file - If the file is still being copied you will get an error (which you Catch). Then you can set a boolean variable to conditionally process the file if the Open worked.
EG:
Boolean b = true;
FileStream f;
try
{
f = new FileStream("C:\\Test\\Test.txt", FileMode.Open, FileAccess.ReadWrite, FileShare.None);
}
catch (IOException e)
{
if (e.Message == "hello")
{
b = false;
}
}

Trigger SQL Server Job by placing file in monitored folder [duplicate]

The requirement is to execute SSIS package, when a file is arrived at a folder,i do not want to start the package manually .
It is not sure about the file arrival timing ,also the files can arrive multiple times .When ever the files arrived this has to load into a table.I think, some solution like file watcher task ,still expect to start the package
The way I have done this in the past is with an infinite loop package called from SQL Server Agent, for example;
This is my infinite loop package:
Set 3 Variables:
IsFileExists - Boolean - 0
FolderLocation - String - C:\Where the file is to be put in\
IsFileExists Boolean - 0
For the For Loop container:
Set the IsFileExists variables as above.
Setup a C# script task with the ReadOnlyVariable as User::FolderLocation and have the following:
public void Main()
{
int fileCount = 0;
string[] FilesToProcess;
while (fileCount == 0)
{
try
{
System.Threading.Thread.Sleep(10000);
FilesToProcess = System.IO.Directory.GetFiles(Dts.Variables["FolderLocation"].Value.ToString(), "*.txt");
fileCount = FilesToProcess.Length;
if (fileCount != 0)
{
for (int i = 0; i < fileCount; i++)
{
try
{
System.IO.FileStream fs = new System.IO.FileStream(FilesToProcess[i], System.IO.FileMode.Open);
fs.Close();
}
catch (System.IO.IOException ex)
{
fileCount = 0;
continue;
}
}
}
}
catch (Exception ex)
{
throw ex;
}
}
// TODO: Add your code here
Dts.TaskResult = (int)ScriptResults.Success;
}
}
}
What this will do is essentially keep an eye on the folder location for a .txt file, if the file is not there it will sleep for 10 seconds (you can increase this if you want). If the file does exist it will complete and the package will then execute the load package. However it will continue to run, so the next time a file is dropped in it will execute the load package again.
Make sure to run this forever loop package as a sql server agent job so it will run all the time, we have a similar package running and it has never caused any problems.
Also, make sure your input package moves/archives the file away from the drop folder location.
As others have already suggested, using either WMI task or an infinite loop are two options to achieve this, but IMO SSIS is resource intensive. If you let a package constantly run in the background, it could eat up a lot of memory, cpu and cause performance issues with other packages depending on how many other packages you've running. So other option you may want to consider is schedule an Agent job every 5 minutes or 10 minutes or something and call your package in the job. Configure the package to continue only when a file is there or quit otherwise.
You can create a Windows service that uses WMI to detect file arrival and launch packages. Details on how to are located here: http://msbimentalist.wordpress.com/2012/04/27/trigger-ssis-package-when-files-available-in-a-folder-part2/?relatedposts_exclude=330
What about the SSIS File Watcher Task?

Mallet Api - Get consistent results

I am new to LDA and mallet. I have the following query
I tried running Mallet-LDA with the command line and by setting the --random-seed to a fixed value, I was able to get consistent results for multiple runs of the algorithm
However, I did try with the Mallet-Java-API and everytime I run the program I get different output.
I did google around and found out that random-seed needs to be fixed and I have it fixed in my java code. I still am getting different results.
Could anyone let me know what other parameters do I need to consider for consistent results (when run multiple times)
I might want to add that train-topics when ran multiple times(command line) yields same result. However, when I rerun import-dir and then run train-topics, the results do not match with previous one. (Probably as expected).
I am ok with running import-dir just once and then experiment with different number of topics and iterations by running train-topics.
Similarly, what needs to be changed/ kept constant if I want to replicate the same when I use Java-Api.
I was able to solve this.
I will respond in detail here:
There are two ways in which Mallet could be run.
a. Command mode
b. Using Java API
To get consistent results for different runs, we need to fix the 'random seed' and in the command line we have an option of setting it. We have no surprises there.
However, while using APIs, though we have an option of setting 'random seed', we need to know that it needs to be done at proper point, else it does not work. (see code)
I have pasted the code here which would create a model(read InstanceList) file from the data
and then we could use the same model file and set the random seed and see to it that we get consistent(read same) results every time we run.
Creating and saving model for later use.
Note: Follow this link to know the format of input file.
http://mallet.cs.umass.edu/ap.txt
public void getModelReady(String inputFile) throws IOException {
if(inputFile != null && (! inputFile.isEmpty())) {
List<Pipe> pipeList = new ArrayList<Pipe>();
pipeList.add(new Target2Label());
pipeList.add(new Input2CharSequence("UTF-8"));
pipeList.add(new CharSequence2TokenSequence());
pipeList.add(new TokenSequenceLowercase());
pipeList.add(new TokenSequenceRemoveStopwords());
pipeList.add(new TokenSequence2FeatureSequence());
Reader fileReader = new InputStreamReader(new FileInputStream(new File(inputFile)), "UTF-8");
CsvIterator ci = new CsvIterator (fileReader, Pattern.compile("^(\\S*)[\\s,]*(\\S*)[\\s,]*(.*)$"),
3, 2, 1); // data, label, name fields
InstanceList instances = new InstanceList(new SerialPipes(pipeList));
instances.addThruPipe(ci);
ObjectOutputStream oos;
oos = new ObjectOutputStream(new FileOutputStream("Resources\\Input\\Model\\Model.vectors"));
oos.writeObject(instances);
oos.close();
}
}
Once model file is saved, this uses the above saved file to generate topics
public void applyLDA(ParallelTopicModel model) throws IOException {
InstanceList training = InstanceList.load (new File("Resources\\Input\\Model\\Model.vectors"));
logger.debug("InstanceList Data loaded.");
if (training.size() > 0 &&
training.get(0) != null) {
Object data = training.get(0).getData();
if (! (data instanceof FeatureSequence)) {
logger.error("Topic modeling currently only supports feature sequences.");
System.exit(1);
}
}
// IT HAS TO BE SET HERE, BEFORE CALLING ADDINSTANCE METHOD.
model.setRandomSeed(5);
model.addInstances(training);
model.estimate();
model.printTopWords(new File("Resources\\Output\\OutputFile\\topic_keys_java.txt"), 25,
false);
model.printDocumentTopics(new File ("Resources\\Output\\OutputFile\\document_topicssplit_java.txt"));
}

Calling wkhtmltopdf to generate PDF from HTML

I'm attempting to create a PDF file from an HTML file. After looking around a little I've found: wkhtmltopdf to be perfect. I need to call this .exe from the ASP.NET server. I've attempted:
Process p = new Process();
p.StartInfo.UseShellExecute = false;
p.StartInfo.FileName = HttpContext.Current.Server.MapPath("wkhtmltopdf.exe");
p.StartInfo.Arguments = "TestPDF.htm TestPDF.pdf";
p.Start();
p.WaitForExit();
With no success of any files being created on the server. Can anyone give me a pointer in the right direction? I put the wkhtmltopdf.exe file at the top level directory of the site. Is there anywhere else it should be held?
Edit: If anyone has better solutions to dynamically create pdf files from html, please let me know.
Update:
My answer below, creates the pdf file on the disk. I then streamed that file to the users browser as a download. Consider using something like Hath's answer below to get wkhtml2pdf to output to a stream instead and then send that directly to the user - that will bypass lots of issues with file permissions etc.
My original answer:
Make sure you've specified an output path for the PDF that is writeable by the ASP.NET process of IIS running on your server (usually NETWORK_SERVICE I think).
Mine looks like this (and it works):
/// <summary>
/// Convert Html page at a given URL to a PDF file using open-source tool wkhtml2pdf
/// </summary>
/// <param name="Url"></param>
/// <param name="outputFilename"></param>
/// <returns></returns>
public static bool HtmlToPdf(string Url, string outputFilename)
{
// assemble destination PDF file name
string filename = ConfigurationManager.AppSettings["ExportFilePath"] + "\\" + outputFilename + ".pdf";
// get proj no for header
Project project = new Project(int.Parse(outputFilename));
var p = new System.Diagnostics.Process();
p.StartInfo.FileName = ConfigurationManager.AppSettings["HtmlToPdfExePath"];
string switches = "--print-media-type ";
switches += "--margin-top 4mm --margin-bottom 4mm --margin-right 0mm --margin-left 0mm ";
switches += "--page-size A4 ";
switches += "--no-background ";
switches += "--redirect-delay 100";
p.StartInfo.Arguments = switches + " " + Url + " " + filename;
p.StartInfo.UseShellExecute = false; // needs to be false in order to redirect output
p.StartInfo.RedirectStandardOutput = true;
p.StartInfo.RedirectStandardError = true;
p.StartInfo.RedirectStandardInput = true; // redirect all 3, as it should be all 3 or none
p.StartInfo.WorkingDirectory = StripFilenameFromFullPath(p.StartInfo.FileName);
p.Start();
// read the output here...
string output = p.StandardOutput.ReadToEnd();
// ...then wait n milliseconds for exit (as after exit, it can't read the output)
p.WaitForExit(60000);
// read the exit code, close process
int returnCode = p.ExitCode;
p.Close();
// if 0 or 2, it worked (not sure about other values, I want a better way to confirm this)
return (returnCode == 0 || returnCode == 2);
}
I had the same problem when i tried using msmq with a windows service but it was very slow for some reason. (the process part).
This is what finally worked:
private void DoDownload()
{
var url = Request.Url.GetLeftPart(UriPartial.Authority) + "/CPCDownload.aspx?IsPDF=False?UserID=" + this.CurrentUser.UserID.ToString();
var file = WKHtmlToPdf(url);
if (file != null)
{
Response.ContentType = "Application/pdf";
Response.BinaryWrite(file);
Response.End();
}
}
public byte[] WKHtmlToPdf(string url)
{
var fileName = " - ";
var wkhtmlDir = "C:\\Program Files\\wkhtmltopdf\\";
var wkhtml = "C:\\Program Files\\wkhtmltopdf\\wkhtmltopdf.exe";
var p = new Process();
p.StartInfo.CreateNoWindow = true;
p.StartInfo.RedirectStandardOutput = true;
p.StartInfo.RedirectStandardError = true;
p.StartInfo.RedirectStandardInput = true;
p.StartInfo.UseShellExecute = false;
p.StartInfo.FileName = wkhtml;
p.StartInfo.WorkingDirectory = wkhtmlDir;
string switches = "";
switches += "--print-media-type ";
switches += "--margin-top 10mm --margin-bottom 10mm --margin-right 10mm --margin-left 10mm ";
switches += "--page-size Letter ";
p.StartInfo.Arguments = switches + " " + url + " " + fileName;
p.Start();
//read output
byte[] buffer = new byte[32768];
byte[] file;
using(var ms = new MemoryStream())
{
while(true)
{
int read = p.StandardOutput.BaseStream.Read(buffer, 0,buffer.Length);
if(read <=0)
{
break;
}
ms.Write(buffer, 0, read);
}
file = ms.ToArray();
}
// wait or exit
p.WaitForExit(60000);
// read the exit code, close process
int returnCode = p.ExitCode;
p.Close();
return returnCode == 0 ? file : null;
}
Thanks Graham Ambrose and everyone else.
OK, so this is an old question, but an excellent one. And since I did not find a good answer, I made my own :) Also, I've posted this super simple project to GitHub.
Here is some sample code:
var pdfData = HtmlToXConverter.ConvertToPdf("<h1>SOO COOL!</h1>");
Here are some key points:
No P/Invoke
No creating of a new process
No file-system (all in RAM)
Native .NET DLL with intellisense, etc.
Ability to generate a PDF or PNG (HtmlToXConverter.ConvertToPng)
Check out the C# wrapper library (using P/Invoke) for the wkhtmltopdf library: https://github.com/pruiz/WkHtmlToXSharp
There are many reason why this is generally a bad idea. How are you going to control the executables that get spawned off but end up living on in memory if there is a crash? What about denial-of-service attacks, or if something malicious gets into TestPDF.htm?
My understanding is that the ASP.NET user account will not have the rights to logon locally. It also needs to have the correct file permissions to access the executable and to write to the file system. You need to edit the local security policy and let the ASP.NET user account (maybe ASPNET) logon locally (it may be in the deny list by default). Then you need to edit the permissions on the NTFS filesystem for the other files. If you are in a shared hosting environment it may be impossible to apply the configuration you need.
The best way to use an external executable like this is to queue jobs from the ASP.NET code and have some sort of service monitor the queue. If you do this you will protect yourself from all sorts of bad things happening. The maintenance issues with changing the user account are not worth the effort in my opinion, and whilst setting up a service or scheduled job is a pain, its just a better design. The ASP.NET page should poll a result queue for the output and you can present the user with a wait page. This is acceptable in most cases.
You can tell wkhtmltopdf to send it's output to sout by specifying "-" as the output file.
You can then read the output from the process into the response stream and avoid the permissions issues with writing to the file system.
My take on this with 2018 stuff.
I am using async. I am streaming to and from wkhtmltopdf. I created a new StreamWriter because wkhtmltopdf is expecting utf-8 by default but it is set to something else when the process starts.
I didn't include a lot of arguments since those varies from user to user. You can add what you need using additionalArgs.
I removed p.WaitForExit(...) since I wasn't handling if it fails and it would hang anyway on await tStandardOutput. If timeout is needed, then you would have to call Wait(...) on the different tasks with a cancellationtoken or timeout and handle accordingly.
public async Task<byte[]> GeneratePdf(string html, string additionalArgs)
{
ProcessStartInfo psi = new ProcessStartInfo
{
FileName = #"C:\Program Files\wkhtmltopdf\wkhtmltopdf.exe",
UseShellExecute = false,
CreateNoWindow = true,
RedirectStandardInput = true,
RedirectStandardOutput = true,
RedirectStandardError = true,
Arguments = "-q -n " + additionalArgs + " - -";
};
using (var p = Process.Start(psi))
using (var pdfSream = new MemoryStream())
using (var utf8Writer = new StreamWriter(p.StandardInput.BaseStream,
Encoding.UTF8))
{
await utf8Writer.WriteAsync(html);
utf8Writer.Close();
var tStdOut = p.StandardOutput.BaseStream.CopyToAsync(pdfSream);
var tStdError = p.StandardError.ReadToEndAsync();
await tStandardOutput;
string errors = await tStandardError;
if (!string.IsNullOrEmpty(errors)) { /* deal/log with errors */ }
return pdfSream.ToArray();
}
}
Things I haven't included in there but could be useful if you have images, css or other stuff that wkhtmltopdf will have to load when rendering the html page:
you can pass the authentication cookie using --cookie
in the header of the html page, you can set the base tag with href pointing to the server and wkhtmltopdf will use that if need be
Thanks for the question / answer / all the comments above. I came upon this when I was writing my own C# wrapper for WKHTMLtoPDF and it answered a couple of the problems I had. I ended up writing about this in a blog post - which also contains my wrapper (you'll no doubt see the "inspiration" from the entries above seeping into my code...)
Making PDFs from HTML in C# using WKHTMLtoPDF
Thanks again guys!
The ASP .Net process probably doesn't have write access to the directory.
Try telling it to write to %TEMP%, and see if it works.
Also, make your ASP .Net page echo the process's stdout and stderr, and check for error messages.
Generally return code =0 is coming if the pdf file is created properly and correctly.If it's not created then the value is in -ve range.
using System;
using System.Diagnostics;
using System.Web;
public partial class pdftest : System.Web.UI.Page
{
protected void Page_Load(object sender, EventArgs e)
{
}
private void fn_test()
{
try
{
string url = HttpContext.Current.Request.Url.AbsoluteUri;
Response.Write(url);
ProcessStartInfo startInfo = new ProcessStartInfo();
startInfo.FileName =
#"C:\PROGRA~1\WKHTML~1\wkhtmltopdf.exe";//"wkhtmltopdf.exe";
startInfo.Arguments = url + #" C:\test"
+ Guid.NewGuid().ToString() + ".pdf";
Process.Start(startInfo);
}
catch (Exception ex)
{
string xx = ex.Message.ToString();
Response.Write("<br>" + xx);
}
}
protected void btn_test_Click(object sender, EventArgs e)
{
fn_test();
}
}