Mallet Api - Get consistent results - lda

I am new to LDA and mallet. I have the following query
I tried running Mallet-LDA with the command line and by setting the --random-seed to a fixed value, I was able to get consistent results for multiple runs of the algorithm
However, I did try with the Mallet-Java-API and everytime I run the program I get different output.
I did google around and found out that random-seed needs to be fixed and I have it fixed in my java code. I still am getting different results.
Could anyone let me know what other parameters do I need to consider for consistent results (when run multiple times)
I might want to add that train-topics when ran multiple times(command line) yields same result. However, when I rerun import-dir and then run train-topics, the results do not match with previous one. (Probably as expected).
I am ok with running import-dir just once and then experiment with different number of topics and iterations by running train-topics.
Similarly, what needs to be changed/ kept constant if I want to replicate the same when I use Java-Api.

I was able to solve this.
I will respond in detail here:
There are two ways in which Mallet could be run.
a. Command mode
b. Using Java API
To get consistent results for different runs, we need to fix the 'random seed' and in the command line we have an option of setting it. We have no surprises there.
However, while using APIs, though we have an option of setting 'random seed', we need to know that it needs to be done at proper point, else it does not work. (see code)
I have pasted the code here which would create a model(read InstanceList) file from the data
and then we could use the same model file and set the random seed and see to it that we get consistent(read same) results every time we run.
Creating and saving model for later use.
Note: Follow this link to know the format of input file.
http://mallet.cs.umass.edu/ap.txt
public void getModelReady(String inputFile) throws IOException {
if(inputFile != null && (! inputFile.isEmpty())) {
List<Pipe> pipeList = new ArrayList<Pipe>();
pipeList.add(new Target2Label());
pipeList.add(new Input2CharSequence("UTF-8"));
pipeList.add(new CharSequence2TokenSequence());
pipeList.add(new TokenSequenceLowercase());
pipeList.add(new TokenSequenceRemoveStopwords());
pipeList.add(new TokenSequence2FeatureSequence());
Reader fileReader = new InputStreamReader(new FileInputStream(new File(inputFile)), "UTF-8");
CsvIterator ci = new CsvIterator (fileReader, Pattern.compile("^(\\S*)[\\s,]*(\\S*)[\\s,]*(.*)$"),
3, 2, 1); // data, label, name fields
InstanceList instances = new InstanceList(new SerialPipes(pipeList));
instances.addThruPipe(ci);
ObjectOutputStream oos;
oos = new ObjectOutputStream(new FileOutputStream("Resources\\Input\\Model\\Model.vectors"));
oos.writeObject(instances);
oos.close();
}
}
Once model file is saved, this uses the above saved file to generate topics
public void applyLDA(ParallelTopicModel model) throws IOException {
InstanceList training = InstanceList.load (new File("Resources\\Input\\Model\\Model.vectors"));
logger.debug("InstanceList Data loaded.");
if (training.size() > 0 &&
training.get(0) != null) {
Object data = training.get(0).getData();
if (! (data instanceof FeatureSequence)) {
logger.error("Topic modeling currently only supports feature sequences.");
System.exit(1);
}
}
// IT HAS TO BE SET HERE, BEFORE CALLING ADDINSTANCE METHOD.
model.setRandomSeed(5);
model.addInstances(training);
model.estimate();
model.printTopWords(new File("Resources\\Output\\OutputFile\\topic_keys_java.txt"), 25,
false);
model.printDocumentTopics(new File ("Resources\\Output\\OutputFile\\document_topicssplit_java.txt"));
}

Related

Enforce Microsoft.Build to reload the project

I'm trying to iteratively (part of automation):
Create backup of the projects in solution (physical files on the filesystem)
Using Microsoft.Build programmatically load and change projects inside of the solution (refernces, includes, some other properties)
Build it with console call of msbuild
Restore projects (physically overriding patched versions from backups)
This approach works well for first iteration, but for second it appears that it does not load restored projects and trying to work with values that I patched on the first iteration. It looks like projects are cached: inside of the csproj files I see correct values, but on the code I see previously patched values.
My best guess is that Microsoft.Build is caching solution/projects in the context of the current process.
Here is code that is responsible to load project and call method to update project information:
private static void ForEachProject(string slnPath, Func<ProjectRootElement> patchProject)
{
SolutionFile slnFile = SolutionFile.Parse(slnPath);
var filtredProjects = slnFile
.ProjectsInOrder
.Where(prj => prj.ProjectType == SolutionProjectType.KnownToBeMSBuildFormat);
foreach (ProjectInSolution projectInfo in filtredProjects)
{
try
{
ProjectRootElement project = ProjectRootElement.Open(projectInfo.AbsolutePath);
patchProject(project);
project.Save();
}
catch (InvalidProjectFileException ex)
{
Console.WriteLine("Failed to patch project '{0}' with error: {1}", projectInfo.AbsolutePath, ex);
}
}
}
There is Reload method for the ProjectRootElement that migh be called before iteraction with content of the project.
It will enforce Microsoft.Build to read latest information from the file.
Code that is working for me:
private static void ForEachProject(string slnPath, Func<ProjectRootElement> patchProject)
{
SolutionFile slnFile = SolutionFile.Parse(slnPath);
var filtredProjects = slnFile
.ProjectsInOrder
.Where(prj => prj.ProjectType == SolutionProjectType.KnownToBeMSBuildFormat);
foreach (ProjectInSolution projectInfo in filtredProjects)
{
try
{
ProjectRootElement project = ProjectRootElement.Open(projectInfo.AbsolutePath);
project.Reload(false); // Ignore cached state, read actual from the file
patchProject(project);
project.Save();
}
catch (InvalidProjectFileException ex)
{
Console.WriteLine("Failed to patch project '{0}' with error: {1}", projectInfo.AbsolutePath, ex);
}
}
}
Note: It better to use custom properties inside of the project and provide it for each msbuild call instead of physical project patching. Please consider it as better solution and use it if possible.

windows 8 app FileOpenPicker np file info

I'm trying to get some file information about a file the user select with the FileOpenPicker, but all the information like the path and name are empty. When I try to view the object in a breakpoint I got the following message:
file = 0x03489cd4 <Information not available, no symbols loaded for shell32.dll>
I use the following code for calling the FileOpenPicker and handeling the file
#include "pch.h"
#include "LocalFilePicker.h"
using namespace concurrency;
using namespace Platform;
using namespace Windows::Storage;
using namespace Windows::Storage::Pickers;
const int LocalFilePicker::AUDIO = 0;
const int LocalFilePicker::VIDEO = 1;
const int LocalFilePicker::IMAGES = 2;
LocalFilePicker::LocalFilePicker()
{
_init();
}
void LocalFilePicker::_init()
{
_openPicker = ref new FileOpenPicker();
_openPicker->ViewMode = PickerViewMode::Thumbnail;
}
void LocalFilePicker::askFile(int categorie)
{
switch (categorie)
{
case 0:
break;
case 1:
_openPicker->SuggestedStartLocation = PickerLocationId::VideosLibrary;
_openPicker->FileTypeFilter->Append(".mp4");
break;
case 2:
break;
default:
break;
}
create_task(_openPicker->PickSingleFileAsync()).then([this](StorageFile^ file)
{
if (file)
{
int n = 0;
wchar_t buf[1024];
_snwprintf_s(buf, 1024, _TRUNCATE, L"Test: '%s'\n", file->Path);
OutputDebugString(buf);
}
else
{
OutputDebugString(L"canceled");
}
});
}
Can anybody see whats wrong with the code or some problems with settings for the app why it isn't work as expected.
First an explanation why you are having trouble debugging, this is going to happen a lot more when you write WinRT programs. First, do make sure that you have the correct debugging engine enabled. Tools + Options, Debugging, General. Ensure that the "Use Managed Compatibility Mode" is turned off.
You can now inspect the "file" option, it should resemble this:
Hard to interpret of course. What you are looking at is a proxy. It is a COM term, a wrapper for COM objects that are not thread-safe or live in another process or machine. The proxy implementation lives in shell32.dll, thus the confuzzling diagnostic message. You can't see the actual object at all, accessing its properties requires calling proxy methods. Something that the debugger is not capable of doing, a proxy marshals the call from one thread to another, that other thread is frozen while the debugger break is active.
That makes you pretty blind, in tough cases you may want to write a littler helper code to store the property in a local variable. Like:
auto path = file->Path;
No trouble inspecting or watching that one. You should now have confidence that there's nothing wrong with file and you get a perfectly good path. Note how writing const wchar_t* path = file->Path; gets you a loud complaint from the compiler.
Which helps you find the bug, you can't pass a Platform::String to a printf() style function. Just like you can't with, say, std::wstring. You need to use an accessor function to convert it. Fix:
_snwprintf_s(buf, 1024, _TRUNCATE,
L"Test: '%s'\n",
file->Path->Data());

Three.js r68 on chrome shadows and uniforms failure to render mesh

I have a WebGL app using three.js. It has been running fine for months until today, September 7th, 2014.
It no longer runs on Chrome, however, it still continues to run on Firefox and Safari.
I have traced the problem to a specific condition.
I load objects from json files like this:
loader.load("assets/Tables.json", callback_mesh, "assets/maps/", parent.texturesTable);
The callback_mesh function looks like this:
var callback_mesh = function (result, materials, userData) {
for (var i = 0; i < materials.length; i++) {
if (materials[i].uniforms != undefined) {
materials[i].uniforms.tDiffuse.value = userData[TEXTURE_DIFFUSE];
materials[i].uniforms.tNormal.value = userData[TEXTURE_NORMAL];
materials[i].uniforms.tSpecular.value = userData[TEXTURE_SPECULAR];
}
}
var mesh = new THREE.Mesh(result, new THREE.MeshFaceMaterial(materials));
mesh.scale.set(1, 1, 1);
mesh.receiveShadow = true;
mesh.castShadow = true;
objects.push(mesh);
}
The above code no longer runs in Chrome.
If I remove the line "mesh.receiveShadow = true", it works fine.
If I create a new material, instead of using the materials from the json file and modifying their parameters/uniforms, I can leave the receiveShadow set to true and it works.
So the rather specific issue is when I import an object from a json file and assign the materials array that comes from the json file to the mesh as a MeshFaceMaterial and turn on receiveShadow, the object does not load and I get the following error message:
THREE.WebGLProgram: gl.getProgramInfoLog() (260,64-140): warning X3550: sampler array index must be a literal expression, forcing loop to unroll
(89,12): error X6077: texld/texldb/texldp/dsx/dsy instructions with r# as source cannot be used inside dynamic conditional 'if' blocks, dynamic conditional subroutine calls, or loop/rep with break*.
Failed to create D3D shaders.
three.js:25545
59WebGL: INVALID_OPERATION: getUniformLocation: program not linked three.js:25283
21WebGL: INVALID_OPERATION: getAttribLocation: program not linked
And, again, I never saw this prior to today, maybe there was a Chrome update. And it works fine under Firefox and Safari, the mesh loads with receiveShadows on and the shadows are rendered.
In the current Chrome, it works only if I turn off receiveShadows.
Anyone have this same problem or have any thoughts on what might be causing this?
Thanks.
-mat

Deleted files status unreliably reported in the new Google Drive Android API (GDAA)

This issue has been bugging me since the inception of the new Google Drive Android Api (GDAA).
First discussed here, I hoped it would go away in later releases, but it is still there (as of 2014/03/19). The user-trashed (referring to the 'Remove' action in 'drive.google.com') files/folders keep appearing in both the
Drive.DriveApi.query(_gac, query), and
DriveFolder.queryChildren(_gac, query)
as well as
DriveFolder.listChildren(_gac)
methods, even if used with
Filters.eq(SearchableField.TRASHED, false)
query qualifier, or if I use a filtering construct on the results
for (Metadata md : result.getMetadataBuffer()) {
if ((md == null) || (!md.isDataValid()) || md.isTrashed()) continue;
dMDs.add(new DrvMD(md));
}
Using
Drive.DriveApi.requestSync(_gac);
has no impact. And the time elapsed since the removal varies wildly, my last case was over 12 HOURS. And it is completely random.
What's worse, I can't even rely on EMPTY TRASH in 'drive.google.com', it does not yield any predictable results. Sometime the file status changes to 'isTrashed()' sometimes it disappears from the result list.
As I kept fiddling with this issue, I ended up with the following superawfulhack:
find file with TRASH status equal FALSE
if (file found and is not trashed) {
try to write content
if ( write content fails)
create a new file
}
Not even this helps. The file shows up as healthy even if the file is in the trash (and it's status was double-filtered by query and by metadata test). It can even be happily written into and when inspected in the trash, it is modified.
The conclusion here is that a fix should get higher priority, since it renders multi-platform use of Drive unreliable. It will be discovered by developers right away in the development / debugging process, steering them away.
While waiting for any acknowledgement from the support team, I devised a HACK that allows a workaround for this problem. Using the same principle as in SO 22295903, the logic involves falling back to RESTful API. Basically, dropping the LIST / QUERY functionality of GDAA.
The high level logic is:
query the RESTful API to retrieve the ID/IDs of file(s) in question
use retrieved ID to get GDAA's DriveId via 'fetchDriveId()'
here are the code snippets to document the process:
1/ initialize both GDAA's 'GoogleApiClient' and RESTful's 'services.drive.Drive'
GoogleApiClient _gac;
com.google.api.services.drive.Drive _drvSvc;
void init(Context ctx, String email){
// build GDAA GoogleApiClient
_gac = new GoogleApiClient.Builder(ctx).addApi(com.google.android.gms.drive.Drive.API)
.addScope(com.google.android.gms.drive.Drive.SCOPE_FILE).setAccountName(email)
.addConnectionCallbacks(ctx).addOnConnectionFailedListener(ctx).build();
// build RESTFul (DriveSDKv2) service to fall back to
GoogleAccountCredential crd = GoogleAccountCredential
.usingOAuth2(ctx, Arrays.asList(com.google.api.services.drive.DriveScopes.DRIVE_FILE));
crd.setSelectedAccountName(email);
_drvSvc = new com.google.api.services.drive.Drive.Builder(
AndroidHttp.newCompatibleTransport(), new GsonFactory(), crd).build();
}
2/ method that queries the Drive RESTful API, returning GDAA's DriveId to be used by the app.
String qry = "title = 'MYFILE' and mimeType = 'text/plain' and trashed = false";
DriveId findObject(String qry) throws Exception {
DriveId dId = null;
try {
final FileList gLst = _drvSvc.files().list().setQ(query).setFields("items(id)").execute();
if (gLst.getItems().size() == 1) {
String sId = gLst.getItems().get(0).getId();
dId = Drive.DriveApi.fetchDriveId(_gac, sId).await().getDriveId();
} else if (gLst.getItems().size() > 1)
throw new Exception("more then one folder/file found");
} catch (Exception e) {}
return dId;
}
The findObject() method above (again I'm using the 'await()' flavor for simplicity) returns the the Drive objects correctly, reflecting the trashed status with no noticeable delay (implement in non-UI thread).
Again, I would strongly advice AGAINST leaving this in code longer than necassary since it is a HACK with unpredictable effect on the rest of the system.

QR decode exceptions using ZXing.NET in Unity

I'm currently trying to make an application in Unity (for iOS) that allows a user to scan a QR code.
I am using the ZXing.NET library which has been optimized for Unity.
This is the current decode thread I am using
void DecodeQR()
{
// create a reader with a custom luminance source
var barcodeReader = new BarcodeReader {AutoRotate=false, TryHarder=false};
while (true)
{
if (isQuit)
break;
try
{
string result = "Cry if you see this.";
// decode the current frame
if (c != null){
print ("Start Decode!");
result = barcodeReader.Decode(c, W, H).Text; //This line of code is generating unknown exceptions for some arcane reason
print ("Got past decode!");
}
if (result != null)
{
LastResult = result;
print(result);
}
// Sleep a little bit and set the signal to get the next frame
c = null;
Thread.Sleep(200);
}
catch
{
continue;
}
}
}
The execution reaches the "Start Decode!" print statement, but fails to reach the "Got past decode!" statement.
This is because the Decode() method is generating an unknown exception every time, even when the camera is looking at a very clear QR code.
For reference:
c is of type Color32[] and is generated using WebCamTexture.GetPixels32()
W, H are integers representing the width and height of the camera texture.
For some reason, I cannot catch a generic Exception within the catch clause, meaning I cannot determine what kind of exception the Decode() method is generating.
EDIT: The code I have used is adapted from the Unity demo available from the ZXing.NET project. I am using the current version of ZXing.NET. I should also mention that I am currently testing this on an iMac, not on an iOS device or the simulator. I have tried running the Unity demo from scratch and I obtain the same result.
This is how the c variable (Color32[]) is updated:
void Update()
{
if (c == null)
{
c = camTexture.GetPixels32();
H = camTexture.height;
W = camTexture.width;
}
}
EDIT 2: I have separated the decode stage into two bits, firstly generating the result object then retrieving the text property of the result as shown:
if (c != null){
print ("Start Decode!");
var initResult = barcodeReader.Decode(c, W, H);
print ("Got past decode!");
result = initResult.Text; //This line of code is generating unknown exceptions for some arcane reason
print ("Got past text conversion!");
}
It is when the text value of the result is being retrieved that is causing the error. I still do not know how to fix it though.
Can someone please advise me?
Thanks
The code looks like the unity demo from the ZXing.Net project.
I tried the demo again with the current version 0.10.0.0. It works like a charm for me.
But I can only test it with unity on windows. I don't have a chance to try it with iOS.
Did you try the latest version of ZXing.Net? What version of unity do you use?
Is the variable c correctly set within the Update method?
I have solved the problem. The problem was that the texture being obtained from the camera was not in the correct aspect ratio and was 'squished'. As a result, the ZXing library could not properly recognise the code.
After correcting this issue, recognition worked flawlessly.