I have a pipeline of transformations. Can a description of a given column upstream that is downstream across the pipeline be propagated?
So that one can just add the description upstream and then it is propagated downstream automatically.
As of right now there's nothing that does this for you automatically; however since you can read and write the descriptions from your Python like here you can do this on your own
Related
Using the "scm.exe" utility commands, it is possible to list the changed components using snapshot UUID?
I can get list of components, and
I can get list of changes
But really can't find a way to get "list of changed components" within a given snapshot.
Thank you.
A snapshot is a static thing, it does not contain any notion of any previous state or snapshot. When you say "changed components", the question is really: changed compared to what?
You should be able to use the compare command to see what is different between the snapshot and either a second snapshot, or some other workspace or stream.
I have added all the plugins of Knime in Eclipse and I want to create my Own custom node. but I am not able to understand how to pass the data from one node to another node.
I saw one node which has been provided by the Knime itself which is " File Reader " node. Now I want the source code of this node or jar file for this node But I am not able to find it out.
I am searching with the similar name in eclipse plugin folder but still I didn't get it.
Can someone please tell me how to pass the data from one node to another node and how to identify the classes or jar for any node given by knime and source code also.
Assuming that your data is a standard datatable, then you need to subclass NodeModel, with a call to the supertype constructor:
public MyNodeModel(){
//One incoming table, one outgoing table
super(1,1);
}
You need to override the default #execute(BufferedDataTable[] inData, ExecutionContext exec) method - this is where the meat of the node work is done and the output table created. Ideally, if your input and output table have a one-to-one row mapping then use a ColumnRearranger class (because this reduces disk IO considerably, and if you need it, allows simple parallelisation of your node), otherwise your execute method needs to iterate through the incoming datatable and generate an output table.
The #configure(DataTableSpec[] inSpecs) method needs to be implemented to at the least provide a spec for the output table if this can be determined before the node is executed (it normally can, and this allows downstream nodes also to be configures, but the 'Transpose' node is an example of a node which cannot do so).
There are various other methods which you also need to implement, but in some cases these will be empty methods.
In addition to the NodeModel, you need to implement some other classes too - a NodeFactory, optionally a NodeSettingsPane and optionally a NodeView.
In Eclipse you can view the sources for many nodes, and also the KNIME community 'book' pages all have a link to their source code. Take a look at https://tech.knime.org/developer-guide and https://tech.knime.org/developer/example for a step-by-step guide. Also, questions to the knime forums (including a developer forum) generally get rapid responses - and KNIME run a Developer Training Course a few times a year if you want to spend a few days learning more. And last but not least, it is worth familiarising yourself with the noding guidelines which describe the best practice of how your node should behave
Source code for KNIME nodes are now available on git hub.
Alternatively you can check under your project>plugin dependencies>knime-base.jar>org.knime.base.node.io.filereader for file reader source code in eclipse KNIME SDK.
Knime-base.jar will be added to your project by default when created with KNIME SDK.
I have a small problem (I assume...)
I'm loading a flatfile (csv) and I want to add a rownumber to the dataflow. Using the RowNumber transforation works good for both output paths (source and error) individually. But what if you want to use the same rownumber in both paths to be able to track where (in the file) an error occured. I have scratch my head long enough now and I'm just throwing it out here since I'm pretty sure other people has tumbled across this one...
I have tried the script transformation which seems to work for a while but then it hangs the load.
Any suggestion on how to solve this issue is greatly appreciated.
If I understand you correctly, dynamically generating the number with a script component for the dataflow is not a problem for you.
What I would recommend you is to adopt the following philosophy for stable etl processes coming from files:
Never cast anything in the connector, just import the fields as nvarchars of the maximum lenght they will achieve.
Cast and control each column to your specification.
If a row cannot be read, you will not know the index, but you will know that the file is malformed (extremely rare in my experience, for half transferred files), and it should be rejected anyway.
A quick screenshot of a part of a file loading process shows how the rejection (after assigning row_id) can work (link to dataflow image). To this you can add further countless checks (duplicates...) and even have a repository for the loaded files to check upon the rejects and whatever else you might want to control (Link to control flow image).
In some of my processes, I even use a flat file connector and just import each row as a bulk text and then split it in columns with an intermediate script component, allowing for different versions of the columns in the files.
Anyway, sorry not to be more detailed (due to my status I can't add more links or any images), but I hope that you understand the concept.
Regards,
Francisco.
I'm working on a Mercurial GUI client that interacts with hg.exe through the command line (the preferred high-level API, as I understand it).
However, I am having trouble determining the possible outputs of each command. I can see several outputs by simulating situations, but I was wondering if there is a complete reference of the possible outputs for each command.
For instance, for the command hg fetch, some possible outputs are:
pulling from https://User#server.com/Repo
searching for changes
no changes found
if there are no changes, or:
abort: outstanding uncommitted changes
or one of several other messages, depending on the situation.
I would like to structure my program to handle as many of these cases as possible, but it's hard for me to know in advance what they all are.
Is there a documented reference for the command-line? I have not been able to find one with The Google.
Look through the translation strings file. Then you'll know you have every message handled and be able to see what parts of it vary.
Also, fetch is just a convenience wrapper around pull/update/merge. If you're invoking mercurial programmatically you probably want to keep those three very different concepts separate in your running it so you know which part failed. In your example above it's the 'update' failing, so the 'pull' would have succeeded and the 'update's failing would allow you to provide the user with a better message.
(fetch is an abomination, which is part of why it's disabled by default)
Is this what you were looking for: https://www.mercurial-scm.org/wiki/MercurialBook ?
Mercurial 1.9 brings a command server, a stable (in a sense that API doesn't change that much) and low overhead (there is no need to run hg process for every command). The communication is done via a pipe.
I read the following in an article
Immutable objects are particularly handy for implementing certain common idioms such as undo/redo and abortable transactions. Take undo for example. A common technique for implementing undo is to keep a stack of objects that somehow know how to run each command in reverse (the so-called "Command Pattern"). However, figuring out how to run a command in reverse can be tricky. A simpler technique is to maintain a stack of immutable objects representing the state of the system between successive commands. Then, to undo a command, you simply revert back to the previous system state (and probably store the current state on the redo stack).
However, the article does not show a good practical example of how immutable objects could be used to implement "undo" operations. For example... deleting 10 emails from a gmail inbox. Once you do that, it has an undo option. How would an immutable object help in this regard?
The immutable objects would hold the entire state of the system, so in this case you'd have object A that contains the original inbox, and then object B that contains the inbox with ten e-mails deleted, and (in effect) a pointer back from B to A indicating that, if you do one "undo", then you stop using B as the state of the system and start using A instead.
However, Gmail inboxes are far too large to use this technique. You'd use it on documents that can actually be stored in a fairly small amount of memory, so that you can keep many of them around for multi-level undo.
If you want to keep ten levels of undo, you can potentially save memory by only keeping two immutable objects - one that is current, and one that is from ten "undos" ago - and a list of Commands that were applied between them.
To do an "undo", you re-execute all but the last Command object, use that as the new current object, and erase the last Command (or save it as a "Redo" object). Every time you do a new action, you update the current object, add the associated Command to the list, and then (if the list is more than ten Commands long) you execute the first Command on the object from the start of the undo list and throw away the first Command on the list.
You can do various other checkpointing systems as well, involving a variable number of complete representations of the system as well as a variable number of Commands between them. But it gets further and further from the original idea that you cited and becomes more and more like a typical mutable system. It does, however, avoid the problem of making Commands consistently reversible; you need only ever apply Commands to an object forward and not reverse.
SVN and other version control systems are effectively a disk- or network-based form of undo-and-redo.