How do I replace values in a column in KNIME? - knime

I have a column of countries with 50 different values that I want to reduce to United States and Other.
Can someone help me with that?
Another example is Age which has 48 values that I'd like to reduce to only 4 like 1 to 18 = youth, 18-27 = starting, etc.
I've actually got about 5 columns that I want to reduce the values of. So would I need to repeat the process multiple times in KNIME or can I accomplish multiple column value replacements at once?

The latter on can easily be achieved with the Rule Engine
$Col0$ > 1 AND $Col0$ <18 => "youth"
For the First problem I'd use a String Replace (Dictionary).
I don't think you replace all at once but you can loop over columns.

For the second case I would use Numeric Binner:
For each column a number of intervals - known as bins - can be
defined. Each of these bins is given a unique name (for this column),
a defined range, and open or closed interval borders. They
automatically ensure that the ranges are defined in descending order
and that interval borders are consistent. In addition, each column is
either replaced with the binned, string-type column, or a new binned,
string-type column is appended.

Related

Find Row Where Sum is Reached from Single Joined Column (not a range of cells)

I'm trying to run a formula to identify in which row a total sum is reached.
I've been able to do that calculation when I have an entire range of cells to work with, however, I'm doing a filter / join calculation because I need to do this from an individual row with all the data instead of an entire range of cells.
Here is an example google sheet (EDITABLE - feel free) where you can see the range and working formula (both below). Help getting this from the single-cell versions on the top would be very helpful. The error I get with both row() & index() formulas is that the "argument must be a range".
If there's another way to do this besides the single-cell I had that doesn't require referencing the range (e.g. using FILTER) then I'm open to it.
My desired result is to be able to pull the get the second column (date) at the point when the sum is reached (can be via the INDEX & MATCH formula I used or an alternative). This will tell me the earliest date that feeds into the desired sum.
Yes unfortunately you can't do that trick with SUMIFS to get a running total unless the column being totalled is an actual range.
The only approach I know is to multiply successive values by a triangular array like this:
1 0 0 ...
1 1 0 ...
1 1 1 ...
so you get just the sum of the first value, the first 2 values, then 3 values up to n.
This is the formula in F5:
=ArrayFormula(match(E14,mmult(IF(ROW(A1:INDEX(A1:ALL1000,COUNT(split(A5,",")),COUNT(split(A5,","))))>=
COLUMN(A1:INDEX(A1:ALL1000,COUNT(split(A5,",")),COUNT(split(A5,",")))),1,0),TRANSPOSE(SPLIT(A5,",")))))
And the formula in F6 is just
=to_date(INDEX(TRANSPOSE(SPLIT(B5,",")),F5,1))
EDIT
You might have guessed that the above formula was adapted from Excel, where you try to avoid volatile functions like Offset and Indirect.
I have realised since posting this answer that it could be improved in two ways:
(1) By using Offset or Indirect, thus avoiding the need to define a range of arbitrary size like A1:ALL1000
(2) By implying a 2D array by comparing a row and column vector, rather than actually defining a 2D array. This would give you something like this in F5:
=ArrayFormula(match(E14,mmult(IF(ROW(indirect("A1:"&address(COUNT(split(A5,",")),1)))>=
COLUMN(indirect("A1:"&address(1,COUNT(split(A5,","))))),1,0),TRANSPOSE(SPLIT(A5,",")))))
which could be further simplified to:
=ArrayFormula(match(E14,mmult(IF(ROW(indirect("A1:A"&COUNT(split(A5,","))))>=
COLUMN(indirect("A1:"&address(1,COUNT(split(A5,","))))),1,0),TRANSPOSE(SPLIT(A5,",")))))

Access 2013 Count

I am working on a report in Access 2013 I need to seperate the first 20 records in a column that contain a value and assign a name to them. Such as at 1-20 I need it to insert Lot 1 at 21-40 need to assign Lot 2 etc... The report needs to be separated by lots of 20. I can also just insert a line when it reaches sets of 20 without a name if that makes it easier. Just need something to show a break at sets of 20.
Example: As you can see the report is separated by welder stencil. When the count in the VT column reaches 20 I need to enter a line or some type of divider to separate data. What our client is asking for is we separate the VT in sets of 20. I don't know whats the easiest way to accomplish this. I have researched it but haven't found anything.
Example Report with Divisions
Update the report's RecordSource query by adding "Lot" values for each row. There are multiple ways of doing this, but the easiest will be if your records already have a sequential, continuous numerical key. If they do not have such a key, you can research generating such sequential numbers for your query, but it is beyond the scope of this question and no details about the actual data schema were supplied in the question.
Let's imagine that you have such a key column [Seq]. You use the modulo (mod) and/or integer division operators (\ - backslash) to determine values that are exactly divisible by 20, e.g. ([Seq] - 1) mod 20 == 0.
Generate a lot value for each row. An example SQL snippet: SELECT ("Lot " & (([Seq] - 1) \ 20)) As LotNumber ...
Utilize Access report sorting and grouping features --grouping on the new Lot field-- to print a line and/or label at the start of each group. You can also have the report start a new page at the beginning or end of such a group.
The details about grouping can be found elsewhere in tutorials and Access documentation and are beyond the scope of this question.

Sorting/Ordering sequenced pairs of data in MySQL?

I am trying to determine if there's a way to sort rows of a MySQL table that consists of start/finish columns. (Could also be thought of as parent/child relations or other linked list arrangement)
Here's an example of how the data is currently stored:
id start finish
2 stepthree stepfour
6 stepfive stepsix
9 stepone steptwo
78 stepfour stepfive
121 steptwo stepthree
(The id numbers in this are not relevant, just using them to indicate additional columns of arbitrary data)
I want to sort/display these row in order, presuming I am always starting with "stepone", that traverses the start-> finish chain like, each "finish" being followed by the row with it as a "start".
desired output
9 stepone steptwo
121 steptwo stepthree
2 stepthree stepfour
78 stepfour stepfive
6 stepfive stepsix
There shouldn't be any branching/splits normally, just a sequential series of steps or states. I can't use simple alpha sorting (in my case the start and finish values are codes created by a customer), but can't figure out any other way to order these using SQL. I could programmatically do it using most languages, but stumped about doing it just with SQL.
Any clever ideas?
I would recommend having another table that has each step mapped to its precedence order.
Then you can write a query to sort each row in the order of precedence of the start step.

MySQL BETWEEN Index

I have to figure out a way whether a number (stored as string because of leading zeros) falls in a specific range. The ranges look like this:
12 - 14
3456 - 4567
1233435 would be considered to fall in the first range (matching is from the left).The number can have a maximum of 20 digits and I have a file which has all ranges included. I imported the ranges adding trailing zeros to the lower bound and trailing nines to the upper bound to reach the length of 20. This is to be able to handle variable length numbers - they are padded with zeros on the right so that I can do the following query:
SELECT * FROM ranges WHERE 'my padded number' BETWEEN bound_lower AND bound_upper
Since I have a couple of thousand ranges I would like to put an index on the table but I am not sure how I can achieve this.
Thanks,
Mendel
That seems like a valid approach that you've taken. To add the index, you just have to:
CREATE INDEX between_index on ranges (bound_lower, bound_upper);
You can verify that it is working by using EXPLAIN.

How do I store orders?

I have an app which has tasks in it and you can reorder them. Now I was woundering how to best store them. Should I have a colomn for the ordernumber and recalculate all of them everytime I change one? Please tell me a version which doesn't require me to update all order numbers since that is very time consuming (from the executions point of view).
This is especially bad if I have to put one that is at the very top of the order and then drag it down to the bottom.
Name (ordernumber)
--
1Example (1)
2Example (2)
3Example (3)
4Example (4)
5Example (5)
--
2Example (1) *
3Example (2) *
4Example (3) *
5Example (4) *
1Example (5) *
*have to be changed in the database
also some tasks may get deleted due to them being done
You may keep orders as literals, and use lexical sort:
1. A
2. Z
Add a task:
1. A
3. L
2. Z
Add more:
1. A
4. B
3. L
2. Z
Move 2 between 1 and 4:
1. A
2. AL
4. B
3. L
etc.
You update only one record at a time: just take an average letter between the first ones that differ: if you put between A and C, you take B, if you put between ALGJ and ALILFG, you take ALH.
Letter next to existing counts as existing concatenated with the one next to Z. I. e. if you need put between ABHDFG and ACSDF, you count it as between ABH and AB(Z+), and write AB(letter 35/2), that is ABP.
If you run out of string length, you may always perform a full reorder.
Update:
You can also keep your data as a linked list.
See the article in my blog on how to do it in MySQL:
Sorting Lists
In a nutshell:
/* This just returns all records in no particular order */
SELECT *
FROM t_list
id parent
------- --------
1 0
2 3
3 4
4 1
/* This returns all records in intended order */
SELECT #r AS _current,
#r := (
SELECT id
FROM t_list
WHERE parent = _current
)
FROM (
SELECT #r := 0
) vars,
t_list
_current id
------- --------
0 1
1 4
4 3
3 2
When moving the items, you'll need to update at most 4 rows.
This seems to be the most efficient way to keep an ordered list that is updated frequently.
Normally I'll add an int or smallint column named something like 'Ordinal' or 'PositionOrdinal' as you suggest, and with the exact caveat you mention — the need to update a potentially significant number of records every time a single record is re-ordered.
The benefit is that given a key for a specific task and a new position for that task, the code to move an item is just two statements:
UPDATE `Tasks` SET Ordinal= Ordinal+1 WHERE Ordinal>=#NewPosition
UPDATE `Tasks` SET Ordinal= #NewPosition WHERE TaskID= #TaskID
There are other suggestions for a doubly linked list or lexical order. Either can be faster, but at the cost of much more complicated code, and the performance will only matter when you have a lot of items in the same group.
Whether performance or code-complexity is more important will depend on your situation. If you have millions of records the extra complexity might worth it. However, I normally prefer the simpler code because users normally only order small lists by hand. If there aren't all that many items in the list the extra updates won't matter. This can typically handle thousands of records without any noticeable impact in performance.
The one thing to keep in mind with your updated example is that the column is only used for sorting and not otherwise shown directly to the user. Thus, when dragging an item from the top to the bottom as shown the only thing you need to change is that one record. It doesn't matter that you'll leave the first position empty. This means there is a small potential to overflow your integer sort with enough re-ordering, but let me say again: users normally only order small lists by hand. I've never heard of this risk actually causing a problem.
Out of your answers I came up with a mixture which goes as follows:
Say we have:
1Example (1)
2Example (2)
3Example (3)
4Example (4)
5Example (5)
Now if I sort something between 4 and 5 it would look like this:
2Example (2)
3Example (3)
4Example (4)
1Example (4.5)
5Example (5)
now again something between 1 and 5
3Example (3)
4Example (4)
1Example (4.5)
2Example (4.75)
5Example (5)
it will always take the half of the difference between the numbers
I hope that works please do correct me ;)
We do it with a Sequence column in the database.
We use sparse numbering (e.g. 10, 20, 30, ...), so we can "insert" one between existing values. If the adjacent rows have consecutive numbers we renumber the minimum number of rows we can.
You could probably use Decimal numbers - take the average of the Sequence numbers for rows adjacent to where you are inserting, then you only have to update the row being "moved"
This is not an easy problem. If you have a low number of sortable elements, I would just reset all of them to their new order.
Otherwise, it seems it would take just as much work or more to "test-and-set" to modify only the records that have changed.
You could delegate this work to the client-side. Have the client maintain old-sort-order and new-sort-order and determine which row[sort-order]'s should be updated - then passes those tuples to the PHP-mySQL interface.
You could enhance this method in the following way (doesn't require floats):
If all sortable elements in a list are initialized to a sort-order according to their position in the list, set the sort-order of every element to something like row[sort-order] = row[sort-order * K] where K is some number > average number of times you expect the list to be reordered. O(N), N=number of elements, but increases insertion capacity by at least N*K with at least K open slots between each exiting pair of elements.
Then if you want to insert an element between two others its as simple as changing its sort-order to be one that is > the lower element and < the upper. If there is no "room" between the elements you can simply reapply the "spread" algorithm (1) presented in the previous paragraph. The larger K is, the less often it will be applied.
The K algorithm would be selectively applied in the PHP script while the choosing of the new sort-order's would be done by the client (Javascript, perhaps).
I'd recommend having an order column in the database. When an object is reordered, swap the order value in the database between the object you reordered and the objects that have the same order value, that way you don't have to reoder the entire set of rows.
hope that makes sense...of course, this depends on your rules for re-ordering.