Is there a function in GREL to remove many columns at once based on their headers in OpenRefine? - multiple-columns

I have a file with 76 columns, out of which 52 columns are irrelevant and should be removed based on their column headers (i.e. string of names). OpenRefine offers the possibility to manually Re-order/remove columns but I was wondering if there is a GREL way to match header names and remove many columns at once, as I was not able to find a remove function similar to replace.

If the columns that you want to remove are in sequence, you can use Transpose -> Transpose cells across columns into rows
Then do the following:
From Column: First column you want to delete
To Column: Last column you want to delete
Transpose Into: Two new columns
Set the Key column to anything - e.g. A
Set the Value column to anything - e.g B
Now all of the columns you want to delete have been turned into rows and have been replaced by A and B. Finally, delete A and B and you're done.

Did you try the menu options on the All column?
All --> Edit Columns --> Re-order/Remove columns...

Related

SSIS:I need to apply fuzzy lookup on multiple table columns

I need to apply fuzzy lookup on multiple table columns. for example - I have table A which contains 4 columns(50% matched data ) which look 4 different tables which contain 100% matched data. I want to apply fuzzy lookup on 4 different data sets which match data from different 4 tables and give me correct data for table A. How can I do this.
In the edit querys go to Merge Querys > Merge As New and check the Use fuzzy matching to compare the merge in the pane (You also have some fuzzy merge options here for example the match percentage) and hit OK.
If you have more tables to match with, just repeat the first step again on the newly created table.
You can also pass a transformation table where you cen specify some matching criterias.

Two Way table in knime?

I am new to KNIME and I have a question, I have column splitter node that is outputting one column and one row. This will naturally have one value in the cell. I want to feed this value into a column of a table in KNIME. How do I do this?
I Don't see two way tables in KNIME.
You can use the Cross Joiner node to append the constant column. (There is also the Table Row to Variable and Constant Value Column combination if the constant value is one of the primitive types (String, Double, Int).)
You may need the RowID node to replace/restore the row ids.

is it possible with html to specify table/grid organized by columns?

I have a table/grid with two columns and I want to be able to add/remove rows from each column. Currently I'm using a regular table and if I want to add a cell to the left column, I have to go through each of the rows and move the data from all the other cells in that column down in the subsequent rows.
I was wondering if there was a way either with tables or css-tables to organize by data by column and then just add/subtract rows/divs from the appropriate column grouping and not have to deal with visual rows.

add variable number of empty columns to a tablix

I have a tablix with a column group so that it will create a column for that field provided any row has data for that column.
I need to create a version of this report that contains some empty columns on the end.
I want the number of columns to be added to be based on some factor of the non empty columns with some min/max constraints also. (my question is not how to get the number of columns required)
so far i've tried.
1 - adding individual empty columns to the tablix and setting the visibility condition on each column.
a bit long winded and a bit of a faff.
2 - creating another column group and grouping on the same field, this creates
cant vary the number of columns returned.
am i missing a simple way of adding x empty rows or columns to a tablix? where x can be calculated somehow from the values in the dataset.
You will have to fib this scenario in your data. The column-wise grouping works with known data including those falling in the column group value with null values. There is no way to grow your groups without data unless you add some column group footer logic that would be pretty weird.
I would look into producing phantom NULL value records that will push out your columns.

SQL : should I insert another column or parse every single row

Assume millions of lines of traffic data in SQL format.
From the column URL and for each row of given range, I want to get a substring text that matches the target tag.
For example, from the column URL, I have the following texts:
Column: `URL`
Row 1: http://www.google.com/abcdeft?&QQ=123&AA=america&YY=111
Row 2: http://www.google.com/abcdeft?&QQ=123&AA=asia&YY=111
Row 3: http://www.google.com/abcdeft?&QQ=123&AA=africa&YY=111
Row 4: http://www.google.com/abcdeft?&QQ=123&AA=south&YY=111
Row 5: http://www.google.com/abcdeft?&QQ=123&AA=south&YY=111
Row 6: http://www.google.com/abcdeft?&QQ=123&AA=&YY=111
Row 7: http://www.google.com/abcdeft?&QQ=123
...
Row 99999999: http://www.google.com/abcdeft?&QQ=123&AA=ddd&YY=111
Data keep being loaded with lots of updates. So performance does matter. My goal is to:
Identify each row with its unique key-tag &AA=. Basically I need to get the string in the tag &AA= from every single row. For example, I want africa from ~~&AA=africa&~~. None if there is no &AA= but still need to read every single row.
Identify duplicate rows that contain the same tag in &AA=. e.g. row 4 and 5 are duplicates because they have same AA tags of south.
Question: which would be the best way for future data processing?
Option 1. Without URL column
Read every single row in URL column
Parse each row for the tag &AA= using urlparse library
Need a separate script to find duplicate rows with the same AA tag. e.g. using Python, I need to make a list of all items(all tags) and find the duplicate items in the list.
Need a separate query to find the rows that contain duplicate tags. e.g. query the rows that contain the duplicate items in the column URL
Creating separate column specifically for this task seems relatively doable.
Option 2. Insert another new column AA for tag &AA= and start filling out the new column when updating traffic data.
In this way:
No need to Read the column URL
No need to Parse the text in URL to get the tag &AA=
No need to Find duplicate items from one query
- No need to etrieve rows with duplicate items from another query
In this way, we can easily:
Get &AA= data just selecting the column AA
SELECT duplicate rows using COUNT function in SQL
Which one would perform better?
If you can stand the extra space cost of having an additional column then that would be the optimal approach. If there are a lot of duplicates of AA you might consider putting that in another table and then joining to it for queries. That would cut down on the space cost and still give you all the flexiblity. it would make it even easier (faster to query) if you were querying on an ID instead of the textual value of AA.