I was trying to develop U-SQL User defined operators using this link. It looks like we can read one row, process it and write it as a single row using UDO.
In my scenario I have to read multiple consecutive rows and write multiple consecutive rows and that seems not possible using the help provided in blog.
In another scenario, I have to process single row and break into multiple and then write to output.
I am wondering if it is possible to process multiple rows using U-SQL UDO or if there is any other way to do it in U-SQL?
You can write a custom applier to take a single row and return several rows. You invoke it with CROSS APPLY.
You can write a custom reducer (or a user-defined aggregator) to take several rows (cells) and return a single row (cell).
What do you want to do by reading several rows, see them all and then return several rows? Would that be similar to a self-join (you could use a combiner)?
Related
I have a huge SQL query with multiple sub-tables (sub-table output is not retained) that is generating a final output in a specific format (format is fixed as it is expected by another service). I want to also generate some counts of the sub-tables in the query. A simple solution would be to re-write and run the query again, and get the counts.
However, I want to avoid double executions/computations of the same things and don't want to store the sub-tables either. I want to be able to either append these counts to the data output (will need to modify the service to ignore these count values accordingly) OR be able to write these counts to another location.
I'm using 'unload to S3' command (https://docs.aws.amazon.com/redshift/latest/dg/t_Unloading_tables.html), where all the results are stored in a S3 location.
Is it possible to achieve this? If so, how?
I have a table that contains data from repeated experiments (for example, site A has one sample, and the lab processed the sample three times obtaining slightly different values). I need to average these results in a separate table, but what I have read on the Microsoft support site is that a query that pulls data into another table with a calculated field is not possible on Access.
Can I query multiple data points from one table into a single calculated field in another table? Thank you.
UPDATE
I ended up doing a lot of manual adjustments of the file format to create a calculated field in the existing table that averages each sites data, so my problem is, for my current purposes, solved. However I would still like to understand. Following up with you both, I think the problem was that I had repeated non-unique IDs between rows when I probably should have made data columns with unique variable names so that I could query each variable name for an average.
So, instead of putting each site separately on the y axis, I formatted it by putting the sample number for each site on the x-axis:
I was able to at least create a calculated field using this second format in order to create an average value for each site.
Would have there been a way to write a query using the first method? Luckily, my data set was not at all very hefty, so I could handle a reformat manually, but if the case were with thousands of data entries, I couldn't have done that.
Also, here is the link to the site I mentioned originally https://support.office.com/en-ie/article/add-a-calculated-field-to-a-table-14a60733-2580-48c2-b402-6de54fafbde3.
Thanks all.
I have a grid view where I need to show data of more than a table, each record in a row (Not relations).
Thus, a group of grid rows may come from table one, another group of rows from table two, ect...
For example, I need to select from tbl1 rows (suppose r1 to r10), and from table tbl2, rows (suppose r11 to r20) r1 to r10 and r11 to r20 may have a lot of common id's (because from different tables). And I want to show all these records in a single grid view, with search and actions enabled.
I have made an attempt to get data in arrayDataProvider, and it worked perfect.
The problems I am trying to fix are two:
1. Enabling the searchModel in the grid. (For that, I have also get all data in the search model in arrayDataProvider, but still need to enable search).
2. I need to know which record is selected for(view, update, or delete) and take action based on the selection, because the same id may exist in the grid multiple times, each from a table.
To Enable Search:
1. I have used all search models to return arrays based on filtering queries.
2. I have used a basic search model that includes common attributes between all tables, it calls functions from other search models to get array from them, then it concatenates all these arrays and returns them as an array data provider.
3. Needed some attention when dealing with parameters in search models, because they are using the same model as the basic, they have more fields.
When I treat id's and acyions on rows, I will post the method.
If any one having the same issue, and need help, I will be :-)
I have job in Talend that is designed to bring together some data from different databases: one is a MySQL database and the other a MSSQL database.
What I want to do is match a selection of loan numbers from the MySQL database (about 82,000 loan numbers) to the corresponding information we have housed in the MSSQL database.
However, the tables in MSSQL to which I am joining the data from MySQL are much larger (~ 2 million rows), are quite wide, and thus cost much more time to query. Ideally I could perform an inner join between the two tables based on the loan number, but since they are in different databases this is not possible. The inner join that is performed inside a tMap occurs after the Lookup input has already returned its data set, which is quite large (especially since this particular MSSQL query will execute a user-defined function for each loan number).
Is there any way to create a global variable out of the output from the MySQL query (namely, the loan numbers selected by the MySQL query) and use that global variable as an IN clause in the MSSQL query?
This should be possible. I'm not working in MySQL but I have something roughly equivalent here that I think you should be able to adapt to your needs.
I've never actually answered a Stackoverflow question and while I was typing this the page started telling me I need at least 10 reputation to post more than 2 pictures/links here and I think I need 4 pics, so I'm just going to write it out in words here and post the whole thing complete with illustrations on my blog in case you need more info (quite likely, I should think!)
As you can see, I've got some data coming out of the table and getting filtered by tFilterRow_1 to only show the rows I'm interested in.
The next step is to limit it to just the field I want to use in the variable. I've used tMap_3 rather than a tFilterColumns because the field I'm using is a string and I wanted to be able to concatenate single quotes around it but if you're using an integer you might not need to do that. And of course if you have a lot of repetition you might also want to get a tUniqueRows in there as well to save a lot of unnecessary repetition
The next step is the one that does the magic. I've got a list like this:
'A1'
'A2'
'B1'
'B2'
etc, and I want to turn it into 'A1','A2','B1','B2' so I can slot it into my where clause. For this, I've used tAggregateRow_1, selecting "list" as the aggregate function to use.
Next up, we want to take this list and put it into a context variable (I've already created the context variable in the metadata - you know how to do that, right?). Use another tMap component, feeding into a tContextLoad widget. tContextLoad always has two columns in its schema, so map the output of the tAggregateRows to the "value" column and enter the name of the variable in the "key". In this example, my context variable is called MyList
Now your list is loaded as a text string and stored in the context variable ready for retrieval. So open up a new input and embed the variable in the sql code like this
"SELECT distinct MY_COLUMN
from MY_SECOND_TABLE where the_selected_row in ("+
context.MyList+")"
It should be as easy as that, and when I whipped it up it worked first time, but let me know if you have any trouble and I'll see what I can do.
I am trying to use the Pentaho Kettle software for a few transformations on my largetables. I want to perform an operation that displays the contents of alternate rows in two different tables and then I wish to join the two tables later for further transformation.
The scripting option in the tool helps me with the executing SQL scripts for single row or multiple rows.
Can anyone help me with how to select the row for this purpose.
It's not very clear what your trying to achieve, but the individual elements are pretty straight forward when you break them down to their discrete steps.
I would use the following steps:
Table Input - allows you to make a query to a database connection with a SQL statement.
Filter Rows - allows you to split a row of data to two separate paths based on selected criteria in the data row.
You can achieve union of two or more separate paths by connecting them to any step type; that is, a step will process each data row from any number of input paths separately and send it down the output path. In affect, all steps perform union operations.
One critical principle to keep in mind when using Pentaho Kettle is never assume that operations happen sequentially (i.e. process 1st row, then 2nd, then 3rd, etc.). Operations happen in parallel by data row; so the 1st row can be sent down the path after the 2nd row.
Hope that helps...