I want to output multiple files using U-SQL according to the number of rows - output

I want to output a single table into multiple files in U-SQL according to the number of rows.
If my table is having 500 rows, then I have to generate 5 files or 100 rows in each file.
Followed the post, U-SQL Output in Azure Data Lake

In order to generate separate files based on number of rows, you would have to add a ROW_NUMBER() to each row. Then generate a script (for example with U-SQL, see U-SQL Output in Azure Data Lake as an example) that creates an output statement for each of the row regions. Note the script generation file probably uses an inner join with a SELECT COUNT(*) FROM #data to generate the right number of OUTPUT statements. Also you want the first statement in the generated script to be the one that adds the ROW_NUMBER() to the rowset that you then output.
Once you generated the script that does that, you can then download it and submit it.

Related

Azure Data Factory For Each Avoid Array

Im quit new in ADF so here's the challenge from me.
I have a Pipeline that consist a LookUp activity and ForEach and inside this a Copy Activity
When i run this pipeline the first output of the Lookup activity looks like this
The output contains 11 different values. From my perspective i only see 11 records that will need to be copied to my Sink which is Azure SQL DB.
The input of the ForEach activity looks like this
During the running the Pipeline copy 11 times and in my sql database it has now 121 records. This amount is based on 11 rows multiple 11 iteration. This is not the output which i expected.
I only expect 11 rows in my sink table. How can i change this pipeline in order to achieve the expected outcome of only 11 rows?
Many thanks!
In order to copy data, Lookup activity and copy data source activity should not be given same configuration. If given so, duplicate rows will be copied.
I tried to repro the same in my environment.
If 3 records are there in source data, 3 times 3 records will be copied.
In order to avoid duplicates, we can use only copy activity to copy data from Source to sink.
Only 3 records are in target table.

Is it possible to print both the data and count of the data in SQL query?

I have a huge SQL query with multiple sub-tables (sub-table output is not retained) that is generating a final output in a specific format (format is fixed as it is expected by another service). I want to also generate some counts of the sub-tables in the query. A simple solution would be to re-write and run the query again, and get the counts.
However, I want to avoid double executions/computations of the same things and don't want to store the sub-tables either. I want to be able to either append these counts to the data output (will need to modify the service to ignore these count values accordingly) OR be able to write these counts to another location.
I'm using 'unload to S3' command (https://docs.aws.amazon.com/redshift/latest/dg/t_Unloading_tables.html), where all the results are stored in a S3 location.
Is it possible to achieve this? If so, how?

How to insert data into one table which is coming from 2 different csv files using conditional split transformation?

I am having two 3 csv files 1 teacher and 2 students I have to insert teachers data into one table and students data who got more than 50 marks into one table from 2 csv files, please explain how to use conditional split transformation for those 2 students file to put the data into one table
Are you sure you to use the Coniditional Split? You need to combine the student flatfiles into one table, right? If so, what you want to use is a Merge Join transformation.
You can read more about how to use the Merge Join, here.
Not sure if I have understood the question correctly. My assumptions:
Teacher is moved from CSV to a table 1 no conditions.
Student files (CSV) contain only unique records.
Records where student achieved score greater or equal to 50 are inserted into a table 2.
If the above assumptions are correct. The simplest way will be to use a loop container to loop through the students file, and have one workflow which does as follows:
Reads student file
Passes the file to the conditional split
Writes to the destination table
Conditional split task allows one to configure the conditions and outputs on those conditions.
If the file contains the column called StudentScore, then in the conditional split the first condition should be set as in the attached screen, please note that because the StudentScore is set to a string in the source file it has to be converted to the integer hence (DT_I4), if it is set to be an integer in the source file this conversion is redundant.
I also have given an output a name StudentScore, this output then will be linked to the destination file. I hope this helps.

Split the Table into multiple excel files using ssis

There is a table with 5000 records ,I need to split it into 10 excel files with names
Jan_DEpt_Records.xlsx,Feb_Deptname_Records.xlsx etc.How to achieve this with ssis.
Here "Dept" part of the excel name would come from the source table dept column.
It has been understood the use of for each loop and dataflow task inside foreachloop.
You should use conditional splits and in that you can right the cases for the number of records and than pass it to your excels just replace derived columns with the sample excel.insert indentity column on basis of that you can differentiate :

MySQL Select Query using criteria from external file?

I'm not sure if what I'm trying to do is possible at all. I'd like to be able to run a query in MySQL that does a simple Select, but uses the search criteria / select criteria from an external file to do the select on.
e.g.
SELECT * FROM lookup_table WHERE id IN (<import each individual id from external file>);
The id's will then be placed in a file. Reason I'm asking for this is that I can't create temporary tables in the database, but I can execute queries. The external file can be in any format that's required, but will contain a few hundred id's that I need to look up.
Is anything possible?
You could place all the IDs in a list in a .txt file and then use file_get_contents to get the contents. Once you've done that, you can explode the contents by new line (or whatever you seperated the values by).