Using coldfusion I am using a cvs file and then compiling multiple txt files form that depending on the value of each row in the csv file
I would like to do a SELECT DISTINCT on the csv for the Bank Name column if that is possible, I want to get the distinct values in that column withing the csv. I then also want to count how many rows there are for that distinct value (how many times it appears in the csv file). Finally I want to get the SUM of the Amount column for that distinct Bank Name
I am not really sure how to go about this and would appreciate any input and thank you in advance!
Read your csv file using cfhttp. The name attribute creates a query variable of the file contents which enables you to use query of queries. Details are in the documentation of the cfhttp tag.
You could try using a datasource proxy with a text driver as is described in this post on CSV files and ColdFusion. Because of the 64bit and 32bit ODBC issues you may also need to refer to this post to get such a DSN installed on a modern CF installation. Note that the second post will work through CF 9, but I've not tested the technique on CF 10 or CF 11 (it's a pretty old technique).
I'm not recommending either approach but assuming you could get it working it would give you an easy way to use Q of a Q and get distinct values. I'm not sure if either one of them is any better than Ben's way of doing it. However, you can borrow his CFC and simply pass in your columns and data. I'm not sure I understand how that is more work than writing filtering code.
Related
I am looking to refresh a data set in quicksight, this is in Spice. The data set comes from a csv file that has been updated and now has more data than the original file I uploaded.
I can't seem to find a way to simply repoint to the same file with same format. I know how to replace the file but whenever i do this it states that it can't create some of my calculated fields and so drops multiple rows of data!
I assume I'm missing something obvious but I can't seem to find the right method or any help on the issue.
Thanks
Unfortunately, QuickSight doesn't support refreshing file data-sets to my knowledge. One solution, however, is to put your CSV in S3 and refresh from there.
The one gotcha with this approach is that you'll need to create a manifest file pointing to your CSV. This isn't too difficult and the QuickSight documentation is pretty helpful.
You can replace the datasource by going into the Analysis and clicking on the pencil icon as highlighted in Step 1. By replacing dataset, you will not lose any new calculated fields that might have been calculated already on the old dataset.
If you try to replace the data source by going into the Datasets as highlighted below, you'll lose all calculated fields and modifications etc
I don't know when this was introduced but you can now do this exact thing through the "Edit Dataset", starting either from the Dataset page or from the 'pencil' -> Edit dataset inside an Analysis. It is called "update file" and will result in an updated dataset (additional or different data) without losing anything from your analysis including calculated fields, filters etc.
The normal caveat applies in that the newer uploaded file MUST contain the same column names and datatypes as the original - although it can also contain additional columns if needed.
Screenshots:
I am doing a project to generate data extracts on a daily basis. I have ten different queries with different columns and also the number of columns are also different. the database is MSSQL server 2008 R2 and I tried SSIS packet to accomplish the result.I used the components datasource, then a sort and the result of the sort to merge and then to text file. But I am getting error when combining the result saying the columns are different or something. Can anyone suggest a solution or is there any other way to accomplish this.
thanks,
Sivajith
Can you please provide error message? The merge component can merge data flows with various amount of columns, by selecting for the input columns.
First create a template .csv file which contain all the columns from the queries (i.e. if you have the columns A B C in the first query, B, E, F in the 2nd , B , X, Y in the third and so on, make sure your template file will have A B C E F X Y)
Make 10 tasks (one for each query). As a source use sql from command and write your query. As a destination, use the template file created above. Make sure you uncheck "Overwrite data".
Use the same destination for all the queries.
This should do the trick. I am not sure that I completely understood your question since it's a little big vague.
Here are the following reference that may help you a bit more:
SQL Server : export query as a .txt file
You will have to make sure you have a proper connection to the SQL server and then run this as a powershell or a .bat file. This can be scheduled to run daily as well.
I wrote a VBA script that runs in an Access database. The script looks up values on various tables and assigns an attribute to a main table based on the combination of values.
The script works as intended, however, I am working with millions of records so it takes an unacceptably long time.
I would like to break the process up into smaller parts and run the script concurrently on separate threads.
Before I start attempting to build a solution, I would like to know:
Based on your experience, would this increase performance? Or would the process take just as long?
I am looking at using Powershell or VBScript to accomplish this. Any obstacles to look out for?
Please Note: Due to the client this will run on, I have to use Access for the backend and if I use Powershell it will have to be version 1.0.
I know these are very vague questions but any feedback based on prior experience is appreciated. Thanks
Just wanted to post back with my final solution on this...
I tried the following ways to assign an attribute to a main table based on a combination of values from other tables for a 60,000 record sample size:
Solution 1: Used a combination of SQL queries and FSO Dictionary objects to assign attribute
Result: 60+ minutes to update 60,000 records
Solution 2: Ran script from Solution 1 concurrently from 3 separate instances of Excel
Result: CPU was maxed out (Instance 1 - 50% of CPU, Instances 2 and 3 - 25% each); stopped the code after an hour since it wasn't a viable solution
Solution 3: Tried using SQL UPDATE queries to update main table with the attribute
Result: This failed because apparently Access does not allow for a join on an UPDATE sub-query (or I just stink at writing SQL)
Solution 4 (Best Result): Selected all records from main table that matched the criteria for each attribute,
output the records into csv and assigned the attribute to all records in the csv file.
This created a separate file for each attribute, all in the same format. I then
imported and appended all of the records from the csv files into a new main table.
Result: 2.5 minutes to update 60,000 records
Special thanks to Pynner and Remou who suggested writing the data out to csv.
I never would have thought that this would be the quickest way to update the records with the attribute. I probably would have scrapped the project thinking it was impossible to accomplish with Access and VBA had you not made this suggestion. Thank you so much for sharing your wisdom!
I have got a feed (for Employee details) whose one record is like this.
101EnggAnal
brief given to me is 1st 3 characters will be employee ID, next 4 will be department and last 4 will be Designation. Can I read this using Flat file source? If yes how? Do i have to write Script Component as Source to get this done?
Unless I'm missing some nuance in your question, you are simply looking at a flat file connection manager with a format of Ragged right, or possibly Fixed width.
I reckon the easiest way is to read it as one column, and in a Data Flow task use Derived Columns on the Source to generate the 3 columns you want via expressions before using those as the columns for the Destination.
I am writing the SSIS package to import the data from *.csv files to the SQL 2008 DB. The problem is that one of the file contains the duplicate records in the csv file and I want to extract only the distinct values from that source. Please see the image below.
Unfortunately, the generated files are not under my control and it is owned by the third party and I could not change the way they generated.
I did use the LookUp Component. But it only checks the existing data against the incoming data. It does not check the duplicate records in the incoming data.
I believe the sort component gives an option to remove duplicate rows.
Depends on how serious you want to get about the duplicates. Do you need a record of what was duplicated or is it enough to just get rid of them? Sort component will get rid of dups on the sort field. However, the dups may have different data in the other fields and then you want a differnt strategy. Usually I load all to staging tables and clean up from there. I send the dupes removed to an exception table (we have to answer a lot of questions from our customers about why things don't match what they sent) and I often use a set of business rules (and use either an execute SQl or data flow tasks to enforce the rules) to determine which one to pick if there are duplicates in one area but not another (say two business addresses when we can only store 1). I also make sure the client is aware of how we determine which of the two to pick.
Use SORT tool for that from Toolbox, then click on it. You will get all available input columns.
Check the column and change sortType direction and then check "remove rows with duplicate sort value".
Bring in the data from the csv file the way it is, then dedup it after it's loaded.
It'll be easier to debug, too.
I used Aggregate Component and Group By both QualificationID and UnitID. If you want, you can also use Sort Component too. Perhaps, my information might help others.