I have an Excel file with 900+ column I need to import on regular basis into Access. Unfortunately I get the Excel file as such and can't change the data structure. The good news is I only need few columns of those 900+. Unfortunately MS Access can't work with files more than 255 columns.
So the idea is to import as csv file with all columns in each row in just text field. And then using VBA in Access via split to break it out again.
Question:
As I don't need all columns I want to only keep some items. So I have as input a list of column numbers I need to keep. The list is dynamic in a sense it is user defined. There is a table with all item numbers users wants to have.
I can relatively easy split the sourceTbl field.
SELECT split(field1, vbTab) from sourceTbl
If I would know I always need to extract certain columns I could probalby write in some
SELECT getItem(field1, vbTab, 1), getItem(field1, vbTab, 4), ...
Where getItem would be custom function to return item number i. Problem is which/how many columns to retrieve is not static. I read that dynamically from another table that lists the item numbers to keep.
Sample Data:
sourceTbl: field1 = abc;def;rtz;jkl;wertz;hjk
columnsToKeep: 1,4,5
Should output: abc, jkl, wertz
Excel files have around 20k rows each. About 100 MB data per file. Talking about 5 files per import. Filtered on the needed columns all data imported is about 50 MB.
Related
I've get a dimension in my Tableau workbook called discount codes. This dimension holds 30,000 strings. Also I've get separate csv files that hold hundreds of discount codes.
In Tableau I want to filter out the values from a single csv file.
I have tried to create a filter and just paste the discount codes in a list:
When I select every single value manually it works. But when I paste the whole list Tableau can't match the discount codes.
Is there any way to filter the values without selecting every single value?
You could do this using Excel (for speed) and a calculated field. Use Excel to write the calculated field formula. You already have the list of discount codes in Excel, use this to create a calculated field, a giant CASE statement.
Assuming your list of exclusions starts in cell A1, cell B1 would be
="WHEN '"&A1&'" THEN 1 "
The formula in cell B2:
=B1 & "WHEN '"&A2&'" THEN 1 "
Drag that formula to the end, and you should then have the contents of a large case statement. Copy the final cell formula as values, then copy the text into a Tableau calulcated field.
Start the calculated field with:
CASE [Discount Codes]
*pasted value*
END
All being well, you can use that calculated field as a data source filter and exclude 1.
Note I haven't tested this so watch out for bracket errors, etc.
I'm trying to concatenate my search results. I found one article describing this, but couldn't get it to work.
I'm trying to do the following:
- I have created two tables (tblBus and tblJoin). I related the tables (1:M).
- I have created a search form with a few fields to search for data.
- I've also created a query.
For most of the part everything works, except if I try to concatenate my data.
Here is an example of what I'm trying to do:
Stop Number - Route Number
110 - 111
110 - 222
115 - 111
115 - 222
I would like to combine the route numbers like this:
Stop Number - Route Number
110 - 111, 222
115 - 222, 222
Both fields are Integer fields.
You will need to use a VBA record set to create the comma delimited list of numbers.
The VBA will store the data to be displayed in a temporary table.
Your VBA will open a record set based on a, SQL query that contains your example data. The code will loop through every row in the data detecting when the number in the first column changes resetting a string variable to empty string. As it loops through each row it will add to the comma delimited string.
Alternatively you could write a function that builds a single comma delimited string that is called by a query. The calling quiet will only list the unique values in the first column. The function may be slower than VBA method. Which method you use depends on the number of rows in your table and speed.
i have column which contain data like :
Value 1\Value 2
Value 1\ Value 2\ Value 3
i don't know how many each rows have "\" and I need to split this data using SSIS Derived Column.
Could you help me?
The problem you're going to run into is that eventually you must define an upper limit to the number of columns, at least if you're going to use a Data Flow Task It does not support dynamic columns.
A script task or component will help you in the splitting of data. The String library has a Split method that takes user specified delimiters
I have a flat file with the following structure (first 3 lines are information about the file content and data starts at 4th row):
ImportSourceId,ReadTime,Location
ColumnHeader1,ColumnHeader2,ColumnHeader3,ColumnHeader4,ColumnHeader5,ColumnHeader6
Unit1,Unit2,Unit3,Unit4,Unit5,Unit6
DataForColumn1,DataForColumn2,DataForColumn3,DataForColumn4,DataForColumn5,DataForColumn6
I would appreciate suggestions to import this data to a target SQL Server table using SSIS. I am thinking on these lines:
Add a connection manager. 3 columns will be created based on the
number of values in first row (ColumnHeader3 thro ColumnHeader6 are all
being treated as one column by the connection manager at this point). As I want to extract information from the first row, I can't set 'Header Rows To skip' (?).
Add a script component to read first 3 rows to a string variable and extract the data as required.
(not sure how to split the 3rd column to 3 columns at this point)
Regards,
Mohan.
Assuming the column names are always static:
When importing the file, use a flat file connection.
Skip the first 3 rows with "Header Rows to skip"
Uncheck "column names in first row"
Click "Advanced" and manually set your column names.
I have a couple of questions about the task on which I am stuck and any answer would be greatly appreciated.
I have to extract data from a flat file (CSV) as an input and load the data into the destination table with a specific format based on position.
For example, if I have order_id,Total_sales,Date_Ordered with some data in it, I have to extract the data and load it in a table like so:
The first field has a fixed length of 2 with numeric as a datatype.
total_sales is inserted into the column of total_sales in the table with a numeric datatype and length 10.
date as datetime in a format which would be different than that of the flat file, like ccyy-mm-dd.hh.mm.ss.xxxxxxxx (here x has to be filled up with zeros).
Maybe I don't have the right idea to solve this - any solution would be appreciated.
I have tried using the following ways:
Used a flat file source to get the CSV file and then gave it as an input to OLE DB destination with a table of fixed data types created. The problem here is that the columns are loaded, but I have to fill them up with zeros in case the date when it is been loaded or in most of the columns if I am not utilizing the total length then it has to preceded with zeros in it.
For example, if I have an Orderid of length 4 and in the flat file I have an order id like 201 then it has to be changed to 0201 when it is loaded in the table.
I also tried another way of using a flat file source and created a variable which takes the entire row as an input and tried to separate it with derived columns. I was to an extent successful in getting it, but at last the data type in the derived column got fixed to Boolean type explicitly, which I am not able to change to the data type I want.
Please give me some suggestions on how to handle this issue...
Assuming you have a csv file in the following format
order_id,Total_sales,Date_Ordered
1,123.23,01/01/2010
2,242.20,02/01/2010
3,34.23,3/01/2010
4,9032.23,19/01/2010
I would start by creating a Flat File Source (inside a Data Flow Task), but rather than having it fixed width, set the format to Delimited. Tick the Column names in the first data row. On the column tab, make sure row delimiter is set to "{CR}{LF}" and column delimiter is set to "Comma(,)". Finally, on the Advanced tab, set the data types of each column to integer, decimal and date.
You mention that you want to pad the numeric data types with leading zero's when storing them in the database. Numeric data types in databases tend not to hold leading zero's. So you have two options; either hold the data as the type they are in the target system (int, decimal and dateTime) or use the Derived Column control to convert them to strings. If you decide to store them as strings, adding an expression like
"00000" + (DT_WSTR, 5) [order_id]
to the Derived Column control will add up to 5 leading zeros to order id (don't forget to set the data type length to 5) and would result in an order id of "00001"
Create your target within a Data Flow Destination and make the table/field mappings accordingly (or let SSIS create a new table / mappings for you).