SSIS Flat File Destination - Columns going out of sync when adding new column - mysql

Really hoping someone can help me with this 2008 SSIS BIDS issue. I am very new to the software so I am hoping it will be something simple!
I am attempting to use SSIS to retrieve data from an SQL table. My data flow consists of one OLE DB Source pointing directly to one Flat File Destination.
At the moment I can successfully retrieve 4 columns which display fine however when I attempt to add a fifth column (from data returned from SQL database) it appears to jumble everything up.
To elaborate the fifth column should contain a letter between “A – F” however instead of pulling this through from my results in the OLE Data Source it appears to be pulling through data from columns 1 – 4
EG before adding this extra column things look like this:
|----------|------------|-------------|--------|
| Column 1 |Column2 |Column3 |Column4 |
|----------|------------|-------------|--------|
|444456654 |10/01/2015 |User unable |JSMIT14 |
| | |to logout of | |
| | |app without | |
| | |crashing. | |
------------------------------------------------
However after adding the end column and mapping it accordingly everything seems to go out of sync, with added quotation marks:
|----------|------------|-------------|--------|-----------------------|
| Column 1 |Column2 |Column3 |Column4 |Column 5 |
|----------|------------|-------------|--------|-----------------------|
|444456654 |10/01/2015 |User unable |JSMIT14 |444456654”,”User |
| | |to logout of | |unable to logout of |
| | |app without | |app without |
| | |crashing. | |crashing.”,”JSMITH14 |
-----------------------------------------------------------------------
I was expecting it to pull through the data as I had mapped it and appear as below:
|----------|------------|-------------|--------|---------|
| Column 1 |Column2 |Column3 |Column4 |Column 5 |
|----------|------------|-------------|--------|---------|
|444456654 |10/01/2015 |User unable |JSMIT14 | F |
| | |to logout of | | |
| | |app without | | |
| | |crashing. | | |
----------------------------------------------------------
I add the extra column by selecting “Flat File Connection Manager” > “Advanced” > “New”.
I then map the SQL column to my new column by selecting “Flat File Destination” > “Mappings”
Please note if I select the data source and preview the SQL code all appears fine so I feel that something must be going wrong at the Flat File Destination stage.

Related

How does Foundry Magritte append ingestion handle deleted rows in the data source?

If I have a Magritte ingestion that is set to append, will it detect if rows are deleted in the source data? Will it also delete the rows in the ingested dataset?
For your first question on if deletions are detected, this will depend on the database implementation you are extracting from (I'll assume this is JDBC for this answer). If this shows up as a modification and therefore a new row, then yes your deletes will show up.
This would look something like the following at first:
| primary_key | val | update_type | update_ts |
|-------------|-----|-------------|-----------|
| key_1 | 1 | CREATE | 0 |
| key_2 | 2 | CREATE | 0 |
| key_3 | 3 | CREATE | 0 |
Followed by some updates (inside a subsequent run, incremental on update_ts:
| primary_key | val | update_type | update_ts |
|-------------|-----|-------------|-----------|
| key_1 | 1 | UPDATE | 1 |
| key_2 | 2 | UPDATE | 1 |
Now your database would have to explicitly mark any DELETE rows and increment the update_ts for this to be brought in:
| primary_key | val | update_type | update_ts |
|-------------|-----|-------------|-----------|
| key_1 | 1 | DELETE | 2 |
After this, you would then be able to detect the deleted records and adjust accordingly. Your full materialized table view will now look like the following:
| primary_key | val | update_type | update_ts |
|-------------|-----|-------------|-----------|
| key_1 | 1 | CREATE | 0 |
| key_2 | 2 | CREATE | 0 |
| key_3 | 3 | CREATE | 0 |
| key_1 | 1 | UPDATE | 1 |
| key_2 | 2 | UPDATE | 1 |
| key_1 | 1 | DELETE | 2 |
If you are running incrementally in your raw ingestion, these rows will not be automatically deleted from your dataset; you'll have to explicitly write logic to detect these deleted records and remove them from your output clean step. If these deletes are found, you'll have to SNAPSHOT the output the remove them (unless you're doing lower-level file manipulations where you could remove the underlying file perhaps).
It's worth noting you'll want to materialize the DELETES as late as possible (assuming your intermediate logic allows for it) since this will require a snapshot and will kill your overall pipeline performance.
If you aren't dealing with JDBC, then #Kellen's answer will apply.
If this is a file-based ingetsion (as opposed to JDBC) magritte ingestion operates on files not on rows. If your transaction type for the ingestion is set to UPDATE, and you make changes to the file, including deleting rows, then when the ingestion runs the new file will completely replace the existing file in that dataset, so any changes made in the file will be reflected in the dataset.
Two additional notes:
If you have the exclude files already synced filter, you will probably want to have last modified date and/or file size options enabled or the modified file won't be ingested.
If your transaction type is set to APPEND and not UPDATE then the ingestion will fail because APPEND doesn't allow changes to existing files.

MySQL structure for storing user specific settings without 100 columns

I'm looking for some guidance in the best way to store user specific data in an SQL database. I'm a little new to SQL so I'm hoping this is a fairly easy concept for those familiar.
I've been reading about normalisation and other good practices as I'm aware that setting a good foundation for the database is crucial and hard to change later.
I think an easy way to explain my scenario is this:
Each website user can choose to create one or more "projects".
Within each project a user will set an "object". This object can be created by the user or it can be chosen from a list of objects which have been created by other users.
Each object has a variable number of settings. Let's say an object could have between 5 - 25 settings. Each setting could simply be an integer value between 0 - 100.
Originally I thought about doing it this way:
Project Table
+-----------+-------------+------+---------+----------+---------+----------+------+--------+
| ProjectID | ProjectName | User | Object1 | Object2 | SetID | Notes | Date | Photo |
+-----------+-------------+------+---------+----------+---------+----------+------+--------+
| PID0001 | My Project | Bob | OBJ0001 | OBJ00056 | SID0045 | my notes | | |
+-----------+-------------+------+---------+----------+---------+----------+------+--------+
Each user can create a project and reference different objects and object settings profiles within that project.
Object Table
+---------+------------+--------+---------+-------+--------+----------+-------+-------+---------+---------+--------+
| ObjID | ObjName | ObjVer | Date | User | Set1ID | Set1Name | Set1X | Set1Y | Set1Min | Set1Max | Set2ID |
+---------+------------+--------+---------+-------+--------+----------+-------+-------+---------+---------+--------+
| OBJ0001 | My Object | Bob | | Bob | S00013 | Volts | 12 | 52 | 1 | 80 | S00032 |
+---------+------------+--------+---------+-------+--------+----------+-------+-------+---------+---------+--------+
This table would define all the configurable settings for the object. It could range from 1 settings to 25 settings. In this example, each setting the user adds to the object would have 6 parameters such as min/max allowed values, name, id etc.
If I do it this way, I would end up with over 100 columns many of which could be empty...
Object Settings Table
+---------+-------------+---------+------------+------+---------+---------+---------+
| SetID | Setname | ObjID | Date | User | Set1Val | Set2Val | Set3Val |
+---------+-------------+---------+------------+------+---------+---------+---------+
| SID0045 | My Settings | OBJ0001 | 12-12-2017 | Bob | 12 | 32 | 98 |
+---------+-------------+---------+------------+------+---------+---------+---------+
In this table, each row would define a user's settings profile for that object - basically just the value for the settings which were defined in the object table. Each user could have a different set of settings for the same object when it's used in their project.
So, the above method seems bad to me. It makes sense in my head but the number of columns will get out of control when allowing multiple settings.
I suppose the better way of doing this would be to go vertical by adding a row for each setting or setting column but I'm just not sure how this would look. How can I structure it this way while still allowing the "sharing" of objects between user projects?

Multiple Data Sources in Microsoft Excel SQL Query

I have a lot of spreadsheets that pull transactional information from our ERP software into Excel using the Microsoft Query that we then perform other calculations on automatically. Recently we upgraded our ERP system, but management made the decision to leave the transactional history in the old databases to have a clean one going forward in the new system. I still need to have some "rolling 12 months" graphs, but if I use only the old database, I'm missing new data and if I use only the new, I'm missing the last 11 months data.
Is there a way that I can write a query in Excel to pull data from the old database PartTran table and merge it with the new database PartTran table without user intervention each time? For instance, I don't want my users (if possible) to have to have two queries that they copy and paste into one Excel table. The schema of the tables (at least the columns I need) are identically named and defined.
If you want to take a bit of a fun, hacky Excel approach, you could do the "copy-paste" bit FOR your users behind the scenes. Given two similar tables OLD and NEW with structures
+-----+------+-------+------------+
| id | foo | bar | date |
+-----+------+-------+------------+
| 95 | blah | $25 | 2015-06-01 |
| 96 | bork | $12 | 2015-07-01 |
| 97 | bump | $200 | 2015-08-01 |
| 98 | fizz | | 2015-09-01 |
| 99 | buzz | $50 | 2015-10-01 |
| 100 | char | ($1) | 2015-11-01 |
| 101 | mope | | 2015-12-01 |
+-----+------+-------+------------+
and
+----+-----+-------+------------+------+---------+
| id | foo | bar | date | fizz | buzz |
+----+-----+-------+------------+------+---------+
| 1 | cat | ($10) | 2016-01-01 | 285B | 1110111 |
| 2 | dog | $25 | 2016-02-01 | 27F5 | 1110100 |
| 3 | ant | $100 | 2016-03-01 | 1F91 | 1001111 |
+----+-----+-------+------------+------+---------+
... you can union together the data for these two datasets with some prudent excel wizardry as below:
Your UNION table ( named using alt+j+t+a ) should have the following items:
New natural ID
DataSet pointer ( name of old or new table )
Derived ID from original dataset
Columns of data you want from Old & New DataSets
example:
+---------+------------+------------+----+------+-----+------------+------+------+
| UnionId | SourceName | SourceRank | id | foo | bar | date | fizz | buzz |
+---------+------------+------------+----+------+-----+------------+------+------+
| 1 | OLD | | | | | | | |
| 2 | NEW | | | | | | | |
+---------+------------+------------+----+------+-----+------------+------+------+
You will then make judicious use of Indirect() and VlookUp() to derive the lookup id and column targets. Sample code below
SourceRank - helper column
=COUNTIFS([SourceName],[#SourceName],[UnionId],"<="&[#UnionId])
id - the id from the original DataSet
=SMALL(INDIRECT([#SourceName]&"[id]"),[#SourceRank])
Everything else is just VlookUp madness!! Although I've taken the liberty of copying the sample code below for reference
foo =VLOOKUP([#id],INDIRECT([#SourceName]),MATCH(UNION[[#Headers],[foo]],INDIRECT([#SourceName]&"[#Headers]"),0),0)
bar =VLOOKUP([#id],INDIRECT([#SourceName]),MATCH(UNION[[#Headers],[bar]],INDIRECT([#SourceName]&"[#Headers]"),0),0)
date =VLOOKUP([#id],INDIRECT([#SourceName]),MATCH(UNION[[#Headers],[date]],INDIRECT([#SourceName]&"[#Headers]"),0),0)
fizz =VLOOKUP([#id],INDIRECT([#SourceName]),MATCH(UNION[[#Headers],[fizz]],INDIRECT([#SourceName]&"[#Headers]"),0),0)
buzz =VLOOKUP([#id],INDIRECT([#SourceName]),MATCH(UNION[[#Headers],[fizz]],INDIRECT([#SourceName]&"[#Headers]"),0),0)
Output
You'll likely want to make prudent use of If() and/or IfError() to help your users ignore the new column references to the old table and those rows that do not yet have data. Without that, however, you'll end up with something like the below.
This is both ready to accept & read new inputs to both OLD and NEW DataSets and is sortable to get rid of those pesky placeholder rows...
Hope this helps! Happy coding!

Replacing empty values with zeros

I know this question has been asked many times before, but having followed previous solutions I still can't resolve the issue:
I have a SSRS (2008R2) report, that displays data in a matrix. There is a horizontal column group containing values that may or may not exist: when the value doesn't exist, I'd like to replace the empty cell with a 0. Currently the report renders like:
and I'd like those empty cells to have 0's in them instead.
The cell expression was set to:
=Sum(Fields!number.Value)
so I tried
=iif (IsNothing(Sum(Fields!number.Value)),0,Sum(Fields!number.Value))
and
=iif (IsNothing(Fields!number.Value),0,Sum(Fields!number.Value))
and
=iif (Sum(Fields!number.Value)>0,Sum(Fields!number.Value),0)
... but the "empty" cells persist. I'm sure I'm doing something daft... but what?
EDIT: To illustrate my situation better, my query produces output (in SSMS) similar to:
File | outcomeID | number
A | Outcome1 | 2
A | Outcome2 | 1
B | Outcome2 | 2
C | Outcome1 | 1
D | Outcome3 | 2
... which would produce the outcome in SSRS of:
File | Outcome1 | Outcome2 | Outcome3
A | 2 | 1 |
B | 2 | |
C | 1 | |
D | | | 2
using a Column Group:
Even if I change the expression to be simply:
=999
I end up with
File | Outcome1 | Outcome2 | Outcome3
A | 999 | 999 |
B | 999 | |
C | 999 | |
D | | | 999
... i.e. lots of blank spaces.
EDIT2: I've uploaded a very small example .rdl file [REMOVED], using the example data above - it reproduces the issue
my first thought was a change to the stored procedure but since i had a similar report to test something on quickly i had an idea.
try this instead
=0+Sum(Fields!number.Value)
I tend to use the following (on SSRS 2005, however): =CInt(Fields!number.Value). It's possible to use other conversions, like CDbl, as well.
Alternatively, but a tiny bit longer, try =IIf(Fields!number.Value Is Nothing, 0, Fields!number.Value).

Access 2007: crosstab view to insert values into a database

I am stuck and need help/advise. I am pretty sure that I am not the first one to run into this problem, but I can't seem to find the answer on the web.
We are collecting all kinds of data from many factories. This is mainly forecasted values of yearly peak production, etc. This data collection is repeated every year.
We currently keep track of this data in Excel, which has the following structure:
Factory | 2010 | 2011 | 2012 | ..
----------------------------------
A | 20 | 30 | 28 | ..
B | | 39 | 55 | ..
In this example factory B just starts production in the year 2011. If we collect data for an additional year, we simply add a column. If forecasted data changes, we simply enter the new values and lose the old ones. You can imagine that this way of working has its limitations: the table becomes rather sparse for missing data. Old values cannot be traced back. There is no reference to the source of the values.
To satisfy our need for a better system, I put my antique knowledge of databases to work. In Access 2007 I created the following structure:
Table: Factories
FacID | FactoryName
---------------------
1 | A
2 | B
Table: Sources
SouID | Source | SourceDate
---------------------------------
1 | DocumentX | Sep. 2009
2 | DocumentY | Jan. 2010
Table: Parameters
ParID | FacID | SouID | ParamType | Year | Value
------------------------------------------------------
1 | 1 | 1 | PeakProduction | 2010 | 20
2 | 1 | 1 | PeakProduction | 2011 | 30
3 | 1 | 1 | PeakProduction | 2012 | 28
4 | 2 | 1 | PeakProduction | 2011 | 39
5 | 2 | 2 | PeakProduction | 2012 | 55
For each new data collection we just add a new source document and append the Parameters table. In this way, we can always revert back to old data. Furthermore, if additional years are collected, there is no need to add additional columns to any table.
Although the actual setup is more complex, above examples is sufficient to illustrate the problem that I am running into: To enter data into the database I would like to have a single form which resembles the original Excel sheet lay-out, i.e.:
Factory | 2010 | 2011 | 2012 |
------------------------------
A | | | |
B | | | |
Of course, this form will have some drop-down menu to select the source document and parameter type ("PeakProduction" in the example).
My question: With crosstab queries it is easy to create such a view based of existing data in the database, however, entering new values is not allowed. What can I do to make this work, and how?
Should I reconsider the design of my database? Should I work with VBA? Link the Access database with Excel sheets?
Thanks!
Where you are dealing with 2 dimensional data that is normalised into a table it is problematic for the user to maintain using Access. My approach to this has been to use the appropriate tool for the job, which looks like Excel in this case. I have the an excel spreadsheet template for data entry. The user enters the data into that. Then in VBA I open the spreadsheet as an embedded object, retrieve the cell contents and insert it into the table. Something like below.
dim myRec as recordset
dim xlApp as Excel.application
dim xlWrksht as Excel.worksheet
set myRec=currentdb.openrecordset("NameOfTable")
set xlApp=createobject("Excel.Application")
set xlWrksht=xlApp.Open("PathOfWorksheet").Worksheets( "WorksheetNumber")
myrec.addnew
myrec.fields("NameOfFields")=xlWrkSht.cells(1,"A")
......
myRec.update