I have a dataflow, where there is a DB source and a flat text file destination(delimited by pipe '|').
The DB source is picking up the SQL query from a variable.
Problem is that if my DB field size of say, firstname and lastname are 30 characters, i get the output as(space represented by dots)
saurabh......................|kumar.......................
What I need is the fields to be trimmed, so that the actual output is
saurabh|kumar
I have more than 40 columns to write, and I would not want to manually insert RTRIM after every column in my BIG sql query :(
I should add that the source can have upto 50,000 rows returned. I was thinking of putting a script component in between, but processing every row might have a performance impact.
Any ideas?
You have quite a few options, but some will obviously be undesirable or impossible to do because of your situation.
First, I'll assume that trailing spaces in the data are because the data types for the source columns are CHAR or NCHAR. You can change the data types in the source database to VARCHAR or NVARCHAR. This probably isn't a good idea.
If the data types in the source data are VARCHAR or NVARCHAR and the trailing spaces are in the data, you can update the data to remove the trailing spaces. This is probably not appealing either.
So, you have SSIS and the best place to handle this is in the data flow. Unfortunately, you must develop a solution for each column that has the trailing spaces. I don't think you'll find a quick and simple "fix all columns" solution.
You can do the data trimming with a script transformation, but you must write the code to do the work. Or, you can use a Derived Column transformation component. In the Derived Column transformation you would add a derived column for each column that needs trimming. For example, you would have a firstname column and a lastname column. The derived column value would replace the existing column value.
In the Derived Column transformation you would use SSIS expression syntax to trim the data. The firstname and lastname trim expressions would be
RTRIM(firstname)
RTRIM(lastname)
Performance will probably be better for the Derived Column transformation, but it may not differ much from the script solution. However, the Derived Column transformation will probably be easier to read and understand later.
You could try using a script component in the data flow? Unlike the control flow, a data-flow script component has inputs & outputs.
Look at this example in MSDN: http://msdn.microsoft.com/en-us/library/ms345160.aspx
If you can iterate each column of the row (?) as it flows through the script component, you could do a .Net trim on the column's data, then pass the trimmed row to the output.
Advantage there of course is it will trim future rows you add later.
Just an idea, I haven't tried this myself. Do post back if it works.
See this:
http://masstrimmer.codeplex.com
It will trim rows by using parallelism.
Related
I'm looking to make it easier for the clients to search for stuff like phone/mobile/fax numbers. For that to happen I want to strip both the search value and the relevant columns in my database of any non-numeric characters before comparing them. I'm using these functions to get only the numeric elements of the strings in mysql but they slow my queries down to a crawl when I use them.
Is there any way to do it without blowing my run times sky high?
The reason why your query times are exploding is because any use of such functions disables you from using any index. Since you are not searching directly on a field, but on the output of a function, there is no way mySQL can use an index to execute the query.
This is in addition to the fact that you have to compute the function output for each record.
The best way around these runtimes, if you have access and permission to do so, is to add a new column with the content you're filtering. Add a WRITE trigger to fill the column with the stripped values, run a script that updates the field once for all records. Add an index and include the new column. Then, in your application, use the new column for searches for a number value of a telephone. Downsides are table schema alterations and added code for the business logic and/or data abstraction layer.
This is a question of converting strings from DB2 to SQL Server.
On DB2 you can have a column that contains a mix of strings and binary data (e.g. using REDEFINS in COBOL to combine string and decimal values into a DB2 column).
This will have unpredictable results during data replication as the binary zero (0x00) is treated as string-terminator (in the C family of software languages).
Both SQL Server and DB2 are able to store binary zero in the middle of fixed length char columns without any issue.
Has anyone any experiences with this problem? The way I see it, the only way to fix it, is to amend the COBOL program and the database schema, so if you have a column of 14 chars, where the first 10 is a string and the last 4 a decimal, split this up into two columns containing one "part" each.
If you want to just transfer the data 1:1, I'd just create a binary(x) field of equal length, of varbinary(x) in case the length differs.
If you need to easily access the stored string and decimal values, you could create a number of computed columns that extract the string/decimal values from the binary(x) field and represents them as normal columns. This would allow you to do an easy 1:1 migration while having simple and strongly typed access to the contents.
The optimal way would be to create strongly typed columns on the SQL Server database and then perform the actual migration either in COBOL or whatever script/system is used to perform the one time migration. You could still store a binary(x) to save the original value, in case a conversion error occurs, or you need to present the original value to the COBOL system.
In Access 2003 I need to display numbers like this while keeping the leading zeroes:
080000
090000
070000
What data type should I use for this?
Use a string (or text, or varchar, or whatever string variant your particular RDBMS uses) and pad it with whatever character you want ("0") that you need.
Key question:
Are the leading zeros meaningful data, or just formatting?
For instance, 07086 is my zip code, and the leading zero is meaningful, so US zip codes have to be stored as text.
Are the values '1', '01', '001' and '0001' considered to be unique, legal values or are they considered to be duplicates?
If the leading zero is not meaningful in your table, and is just there for formatting, then store the data as a number and format with leading zeros as needed for display purposes.
You can use the Format() function to do your formatting, as in this example query:
SELECT Format(number_field, "000000") AS number_with_leading_zeroes
FROM YourTable;
Also, number storage and indexing in all database engines I know of are more efficient than text storage and indexing, so with large data sets (100s of thousands of records and more), the performance drag of using text data type for numeric data can be quite large.
Last of all, if you need to do calculations on the data, you want them to be stored as numbers.
The key is to start from how the data is going to be used and choose your data type accordingly. One should worry about formatting only at presentation time (in forms and reports).
Appearance should never drive the choice of data types in the fields in your table.
If your real data looks like your examples and has a fixed number of digits, just store the data in a numeric field and use the format/input mask attributes of the column in Access table design display them with the padded zeros.
Unless you have a variable number of leading zeros there is no reason to store them and it is generally a bad idea. unecessarily using a text type can hurt performance, make it easier to introduce anomalous data, and make it harder to query the database.
Fixed width character with Unicode compression with a CHECK constraint to ensure exactly six numeric characters e.g. ANSI-92 Query Mode syntax:
CREATE TABLE IDs
(
ID CHAR(6) WITH COMPRESSION NOT NULL
CONSTRAINT uq__IDs UNIQUE,
CONSTRAINT ID__must_be_ten_numeric_chars
CHECK (ID ALIKE '[0-9][0-9][0-9][0-9][0-9][0-9]')
);
Do you need to retain them as numbers within the table (i.e. do think you will need to do aggregations within queries - such as SUM etc)?
If not then a text/string datatype will suffice.
If you DO then perhaps you need 2 fields.
to store the number [i.e. 80000] and
to store some meta-data about how the value needs to be displayed
perhaps some sort of mask or formatting pattern [e.g. '000000'].
You can then use the above pattern string to format the display of the number
if you're using a .NET language you can use System.String.Format() or System.Object.ToString()
if you're using Access forms/reports then Access uses very similar string formatting patterns in it's UI controls.
I'm using MS SQL 2008 server, and I have a column that stores a word document ".docx".
Within the word document is a definition (ie: a term). I need to sort the definitions upon returning a dataset.
so basically...
SELECT * FROM DocumentsTable
Order By DefinitionsColumn ASC.
So my problem is how can this be accomplished, the binary comlumn only sorts on the binary value and not the word document content?
I was wondering if fulltext search/index would work. I already have that working, just not sure if I can use it with ORDER BY.
-Thanking all in advance.
I think you'd need to add another column, and populate this with the term from inside the docx. If it's possible at all to get SQL to read the docx (maybe with a custom .net function?) then it's going to be pretty slow.
Better to populate and maintain another column.
You have a couple options that may or may not be acceptable.
store the string definition contents
of the file in a field along side
the binary file column in the
record.
Only store the string definition in the record, and build the .docx
file at runtime for use within your
application.
When I use (in MS Access 2003 SP3):
SELECT * INTO NewTable FROM SomeQuery;
MEMO fields are converted to TEXT fields (which are limited to 255 characters), so longer texts are cut.
The output of the query itself is fine and not truncated; the text is cut only in the new table that is created.
An update: I managed to focus the problem on the IIF statement in my query.
When I remove the IIF, the new table contains the MEMO field, but with the IIF the same field appears as TEXT. The weird thing is that the query output shows the long strings in full, even when the IIF is being used. Only when it is 'copied' to the new table (by the INTO statement), the text is cut.
Do you know of any problems that IIF may cause to MEMO fields?
Thank you for your answers.
You have here the current workarounds for avoiding any Truncation of Memo fields.
In your case, that may be the result of the query's Properties Sheet including a "set Unique" Values to Yes (which forces comparison of Memo fields, triggering the truncation), or the Format property of the field, e.g. forcing display in upper case (>) or lower case (<).
What Access are you using, and what format are you saving your document into ?
In a Access 2000 compatible format, the cell are in Excel5.0/95 format: 255 characters max.
Do you have any other (non-Memo) field with lengthy value you could try to select, just to see if it also gets truncated ?
If the output is fine, but the export in a new table does truncate the Memo fields, could you check the following:
In the export dialog under advanced, even though it looks like you can only inlude the name, if you click very carefully to expand column that don't appear, you can change the data type to memo.
I have just tested in A2K3 with a make table and appending a memo field. I had no difficulty getting it to append full data to a memo field.
Perhaps you could post the SQL for the query you're using to populate your table. If you're sorting (or grouping) on the memo fields that could do it, because sorting on memo fields is supposed to truncate them to 255 characters (though in the test I just ran on A2K3 SP3 with all the latest post-SP3 patches, mere sorting doesn't truncate but GROUP BY does).
Another issue is that it's usually not advisable to have a Make Table query in a production app. Anything that's happening repeatedly enough that you programmed for it really ought to be appending to a pre-defined table, instead of replacing an existing table. For one, a pre-defined table can have indexes defined on it, which makes it much more efficient to use after it's been populated. Sure, you have to delete existing records before appending your new data, but the benefit is pretty big in terms of indexing. And, yes, you could redefine indexes each time you run your Make Table query, but, well, if it's too much trouble to delete existing data, isn't it even more work to add indexes to the newly-created table?
I hardly ever use Make Table queries except when I'm manipulating data that I'm massaging for some other purpose. It's not always predictable what data types you'll end up with in a target table because it is partly dependent on the data in your source table. That alone makes it inadvisable to use them in most situations.
SP3 of 2003 is notorious, it may be related to that. There is a hotfix:
http://support.microsoft.com/default.aspx/kb/945674