We have a set of template files which can be copied by users and modified later. We have a unique constraint on the "name" field, so when some user copies a template file say "File 1" we add it as "Copy of File 1" and if he copies a template file say "File 2" we add it as "Copy of File 2" (i.e. we add an appropriate prefix, when we copy these files) so we don't violate the unique constraint.
But if he adds the same file "File 1" again we run into a unique constraint violation error, what kind of naming convention should I follow, so that I can make it intuitive enough for the end user.
You could add the date of when the copy was made to the filename. Adjust the precision to the time frame least likely to cause a problem.
e.g. File1 -> File 1 - Copy 2010-12-29 0017
Also, I prefer suffixes to prefixes for copies so that the copy is stored next to the original.
Do the something similar what the explorer of win7 does when you drag-and-drop-copy a file within the same folder:
index.html
index - Copy.html
index - Copy (2).html
index - Copy (3).html
Related
I have 350 files of data with each containing about 4,000 rows. There are 3,000 unique rows but some rows are duplicated e.g.
"2021-02-02",20.1,99,0,3.4
"2021-02-03",22.6,95,0,2.9
"2021-02-04",18.8,90,0,5.2
"2021-02-02",20.1,99,0,3.4
"2021-02-03",22.6,95,0,2.9
"2021-02-05",21.9,96,0.8,4.2
"2021-02-06",20.8,95,0,3.3
I will like to remove only the duplicate lines in each of the 350 files. However, the duplicate lines are different in each file. i.e., some files may have other dates duplicated apart from the sample shown. The duplicate lines are random and not in any particular order. I used Line Operations in Notepad++ to sort the lines in ascending order and then remove duplicates. It works okay for one file but it will take a long time repeating this step 350 times.
As mentioned in comments a script in your favorite scripting language is the best way.
But you may have a look at the screenshots below and try for your needs.
I assume you have all files or part of them in one directory. Please think about a backup copy for your test.
Open one file in your workspace
Open the dialog e.g. by STRG+F
Try for your needs Find What: ^(.*?)$\s+?^(?=.*^\1$)
Choose Regular Expression and matches newline
Open Find in Files tab e.g. by STRG+Shift+F
Replace with: Nothing
Set Filter
Set Directory
Press Replace in Files (at your own risk!)
Before:
After:
I have to copy files from source to destination, here the tricky part is, i have many source files located in different shared paths, i need to copy those input files to particular destination folders.
For Example:
Spath1/a.txt --> Dpath1/, spath2/b.txt --> Dpath2/, spath3/c.txt --> Dpath3/, Etc..
Can i use a table to map the input files/source paths and destination folder and use foreach loop to attain the solution?
kindly post your suggestsions.
Welcome to SO Harun
Yes, you can use a table in this case. First add an an object-type variable to the package, let's call it #folders and a second string-type variable, lets call this as #currentSrc and lastly third string-variable we call #currentDst. Then add Execute SQL task to get the list of folders. Set the result set to "Full result set" and add a reference to #folders-variable in the Result Set tab.
Add a For Each loop, set the Enumerator as "Foreach ADO Enumerator", #folders as the ADO object source variable and "Rows in the first table" as Enumeration mode. Add a reference to #currenSrc and #currenDst-variables in the "Variable Mappings" tab.
Now you can use the File System Task to copy files from sources to destinations based on the data you have on a table in the database (make sure to reference #currentSrc and #currentDst variables in the File System task).
When I create the SSIS package it requires a file to be referenced to pick up the files metadata. For example the column headers will be ColumnA, ColumnB.
I have always assumed that these column names need to be present in the file for it to be loaded. Recently business, for whatever reason, changed one of the column names in the file to something else so the file contains ColumnA, NotColumnB. When the SSIS package runs it ignores this and loads the file. I assumed that it would fail. Is my assumption correct and there is something weird going on or is my assumption incorrect, if so please let me know why.
I have changed the column names in a few other packages that load data from a file and they also dont care what the column names are
Click on the flat file source, and press F4 to show the properties tab. There are a property called ValidateExternalMetadata change it to True.
For more information check the following answer:
Detect new column in source not mapped to destination and fail in SSIS
Update 1
It looks like that flat file connection manager has no validation engine and the metadata defined is used at configuration time to configure the mappings between the data file and the database.
Why Does't SSIS Flat File Data Check If Columns Names or Order Have Changed? What is best way to check?
Flat file destination columns data types validation
I'm working on a Ssis package and would like to know, how I can achieve the following:
I want to move files from a drop folder into a process folder and I want to implement the following rule:
If file does not exist in archive move file to process and archive.
If file exists in archive drop file (don't archive and don't move to process).
The test "if" exists must be based on file name and time stamp (when raw file got created).
Any ideas?
You can do this by simple way which I have done this in few days back.
1) Create a variable FileName(string), FileExists(boolean)
2) Drag the File System Task and based on your condition you can Copy/Move/Delete file or folder.
3) In my case based on the time frame I archive the file which is move the file from one folder to another by adding one more variable name DestinationFolder (string).
4) The condition I applied is in Precedence Constraint (right click on properties or double click Precedence constraint editor then expression and constraint and give the expression as #fileexists == TRUE or FALSE).
This should work just fine.
Is there a method by which one can read just the first record of a file, i.e., to read header information so that a decision can be made whther or not to process the remainder of the file?
I know that with the split transformation component one can write an expression that will ignore all of the rows besides the header based on a key word in the header. I would rather not go that route as that is inefficiently reading every record in the file.
Specifically, is there script component logic that I can implement to close the file
and end the dataflow after the first record has been read?
See this post from Todd McDermid:
Basically, you would set up a Foreach
Container to loop over the files in
your directory. Inside the Foreach,
you would determine the "file type" -
perhaps by creating a variable with a
long-winded expression on it that
pulls apart your file name and assumes
the a "file type" value - then passes
control on to one of five Data Flows
via conditional connectors.
(Double-click on the standard green
connector, change it's Evaluation
Operation to Expression and
Constraint, and set the expression to
be "file_type_variable =
".) Then each Data Flow
picks apart one "file type".