I'd like to automate a procedure some. Basically, what I do is import a few spreadsheets from Excel, delete the old spreadsheets that I previously imported, and then change a few queries to reflect the title of the new imports. And then I change the name of the queries to reflect that I've changed them.
I suppose I could make this a bit easier by keeping the imported documents the same name as the old ones, so I'm open to doing that, but that still leaves changing the queries. That's not too difficult, either. The name stays pretty much the same, except the reports I'm working with are dated. I wish I could just do a "find and replace" in the SQL editor, but I don't think there's anything like that.
I'm open to forms, macros, or visual basic. Just about anything.
I've just been doing everything manually.
Assuming I have correctly understood the setup, there are a few ways in which this could be automated, without the need to continually modify the SQL of the queries which operate on the imported spreadsheet.
Following the import, you could either execute an append query to transfer the data into a known pre-existing table (after deleting any existing data from the table), avoiding the need to modify any of your other queries. Alternatively, you could rename the name of the imported table.
The task is then reduced to identifying the name of the imported table, given that it will vary for each import.
If the name of the spreasheet follows logical rules (you mention that the sheets are dated), then perhaps you could calculate the anticipated name based on the date on which the import occurs.
Alternatively, you could store a list of the tables present in your database and then query this list for additions following the import to identify the name of the imported table.
Related
What is the best way to organize a big database.
The way it works is that only I am allowed to touch or modify the database but interns help sometimes to collect data, we used to have the whole system excel based, back than we had the macro which by choosing 2 files it will integrate and mark in colors the changes.
How can I create something friendly to use which will update by pressing a button and also will show changes!! I am familiar with the update query, however:
it doesn’t track any changes.
I want to know other options.
To sum up the way of processing is:
I have the database and I need to split some data to smaller files so other employees will work on.
Then I will collect the files and integrate with the existing database, but since we are all human mistakes can happened that’s why I want to be able to track easily changes.
The updates are going to happen often. When I will give the intern a temp. table The possible changes are for example: address, phone number, price, they will do those researches based on current data which they will find online which information has been changed, and they will change the info which is on the temp. table, That is why I want to be able to know what exactly did they found out. Lets say if Product A (product ID1234) used to cost 10$ and today its 12$ from the same supplier. I just want to know and to see that the price for product ID1234 has been changed. Not only to have it updated to the back end database. For quality assurance I need to track which new input they did in relation to the product ID. (some times input by someone else which was done in wrong format or wrong column could affect big time on the quality of the reports)
So this was the explanation for what I need the reports
So in order to make those temp. tables, I want to create a form for it that by choosing region, category etc. and then clicking on a button it will automatically select the relevant records from the database, create a new table/access-file and then copy the selected records to the temp. table. So someone else could work on it...
Next thing is that it would be nice to know how can I create a template for tables, by template I mean to standardize by validation rules. some fields I'd like to have dropbox menu, some fields ready mask for phone number.... etc.
Final part, after they made the changes and saved the file (the temp. table which they were working on), I want to be able to update the back-end database via clicking on a button...
Looking forward to get the best solution!
Thanks in advance J
Michael
Okay for the temp tables thing:
why not split your database in a backend part (having all the tables) and a frontend part which contain the forms and tables the interns need? I'm guessing mostly it is going to be the same so you can even create multiple different frontend's to give to different interns incase they need other tables. There are a lot of articles out there about splitting a database and linking tables.
Then the thing about the record changes not sure is this is what your looking for but it could help, i haven't used it myself so not sure what it exacly does. But this may help you a bit.
http://support.microsoft.com/kb/197592
I would consider taking a look at the BeforeUpdate event for the form. You can trap the old and new values of textboxes if the form is bound to a table. You could loop through all the controls on your form and check for Me.Control <> Me.Control.OldValue. If they don't match, write both values to an auditing table so you can go back and check whenever you want to. I would include the following fields in your auditing table:
ChangeDate
TableName
ControlName
OldValue
NewValue
Then you can query that table any time you want to see what has changed.
I have created an SSIS package in Visual Studio 2008 that take's a SQL select statement and populates a excel sheet, the excel sheet is duplicated from a template file with all the formatting and cells set up.
The issue I am having is that no matter what I do I can not change the excel destination formatting to anything other than general, it overwrites the source destination and puts decimal numbers a '1.50 always adding the ' to fields.
i have tried inserting a row as per some suggestions as people think this is where SSIS scans for formatting types. However the field always comes up as Unicode string [DT_WSTR] in the advance editor and always defaults back if i change them.
Please can someone help! Happy to provide any additional info if I've missed anything, I've seen some posts with the same issue, but none of the solutions seem to be working or i'm missing something else.
****Update****
Figured out the reason behind none of the recommended fixes working, this was due to using a select statement in the excel destination instead of selecting the table.
This essentially wipes out any change if changing formatting.
So what I decided in the end was to create a data only sheet(which is hidden) using the basic table data access mode, then reference that in a front end sheet with all the formatting all ready in and using a =value(C1) formula to return just the value. Protected the cells to hide the formula's.
I have found that, when I change a Data Flow Task in SSIS, that exports to (or imports from) Excel, I often have to "start over", or SSIS will somehow retain the some of the properties of the old Data Flow Task: data types, column positions... For me, that often means:
1) Deleting the Source and Destination objects within the Data Flow Task, AND ALSO deleting/recreating the Connection Object for the Excel spreadsheet. I've done this enough times that I now save myself time by copy/pasting my Source and Destination names to-and-from a Notepad window, and I choose names that remind me of the objects they referred to (the table and file, respectively).
2) Remembering to rebuild the ARROW's metadata, too: after you change and/or recreate the Source object, you have to remember to DOUBLE-CLICK THE ARROW NEXT, before re-creating the Destination. That shows the arrow's metadata, but it also creates/updates the arrow's metadata.
3) When recreating the destination, DELETE THE SPREADSHEET from prior runs (or rename or move, etc.), and have SSIS recreate it. (In your new destination object, there's a button to create that spreadsheet, using the metadata.)
If you still have problems after the above, take a look at your data types... make sure you've picked SQL datatypes that SSIS supports.
At the link below, about 2/3rds of the way down the page, you'll find a table "Mapping of Integration Services Data Types to Database Data Types", with SSIS data types in the 1st column ("Data Type"), and your T-SQL equivalent data types in the 3rd column ("SQL Server (SqlClient)"):
Integration Services Data Types
Hope that helps...
I have created a CSV from a set of files in a directory that are numbered incrementally:
img1_1.jpg, img1_2.jpg ... img1_1999.jpg, img1_2000.jpg
The CSV output is like so:
filename, datetime
eg:
img1_1.JPG,2011-05-11 09:16:33.000000000
img1_3.jpg,2011-05-11 10:10:55.000000000
img1_4.jpg,2011-05-11 10:17:31.000000000
img1_6.jpg,2011-05-11 10:58:37.000000000
The problem is, there are a number of files missing in the listing, as some of the files don't exist. As a result, when imported, the actual row number does not match the file number.
Can anyone think of a reasonably efficient way to insert the missing rows so that the row number and filename matches up other than manually inserting rows for the missing ones? (There are over 800 missing rows).
Background
A previous programmer developed an uploader script and did not save the creation time of the mysql record in the database. I figured the easiest way to find the creation time for the majority of the records would be to output a directory listing of all the files and combine them in a spreadsheet.
You exactly need to do what you write in your comment to answer #tadman.
A text parser script to inject the missing lines with e.g. a date/time value that reflects the record is an empty one, i.e. there is no real data is behind it (e.g. date it to 1950-01-01 00:00:00). When it is done, bulk import the CSV.I think this must be the best and most efficient solution.
Also, think about any future insert/delete/update events might occur to your data.
That would possibly break the chain you initially have had, so you might prefer instead, to introduce a numeric field for the jpegs IDs (and index that field), and leave the PK "as is" (auto increment).
In this case you can avoid CSV manipulation, as well as being chained to your AUTO PK (means: you will not get in trouble if a new jpeg arrives with an ID which was previously deleted, or existing ID, etc).
So the solution really depends on how you want to use this table in the future. If you give more details, I am sure the community can come up with even more ideas.
If it's a one-time thing, it might be easiest to open up your csv in a spreadsheet.
If your table above is in sheet1, you could put something like the following in sheet2 (this is openoffice, but there are similar functions for Excel)
pre_filename | filename | datetime
img1_1 | = A2&".JPG" | =OFFSET(Sheet1.$B$1;MATCH(B2;Sheet1.$A$2:$A$4;0);0)
You should be able to select the three cells above and drag them down to however many you need.
I am writing the SSIS package to import the data from *.csv files to the SQL 2008 DB. The problem is that one of the file contains the duplicate records in the csv file and I want to extract only the distinct values from that source. Please see the image below.
Unfortunately, the generated files are not under my control and it is owned by the third party and I could not change the way they generated.
I did use the LookUp Component. But it only checks the existing data against the incoming data. It does not check the duplicate records in the incoming data.
I believe the sort component gives an option to remove duplicate rows.
Depends on how serious you want to get about the duplicates. Do you need a record of what was duplicated or is it enough to just get rid of them? Sort component will get rid of dups on the sort field. However, the dups may have different data in the other fields and then you want a differnt strategy. Usually I load all to staging tables and clean up from there. I send the dupes removed to an exception table (we have to answer a lot of questions from our customers about why things don't match what they sent) and I often use a set of business rules (and use either an execute SQl or data flow tasks to enforce the rules) to determine which one to pick if there are duplicates in one area but not another (say two business addresses when we can only store 1). I also make sure the client is aware of how we determine which of the two to pick.
Use SORT tool for that from Toolbox, then click on it. You will get all available input columns.
Check the column and change sortType direction and then check "remove rows with duplicate sort value".
Bring in the data from the csv file the way it is, then dedup it after it's loaded.
It'll be easier to debug, too.
I used Aggregate Component and Group By both QualificationID and UnitID. If you want, you can also use Sort Component too. Perhaps, my information might help others.
What is the best way to re-use reports on different tables / datasets?
I have a number of reports built in BIRT, which get their data from a flat (un-normalized) MySQL table, the data which in turn has been imported from an excel sheet.
In BIRT, I've constructed my query like this, such that I can change the field names and re-use the report:
SELECT * FROM
(SELECT index as "Index", name as "Name", param1 as "First Parameter" FROM mytable) t
However, then when I switch to a new client's data, I need to change the query to the new data source and this doesn't seem sustainable or anywhere near a good practice.
So... what is a good practice?
Is this a reporting issue, or a database-design issue?
Do I create a standard view that the report connects to?
If I have a standard view, do I create a different view with the same structure for each data table, or keep replacing the view with a reference to the correct data table each time I run the report?
What's annoying is the excel sheets keep changing - new columns are added, and different clients name their data differently. Even if I can standardize this, I'd store different client data in different tables... so would I need to create a different report for each client, or pass in the table name to the report?
There are two ways and the path you choose is really dictated by how much flexibility you have architecturally.
First, you are on the right track by renaming your selected columns to a common name since that name is what is used to bind the data to the control on the report. Have you considered a stored procedure to access the data? This removes the query from the report and allows you to set up the stored proc on any database to return the necessary columns. If you cannot off-load to a stored proc, you can always rely on altering the query text at run-time. Because BIRT reports are not compiled (they are XML) you can change the query based on parameters and have it executed for each run of the design. Look at the onCreate event for the Data Set and you can access this.queryText and do any dynamic string substitution you need via JavaScript. Hidden parameters are a good way to help alter/tune the query. If you build the Data Set correctly, the changing of the underlying data could be as easy as changing the Data Source and then re-associating the Data Set to the new Data Source (in the edit data set window). I have done this MANY times and it works well. If you are going down this route, I would add the Data Source(s), Data Set(s) and any controls that they provide data to a report library. With the library you can use the controls in many reports and maintain them in one spot. If you update the library, all the reports using the library get updated as well.
Alternatively, if you want to really commit to a fully re-usable strategy that allows you to build a library of reusable components you could check out the free Reusable Component Library at BIRT Exchange (Reusable Component Library). In my opinion this strategy would give you the re-use you are looking for but at the expense of maintainability. It is abstraction to the point of obfuscation. It requires totally generic names for columns and controls that make debugging very difficult. While it would not be my first choice (the option above would be) others have used it successfully so I thought I would include it here since it directly speaks to your question.