I have a ccnet project that uses a file labeller for reporting the build number. The problem is that the build process increments the value in this file, but the labeller event always occurs at the beginning of the build no matter where I sequence the task in the config xml file. I suspect that internally CCNET is processing the entire file and sequences certain events in a pre-defined order.
Is there a way to sequence this event to occur after the file has been updated? In my particular case I need to run a clean on all .NET projects before changing the value of the target file.
One thought I have but have never played with, is to create a project that cleans the VB.NET projects and updates the build number. Once this event has completed then the main project can kick off. This process would have to check for modifications for CI.
Perhaps someone has a better solution?
Thank you.
Dan
I developed a solution by creating a separate BuildLabeller file and adding a powershell routine that just increments the build version value of that version number and then uses Set-Content to overwrite the file. The build labeller file will always be one value higher than the current build so that when the next build starts, it is the correct value for the file labeller task for ccnet.
Related
I am using SSIS2017 and part of what I am doing involves running several (30ish) SQL scripts to be output into flat files into the same folder. My question is, to do this do I have to create 30 New File Connections or is there a way to define the folder I want all the outputs to go to, and have them saved there?
I am only really thinking of keeping a tidy Connection Manager tab. If there's a more efficient way to do it than 30something file connections that would be great?
A data flow is tightly bound to the columns and types defined within for performance reasons.
If your use case is "I need to generate an extract of sales by year for the past 30ish" then yes, you can make do with a single Flat File Connection Manager because the columns and data types will not change - you're simply segmenting the data.
However, if your use case is "I need to extract Sales, Employees, Addresses, etc" then you will need a Flat File Connection Manager (and preferably a data flow) per entity/data shape.
It's my experience that you would be nicely served by designing this as 30ish packages (SQL Source -> Flat File Destination) with an overall orchestrator package that uses Execute Package Task to run the dependent processes. Top benefits
You can have a team of developers work on the individual packages
Packages can be re-run individually in the event of failure
Better performance
Being me, I'd also look at Biml and see whether you can't just script all that out.
Addressing comments
To future proof location info, I'd define a project parameter of something like BaseFilePath (assuming the most probably change is that dev I use a path of something like C:\ssisdata\input\file1.txt, C:\ssisdata\input\file3.csv and then production would be \server\share\input\file1.txt or E:\someplace\for\data\file1.txt) which I would populate with the dev value C:\ssisdata\input and then assign the value of \\server\share\input for the production to the project via configuration.
The crucial piece would be to ensure that an Expression exists on the Flat File Connection Manager's ConnectionString property to driven, in part, by the parameter's value. Again, being a programmatically lazy person, I have a Variable named CurrentFilePath with an expression like #[Project$::BaseFilePath] + "\\file1.csv"
The FFCM then uses #[User::CurrentFilePath] to ensure I write the file to the correct location. And since I create 1 package per extract, I don't have to worry about creating a Variable per flat file connection manager as it's all the same pattern.
The git repo for my Django app includes several .tsv files which contain the initial entries to populate my app's database. During app setup, these items are imported into the app's SQLite database. The SQLite database is not stored in the app's git repo.
During normal app usage, I plan to add more items to the database by using the admin panel. However I also want to get these entries saved as fixtures in the app repo. I was thinking that a JSON file might be ideal for this purpose, since it is text-based and so will work with the git version control. These files would then become more fixtures for the app, which would be imported upon initial configuration.
How can I configure my app so that any time I add new entries to the Admin panel, a copy of that entry is saved in a JSON file as well?
I know that you can use the manage.py dumpdata command to dump the entire database to JSON, but I do not want the entire database, I just want JSON for new entries of specific database tables/models.
I was thinking that I could try to hack the save method on the model to try and write a JSON representation of the item to file, but I am not sure if this is ideal.
Is there a better way to do this?
Overriding save method for something that can go wrong or that can take more than it should is not recommended. You usually override save when changes are simple and important.
You can use signals but in your case it's too much work. You can instead write a function to do this for you but still not exactly after you saved the data to database. You can do it right away but it's too much process unless it's so important for your file to be updated.
I recommend using something like celery to run a function in the background separated from all of your django functions. You can call it on every data update or each hour for example and edit your backup file. You can even create a table to monitor the update process.
Which solution is the best is highly depended you and how important the data is. And keep in mind that editing a file can be a heavy process too so creating a backup like everyday might be a better idea anyway.
Overall, I am looking for feedback regarding two different design options of running a master package.
I have one package that Agent calls that runs a bunch of packages that process data (I think we are up to about 50 now).
The original design was to group packages into smaller chunks called directorates which call the actual packages. Sample below:
A few perceptions I see (and experienced) with this approach is that:
1. Every package has to open (even if it is unnecessary to run ie no file present)
2. #1 adds so much time for the process to complete
3. Runs in parallel for sure
So I developed a new approach which will only run the packages that have the necessary files and logs the attempt if not. It is so much cleaner and you don't need all the file connections for each package to run since you are iterating through them.
I am not sure it runs in parallel (I actually doubt it).
I am adding the dataflow that populates the ADO Object that is being iterated in foreach to demonstrate the files being processed.
Note: Usually in DEV environment there are not many files to be processed, however, when deploying to TEST and PROD there will be most files present to be processed.
Can I get some feedback on these two different approaches?
Anyone that provides productive feedback will recieve upvotes!!!
I would go with modified first approach ie something like Inside package, use Script task to check if files are present in destination or not.
For instance :
Create a Script task and a variable.
Inside script task, write a code similar to the image below(Logic is, if file is found then flag it as true, else flag is false) :
Now constraint the execution of DFT by using this flag as shown below :
Only con is, you'll have to make changes in 50 packages, but this is a one time activity. Your parallel execution will remain intact.
I will go with 2nd approach as its cleaner and easy to debug.
Here are the suggestions to improve 2nd approach :
Create a Control table with all package Names, Enable/Disable flag, FileAvailable Flag
Create a Poll package which will go through files and sets flag and package flag accordingly
Loop through this Control table and run only those are enabled and having file.
I would like a way for individual users to send a repo path to a hudson server and have the server start a build of that repo. I don't want to leave behind a trail of dynamically created job configuration. I'd like to start multiple simultaneous instances of the same job. Obviously this requires that the workspaces different for the different instances. I believe this isn't possible using any of the current extensions. I'm open to different approaches to what I'm trying to accomplish.
I just want the hudson server to be able to receive requests for builds from outside sources, and start them as long as there are free executors. I want the build configuration to be the same for all the builds except the location of the repo. I don't want to have dozens of identical jobs sitting around with automatically generated names.
Is there anyone out there using Hudson or Jenkins for something like this? How do you set it up? I guess with enough scripting I could dynamically create the necessary job configuration through the CLI API from a script, and then destroy it when it's done. But I want to keep the artifacts around, so destroying the job when it's done running is an issue. I really don't want to write and maintain my own extension.
This should be pretty straightforward to do with Jenkins without requiring any plugins, though it depends on the type of SCM that you use.
It's worth upgrading from Hudson in any case; there have certainly been improvements to the features required to support your use case in the many releases since becoming Jenkins.
You want to pass the repo path as a parameter to your build, so you should select the "This build is parameterized" option in the build config. There you can add a string parameter called REPO_PATH or similar.
Next, where you specify where code is checked-out from, replace the path with ${REPO_PATH}.
If you are checking out the code — or otherwise need access to the repo path — from a script, the variable will automatically be added to your environment, so you can refer to ${REPO_PATH} from your shell script or Ant file.
At this point, when pressing Build Now, you will be prompted to enter a repo path before the build will start. As mentioned in the wiki page above, you can call the buildWithParameters URL to start a build directly with the desired parameter, e.g. http://server/job/myjob/buildWithParameters?REPO_PATH=foo
Finally, if you want builds to execute concurrently, Jenkins can manage this for you by creating temporary workspaces for concurrent builds. Just enable the option
"Execute concurrent builds if necessary" in your job config.
The artifacts will be available, the same as any other Jenkins build. Though probably you want to manage how many recent artifacts are kept; this can be done by checking "Discard Old Builds", and then under Advanced…, you can select enter a value for "Max # of builds to keep with artifacts".
Is it possible to configure hudson to save a copy of all the files per build? As in everytime a build is triggered it grabs the files from the repository and stores them in a directory and builds it. Then when another build is triggered it grabs the files from the repository and stores it in a different directory to keep the build copies separate instead of having it update the same copy over and over again?
You can always use the archive plugin and set the filter to include as much as you want or you can use the clone workspace plugin. I don't see too much value in keeping all files, except if you want to run tests on the code that are so time consuming that you want to give a first feedback after the build and afterward run the tests in a separate job.