How do I retrieve the json representation of an azure data factory pipeline? - json

I want to track pipeline changes in source control, and I'm looking for a way to programmatically retrieve the json representation from the ADF.
The .Net routines return the objects, but sadly ToString() does not return json (wouldn't THAT be convenient?), so right now I'm looking at copying the json down by hand (shoot me now!), or possibly trying to recreate the json from the .Net objects (shoot me later!).
Please tell me I'm being dense and there is an obvious way to do this.

You can serialize the object using Newtonsoft Json.
See (https://azure.microsoft.com/en-us/documentation/articles/data-factory-create-data-factories-programmatically/) for how to connect via the ADF SDK
var aadTokenCredentials = new TokenCloudCredentials(ConfigurationManager.AppSettings["SubscriptionId"], GetAuthorizationHeader());
var resourceManagerUri = new Uri(ConfigurationManager.AppSettings["ResourceManagerEndpoint"]);
var manager = new DataFactoryManagementClient(aadTokenCredentials, resourceManagerUri);
var pipeline = manager.Pipelines.Get(resourceGroupName, dataFactoryName, pipelineName);
var pipelineAsJson = JsonConvert.SerializeObject(pipeline.Pipeline, Formatting.Indented);
I was expecting something more complex but looking at the sdk source GitHub it is not doing anything special.

Our team has a deployment tool that takes git changes and deploy them appropriately. Everything is done asynchronously and being controlled and versioned through git.
In a nutshell our deployment has the following flow:
Any completed git merge request triggers a VSO build. This is simply
building the whole solution via MsBuild.
Every successful build is applied a Git tag for tracking of Last Known Good.
Next (if build succeeded) our .net ADFPublisher starts by taking only the changed data factory files and asynchronously publishing them based on their
git operation (modified, add, delete, etc.).
For some failures cases our ADFPublisher will perform a retry.
This whole process (Build + publish) takes ~ 65 seconds and has
already saved us from having several bugs. It also allows us to move
definitions from one environment to another very easily.
Let me know if you think this is something that you will be interested in and I will setup a way to share it with you

Related

Can you preview ASP.NET Core's appsettings.json environment overrides?

In ASP.NET Core, the JsonConfigurationProvider will load configuration from appsettings.json, and then will read in the environment version, appsettings.{Environment}.json, based on what IHostingEnvironment.EnvironmentName is. The environment version can override the values of the base appsettings.json.
Is there any reasonable way to preview what the resulting overridden configuration looks like?
Obviously, you could write unit tests that explicitly test that elements are overridden to your expectations, but that would be a very laborious workaround with upkeep for every time you change a setting. It's not a good solution if you just wanted to validate that you didn't misplace a bracket or misspell an element name.
Back in ASP.NET's web.config transforms, you could simply right-click on a transform in Visual Studio and choose "Preview Transform". There are also many other ways to preview an XSLT transform outside of Visual Studio. Even for web.config parameterization with Parameters.xml, you could at least execute Web Deploy and review the resulting web.config to make sure it came out right.
There does not seem to be any built-in way to preview appsettings.{Environment}.json's effects on the base file in Visual Studio. I haven't been able to find anything outside of VS to help with this either. JSON overriding doesn't appear to be all that commonplace, even though it is now an integral part of ASP.NET Core.
I've figured out you can achieve a preview with Json.NET's Merge function after loading the appsettings files into JObjects.
Here's a simple console app demonstrating this. Provide it the path to where your appsettings files are and it will emit previews of how they'll look in each environment.
static void Main(string[] args)
{
string targetPath = #"C:\path\to\my\app";
// Parse appsettings.json
var baseConfig = ParseAppSettings($#"{targetPath}\appsettings.json");
// Find all appsettings.{env}.json's
var regex = new Regex(#"appsettings\..+\.json");
var environmentConfigs = Directory.GetFiles(targetPath, "*.json")
.Where(path => regex.IsMatch(path));
foreach (var env in environmentConfigs)
{
// Parse appsettings.{env}.json
var transform = ParseAppSettings(env);
// Clone baseConfig since Merge is a void operation
var result = (JObject)baseConfig.DeepClone();
// Merge the two, making sure to overwrite arrays
result.Merge(transform, new JsonMergeSettings
{
MergeArrayHandling = MergeArrayHandling.Replace
});
// Write the preview to file
string dest = $#"{targetPath}\preview-{Path.GetFileName(env)}";
File.WriteAllText(dest, result.ToString());
}
}
private static JObject ParseAppSettings(string path)
=> JObject.Load(new JsonTextReader(new StreamReader(path)));
While this is no guarantee there won't be some other config source won't override these once deployed, this will at least let you validate that the interactions between these two files will be handled correctly.
There's not really a way to do that, but I think a bit about how this actually works would help you understand why.
With config transforms, there was literal file modification, so it's easy enough to "preview" that, showing the resulting file. The config system in ASP.NET Core is completely different.
It's basically just a dictionary. During startup, each registered configuration provider is run in the order it was registered. The provider reads its configuration source, whether that be a JSON file, system environment variables, command line arguments, etc. and builds key-value pairs, which are then added to the main configuration "dictionary". An "override", such as appsettings.{environment}.json, is really just another JSON provider registered after the appsettings.json provider, which obviously uses a different source (JSON file). Since it's registered after, when an existing key is encountered, its value is overwritten, as is typical for anything being added to a dictionary.
In other words, the "preview" would be completed configuration object (dictionary), which is composed of a number of different sources, not just these JSON files, and things like environment variables or command line arguments will override even the environment-specific JSON (since they're registered after that), so you still wouldn't technically know the the environment-specific JSON applied or not, because the value could be coming from another source that overrode that.
You can use the GetDebugView extension method on the IConfigurationRoot with something like
app.UseEndpoints(endpoints =>
{
if(env.IsDevelopment())
{
endpoints.MapGet("/config", ctx =>
{
var config = (Configuration as IConfigurationRoot).GetDebugView();
return ctx.Response.WriteAsync(config);
});
}
});
However, doing this can impose security risks, as it'll expose all your configuration like connection strings so you should enable this only in development.
You can refer to this article by Andrew Lock to understand how it works: https://andrewlock.net/debugging-configuration-values-in-aspnetcore/

'import' a cujojs/wire context into another

I'm looking for a way to realize the following use-case:
I have many modules and each one of them has a wire spec that
exposes its components
To assemble an application, I select the modules and use their wire-spec
The wire-spec of the application is the merge of wire-specs of used
modules: (3.1) I start by 'requiring' the wire-spec of each module
as objects. (3.2) Then, I merge the objects. (3.3) And, finally, I
return the result as the object defining the wire-spec of the
application.
Here is a sample of an application context-spec:
define(["jquery", "module1-wire-spec", "module2-wire-spec"], function(jquery, module1WireSpec, module2WireSpec) {
return jquery.extend(true, module1WireSpec, module2WireSpec);
});
I have read several times wire documentation hoping to find a 'native' way to do the above but I failed so far to find one.
A 'native' way would be a factory like the 'wire' factory but instead of creating a child-context for each module, I'm looking to see the components of each module as direct components of the application context.
Spring, for instance, allows importing a context definition into another one and the result is as if the content of the imported context has been inlined with the importing context.
A new feature has been added to cujojs/wire to allow import of contexts.
As of version 0.10.8, the keyword imports accepts:
a string for a single context import,
or an array for a list of contexts import.
Check here for more details.

spring batch: Dump a set of queries over a database in parallel to flat files

So my scenario drilled down to the essence is as follows:
Essentially, I have a config file containing a set of SQL queries whose result sets need to be exported as CSV files.
Since some queries may return billions of rows, and because something may interrupt the process (bug, crash, ...), I want to use a framework such as spring batch, which gives me restartabilty and job monitoring.
I am using a file based H2 database for persisting spring batch jobs.
So, here are my questions:
Upon creating a Job, I need to provide my RowMapper some initial configuration. So what happens when a job needs to be restarted after a e.g. crash? Concretly:
Is the state of the RowMapper automatically persisted, and upon restart Spring batch will try to restore the object from its database, or
will the RowMapper object be used that is part of the original spring batch XML config file, or
I have to maintain the RowMapper's state using the step's/job's ExecutionContext?
Above question is related to whether there is magic going on when using the spring batch XML configuration, or whether I could as well create all these beans in a programmatic way:
Since I need to parse my own config format into a spring batch job config, I rather just use spring batch's Java classes (beans) and fill them out appropriately, rather attempting to manually write out valid XML. However, if my Job crashes, I would create all the beans myself again. Does spring batch automagically restore the Job state from its database?
If I really need XML, is there a way to serialize a spring-batch JobRepository (or one of these objects) as a spring batch XML config?
Right now, I tried to configure my Step with the following code - but I am unsure if this is the proper way to do this:
Is TaskletStep the way to go?
Is the way I create the chunked reader/writer correct, or is there some other object which I should use instead?
I would have assumed that opening of the reader and writer would occur automatically as part of the JobExecution, but if I don't open these resources prior to running the Job, I get an exception telling me that I need to open them first. Maybe I need to create some other object that manages the resoures (jdbc connection and file handle)?
JdbcCursorItemReader<Foobar> itemReader = new JdbcCursorItemReader<Foobar>();
itemReader.setSql(sqlStr);
itemReader.setDataSource(dataSource);
itemReader.setRowMapper(rowMapper);
itemReader.afterPropertiesSet();
ExecutionContext executionContext = new ExecutionContext();
itemReader.open(executionContext);
FlatFileItemWriter<String> itemWriter = new FlatFileItemWriter<String>();
itemWriter.setLineAggregator(new PassThroughLineAggregator<String>());
itemWriter.setResource(outResource);
itemWriter.afterPropertiesSet();
itemWriter.open(executionContext);
int commitInterval = 50000;
CompletionPolicy completionPolicy = new SimpleCompletionPolicy(commitInterval);
RepeatTemplate repeatTemplate = new RepeatTemplate();
repeatTemplate.setCompletionPolicy(completionPolicy);
RepeatOperations repeatOperations = repeatTemplate;
ChunkProvider<Foobar> chunkProvider = new SimpleChunkProvider<Foobar>(itemReader, repeatOperations);
ItemProcessor<Foobar, String> itemProcessor = new ItemProcessor<Foobar, String>() {
/* Custom implemtation */ };
ChunkProcessor<Foobar> chunkProcessor = new SimpleChunkProcessor<Foobar, String>(itemProcessor, itemWriter);
Tasklet tasklet = new ChunkOrientedTasklet<QuadPattern>(chunkProvider, chunkProcessor); //new SplitFilesTasklet();
TaskletStep taskletStep = new TaskletStep();
taskletStep.setName(taskletName);
taskletStep.setJobRepository(jobRepository);
taskletStep.setTransactionManager(transactionManager);
taskletStep.setTasklet(tasklet);
taskletStep.afterPropertiesSet();
job.addStep(taskletStep);
Most of you questions are really complex and can be difficult give a good answer without write a long paper.
I'm new with spring-batch as you, and I found a lot of really useful info - and all the answers to your questions - reading Spring batch in action: it's completed, well explained, full of example and cover all aspects of framework (reader/writer/processor, job/tasklet/chunk lifecycle/persistence, tx/resources management, job flow, integration with other service, partitioning, restarting/retry, failure management and a lot of interesting things).
Hope to help

Logging different project libraries, with a single logging library

I have a project in Apps script that uses several libraries. The project needed a more complex logger (logging levels, color coding) so I wrote one that outputs to google docs. All is fine and dandy if I immediately print the output to the google doc, when I import the logger in all of the libraries separately. However I noticed that when doing a lot of logging it takes much longer than without. So I am looking for a way to write all of the output in a single go at the end when the main script finishes.
This would require either:
Being able to define the logging library once (in the main file) and somehow accessing this in the attached libs. I can't seem to find a way to get the main projects closure from within the libraries though.
Some sort of singleton logger object. Not sure if this is possible from with a library, I have trouble figuring it out either way.
Extending the built-in Logger to suit my needs, not sure though...
My project looks at follows:
Main Project
Library 1
Library 2
Library 3
Library 4
This is how I use my current logger:
var logger = new BetterLogger(/* logging level */);
logger.warn('this is a warning');
Thanks!
Instead of writing to the file at each logged message (which is the source of your slow down), you could write your log messages to the Logger Library's ScriptDB instance and add a .write() method to your logger that will output the messages in one go. Your logger constructor can take a messageGroup parameter which can serve as a unique identifier for the lines you would like to write. This would also allow you to use different files for logging output.
As you build your messages into proper output to write to the file (don't write each line individually, batch operations are your friend), you might want to remove the message from the ScriptDB. However, it might also be a nice place to pull back old logs.
Your message object might look something like this:
{
message: "My message",
color: "red",
messageGroup: "groupName",
level: 25,
timeStamp: new Date().getTime(), //ScriptDB won't take date objects natively
loggingFile: "Document Key"
}
The query would look like:
var db = ScriptDb.getMyDb();
var results = db.query({messageGroup: "groupName"}).sortBy("timeStamp",db.NUMERIC);

MEF: "Unable to load one or more of the requested types. Retrieve the LoaderExceptions for more information"

Scenario: I am using Managed Extensibility Framework to load plugins (exports) at runtime based on an interface contract defined in a separate dll. In my Visual Studio solution, I have 3 different projects: The host application, a class library (defining the interface - "IPlugin") and another class library implementing the interface (the export - "MyPlugin.dll").
The host looks for exports in its own root directory, so during testing, I build the whole solution and copy Plugin.dll from the Plugin class library bin/release folder to the host's debug directory so that the host's DirectoryCatalog will find it and be able to add it to the CompositionContainer. Plugin.dll is not automatically copied after each rebuild, so I do that manually each time I've made changes to the contract/implementation.
However, a couple of times I've run the host application without having copied (an updated) Plugin.dll first, and it has thrown an exception during composition:
Unable to load one or more of the requested types. Retrieve the LoaderExceptions for more information
This is of course due to the fact that the Plugin.dll it's trying to import from implements a different version of IPlugin, where the property/method signatures don't match. Although it's easy to avoid this in a controlled and monitored environment, by simply avoiding (duh) obsolete IPlugin implementations in the plugin folder, I cannot rely on such assumptions in the production environment, where legacy plugins could be encountered.
The problem is that this exception effectively botches the whole Compose action and no exports are imported. I would have preferred that the mismatching IPlugin implementations are simply ignored, so that other exports in the catalog(s), implementing the correct version of IPlugin, are still imported.
Is there a way to accomplish this? I'm thinking either of several potential options:
There is a flag to set on the CompositionContainer ("ignore failing imports") prior to or when calling Compose
There is a similar flag to specify on the <ImportMany()> attribute
There is a way to "hook" on to the iteration process underlying Compose(), and be able to deal with each (failed) import individually
Using strong name signing to somehow only look for imports implementing the current version of IPlugin
Ideas?
I have also run into a similar problem.
If you are sure that you want to ignore such "bad" assemblies, then the solution is to call AssemblyCatalog.Parts.ToArray() right after creating each assembly catalog. This will trigger the ReflectionTypeLoadException which you mention. You then have a chance to catch the exception and ignore the bad assembly.
When you have created AssemblyCatalog objects for all the "good" assemblies, you can aggregate them in an AggregateCatalog and pass that to the CompositionContainer constructor.
This issue can be caused by several factors (any exceptions on the loaded assemblies), like the exception says, look at the ExceptionLoader to (hopefully) get some idea
Another problem/solution that I found, is when using DirectoryCatalog, if you don't specify the second parameter "searchPattern", MEF will load ALL the dlls in that folder (including third party), and start looking for export types, that can also cause this issue, a solution is to have a convention name on all the assemblies that export types, and specify that in the DirectoryCatalog constructor, I use *_Plugin.dll, that way MEF will only load assemblies that contain exported types
In my case MEF was loading a NHibernate dll and throwing some assembly version error on the LoaderException (this error can happen with any of the dlls in the directory), this approach solved the problem
Here is an example of above mentioned methods:
var di = new DirectoryInfo(Server.MapPath("../../bin/"));
if (!di.Exists) throw new Exception("Folder not exists: " + di.FullName);
var dlls = di.GetFileSystemInfos("*.dll");
AggregateCatalog agc = new AggregateCatalog();
foreach (var fi in dlls)
{
try
{
var ac = new AssemblyCatalog(Assembly.LoadFile(fi.FullName));
var parts = ac.Parts.ToArray(); // throws ReflectionTypeLoadException
agc.Catalogs.Add(ac);
}
catch (ReflectionTypeLoadException ex)
{
Elmah.ErrorSignal.FromCurrentContext().Raise(ex);
}
}
CompositionContainer cc = new CompositionContainer(agc);
_providers = cc.GetExports<IDataExchangeProvider>();