Is there a preprocessor for json files? [closed] - json

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 2 years ago.
Improve this question
I have some configuration files that I store the complex object values as serialized json. Currently there is a configuration file for each environment (localhost, dev, prod etc.) and for each installation by client. Most of the values are identically for the configurations between environments but not all. So for three environments and four clients I currently have 12 total files to manage.
If this were a web.config file there would be web.config transforms that would solve the problem. If this was c# I'd have compiler preprocessor directives that could be useed to substitute the different values based on the current build configuration.
Does anyone know of anything that works basically this way or have some good suggestion on tried and true ways to proceed? What I would like is to reduce the number of files down to a single instance for each installation that can suffice for each environment.

Configuration of configuration always seems a bit overdone to me, but you could use a properties file for the parts that change, and apache ant's <replace> task to do the substitutions. Something like this:
<replace
file="configure.json"
propertyFile="config-of-config.properties">
<replacefilter
token="#token1#"
property="property.key"/>
</replace>

Jsonnet from Google is a language that with with a super-set syntax based on JSON, adding high level language features that help to model data in JSON fromat. The compilation step produces JSON. I have used it in a project to describe complex deployment environments that inherit from one another at times and that share domain attributes albeit utilizing them differently from one instance to another.
As an example, an instance contains applications, tenant subscriptions for those applications, contracts, destinations and so forth. The values for all of these attributes are objects the recur throughout environments.
Their docs are very thorough and don't miss the std functions because they make for some very powerful data rendering capabilities.

I wrote a Squirrelistic JSON Preprocessor which uses Golang Text Templates syntax to generate JSON files based on parameters provided.
JSON template can include reference to other templates, use conditional logic, comments, variables and everything else which Golang Text templates package provides.

This really comes down to your full stack.
If you're talking about some application that runs solely client-side, with no server-side processing, whatsoever, then there's really no such thing as pre-processing.
You can process the data further before actually using it, but that won't mean that it will be processed prior to the page being served -- it means that people have to sit around, waiting for that to happen before the apps which need that data can be initialized.
The benefit of using JSON, to begin with is that it's just a data-store, and is quite language-agnostic, and quite widely supported, now. So if it's not 100% client-side, there's nothing stopping you from pre-processing in whatever language you're using on the server, and caching those versions of those files, to serve (and cache) to users, based on their need.
If you really, really need a system to do live processing of config-files, on the client-side, and you've gone through the work of creating app-views which load early, but show the user that they're deferring initialization (ie: "loading..."/spinners), then download a second JSON file, which holds all of the needed implementation-specific data (you'll have 12 of these tiny little files, which should be simple to manage), parse both JSON files into JS objects, and extend the large config object with the additional data in the secondary file.
Please note: Use localhost or some other storage facility to cache this, so that for html5-browsers, this longer load only happens one time.

There is one, https://www.npmjs.com/package/json-variables
Conceptually, it is a function which takes a string, JSON contents, sprinkled with specially marked variables and it produces a string with those variables resolved. Same like Sass or Less does for CSS - it's used to DRY up the source code.
Here's an example.
You'd put something like this in JSON file:
{
"firstName": "customer.firstName",
"message": "Hi %%_firstName_%%",
"preheader": "%%_firstName_%%, look what's inside"
}
Notice how it's DRY — single source of truth for the firstName value.
json-variables would process it into:
{
"firstName": "customer.firstName",
"message": "Hi customer.firstName",
"preheader": "customer.firstName, look what's inside"
}
that is, Hi %%_firstName_%% would look for firstName at the root level (but equally, it could be a deeper path, for example, data1.data2.firstName). Resolving also "bubbles up" to the root level, also you can use custom data structures and more.
Missing pieces of a JSON-processing task puzzle are:
Means to merge multiple JSON files, various ways (object-merge-advanced)
Means to orchestrate actions — Gulp is good if you're preferred programming language is JS
Means to get/set values by path (object-path - its notation uses dots only, no brackets key1.key2.array.2 instead of key1.key2.array[2])
Means to maintain the same set of keys across set of JSON files - you add a key in one, it's added on all others (object-fill-missing-keys)
In described case, we can do at least two approaches: one-to-many, or many-to-many.
Former - Gulp could be "baking" many JSON files from one or more JSON-like source files, json-variables DRY-ing up the references.
Later - alternatively, it could be "managed" set of JSON files rendered into set of distribution files — Gulp watches src folder, runs object-fill-missing-keys to normalise schemas, maybe even sorting objects (yes, it's possible, sorted-object).
It all depends how similar is the desired set of JSON files and how values are customised and is it done manually or programmatically.

Related

Questionnaire tool to create config files

I have an application that needs a configuration file with several inputs which depend on the project that is going to be delivered. Things that are included in this conf file are IP's of databases, activating certain functions depending on the customer's needs, changing the values of some title screens, etc... A short example of a file could be something like:
postgresdb=192.156.98.98
transactions.enabled=true
application.name="client-1-logistics"
historicaldb=196.125.125.16
....
This files can become large and it might be difficult to find which parameters must be changed, specially if the configuration process has to be done by an external department.
I was looking into some kind of tool or framework that allows you to create some sort of questionnaire by which the user answers yes or no questions and fills out boxes with specific IP's or messages and get as a result the configuration file needed. This would be much tidier as you could group the questions into sections and has the potential of customising the configuration process with more context on the different parameters.
Does anyone know of such a framework?. How do you handle this kind of complex configuration processes?
The approach I outline below is not exactly what you are looking for, but it might provide some food for thought.
Use a template engine (example, Velocity, or any of the
several dozen listed in Wikipedia) to create a templated
version of your configuration file, containing lots of boilerplate
configuration that won't change, with the occasional
${variable_name} placeholder (the syntax for a placeholder will
vary from one template engine to another).
Write a small metadata file containing variable_name=value
settings.
Write a trivial program that: (a) parses the metadata file and loads
the variable_name=value settings into a Map (the template engine
might refer to the Map as, say, a context object); (b) uses the
template engine to parse the template file; (c)
merges/evaluates/instantiates the parsed template file with the settings in
the Map; and (d) writes the result to the target
configuration file.
You might be able to use steps 1 and 3 above without change. It is only step 2 that you need to adapt to your questionnaire requirements. Instead of a questionnaire, perhaps you could give users a document that explains how to write the metadata file.

best practices for writing to a file from multiple methods

I have a class that contains a bunch of methods for checking data I scrape every week (for things like well-formedness and other errors in gathering the data). Each of these methods performs a test, and then prints out a summary of the test.
I want to print out the output from these tests to a file, but I'm not sure what the best way to do it is. For example...
Should the class hold an instance variable to the file, and each method open/appends/closes the file? (A problem is that methods sometimes call other methods, so this seems kinda messy?)
Should each method get passed the file as a parameter? (Seems messy as well.)
Should each method return a string, and a"central" method that calls all the other tests outputs all these strings to a file?
I'm not really familiar with using logger libraries -- would that be a solution?
My particular context
I have a scraper that pulls data from various websites and stores them in a database. Websites change all the time, so I'm writing a "scrape checker" program that checks my scrapes for various things, like:
number of empty results
length of results
weird characters in results
and so on
So I have methods like:
check_num_empty_results
check_weird_characters
check_scrape (calls a bunch of other checks)
check_scrape_pair (sometimes I want to check pairs of scrapes together, e.g., to match results against each other, so this is different checking each one in isolation)
etc.
I want my "scrape checker" program to print out a file that summarizes all the checks.
Separation of concerns. Write code the focuses on the scraping activity and return the value(s) scraped. Then use aspect oriented programming for logging, which can simplify the problem greatly as the aspect holds the reference to the file or logging API.
Ultimately, it depends on what language you're using.
The first solution makes the most sense if your language permits it. For each instance of the logging class, have a field for the file object that you're reading from/writing to. This is basically equivalent to passing the file object as a parameter to every method.
That said, most mature languages have modules that will do a lot of this work for you; off the top of my sh/awk, Perl, and Python all come to mind as being suited to this task (though if you want to, you could use Java or something else).
Seems like a logging framework would be a perfect solution for this. If you are using Java or .NET, log4j and log4net are pretty much the de-facto standards for that.

What are the benefits and drawbacks of using header files? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
I had some experience on programming languages like Java, C#, Scala as well as some lower level programming language like C, C++, Objective - C.
My observation is that low level languages try separate out header files and implementation files while other higher level programming language never separate it out. Those languages use some identifiers like public, private, protected to try to do the jobs of header files. C++ also have both identifiers and header files as well
I saw one benefit of using header file (in some book like Code Complete), they talk about that using header files, people can never look at our implementation file and it helps with encapsulation.
A drawback is that it creates too many files for me. Sometimes, it looks like verbose.
It is just my thought and I don't know if there are any other benefits and drawbacks that people ever see and work with header file
This question may not relate directly to programming but I think that if I can understand better about programming to interface, design software.
They allow you to distribute the API of a library so the compiler can compile correct code.
As in C, rather than including the whole implementation, you just include the definition of what is in the library when linked.
In this sense, the benefits are mainly for the compiler. Hence you installing a binary library into say /lib and headers into your include search path. You are saying, at runtime, expect these symbols with this calling convention to be available.
When they are not required by the compiler/linker/interpreter then the convention for that language is the best way to do it because that's what other programmers expect to find. Conventional is expected.
Languages such as C# include the ability to inspect libraries for information from the binary blob, hence in many of these languages you don't require headers. Tools such as Cecil for C# also allow you to inspect a library yourself (and even modify it).
In short, some languages remove the benefits of headers and allow a library to be inspected for all the compile-time information required to ensure linking code meets the same interface/api specs.
I'm not sure exactly what question you are asking, so I will try to rephrase it:
What is the benefit of putting public information in a separate (header or interface) file, as opposed to simply marking information as public or private wherever it appears?
The main benefit of having a separate interface or header file is that it reduces the cognitive load on the reader. If you are a trying to understand a large system, you can tackle one implementation file at a time, and you need to read only the interfaces of the other implementations/classes/modules it depends on. This is a major benefit, and languages that do not require separate interface files (such as Java) or cannot even express interfaces in separate files (such as Haskell) often provide tools such as Doxygen or Haddock so that a separate interface, for people to read, is generated from the implementation.
I strongly prefer languages like Standard ML, Objective Caml, and Modula-2/3, where there is a separate interface file available for scrutiny. Having separate header files in C is also good, but not quite as good because in general, the header files cannot be checked independently by the compiler. (C++ header files are less good because they allow private information, such as private fields or the implementations of inline methods, to leak out into the header files, and so the public information becomes diluted.)
It's folklore in the language-design world that for typical statically typed languages, only about 10% of the information in a module is public (measured by lines of code). By putting this information in a separate header file, you reduce the reader's workload by roughly a factor of ten.
They use some identifiers like public, private, protected to do the jobs of header files.
I think you're wrong there: C++ for instance still has public private and protected, but it's common to split the implementation from the interface with a header file (although that doesn't go for function-templates).
I think it's in general a good idea to seperate interface from implementation when you're creating libraries, since you then never expose the inner workings of anything to the client and thus the client can never deliberately make code that depends on the implemenation. The choice if you want to split it is up to you. I must admit that for my own code (small programs I write for myself), I don't use it often.
In context of c The header file implementation brings lots of readability in our program and it becomes easy to understand. If it is the way to write our code systematically and header file brings abstraction,standardization and loose coupling between our main function file(.c) and other (.c) files which we are using.

Testing and mocking with Flex

I am developing a "dumb" front-end, it's an AIR application that interacts with a "smart" LiveCycle server. There are currently about 20 request & response pairs for the application. For many reasons (testing, developing outside the corporate network, etc), we have several XML files of fake data, and if a certain configuration flag is set, the files are loaded, a specific file is parsed and used to create a mock response. Each XML file is a set of responses for different situation, all internally consistent. We currently have about 10 XML files, each corresponding to different situation we can run into. This is probably going to grow to 30-50 XML files.
The current system was developed by me during one of those 90-hour-week release cycles, when we were under duress because LiveCycle was down again and we had a deadline to meet. Most of the minor crap has been cleaned up.
The fake data is in an object called FakeData, with properties like customerType1:XML, customerType2:XML, overdueCustomer1:XML, etc. Then in the FakeData constructor, all of the properties are set like this:
customerType1:XML = FileUtil.loadXML(File.applicationDirectory.resolvePath("fakeData/customerType1.xml");
And whenever you need some fake data (this happens in special FakeDelegates that extend the real LiveCycle Delegates), you get it from an instance of FakeData.
This is awful, for many reasons, but it works. One embarrassing part is that every time you create an instance of FakeData, it reloads all the XML files.
I'm trying to figure out if there's a design pattern that is not Singleton that can handle this more elegantly. The constraints are:
No global instances can be required (currently, all the code dealing with the fake data, including the fake delegates, is pulled out of production builds without any side-effects, and it needs to stay that way). This puts the Factory pattern out of the running.
It can handle multiple objects using the XML data without performance issues.
The XML files are read centrally so that the other code doesn't have to know where the XML files are, and so some preprocessing can be done (like creating a map of certain tag values and the associated XML file).
Design patterns, or other architecture suggestions, would be greatly appreciated.
Take a look at ASMock which was developed by a good friend of mine (and a member here Richard Szalay) and is based on .nets Rhino mocks. We've used it in several production environments now so i can vouch for it's stability.
should be able to get rid of any fake tests (more like integration tests) by using the mock object instead.
Wouldn't it make more sense to do traditional mocking with a mocking framework? Depending on your implementation, it might be possible to set up the Expects by reading the fake-data XML files.
Here is a Google Code project that offers mocking for ActionScript.

What does “Data is just dumb code, and code is just smart data” mean? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I just came across an idea in The Structure And Interpretation of Computer Programs:
Data is just dumb code, and code is just smart data
I fail to understand what it means. Can some one help me to understand it better?
This is one of the fundamental lessons of SICP and one of the most powerful ideas of computer science. It works like this:
What we think of as "code" doesn't actually have the power to do anything by itself. Code defines a program only within a context of interpretation -- outside of that context, it is just a stream of characters. (Really a stream of bits, which is really a stream of electrical impulses. But let's keep it simple.) The meaning of code is defined by the system within which you run it -- and this system just treats your code as data that tells it what you wanted to do. C source code is interpreted by a C compiler as data describing an object file you want it to create. An object file is treated by the loader as data describing some machine instructions you want to queue up for execution. Machine instructions are interpreted by the CPU as data defining the sequence of state transitions it should undergo.
Interpreted languages often contain mechanisms for treating data as code, which means you can pass code into a function in some form and then execute it -- or even generate code at run time:
#!/usr/bin/perl
# Note that the above line explicitly defines the interpretive context for the
# rest of this file. Without the context of a Perl interpreter, this script
# doesn't do anything.
sub foo {
my ($expression) = #_;
# $expression is just a string that happens to be valid Perl
print "$expression = " . eval("$expression") . "\n";
}
foo("1 + 1 + 2 + 3 + 5 + 8"); # sum of first six Fibonacci numbers
foo(join(' + ', map { $_ * $_ } (1..10))); # sum of first ten squares
Some languages like scheme have a concept of "first-class functions", which means that you can treat a function as data and pass it around without evaluating it until you really want to.
The upshot is that the division between "code" and "data" is pretty much arbitrary, a function of perspective only. The lower the level of abstraction, the "smarter" the code has to be: it has to contain more information about how it should be executed. On the other hand, the more information the interpreter supplies, the more dumb the code can be, until it starts to look like data with no smarts at all.
One of the most powerful ways to write code is as a simple description of what you need: Data which will be turned into code describing how to get you what you need by the interpretive context. We call this "declarative programming".
For a concrete example, consider HTML. HTML does not describe a Turing-complete programming language. It is merely structured data. Its structure contains some smarts that let it control the behavior of its interpretive context -- but not a lot of smarts. On the other hand, it contains more smarts than the paragraphs of text that appear on an average web page: Those are pretty dumb data.
In the context of security: Due to buffer overflows, what you thought of as data and thus harmless (such as an image) can become executed as code and p0wn your machine.
In the context of software development: Many developers are very afraid of "hardcoding" things and very keen on extracting parameters that might have to change into configuration files. This is often based on the idea that config files are just "data" and thus can be changed easily (perhapy by customers) without raising the issues (compilation, deployment, testing) that changing anything in the code would.
What these developers don't realize is that since this "data" influences the behaviour of the program, it really is code; it could break the program and the only reason not to require complete testing after such a change is that, if done correctly, the configurable values have a very specific, well-documented effect and any invalid value or a broken file structure will be caught by the program.
However, what all too often happens is that the config file structure becomes a programming language in its own right, complete with control flow and everything - one that's badly documented, has a quirky syntax and parser and which only the most experienced developers in the team can touch without breaking the application completely.
So, in a language like Scheme, even code is treated as first class data. You can treat functions and lambda expressions much like you treat other code, say passing them into other functions and lambda expressions. I recommend continuing with the text as this will all become quite clear.
This is something you should come to understand from writing in a compiler.
One common step in compilers is to transform the program into an abstract syntax tree. Representation will often be like trees such as [+, 2, 3] where + is the root, and 2, 3 are the children.
Lisp languages simply treats this as its data. So there is no separation between data and code which are both lists that look like AST trees.
Code is definitely data, but data is definitely not always code. Let's take a basic example - customer name. It's nothing to do with code, it's a functional (essential), as opposed to a technical (accidental) aspect of an application.
You could probably say that any technical/accidental data is code and that functional/essential data is not.