How to create Synthea demographic County files? - json

I am using Synthea to generate synthetic populations using the default county demographics for Massachusetts.
How are these json config files created? Specifically, how could I create them for a different US State?

First, you need to obtain several publicly available data files and place them into your ./resources folder.
I think you answered some of your own question over in this GitHub issue. But in case someone finds the question here, you need four different files.
“subcounty population estimates for towns and cities”
https://www.census.gov/data/datasets/2016/demo/popest/total-cities-and-towns.html
“county population estimates by age, gender, race, ethnicity”
https://www.census.gov/data/datasets/2016/demo/popest/counties-detail.html
“income data”
https://factfinder.census.gov/faces/tableservices/jsf/pages/productview.xhtml?pid=ACS_15_5YR_S1901&prodType=table
“Education data”
https://factfinder.census.gov/faces/tableservices/jsf/pages/productview.xhtml?pid=ACS_15_5YR_S1501&prodType=table
Once you obtain the files for the US State you are interested in:
Copy the files into your ./resources folder
git checkout other_usa_states The ability to process other states is not currently within the master branch in GitHub.
bundle exec rake synthea:census This will process the files in ./resources (you may need to rename the files) and generate the JSON configuration files into ./config
bundle exec rake synthea:generate[XXX.json] where XXX.json is the name of the county file within the ./config folder that you want to generate.
Caveats:
Geospatial lat/lon information will not be generated without additional data.
Zip codes outside of Massachusetts are stubbed to XXXXX
Hospital and healthcare facilities will seem odd, since only organizations in Massachusetts are included by default. It will seem like everyone in town travels across the country to go see their doctor in Massachusetts.
Those issues will be fixed in the future.

Related

How will the DAML Code affecting two distinct DA Nodes be deployed and how will its integrity maintained?

I am looking for DA’s recommendation/best practices regarding writing and deploying DAML code and object (.daml and .dar) in production grade solutions.
Let us take a scenario - Central Authority (CA) operating node may issue a new role under a contract to Participant 1 (P1) by writing a simple DAML code, below are few questions related to the DAML deployment –
a. Assuming DAML code will be written by CA, can we say that only CA needs to have this code and its build on its node and CA will simply execute the contract workflow allowing Party on P1 Node to simply accept/reject the role without having to know the content of the DAML code (business logic and other contract templates) written by CA ?
b. Will the DAML code file (.daml) written by CA Node needs to be transmitted to Participant 1 (P1) node so that P1 can verify and agree with the DAML file code (contract templates, parties and choices) and put the code and its build (.dar) into its node as well?
c. If the answer to above question is yes, how will the integrity of the DAML code be maintained e.g. what if DAML code is changed by P1 or CA at the time of deployment, which may cause conflict later?
The contract model, in the form of a dar file has to be supplied to all nodes that participate in the workflows modeled in that dar file.
A dar file can contain multiple DAML "packages" and each package is identified by its name and a hash.
On ledger, contract types (called template) are fully qualified, including package hash. If you change your templates, the package hash changes and thus the new templates are seen as completely different from the old ones by the ledger.
To change an existing contract model, you have to upgrade existing contracts using a DAML workflow. Of course, all signatories of the existing contracts need to agree to the upgrade workflow. You can only unilaterally upgrade data that you are in full control of. In the crypto currency world, you can consider all miners as signatories. Either they all agree on an upgrade or they hard fork, leading to two slightly different models of the same currency.
This sort of model upgrade process in DAML is described in detail here: https://github.com/digital-asset/ex-upgrade

What is the difference between composer network and composer identity and also registries and networks

In hyperledger composer network permissions, we always come across these terminologies
Composer Network and Composer Identity.
Network access and Business access.
Registries and Networks.
What is the difference between them, how far I know registries are for participants and assets, you can access participants and assets using registries and define permissions?
For permissions, you can read about the ACLs here -> https://hyperledger.github.io/composer/reference/acl_language.html
'Composer Network' represents the business network entity. 'Composer Identity' refers to a specific blockchain identity that is mapped to a single Participant - defined in a Participant Registry that is contained within the business network in question.
Registries maintain a particular type of view of an Asset, Participant . Registries are also maintained by Composer for Identity or Historical transactions. It allows someone in that business network (given the right authority) to see the current status and history of the ledger, and Registries classify that much like a database table might do - ie depends on the level of details required (eg. Backroom Traders (Participant), Front Office Traders (Participant), Metal Commodities (Assets), Agricultural Commodities (Asset) etc etc) - or could just be rolled up as 'Traders'(Participant) and 'Commodities' (Asset) types if less detail is required. The salient point is you store Participant - or Asset Instances - in their respective type registries.
See the tutorials for examples of Assets and Participants in action:
https://hyperledger.github.io/composer/tutorials/queries.html
https://hyperledger.github.io/composer/tutorials/developer-guide.html

SpamAssassin: Site Wide Bayes not working?

Long ago I implemented site-wide bayes filtering as per http://wiki.apache.org/spamassassin/SiteWideBayesSetup.
I don’t think it ever worked, and I certainly find that my spam scores are always negative, with BAYES_00 suggesting that Bayes wasn’t used at all.
Here is what I have in my local.cf file:
bayes_path /etc/mail/spamassassin/bayes/bayes
bayes_file_mode 0777
When I run sa-learn I find instead that the tokens are stored in individual home directories.
What is the correct method to get this working?
Supplementary Question: if I can get this working, can I combine the various bayes_tok and other files?
If you get BAYES_00 results, then Bayes is indeed working as it has classified the email as being ham. A neutral result would be BAYES_50. You just need to train the Bayes database properly.
If sa-learn creates/updates bayes files under your home directory, then it is either not reading the desired local.cf file, or the bayes_path gets overridden by a user-specific configuration file (e.g. /root/.spamassassin/user_config).
You could try one of the following:
run sa-learn under the same user account as spamassassin is executed
specify an explicit path to sa-learn, i.e.
sa-learn --dbpath /etc/mail/spamassassin/bayes/bayes
use the -D option to see what is really going on, i.e. which configuration files are being read, etc.
If/when you get it working you can generally not combine the various database files. There are at least a bayes_toks and a bayes_seen file, because one contains the tokens learned and the other has email Message-Id:s and associated training status (spam/ham). Then there can be an optional bayes_journal if you use deferred syncing.
Further details available in the manpage for sa-learn:
https://spamassassin.apache.org/full/3.4.x/doc/sa-learn.html

How does google drive stores file?

When I upload a file on google drive, does google segment that file into smaller pieces and upload them on different servers. For eg. I upload a fileA on my drive. Does google divide fileA into smaller chunks and upload them on different servers serverA, serverB, ...etc ?
Or is it so that a given file is uploaded on a particular server without breaking it into smaller chunks. For eg. I upload fileA then it will be stored on serverA or any other server, without segmentation.
Or is it so that all the files I upload on my drive are stored at one particular server?
The service hosted by google is supported by GFS, the google filesystem.
I was not sure until I spot this post from whom seems to be a google employee.
Edit:
The fact that Google use GFS is actually an inference based on the fact that the filesystem fits perfectly the model of google drive. An article has been published on arstechnica, which describe that in more details.
Of course no one expect former and current google employees knows what is behind, but the fact that someone who claims being one of them specifically points the OP of the question I linked to GFS, is another strong incentive to believe that GFS (and maybe BigTable) are backing up Google Drive.
Edit 2:
As I was seeing downvotes and that the topic actually interest me a lot, I decided to enrich this answer with the following arguments:
1) Google infrastructure strategy is to build clusters of inexpensive commodity hardware, with in mind the fact that it will fail. This was one of the motivation behind he GFS infrastructure.
2) The intuition that, as GFS totally fits Google's infrastructures, services and internal softwares (among which MapReduce is an important actor), has lead other people to the conclusion that it is backing up pretty much all their services. See :
http://www.slideshare.net/hasanveldstra/the-anatomy-of-the-google-architecture-fina-lv11.
Also, an interview of Jeff Dean, backup a lot this intuition, and explain 1) better
3) That doesn't have any actual meaning, but I found it fun: some user actually ended up having the extension .gfs in Drive (It'd be unexpected, but somehow unfortunate that this gfs actually refer to MS Groove files)
I don' think we'll ever see an actual former/current employee validate this statement, and of course much of the details of what is backing up Drive will stay hidden, but the intuition has strong roots.

How do I merge 3 files to create a panel data?

I have three different Stata files (each for three different years) and I want to estimate a fixed effects regression. My guess is that I need to merge those files in order to test my regression, but how do I do it? How do I give the time identification for the same variable in each of these files?
Typically, you don't merge (put the files side by side) such files, but append (put them on top of one another) them. Typically, the year or wave variable is already included, but when that is not the case you need to generate them before you merge the files. For more, just type in Stata help merge, help append, and help generate.
Preparing datasets should be exactly documented, so using the GUI is not the way to do this. Instead, you should do this using a .do file. For a good introduction on how to do good and reproducible research with Stata, see:
Long, J. S. (2009). The workflow of data analysis using Stata. College Station, TX: Stata Press.