MySQL Database Security for Sensitive Data - mysql

I am working on enhancing security in our MySQL database. Specifically, the database stores health information for our clients' patients (so-called PHI), and we would like to separate the patients' names and other identifying information from their health data. What would be some approaches to this issue?
I've thought of one idea: maintain one key for tying the various identifying data together, and another key for linking the health information. These would be mapped to one another with a special "coded key" that would be available only when a clinical user is logged in. Does anyone have thoughts on that approach?

Combining personally identifiable information with health information (dx, symptom, provider, payment, etc) is PHI. Here's a more detailed discussion of PHI. PII can include all sorts of things
In terms of protection of PHI, HIPAA is not prescriptive about it. It's one of the major problems with HIPAA and the reason HITRUST is catching on in the industry. Your reasoning from a security standpoint makes a lot of sense but ultimately security is different from compliance.
At Catalyze we've been through 2 HIPAA audits and 1 HITRUST audit/assessment, all using 3rd party auditors. We architected our APIs to segment PII and health data similarly to how you described. Our auditors agreed with Ollie that the segmentation was unnecessary but felt it was an additional way to mitigate the risk of a breach of PHI. At the end of the day we treat all data on our platform as PHI and protect it accordingly, so for us it wasn't a matter of segmenting data in order to be compliant. In our final audit reports, segmenting PII from health data did not address specific requirements of HIPAA but did get mentioned as part of our overall security posture.
Hope that helps!

Related

Considerations for binary seralizations (Protobuf, CBOR, MessagePack, etc.) for a long term archive data format

In discussions for a next generation scientific data format a need for some kind of JSON-like data structures (logical grouping of fieldshas been identified. Additionally, it would be preferable to leverage an existing encoding instead of using a custom binary structure. For serialization formats there are many options. Any guidance or insight from those that have experience with these kinds of encodings is appreciated.
Requirements: In our format, data need to be packed in records, normally no bigger than 4096-bytes. Each record must be independently usable. The data must be readable for decades to come. Data archiving and exchange is done by storing and transmitting a sequence of records. Data corruption must only effect the corrupted records, leaving all others in the file/stream/object readable.
Priorities (roughly in order) are:
stability, long term archive usage
performance, mostly read
ability to store opaque blobs
size
simplicity
broad software (aka library) support
stream-ability, transmitted and readable as a record is generated (if possible)
We have started to look at Protobuf (Protocol Buffers RFC), CBOR (RFC) and a bit at MessagePack.
Any information from those with experience that would help us determine the best fit or, more importantly, avoid pitfalls and dead-ends, would be greatly appreciated.
Thanks in advance!
Late answer but: You may want to decide if you want a schema-based or self-describing format. Amazon Ion overview talks about some of the pros and cons of these design decisions, plus this other ION ( completely different from Amazon Ion ).
Neither of those fully meet your criteria, But these articles should list a few criteria you might want to consider. Obviously actually being a standard and being adopted are far higher guarantees of longevity than any technical design criteria
Your goal of recovery from data corruption almost certainly something that should be addressed in a separate architectural layer from the matter of encoding of the records. How many records to pack in to a blob/file/stream is really more related to how many records you can afford to sequentially read through before finding the one you might need.
An optimal solution to storage corruption depends on what kind of corruption you consider likely. For example, if you store data on spinning disks your best protection might be different from if you store data on tape. But the details of that are really not an application-level concern. It's better to abstract/outsource that sort of concern.
Modern cloud-based data storage services provide extremely robust protection against corruption, measured in the industry as "durability". For example, even the Microsoft Azure's lowest-cost storage option, Locally Redundant Storage (LRS), stores at least three different copies of any data received, and maintains at least that level of protection for as long as you want. If any copy gets damaged, another is made from one of undamaged ones ASAP. That results in an annual "durability" of 11 nines (99.999999999% durability), and that's the "low-cost" option at Microsoft. The normal redundancy plan, Geo Redundant Storage (GRS), offers durability exceeding 16 nines. See Azure Storage redundancy.
According to Wasabi, eleven-nines durability means that if you have 1 million files stored, you might lose one file every 659,000 years. You are about 411 times more likely to get hit by a meteor than losing a file.
P.S. I previously worked on the Microsoft Azure Storage team, so that's the service I know the best. However, I trust that other cloud-storage options (e.g. Wasabi and Amazon's S3) offer similar durability protection, e.g. Amazon S3 Standard and Wasabi hot storage are like Azure LRS: eleven nines durability. If you are not worried about a meteor strike, you can rest assured you these services won't lose your data anytime soon.

What "other features" could be incorporated into a train database?

This is a mini project for DBMS course. My task is to develop a Database for management of passenger trains.
I'm designing tables for Customers, Trains, Ticket Booking (via Telephone & Internet), Origins and Destinations.
He said, we are free to incorporate other features in our Database Model. Some of the features that we can include are as listed:
Ad-hoc Querying
Data Mining
Demographic Passenger Mapping
Origin and Destination Mapping
I've no clue about what these features mean. I know about datamining but unable to apply it in this context. Can any one kindly expand these features or suggest new ideas?
EDIT: What is Ad-hoc Querying? Give an example in this context.
Data mining would incorporate extracting useful facts/figures out of the data obtained by your system & stored in the database. For example, data mining might discover that trains between city x and y are always 5 minutes late, or is never at more than 50% capacity, etc. So you may wish to develop some tools or scripts that automatically run and generate statistics (graphs are best) which display this information and highlight unusual trends. In the given example, the schedulers could then analyse why the trains are always late (e.g., maybe the train speedos are wrong?).
Both points 3. and 4. are a subset of data mining imo. There is a huge amount of metrics you could try to measure, it is just really whatever you can think of. If you specify what type of data you are going to collect, that will make making suggestions easier.
Basically, data mining just means "sort the data to find interesting facts".
Based on comment below you could look for,
% of internet vs. phone sales
popular destinations & origins
customers age/sex/location
usage vs. time of day
...

Can someone explain an Enterprise Service Bus to me in non-buzzspeak?

Some of our partners are telling us that our software needs to interact with an Enterprise Service Bus. After researching this a bit, my instinct is to say that this is just buzz speak for saying that we need to have a platform-indpendent way to pass messages back and forth. I'm just trying to get a feel for what our partners are telling us. Am I correct in dismissing our partners' request as just trying to get our software to be more buzzword-compliant, or are they telling us something we should listen to (even if encoded in buzzspeak)?
Although ESB is based on messaging, it is not "just" messaging and not just a buzzword.
So if you start with plain old async messaging, the early networks tended to be very point-to-point. You had to wire up (i.e. configure through some admin interface) each connection and each pair of destinations and if you dared to move anything around invariably something broke. Because the connection points were wired by hand these networks never achieved high connection density. The incremental cost was too high and did not scale. There was also a lot of access control and policy embedded in the topology. The lack of connection density actually favors this approach to security, even though it inhibits flexibility.
The ESB attempts to address these issues with...
Run-time resolution of destinations/services/resources
Location transparency
Any-to-any connectivity and maximum connection density
Architected for redundancy, horizontal scalability, failover
Policy, access control, rules externalized from topology
Logical messaging network layer implemented atop the physical messaging network layer
Common namespace
So when your customer asks for ESB compatibility, they want things like the above. From an application standpoint, this also implies...
Avoiding message affinities such as requirements to process in strict sequence or to address requests only to specific nodes instead of to a generic network destination
Ability to resolve destinations dynamically at run time (i.e. add another instance of a queue and it automatically starts getting traffic, delete one and traffic routes to the remaining nodes)
Requestor and provider apps decoupled from knowing where each other "lives". Requestor makes one connection, regardless of how many services it might need to call
Authorize by policy rather than by topology
Service provider apps able to recognize and handle dupes (as per JMS spec, see "functional duplicate" due to session handling)
Ability to run multiple active instances of a service provider application
Instrument the service provider applications so you can inquire on the status of the network or perform a test without sending an actual transaction
On the other hand, if your client cannot articulate these things then they may just want to be able to check a box that says "works with the ESB."
I'll try & keep it buzzword free (but a buzz acronym may creep in).
When services/applications/mainframes/etc... want to integrate (so send messages to each other) you can end up with quite a mess. An ESB hides that mess inside of itself (or itselves) so that an organisation can pretend that there isn't a mess and that it has something manageable. It then wraps a whole load of features around this to make this box even more enticing to the senior people in an organisation who'll make the decision to buy such an expensive product. These people typically will want to introduce a large initiative which costs a lot of money to prove that they are 'doing something' and know how to spend large amounts of money. If this is an SOA initiative then vendors various will have told them that an ESB is required to make the vendors vision of what SOA is work (typically once the number of services which they might want passes a trivial number).
So an ESB is:
A vehicle for vendors to make lots of money;
A vehicle for consultants to make lots of money;
A way for senior executives (IT Directors & the like) to show they can spend lots of money;
A box to hide a mess in;
A total PITA for a technical team to work with.
After researching this a bit, my
instinct is to say that this is just
buzz speak for saying that we need to
have a platform-indpendent way to pass
messages back and forth
You are correct, partially because the term ESB is always nice word that fits well with another buzzword, legitimate or not - which is governance (i.e. helps you manage who is accessing your endpoints and reporting metrics - Metrics btw is what all the suits like to see, so that may be a contributor)
Another reason they might want a platform neutral device is so that any services they consume are always exposed as endpoints from a central location, instead of a specific machine resource. The ESB makes the actual physical endpoints of your services irrelevant to them, which they shouldn't care much about anyway, but that enables you to move services around however they will only consume the ESB Endpoint.
Apart from a centralized repository for Discovery, an ESB also makes side by side versioning of services easier. If I had a choice and my company had the budget, we would have purchased IBM's x150 appliance :(
Thirdly, a lot of more advanced buses, like SoftwareAG's product if I recall, is natively able to expose legacy data, like from data sitting on main frames as services without the need for coding via adapters
I don't know if their intent is to leverage all the benefits an ESB provides, or as you said, make it buzzword compliant.
After researching this a bit, my instinct is to say that this is just buzz speak for saying that we need to have a platform-indpendent way to pass messages back and forth
That's about right. Sometimes an ESB will go a little bit further and include additional features like message delivery guarantees, confirmation/acknowledgement messages, and so on. The presence of an ESB also usually explicitly or implicitly creates a new protocol where none previously existed, which is another important consideration. (That is, some sort of standard or interface has to be set regarding the format of the messages.)
Am I correct in dismissing our partners' request as just trying to get our software to be more buzzword-compliant, or are they telling us something we should listen to (even if encoded in buzzspeak)?
You should always listen to your customers, even if it initially sounds silly. It's usually worth at least spending the effort to decide what's going on. Reading between the lines, what your partners probably mean is that they want a way for your service to integrate more easily with their own services and products.
An enterprise service bus handles the messaging between systems in a standard way. This allows you to communicate with the bus in the same exact way across all your platforms and the bus handles the actual translating to individual communication mechanism needed for the specific endpoint. This means you write all your code to talk to the bus using a common messaging scheme and the bus handles taking your common scheme and translating it so the endpoint understands it.
The simplest explanation is to explain what it provides:
For many years companies acquired different platforms and technologies to achieve specific functions in their business from Finance to HR. These systems needed to talk to each other to share data so middleware became the glue that allowed them to connect. Before the business knew it, they were paying for support and maint on each of these systems and the middleware. As needs in the business changed departments decided to create their own custom solutions to address special needs rather than try to make the aging solutions flexible enough to meet their needs. Before they knew it, they were paying to support and maintain the legacy systems, middleware, and custom solutions. With new laws like Sarbanes Oxley, companies need to have better information available for reporting purposes. A single view requires that they capture data from all of the systems. In addition, CIOs are now being pressured to lower costs and increase customer service. One obvious solution is the eliminate redudant systems, expensive support and maint contracts, and high cost legacy solutions which require specialists to support. Moving to a new platform allows for this, but there needs to be a transition. There are no turnkey solutions that can replicate what the business does. To address the needs for moving information around they go with SOA because it allows for information access through a generic entity. If I ask for AllEmployees from the service bus it gets them whether it is from 15 HR systems or 1. When the 15 HR systems becomes 1 system the call and result does not change, just how it was done behind the scenes. The Service Bus concept standardizes the flow of information and allows IT managers to conduct transitions behind the bus with no long term effect on upstream users.

How to use a DHT for a social trading environment

I'm trying to understand if a DHT can be used to solve a problem I'm working on:
I have a trading environment where professional option traders can get an increase in their risk limit by requesting that fellow traders lend them some of their risk limit. The lending trader can either search for traders with certain risk parameters which are part of every trader's profile, i.e. Greeks, or the lending trader can subscribe to requests from certain traders who are looking for risk.
I want this environment to be scalable and decentralized, but I don't know how traders can search for specific profile parameters when the data is contained in a DHT. Could anybody explain how this can be done?
Update:
An example that might make it easier to understand might be SO, but instead of running as a web application, the Risk Exchange runs as a desktop application on each trader's workstation. The request for risk are like questions (which may be tagged by contract, exchange, etc) and each user has a profile which shows their history of requests, their return on borrowed risk, etc.
Obviously the "exchange" can be run on a server, but I was hoping to decentralize it and make it scalable so that the system may support an arbitrary number of traders. How can I search for keywords, tags, and other data pertaining to a trader's profile if this information is stored in a distributed hash table?
Your question holds a contradiction in my ears. DHT is a great way of distributing data in a decentralized manner, but cannot provide the nodes with an information overview. This means that any overview action, such as questioning the network for certain data, will have to be done at a centralized collection point. Solutions to this contradiction has been created, but their fault tolerance does not match a critical system such as financial trading.
So my answer would be to use a centralized server to hold an overview cache of the DHT network.

How do I explain APIs to a non-technical audience?

A little background: I have the opportunity to present the idea of a public API to the management of a large car sharing company in my country. Currently, the only options to book a car are a very slow web interface and a hard to reach call center. So I'm excited of the possiblity of writing my own search interface, integrating this functionality into other products and applications etc.
The problem: Due to the special nature of this company, I'll first have to get my proposal trough a comission, which is entirely made up of non-technical and rather conservative people. How do I explain the concept of an API to such an audience?
Don't explain technical details like an API. State the business problem and your solution to the business problem - and how it would impact their bottom line.
For years, sales people have based pitches on two things: Features and Benefit. Each feature should have an associated benefit (to somebody, and preferably everybody). In this case, you're apparently planning to break what's basically a monolithic application into (at least) two pieces: a front end and a back end. The obvious benefits are that 1) each works independently, so development of each is easier. 2) different people can develop the different pieces, 3) it's easier to increase capacity by simply buying more hardware.
Though you haven't said it explicitly, I'd guess one intent is to publicly document the API. This allows outside developers to take over (at least some) development of the front-end code (often for free, no less) while you retain control over the parts that are crucial to your business process. You can more easily [allow others to] add new front-end code to address new market segments while retaining security/certainty that the underlying business process won't be disturbed in the process.
HardCode's answer is correct in that you should really should concentrate on the business issues and benefits.
However, if you really feel you need to explain something you could use the medical receptionist analogue.
A medical practice has it's own patient database and appointment scheduling system used by it's admin and medical staff. This might be pretty complex internally.
However when you want to book an appointment as a patient you talk to the receptionist with a simple set of commands - 'I want an appointment', 'I want to see doctor X', 'I feel sick' and they interface to their systems based on your medical history, the symptoms presented and resource availability to give you an appointment - '4:30pm tomorrow' - in simple language.
So, roughly speaking using the receptionist is analogous to an exterior program using an API. It allows you to interact with a complex system to get the information you need without having to deal with the internal complexities.
They'll be able to understand the benefit of having a mobile phone app that can interact with the booking system, and an API is a necessary component of that. The second benefit of the API being public is that you won't necessarily have to write that app, someone else will be able to (whether or not they actually do is another question, of course).
You should explain which use cases will be improved by your project proposal. An what benefits they can expect, like customer satisfaction.