AES Key Generation using HSM and HKDF - hsm

We have safenet HSM. Our system requires us to generate AES-256 keys. Which approach shall I take ?
Instruct HSM to generate AES keys.
Use HSM to create an input key material and use HKDF to derive keys.
HSM is supposed to be able to generate high quality keys. Is there a need for the second approach ?

Looking at the RFC 5869 for HKDF, in the Information section :
HKDF follows the "extract-then-expand" paradigm, where the KDF logically consists of two modules. The first stage takes the input keying material and "extracts" from it a fixed-length pseudorandom key K. The second stage "expands" the key K into several additional pseudorandom keys (the output of the KDF).
Implying that if you were to use the HRNG (Hardware Random Number Generator) of the HSM and then used the HKDF, you would essentially be injecting Pseudo-Randomness into your AES Key, which totally defeats the purpose.
The option #1 becomes then obviously the right choice, otherwise the whole point of using the RNG of the HSM is defeated.

I would like to present brief purpose but before that: Yes, HSM can already generate high quality AES keys. Internally, HSM uses its own Random Number Generator to achieve entropy. However, you can seed HSM with your own random numbers (normally, you can use a true random number generator(s) like QRNG from Idquantique). Using hardware like QRNG increases the randomness of your keys.
Next, deriving (like HKDF) can generate strong keys as well but in general, deriving techniques are used to generate session symmetric keys i.e., are used to perform some cryptographic operation (like encryption/decryption) for a particular context/entity.
For eg: You have a master key (like an AES key, intern generated by key exchange mechanism) and then you can derive this master key to generate session keys to encrypt/decrypt different entities. This was you are using different session keys to perform crypto graphic operations based on the context.
So for the 2nd point: either you use own keying material (hoping your key algos are strong enough) or use master key approach as outlined above.
In any case, you need to choose the mechanism based on the goal you are trying to achieve.

Related

Saving Thales keys to different location than KMDATA/local

I'm generating cryptographic keys using Thales HSMs. The encrypted keys are stored to /opt/nfast/kmdata/local. Since I may need to generate a very important number of keys (over 20 000 keys), I thought storing all the keys to a single directory won't be the best option (I'm mainly afraid of performance issues).
I would like to either split the local directory to sub directories or ideally store the keys to a RDBMS database.
Is there any "standard" way to update the default HSM behavior ?
Ask Thales support about it (my bet is that it is not possible to change).
There are some ideas how you might deal with this situation:
1. OS level
use OS and filesystem suitable for storing that many files in a single directory
2. Application level
use key diversification instead of key generation (if possible for your use case) -- i.e. if you need to provide keys for thousands of entities use a master key and diversify all keys for your entities using this master key and some diversification data (e.g. entity identity / serial number / etc.). This way you don't need to store thousands of keys as you simply diversify them on-demand. Remember to carefully analyze if you can use key diversification at all (as there are some consequences)
store those keys encrypted outside the security world (if possible for your use case) -- i.e. import/generate each key as temporal object and immediately wrap it using some persistent wrapping key (you could call this key LMK). Safely store the wrapped value in RDBMS (or anywhere else) and delete the temporal object. Later when you need to access this particular key you simply unwrap it back into temporal object and use it. Again this approach has some consequences and you must analyze thoroughly if it can be used in your situation
Good luck with your project!
Disclaimer: I am no crypto expert so please do validate my thoughts.

Using Couchbase with thousands of different schemas

Consider a multi-tenant application in which tenants are free to model their own schemas. I.e.: backend-as-a-service.
With these requirements a 'table' per bucket is undoable. Instead, I'm thinking of simply having an attribute 'schema-id' define the id of the schema. Each 'schema-id' is a compound key based on tenantId + schemaid.
As far as retrieval goes only 'get by id' should be supported. In that sense I'm only using Couchbase as a k/v store instead of a documents store.
Any caveats to the above? Would the sheer number of entities per bucket be a problem? Any other things to think about?
The key pattern idea sounds great to me. You will have to make sure your cluster is sized correctly and stays sized correctly over time.
If you wanted to really control everything tightly, you could even front the whole thing with a simple REST API. Then you could control access tightly, control that key pattern, etc. Each user of the service would get an API key that would give them a session.
Going with different buckets for different schemas will not scale,because i think there is a restriction of only 10 buckets in CB.
Since the is key is known by the client we can map the data from CB to a particular class since we know what type of schema it will be from the key.
Example if the key is PRODUCT_1234 or USER_12345,then we know for first key the data is of type PRODUCT for 2nd it is of type USER.

External data (keys) mapping within own database

I am currently working on a project of a new system.
This system will be using several different web-services to produce composite data.
Some data is compound and I use relational SQL tables (server, in particular, is MySQL) to compose data for further usage.
My problem is that I have to implement some data mapping.
Take countries for example.
Within our system countries are keyed (primary key based of CHAR2 ascii column) on ISO 3166-1 alpha-2.
One web-service provides data in the very same format. While several other have their own, unique, integer type identifiers.
As I am about to implement in-code mapping, I would like to have a possibility to dynamically update mapping tables, without making changes to the code.
Thus I am thinking about mappings table.
I may produce a table service_mappings, that would contain arbitrary length columns such as service_id (my own identifier for particular service), ref_id (datum provided by web-service), model (what data I am mapping this to in my system), key (what key this [service_id, ref_id] correspond to in my model).
On the other hand, I may choose something like a mapping table for each separate model, that would contain less keys (take model from previous table, as it would be defined by table name). This could be more feasible to use with ORMs of some kind.
So, my question is as following: what is the correct approach, what is the most efficient, and maybe there is some completely different technique?
Cache hint
In response to recent answer by Alexey.
We are likely to use some caching technique (such as memcache), although for primary data source we would like to rely on MySQL as we have methods in-place for creating and restoring back-ups, and we would have to think how to implement them.
Also, MySQL seems to inhibit some methods for faster access, and according to research by DeNA it may, actually, be faster than noSQL alternatives on primary-key/unique-key look-ups.
In your case, i'm would leave model in mapping table instead create separate tables, because it's will be much easer to find proper mapping. If you want more efficient, you may use some nosql storage for this mapping (such as Redis or memcachedb) , which is often much faster and reliable.

Should I obscure primary key values?

I'm building a web application where the front end is a highly-specialized search engine. Searching is handled at the main URL, and the user is passed off to a sub-directory when they click on a search result for a more detailed display. This hand-off is being done as a GET request with the primary key being passed in the query string. I seem to recall reading somewhere that exposing primary keys to the user was not a good idea, so I decided to implement reversible encryption.
I'm starting to wonder if I'm just being paranoid. The reversible encryption (base64) is probably easily broken by anybody who cares to try, makes the URLs very ugly, and also longer than they otherwise would be. Should I just drop the encryption and send my primary keys in the clear?
What you're doing is basically obfuscation. A reversible encrypted (and base64 doesn't really count as encryption) primary key is still a primary key.
What you were reading comes down to this: you generally don't want to have your primary keys have any kind of meaning outside the system. This is called a technical primary key rather than a natural primary key. That's why you might use an auto number field for Patient ID rather than SSN (which is called a natural primary key).
Technical primary keys are generally favoured over natural primary keys because things that seem constant do change and this can cause problems. Even countries can come into existence and cease to exist.
If you do have technical primary keys you don't want to make them de facto natural primary keys by giving them meaning they didn't otherwise have. I think it's fine to put a primary key in a URL but security is a separate topic. If someone can change that URL and get access to something they shouldn't have access to then it's a security problem and needs to be handled by authentication and authorization.
Some will argue they should never be seen by users. I don't think you need to go that far.
On the dangers of exposing your primary key, you'll want to read "autoincrement considered harmful", By Joshua Schachter.
URLs that include an identifier will
let you down for three reasons.
The first is that given the URL for
some object, you can figure out the
URLs for objects that were created
around it. This exposes the number of
objects in your database to possible
competitors or other people you might
not want having this information (as
famously demonstrated by the Allies
guessing German tank production levels
by looking at the serial numbers.)
Secondly, at some point some jerk will
get the idea to write a shell script
with a for-loop and try to fetch every
single object from your system; this
is definitely no fun.
Finally, in the case of users, it
allows people to derive some sort of
social hierarchy. Witness the frequent
hijacking and/or hacking of
high-prestige low-digit ICQ ids.
If you're worried about someone altering the URL to try and look at other values, then perhaps you need to look at token generation.
For instance, instead of giving the user a 'SearchID' value, you give them a SearchToken, which is some long unique psuedo-random value (Read: GUID), which you then map to the SearchID internally.
Of course, you'll also need to apply session security and soforth still - because even a unique URL with a non-sequential ID isn't protected against sniffing by anything between your server and the user.
If you're obscuring the primary keys for a security reason, don't do it. That's called security by obscurity and there is a better way. Having said that, there is at least one valid reason to obscure primary keys and that's to prevent someone from scraping all your content by simply examining a querystring in a URL and determining that they can simply increment an id value and pull down every record. A determined scraper may still be able to discover your means of obsuring and do this despite your best efforts, but at least you haven't made it easy.
PostgreSQL provides multiple solutions for this problem, and that could be adapted for others RDBMs:
hashids : https://hashids.org/postgresql/
Hashids is a small open-source library that generates short, unique, non-sequential ids from numbers.
It converts numbers like 347 into strings like “yr8”, or array of numbers like [27, 986] into “3kTMd”.
You can also decode those ids back. This is useful in bundling several parameters into one or simply using them as short UIDs.
optimus is similar to hashids but provides only integers as output: https://github.com/jenssegers/optimus
skip32 at https://wiki.postgresql.org/wiki/Skip32_(crypt_32_bits):
It may be used to generate series of unique values that look random, or to obfuscate a SERIAL primary key without loosing its unicity property.
pseudo_encrypt() at https://wiki.postgresql.org/wiki/Pseudo_encrypt:
pseudo_encrypt(int) can be used as a pseudo-random generator of unique values. It produces an integer output that is uniquely associated to its integer input (by a mathematical permutation), but looks random at the same time, with zero collision. This is useful to communicate numbers generated sequentially without revealing their ordinal position in the sequence (for ticket numbers, URLs shorteners, promo codes...)
this article gives details on how this is done at Instagram: https://instagram-engineering.com/sharding-ids-at-instagram-1cf5a71e5a5c and it boils down to:
We’ve delegated ID creation to each table inside each shard, by using PL/PGSQL, Postgres’ internal programming language, and Postgres’ existing auto-increment functionality.
Each of our IDs consists of:
41 bits for time in milliseconds (gives us 41 years of IDs with a custom epoch)
13 bits that represent the logical shard ID
10 bits that represent an auto-incrementing sequence, modulus 1024. This means we can generate 1024 IDs, per shard, per millisecond
Just send the primary keys. As long as your database operations are sealed off from the user interface, this is no problem.
For your purposes (building a search engine) the security tradeoffs benefits of encrypting database primary keys is negligible. Base64 encoding isn't encryption - it's security through obscurity and won't even be a speedbump to an attacker.
If you're trying to secure database query input just use parametrized queries. There's no reason at all to hide primary keys if they are manipulated by the public.
When you see base64 in the URL, you are pretty much guaranteed the developers of that site don't know what they are doing and the site is vulnerable.
URLs that include an identifier will
let you down for three reasons.
Wrong, wrong, wrong.
First - every request has to be validated, regardless of it coming in the form of a HTTP GET with an id, or a POST, or a web service call.
Second - a properly made web-site needs protection against bots which relies on IP address tracking and request frequency analysis; hiding ids might stop some people from writing a shell script to get a sequence of objects, but there are other ways to exploit a web site by using a bruteforce attack of some sort.
Third - ICQ ids are valuable but only because they're related to users and are a user's primary means of identification; it's a one-of-a-kind approach to user authentication, not used by any other service, program or web-site.
So, to conclude.. Yes, you need to worry about scrapers and DDOS attacks and data protection and a whole bunch of other stuff, but hiding ids will not properly solve any of those problems.
When I need a query string parameter to be able to identify a single row in a column, I normally add a GUID column to that table, and then pass the GUID in the connection string instead of the row's primary key value.

What exactly is GUID? Why and where I should use it?

What exactly is GUID? Why and where I should use it?
I've seen references to GUID in a lot of places, and in wikipedia,
but it is not very clear telling you where to use it.
If someone could answer this, it would be nice.
Thanks
GUID technically stands for globally unique identifier. What it is, actually, is a 128 bit structure that is unlikely to ever repeat or create a collision. If you do the maths, the domain of values is in the undecillions.
Use guids when you have multiple independent systems or clients generating ID's that need to be unique.
For example, if I have 5 client apps creating and inserting transactional data into a table that has a unique constraint on the ID, then use guids. This prevents having to force a client to request an issued ID from the server first.
This is also great for object factories and systems that have numerous object types stored in different tables where you don't want any 2 objects to have the same ID. This makes caching and scavenging schemas much easier to implement.
A GUID is a "Globally Unique IDentifier". You use it anywhere that you need an identifier that guaranteed to be different than every other.
Usually, you only need a value to be "locally unique" -- the Primary Key identity in a database table,for example, needs only be different from the other rows in that table, but can be the same as the ID in other tables. (no need for a GUID here)
GUIDs are generally used when you will be defining an ID that must be different from an ID that someone else (outside of your control) will be defining. One such place in the Interface identifier on ActiveX controls. Anyone can create an ActiveX, and not know with what other control someone will be using them with --- and there's nothing to stop everyone from giving their controls the same name. GUIDs keep them distinct.
GUIDs are a combination of the time (in very small fractions of a second) (so it assured to be different from any GUID defined before or later), and a number defining your location (sometimes taken from the MAC address of you network card) (so it's assured to be different from any other GUID defined right now by someone else).
They are also sometimes known as UUIDs (universally unique ID).
As addition to all the other answers, here is an online GUID generator:
http://www.guidgenerator.com/
What is a GUID?
GUID (or UUID) is an acronym for
'Globally Unique Identifier' (or
'Universally Unique Identifier'). It
is a 128-bit integer number used to
identify resources. The term GUID is
generally used by developers working
with Microsoft technologies, while
UUID is used everywhere else.
How unique is a GUID?
128-bits is big enough and the
generation algorithm is unique enough
that if 1,0000,000,000 GUIDs per
second were generated for 1 year the
probability of a duplicate would be
only 50%. Or if every human on Earth
generated 600,000,000 GUIDs there
would only be a 50% probability of a
duplicate.
How are GUIDs used?
GUIDs are used in software development
as database keys, component
identifiers, or just about anywhere
else a truly unique identifier is
required. GUIDs are also used to
identify all interfaces and objects in
COM programming.
A GUID is a "Globally Unique ID". Also called a UUID (Universally Unique ID).
It's basically a 128 bit number that is generated in a way (see RFC 4112 http://www.ietf.org/rfc/rfc4122.txt) that makes it nearly impossible for duplicates to be generated. This way, I can generate GUIDs without some third party organization having to give them to me to ensure they are unique.
One widespread use of GUIDs is as identifiers for COM entities on Windows (classes, typelibs, interfaces, etc.). Using GUIDs, developers could build their COM components without going to Microsoft to get a unique identifier. Even though identifying COM entities is a major use of GUIDs, they are used for many things that need unique identifiers. Some developers will generate GUIDs for database records to provide them an ID that can be used even when they must be unique across many different databases.
Generally, you can think of a GUID as a serial number that can be generated by anyone at anytime and they'll know that the serial number will be unique.
Other ways to get unique identifiers include getting a domain name. To ensure the uniqueness of domain names, you have to get it from some organization (ultimately administered by ICANN).
Because GUIDs can be unwieldy (from a human readable point of view they are a string of hexadecimal numbers, usually grouped like so: aaaaaaaa-bbbb-cccc-dddd-ffffffffffff), some namespaces that need unique names across different organization use another scheme (often based on Internet domain names).
So, the namespace for Java packages by convention starts with the orgnaization's domain name (reversed) followed by names that are determined in some organization specfic way. For example, a Java package might be named:
com.example.jpackage
This means that dealing with name collisions becomes the responsibility of each organization.
XML namespaces are also made unique in a similar way - by convention, someone creating an XML namespace is supposed to make it 'underneath' a registered domain name under their control. For example:
xmlns="http://www.w3.org/1999/xhtml"
Another way that unique IDs have been managed is for Ethernet MAC addresses. A company that makes Ethernet cards has to get a block of addresses assigned to them by the IEEE (I think it's the IEEE). In this case the scheme has worked pretty well, and even if a manufacturer screws up and issues cards with duplicate MAC addresses, things will still work OK as long as those cards are not on the same subnet, since beyond a subnet, only the IP address is used to route packets. Although there are some other uses of MAC addresses that might be affected - one of the algorithms for generating GUIDs uses the MAC address as one parameter. This GUID generation method is not as widely used anymore because it is considered a privacy threat.
One example of a scheme to come up with unique identifiers that didn't work very well was the Microsoft provided ID's for 'VxD' drivers in Windows 9x. Developers of third party VxD drivers were supposed to ask Microsoft for a set of IDs to use for any drivers the third party wrote. This way, Microsoft could ensure there were not duplicate IDs. Unfortunately, many driver writers never bothered, and simply used whatever ID was in the example VxD they used as a starting point. I'm not sure how much trouble this caused - I don't think VxD ID uniqueness was absolutely necessary, but it probably affected some functionality in some APIs.
GUID or UUID (globally vs Universally) Unique IDentifier is, well, a unique ID :) When you need something really unique machine generated, there are libraries to get you one.
See GUID on wikipedia for details.
As to when you don't need a GUID, it is when a counter that you control (one way or another, like a SERIAL SQL type or a sequence) gets incremented. Indexing a "text" value (GUID in textual form) or a 128 bit binary value (which a GUID is) is far more expensive than an integer.
Someone said they are conceptually 128-bit random values, and that is substantially true, but having done a little reading on UUID (GUID usually refers to Microsoft's implementation of UUID), I see that there are several different UUID versions, and most of them are not actually random. So it is possible to generate a UUID for a machine (or something else) and be able to reliably repeat that process to obtain the same UUID down the road, which is important for some applications.
For me it's easier to think of them as simply "128-bit random values". Which is essentially what they are. There are some algorithms for including a bit of information in a few digits of your GUID (thus the random part gets a bit smaller), but still they are pretty large almost-random values.
Since they are so large, it is extremely unlikely that two GUIDs will ever be generated that are the same. For all practical purposes, every GUID ever generated is unique in the world.
I'll leave it to you to figure out where to use them, but other answers already have some examples. Let your imagination run wild. :)
Can be a hard thing to understand because of all the maths that goes on behind generating them. Think of it as a unique id. You can get Visual Studio to generate one for you, or .NET if you happen to be using C# or one of the many other applications or websites. They are considered unique because there is such a silly small chance you'll see the same one twice that it isn't worth considering.
128-bit Globally Unique ID. You can generate GUIDs from now until sunset and you never generate the same GUID twice, and neither will anyone else. They are used a lot with COM.
As for example of something you would use them for, we use them in one of our products. Our users can generate categories and cards on various devices. We want to make sure that we don't confuse a category made on one device with a category created on a different one, so it's important that IDs are unique no matter who generates them, where they generate them, and when they generate them. So we use GUIDs (actually we use our own scheme using 64-bit numbers but they are similar to GUIDs).
I worked on an ACD call center system a few years back where we wanted to gather call detail records from multiple call processors into a single database. I setup a column in MS SQL to generate a GUID for the database key rather than using a system-generated sequential ID (identity column). Back then, this required setting the default value to NewID (or generating it in the code, but the NewID() function was safer). Of course, having a large value for a key may raise a few eyebrows, but I would rather give up the space than risk a collision.
I didn't see anyone address using a GUID as a database key so I thought it might help to know you could do that too.
GUID stands for "Globally Unique Identifier" and you use it when you want to have, erm, a Globally Unique Identifier.
In RSS feeds, for example, you should have a GUID for each item in the feed. That way, the feed reader software can keep track of whether you have read that item or not. Without a GUID, it would be impossible to tell.
A GUID differs from something like a database ID in that no matter who creates an object -- you, me, the guy down the street -- our GUIDs will always be different. There should be no collisions using a GUID.
You'll also see the term UUID, which stands for "Universally Unique Identifier." There is essentially no difference between the two. UUID is the more appropriate term. GUID is the term used by Microsoft.
If you need to generate an identifier that needs to be unique during the whole lifetime of your application, you use a GUID.
Imagine you have a server with sessions, if you give each session a GUID, you are certain that it will be unique for every session ever created by your server. This is useful for tracing bugs.
One particularly useful application of GUIDs that I've found is using them to track unique visitors in webapps where the visitors are anonymous (i.e. not logged in or registered).
GUID = Global Unique IDentifier.
Use it when you want to uniquely identify something in a global context.
This generator can be handy.
The Wikipedia article on GUIDs is pretty clear on what they are used for - maybe rephrasing your question would help - what do you need a GUID for?
To actually see what it looks like on a windows computer, go to cmd or powershell.
Powershell => [guid]::NewGuid()
CMD => powershell [guid]::NewGuid()