Azure EventHub : How can I get list of partition keys for an existing EventHub - partitioning

I have an Azure EventHub which already has records. I need to get a list of unique partition keys within this EventHub. This I need to use for a specific custom logic in my processor/consumer. Appreciate any suggestions or workarounds.

I'm not sure if you're possibly conflating a partition id and partition key; the former is a core concept for Event Hubs and can be queried using the SDKs for any of the supported languages, the command line, or the REST interface directly. An example for the .NET client library can be found here.
The Event Hubs service does not persist or expose the partition keys used when publishing an event. When an event batch with partition key is published, the service produces a hash based on that key and uses that hash value to select a partition to which the event should be routed. The same key is guaranteed to produce the same hash and route to the same partition. Since the hash value is stable for a given key and the key itself does not have meaning to the service, it is calculated on-demand.
In your case, it sounds like you'd like to understand what key was used when the event was published in your downstream consumer as you're reading it from the service. I'd recommend using the Properties bag of the event to hold hold onto the key that you've chosen and associate it with the event.
The Properties are intended to hold arbitrary data that is meaningful to your application and bundle it with your event as it passes through the system. An example of including custom metadata in the Properties using the .NET client library can be found here.

Related

At what scope are Azure event log IDs unique?

I have an application that monitors activity logs from multiple different and unrelated Azure platforms using Microsoft’s Management Activity API. According to the Common Schema documentation, event IDs are “Unique identifier[s] of an audit record.” but it does not specify a scope. Are they globally unique across all Azure instances, or is it possible I will have an ID collision between two unrelated instances?
Thanks!
GUIDs can be assumed to be unique, and the chance of collision is highly unlikely. Refer this SO answer that covers this eloquently.

Kafka have both partitioning and consumer group

Can we use both partitioning and consumer group for same topic. So we want to create a topic which have partitioning and then create multiple consumers to it and out of them 2-3 will be generic listening to all messages(needs to be in a consumer group so that message in not processed multiple times) and then one consumer for specific partion.
Is partitioning and consumer are mutually exclusive?
With the high level consumer API you can´t pin a consumer instance to a particular partition but there is nothing preventing you from having a set of consumers using the high level API and another set of consumers using the simple API for the same topic.
With this you could have a simple consumer consuming from a specific partition and a set of high level consumers in a consumer group consuming messages across all partitions.
High Level Consumer for kafka 0.8.x doesn't allow you to specify partition. It reads the data from all partitions and does complex failure detection and rebalancing. Probably it will be supported in future versions according to the API redesign - https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Client+Re-Design#ConsumerClientRe-Design-Allowmanualpartitionassignment.
If you need to read from specific topic and partition - use SimpleConsumer(it requires manual leader/offset/exclusivity/error handling).
Or you can use high level consumer and filter the data by partition(if you accept the overhead).
Also as an option you can redesign your topics to write data to separate topic instead of 'specific partition'.

How to separately assign data to each connected user inside Socket.IO?

I am trying to make a card game using Socket.IO, and I am having problems assigning user-specific data (in my case, the cards that each user has).
I'm familiar with JavaScript, but I'm just not sure about whether or not there is a specific feature in Socket.IO for assigning user-specific data, or whether or not I have to store the information in a database / array of sorts.
There are ways to attach data to each socket in socket.io, but it's probably easier to put your data in an associative array, where the keys are the socket id's. Just create the key-value pair upon connection, and make sure you delete the pair on the disconnect event with the delete statement.

Why does the Couchbase Server API require a name for new documents

When you create a document using the Couchbase Server API, one of the arguments is a document name. What is this used for and why is it needed?
When using Couchbase Lite you can create an empty document and it is assigned an _id and _rev. You do not need to give it a name. So what is this argument for in Couchbase Server?
In Couchbase Server it is a design decision that all objects are identified by a the object ID, key or name (all the same thing by different names) and those are not auto-assigned. The reason for this is that keys are not embedded in the document itself, key lookups are the fastest way to get that object and the technology dictates this under the hood of the server. Getting a document by ID is much faster than querying for it. Querying means you are asking a question, whereas getting the object by ID means you already know the answer and are just telling the DB to go get it for you and is therefor faster.
If the ID is something that is random, then more than likely you must query the DB and that is less efficient. Couchbase Mobile's sync_gateway together with Couchbase Lite handles this on your behalf if you want it to as it can have its own keyspace and key pattern it manages for key lookups. If you are going straight to the DB on your own with the Couchbase SDK though, knowing that key will be the fastest way to get that object. Like I said, Couchbase Sync_Gateway handeles this lookup for you, as it is the app server. When you go direct with the SDKs you get more control and different design patterns emerge.
Many people in Couchbase Server create a key pattern that means something to their application. As an example for a user profile store I might consider breaking up the profile into three separate documents with a unique username (in this example hernandez94) for each document:
1) login-data::hernandez94 is the object that has the encrypted password since I need to query that all of the time and want it in Couchbase's managed cache for performance reasons.
2) sec-questions::hernandez94 is the object that has the user's 3 security questions and since I do not use that very often, do not care if it is in the managed cache
3) main::hernandez94 is the user's main document that has everything else that I might need to query often, but not nearly as often as other times.
This way I have tailored my keyspace naming to my application's access patterns and therefor get only the data I need and exactly when I need it for best performance. If I want, since these key names are standardized in my app, I could do a paralellized bulk get on all three of these document since my app can construct the name and it would be VERY fast. Again, I am not querying for the data, I have the keys, just go get them. I could normalize this keyspace naming further depending on the access patterns of my application. email-addresses::hernandez94, phones::hernandez94, appl-settings::hernandez94, etc.

Storing encryption keys -- best practices?

I have a web application that uses a symmetric encryption algorithm.
How would you store the secret key and initialization vector? Storing as a literal in the code seems like a bad idea. How about app settings? What is the best practice here?
One standard approach in the webapp world is to split the key and put it in different places. E.g., you might split the key and put part of it in the filesystem (outside of the 'webapps' directory), part of it in the JNDI configuration (or .net equivalent), and part of it in the database. Getting any single piece isn't particularly hard if you're compromised, e.g., examining backup media or SQL injection, but getting all of the pieces will require a lot more work.
You can split a key by XOR-ing it with random numbers of the same size. (Use a cryptographically strong random number generator!) You can repeat this process several times if you want to split the key into multiple pieces. At the end of the process you want, e.g., three partial keys such that p1 ^ p2 ^ p3 = key. You might need to base64-encode some of the partial keys so they can be stored properly, e.g., in a JNDI property.
(There are more sophisticated ways to split a key, e.g., an n-of-m algorithm where you don't require all of the pieces to recreate the key, but that's -far- beyond what you need here.)
If you can require the user to actively enter the password, there are PBE (password-based encryption) algorithms that convert a password to a good symmetric key. You want to find one that requires an external file as well. Again it's a case the tape backups or the password itself isn't enough, you need both. You could also use this to split the password into two pieces with JNDI - you can use a plaintext passphrase in JNDI and an initialization file somewhere in the filesystem.
Finally, whatever you do be sure you can 'rekey' your application fairly easily. One approach is to use the password obtained above to decrypt another file that contains the actual encryption key. This makes it easy to change the password if you think it's been compromised without requiring a massive reencryption of all of the data - just reencrypt your actual key.
Is it possible for you to enter a password interactively whenever the application starts up? That way you don't have to store the key, or at least any keys (whether they are symmetric or private keys) can be encrypted with this "bootstrap" password.
If not, store your secret key in a file by itself and modify its permissions to make it accessible only to the user running the web application.
These approaches are platform-agnostic. For more concrete suggestions, information about your platform would be helpful.
By the way, an initialization vector should be used for only one message. And IVs do not have be kept secret, so you could store it anywhere, but storing it with the one message that uses it is customary.
I have used an approach where my application requires a symmetric key when it starts and looks for it in a certain file. Once the application has started up I remove the file. A copy of the file is kept remotely for any required restarts. Obviously this approach is not viable if your applciation has frequent restarts.
Another alternative would be a certificate manager such as the Windows Certificate Store. It can store certificates and their keys securely and it's also possible to mark private keys as non-exportable so it would require some serious hacking to get the key out. Your application could load its certificate from the Certificate Store and be able to call operations to sign requests or generate new symmetric keys. In addition you can assign permissions to different certifcate stores so that only certain privileged accounts would be able to access the certificate.
stick it in the web.config and encrypt that section
This SO question talks more about web.config encryption
This should help ...
http://msdn.microsoft.com/en-us/library/ms998280.aspx
But, you really should consider going to PKI if you are serious about protecting your data.
We have a slightly different, but related issue. We have keys generated every few days, and when decrypting, we have to try all our keys because we do not know which day the encryption took place. What we did was to encrypt the keys one more time and store them as secretes. This way, we only have one set of keys to manage.
For secure storing of encryption key you can use KMS service of AWS. Please use this service for storing such confidential keys. PFB url for kms service.
documentation : https://aws.amazon.com/kms/