Delete from Redis Sorted Set based on JSON Property - json

I have a large number of items stored in a Redis Sorted Set (of the order 100,000) that fairly frequently get updated. These items are objects encoded as JSON strings, and the rank for sorting in the set is derived (on insert, by my code) from a date/time property on the object.
Each item in the set has an Id property (which is a Guid encoded as a string) which uniquely identifies the item within the system.
When these items are updated, I need to either update the item within the sorted set, or delete and reinsert the item. The problem I have is how to find that item to perform the operation.
What I'm currently doing is loading the entire contents of the sorted set into memory, operating on that collection in my code and then writing the complete collection back to Redis. Whilst this works, it's not particularly efficient and won't scale well if the lists start to grow very large.
Would anybody have any suggestions as to how to do this in a more efficient manner? The only unique identifier I have for the items is the Id property as encoded in the item.
Many Thanks,
Richard.

Probably, your case is just a bad design choice.
You shouldn't store JSON strings in sorted sets: you need to store identifiers, and the whole JSON serialized objects should be stored in a hash.
This way, when you need to update an object, you update the whole hash key using hset and you can locate the whole object by its unique identifier.
In the other hand, any key in the hash must be present in your sorted set. When you add an object to the sorted set, you're adding its unique identifier.
When you need to list your objects in a particular order, you do the following operations:
You get a page of identifiers from the sorted set (for example, using zrange).
You get all objects from the page giving their identifiers to a hmget command.

Related

IndexedDB - IDBKeyRange on simple index with arbitrary key list

I have an object store in an IDB that has a simple (non-compound) index on a field X. This index is not unique (many items may have the same value for X).
I'd like to query the IDB to return all items that have an X value of either "foo", "bar", or "bat".
According to the documentation, index getAll takes either a key (in my case a string) or an IDBKeyRange. However, it's not obvious to me how to construct an IDBKeyRange with an arbitrary set of keys, and get the union of all results based on those keys.
You cannot do this in a single request. indexedDB does not currently support "OR" style queries.
An alternative solution is to do one request per value. Basically, for each value, use getAll on the index for the value, then concatenate all of the arrays into a single array (possibly merging duplicates). You don't actually have that many round trips against the DB since you are using getAll. In setting up this index, you basically want a store of let's say "things", where each "thing" has a property such as "tags", where tags is an array of the values (each being a string). The index you create on the "tags" property should be flagged as a multi-entry index.
There are, of course, creative hacky solutions. Here is one. Keep in mind it is completely useless if you have things that have different tag sets but you still want to match the ones that share, this would only work if you do not care about whether any one thing has extra tags. For example, consider each distinct set of values, ignoring order. Let's call them groups. E.g. foo is 1, bar is 2, bat is 3, foo-bar is 4, foo-bat is 5, bar-bat is 6, etc. You can give each group a key, like the numerical counter value I just used in the example. Then you can store the group key as a property in the object. Each time you go to store the object, calculate its group key. You can precalculate all group keys, or develop a hash-style function that generates a particular key given a set of arbitrary string values. Sure, you pay a tiny bit more upfront at time of storage, and when building the request query, but you save a ton of processing because indexedDB does all the processing after that. So you want a simple fast hash. And sure, this is added complexity. But maybe it will work. Just find a simple JS hash. Modify it so that you lexicographically store the value set prior to use (so that difference in value order does not cause difference in hash value). So, to explain more concretely, for the things object store, each thing object has a property called "tags-hash". You create a basic index on this (not unique, not multi-entry). Each time you put a thing in the store, you calculate the value of tags-hash, and set the property's value, before calling put. Then, each time you want to query, you calculate the hash of the array of tags by which you wish to query, then call getAll(calculated-hash-value), and it will give you all things that have those exact tags.

Storing attributes with multiple integer values

I need to store a dynamic number of integer attributes (1-8). I'm storing them in individual columns in the database table, like:
attribute_1, attribute_2, ..., attribute_8
This makes for a fairly messy model when methods need to reference these, as well as an unwieldy database table and schema.
These are assigned default values (but are overridable on a form), and represent unique identifiers for the user.
For instance, a Brew is composed of up to eight batches before they are mixed together in a fermenter. The brewer might want to go back and refer to any one of these by its unique identifying number. I'm assigning these unique values based on the last highest value when creating a new Brew. However, the user may want to override these default values to some other numbers.
In most cases (smaller breweries), they'll probably only use the first two, but some larger breweries would use all eight.
There must be a better way to store these than having eight different attributes with the same name and a number at the end.
I'm using MySQL. Is there an easy/concise way to store an array or a JSON hash but still be able to edit these values on a form?
I would not store attributes like that. It will limit you in the future. Let say you want to know which brews have used attribute_4? You will have to scan the entire brews table, open the attributes field and deconstruct it to see if 4 is in there.
Much better to separate Brew and Attributes in two tables, and link them, like so:
Another benefit, is you can add attributes easily.
Storing JSON is ok, like #max pointed out. I just propose the normalized database way of doing it.

Get maximum value of number of keys (hlen) for hashes that match given pattern in Redis

I have many records that start with the pattern as shown below:
user:8d6120be2e7247e49545502092c389fd and
user:000935dc3bb16bd2e0de50988751acfd
Though the hash represent user object, one hash may have more keys than the other. Say if user is a Manager then he may have few additional keys like Reportees, Benifits etc., Without actually looking into all the records is there a way to know the maximum number of keys in any hash? I am in the process of converting the Redis structure in to Relational schema and this gives me an idea on what all columns should be present.
Just use HLEN if your user:<hash> key is HSET. The most of data structures in redis have the way to get they len:
LLEN in LIST
SMEMBERS in SET
ZCARD in SORTED SET

Optimal Way to Store/Retrieve Array in Table

I currently have a table in MySQL that stores values normally, but I want to add a field to that table that stores an array of values, such as cities. Should I simply store that array as a CSV? Each row will need it's own array, so I feel uneasy about making a new table and inserting 2-5 rows for each row inserted in the previous table.
I feel like this situation should have a name, I just can't think of it :)
Edit
number of elements - 2-5 (a selection from a dynamic list of cities, the array references the list, which is a table)
This field would not need to be searchable, simply retrieved alongside other data.
The "right" way would be to have another table that holds each value but since you don't want to go that route a delimited list should work. Just make sure that you pick a delimiter that won't show up in the data. You can also store the data as XML depending on how you plan on interacting with the data this may be a better route.
I would go with the idea of a field containing your comma (or other logical delimiter) separated values. Just make sure that your field is going to be big enough to hold your maximum array size. Then when you pull the field out, it should be easy to perform an explode() on the long string using your delimiter, which will then immediately populate your array in the code.
Maybe the word you're looking for is "normalize". As in, move the array to a separate table, linked to the first by means of a key. This offers several advantages:
The array size can grow almost indefinitely
Efficient storage
Ability to search for values in the array without having to use "like"
Of course, the decision of whether to normalize this data depends on many factors that you haven't mentioned, like the number of elements, whether or not the number is fixed, whether the elements need to be searchable, etc.
Is your application PHP? It might be worth investigating the functions serialize and unserialize.
These two functions allow you to easily store an array in the database, then recreate that array at a later time.
As others have mentioned, another table is the proper way to go.
But if you really don't want to do that(?), assuming you're using PHP with MySQL, why not use the serialize() and store a serialized value?

Does MYSQL stores it in an optimal way it if the same string is stored in multiple rows?

I have a table where one of the columns is a sort of id string used to group several rows from the table. Let's say the column name is "map" and one of the values for map is e.g. "walmart". The column has an index on it, because I use to it filter those rows which belong to a certain map.
I have lots of such maps and I don't know how much space the different map values take up from the table. Does MYSQL recognizes the same map value is stored for multiple rows and stores it only once internally and only references it with an internal numeric id?
Or do I have to replace the map string with a numeric id explicitly and use a different table to pair map strings to ids if I want to decrease the size of the table?
MySQL will store the whole data for every row, regardless of whether the data already exists in a different row.
If you have a limited set of options, you could use an ENUM field, else you could pull the names into another table and join on it.
I think MySQL will duplicate your content each time : it stores data row by row, unless you explicitly specify otherwise (putting the data in another table, like you suggested).
Using another table will mean you need to add a JOIN in some of your queries : you might want to think a bit about the size of your data (are they that big ?), compared to the (small ?) performance loss you may encounter because of that join.
Another solution would be using an ENUM datatype, at least if you know in advance which string you will have in your table, and there are only a few of those.
Finally, another solution might be to store an integer "code" corresponding to the strings, and have those code translated to strings by your application, totally outside of the database (or use some table to store the correspondances, but have that table cached by your application, instead of using joins in SQL queries).
It would not be as "clean", but might be better for performances -- still, this may be some kind of micro-optimization that is not necessary in your case...
If you are using the same values over and over again, then there is a good functional reason to move it to a separate table, totally aside from disk space considerations: To avoid problems with inconsistent data.
Suppose you have a table of Stores, which includes a column for StoreName. Among the values in StoreName "WalMart" occurs 300 times, and then there's a "BalMart". Is that just a typo for "WalMart", or is that a different store?
Also, if there's other data associated with a store that would be constant across the chain, you should store it just once and not repeatedly.
Of course, if you're just showing locations on a map and you really don't care what they are, it's just a name to display, then this would all be irrelevant.
And if that's the case, then buying a bigger disk is probably a simpler solution than redesigning your database just to save a few bytes per record. Because if we're talking arbitrary strings for place names here, then trying to find duplicates and have look-ups for them is probably a lot of work for very little gain.