Redis strings vs Redis hashes to represent JSON: efficiency? - json

I want to store a JSON payload into redis. There's really 2 ways I can do this:
One using a simple string keys and values.
key:user, value:payload (the entire JSON blob which can be 100-200 KB)
SET user:1 payload
Using hashes
HSET user:1 username "someone"
HSET user:1 location "NY"
HSET user:1 bio "STRING WITH OVER 100 lines"
Keep in mind that if I use a hash, the value length isn't predictable. They're not all short such as the bio example above.
Which is more memory efficient? Using string keys and values, or using a hash?

This article can provide a lot of insight here: http://redis.io/topics/memory-optimization
There are many ways to store an array of Objects in Redis (spoiler: I like option 1 for most use cases):
Store the entire object as JSON-encoded string in a single key and keep track of all Objects using a set (or list, if more appropriate). For example:
INCR id:users
SET user:{id} '{"name":"Fred","age":25}'
SADD users {id}
Generally speaking, this is probably the best method in most cases. If there are a lot of fields in the Object, your Objects are not nested with other Objects, and you tend to only access a small subset of fields at a time, it might be better to go with option 2.
Advantages: considered a "good practice." Each Object is a full-blown Redis key. JSON parsing is fast, especially when you need to access many fields for this Object at once. Disadvantages: slower when you only need to access a single field.
Store each Object's properties in a Redis hash.
INCR id:users
HMSET user:{id} name "Fred" age 25
SADD users {id}
Advantages: considered a "good practice." Each Object is a full-blown Redis key. No need to parse JSON strings. Disadvantages: possibly slower when you need to access all/most of the fields in an Object. Also, nested Objects (Objects within Objects) cannot be easily stored.
Store each Object as a JSON string in a Redis hash.
INCR id:users
HMSET users {id} '{"name":"Fred","age":25}'
This allows you to consolidate a bit and only use two keys instead of lots of keys. The obvious disadvantage is that you can't set the TTL (and other stuff) on each user Object, since it is merely a field in the Redis hash and not a full-blown Redis key.
Advantages: JSON parsing is fast, especially when you need to access many fields for this Object at once. Less "polluting" of the main key namespace. Disadvantages: About same memory usage as #1 when you have a lot of Objects. Slower than #2 when you only need to access a single field. Probably not considered a "good practice."
Store each property of each Object in a dedicated key.
INCR id:users
SET user:{id}:name "Fred"
SET user:{id}:age 25
SADD users {id}
According to the article above, this option is almost never preferred (unless the property of the Object needs to have specific TTL or something).
Advantages: Object properties are full-blown Redis keys, which might not be overkill for your app. Disadvantages: slow, uses more memory, and not considered "best practice." Lots of polluting of the main key namespace.
Overall Summary
Option 4 is generally not preferred. Options 1 and 2 are very similar, and they are both pretty common. I prefer option 1 (generally speaking) because it allows you to store more complicated Objects (with multiple layers of nesting, etc.) Option 3 is used when you really care about not polluting the main key namespace (i.e. you don't want there to be a lot of keys in your database and you don't care about things like TTL, key sharding, or whatever).
If I got something wrong here, please consider leaving a comment and allowing me to revise the answer before downvoting. Thanks! :)

It depends on how you access the data:
Go for Option 1:
If you use most of the fields on most of your accesses.
If there is variance on possible keys
Go for Option 2:
If you use just single fields on most of your accesses.
If you always know which fields are available
P.S.: As a rule of the thumb, go for the option which requires fewer queries on most of your use cases.

Some additions to a given set of answers:
First of all if you going to use Redis hash efficiently you must know
a keys count max number and values max size - otherwise if they break out hash-max-ziplist-value or hash-max-ziplist-entries Redis will convert it to practically usual key/value pairs under a hood. ( see hash-max-ziplist-value, hash-max-ziplist-entries ) And breaking under a hood from a hash options IS REALLY BAD, because each usual key/value pair inside Redis use +90 bytes per pair.
It means that if you start with option two and accidentally break out of max-hash-ziplist-value you will get +90 bytes per EACH ATTRIBUTE you have inside user model! ( actually not the +90 but +70 see console output below )
# you need me-redis and awesome-print gems to run exact code
redis = Redis.include(MeRedis).configure( hash_max_ziplist_value: 64, hash_max_ziplist_entries: 512 ).new
=> #<Redis client v4.0.1 for redis://127.0.0.1:6379/0>
> redis.flushdb
=> "OK"
> ap redis.info(:memory)
{
"used_memory" => "529512",
**"used_memory_human" => "517.10K"**,
....
}
=> nil
# me_set( 't:i' ... ) same as hset( 't:i/512', i % 512 ... )
# txt is some english fictionary book around 56K length,
# so we just take some random 63-symbols string from it
> redis.pipelined{ 10000.times{ |i| redis.me_set( "t:#{i}", txt[rand(50000), 63] ) } }; :done
=> :done
> ap redis.info(:memory)
{
"used_memory" => "1251944",
**"used_memory_human" => "1.19M"**, # ~ 72b per key/value
.....
}
> redis.flushdb
=> "OK"
# setting **only one value** +1 byte per hash of 512 values equal to set them all +1 byte
> redis.pipelined{ 10000.times{ |i| redis.me_set( "t:#{i}", txt[rand(50000), i % 512 == 0 ? 65 : 63] ) } }; :done
> ap redis.info(:memory)
{
"used_memory" => "1876064",
"used_memory_human" => "1.79M", # ~ 134 bytes per pair
....
}
redis.pipelined{ 10000.times{ |i| redis.set( "t:#{i}", txt[rand(50000), 65] ) } };
ap redis.info(:memory)
{
"used_memory" => "2262312",
"used_memory_human" => "2.16M", #~155 byte per pair i.e. +90 bytes
....
}
For TheHippo answer, comments on Option one are misleading:
hgetall/hmset/hmget to the rescue if you need all fields or multiple get/set operation.
For BMiner answer.
Third option is actually really fun, for dataset with max(id) < has-max-ziplist-value this solution has O(N) complexity, because, surprise, Reddis store small hashes as array-like container of length/key/value objects!
But many times hashes contain just a few fields. When hashes are small we can instead just encode them in an O(N) data structure, like a linear array with length-prefixed key value pairs. Since we do this only when N is small, the amortized time for HGET and HSET commands is still O(1): the hash will be converted into a real hash table as soon as the number of elements it contains will grow too much
But you should not worry, you'll break hash-max-ziplist-entries very fast and there you go you are now actually at solution number 1.
Second option will most likely go to the fourth solution under a hood because as question states:
Keep in mind that if I use a hash, the value length isn't predictable. They're not all short such as the bio example above.
And as you already said: the fourth solution is the most expensive +70 byte per each attribute for sure.
My suggestion how to optimize such dataset:
You've got two options:
If you cannot guarantee max size of some user attributes then you go for first solution, and if memory matter is crucial then
compress user json before storing in redis.
If you can force max size of all attributes.
Then you can set hash-max-ziplist-entries/value and use hashes either as one hash per user representation OR as hash memory optimization from this topic of a Redis guide: https://redis.io/topics/memory-optimization and store user as json string. Either way you may also compress long user attributes.

we had a similar issue in our production env , we have came up with an idea of gzipping the payload if it exceeds some threshold KB.
I have a repo only dedicated to this Redis client lib here
what is the basic idea is to detect the payload if the size is greater than some threshold and then gzip it and also base-64 it and then keep the compressed string as a normal string in the redis. on retrieval detect if the string is a valid base-64 string and if so decompress it.
the whole compressing and decompressing will be transparent plus you gain close to 50% network traffic
Compression Benchmark Results
BenchmarkDotNet=v0.12.1, OS=macOS 11.3 (20E232) [Darwin 20.4.0]
Intel Core i7-9750H CPU 2.60GHz, 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=5.0.201
[Host] : .NET Core 3.1.13 (CoreCLR 4.700.21.11102, CoreFX 4.700.21.11602), X64 RyuJIT DEBUG
Method
Mean
Error
StdDev
Gen 0
Gen 1
Gen 2
Allocated
WithCompressionBenchmark
668.2 ms
13.34 ms
27.24 ms
-
-
-
4.88 MB
WithoutCompressionBenchmark
1,387.1 ms
26.92 ms
37.74 ms
-
-
-
2.39 MB

To store JSON in Redis you can use the Redis JSON module.
This gives you:
Full support for the JSON standard
A JSONPath syntax for selecting/updating elements inside documents
Documents stored as binary data in a tree structure, allowing fast access to sub-elements
Typed atomic operations for all JSON values types
https://redis.io/docs/stack/json/
https://developer.redis.com/howtos/redisjson/getting-started/
https://redis.com/blog/redisjson-public-preview-performance-benchmarking/

You can use the json module: https://redis.io/docs/stack/json/
It is fully supported and allows you to use json as a data structure in redis.
There is also Redis Object Mappers for some languages: https://redis.io/docs/stack/get-started/tutorials/

Related

Is gzip compression efficient for repetitive text content? Multiple terms that show up multiple times in the string that is being compressed

I have this products-like page that I'm using SSR with NextJS. Users can search, filter, pagination, etc.
It's not a huge amount of items, so I decided to send them all as props to the page, doing all that functionality on client-side only. Raw props data is now roughly ~350kb and gzipped is ~75kb.
I think this is worth it, because I can save a lot on database reads, and I didn't have to setup any search/cache server to implement the searching/filtering functionality. It's all done on the client, becuase it has all the items in-memory.
The product object has a shape similar to this:
{
longPropertyName: value,
stringEnum: 'LONG_STRING_VALUE'
}
What I could do to optmize data traffic would be to shorten property names, and refactor string enums into numeric enums, so it would be:
{
shortProp: value,
numericEnum: 1
}
This for sure would reduce the uncompressed data size from ~350kb to maybe ~250kb.
But I'm not sure if it's worth doing it, because I think the gzipped size would remain very much the same, because I'm assuming the gzip compression is very good in compressing repetitive text content, like property names and string enum values that show up multiple times in the data.
Would I get a reduction on the gzip size by the same factor? Or will it be just the same value?
gzip will find repeated strings that are no farther than 32K bytes from each other. It will encode up to 258 bytes of match at a time as a very compact length and distance. That is well suited to your application, whether or not you tokenize the information. Tokenization will improve the compression, both by making the matches shorter, permitting more text per match, but also by supporting more distant matches by bringing what was further away into that 32K.
You can also try more modern compressors such as zstd or lzma2 (xz), which support much farther distances and longer matches.
Just tested it locally and while I was able to remove 20kb from uncompressed data by replacing the string enums with numeric enums, the change in the compressed data was only 1.1kb.
It means that gzip is already doing 95% of the work for me.
Note: I tested the gzip size with:
gzip -c filename.min.js | wc -c
I ended up writing a minification/expand logic for those objects. I'm doing SSR, so I minify on the server, and expand on the client. That reduced the uncompressed props data from 350kb to 168kb and the final compressed page size from 75kb to 45kb.
Simple logic like:
On server:
{ longProp: value } => { a: value }
{ stringEnum: STRING_VALUE => { b: 5 }
On client:
{ a: value } => { longProp: value }
{ b: 5 } => { stringEnum: STRING_VALUE

How many tasks are created when spark read or write from mysql?

As far as I know, Spark executors handle many tasks at the same time to guarantee processing data parallelly.Here comes the question. When connecting to external data storage,say mysql,how many tasks are there to finishi this job?In other words,are multiple tasks created at the same time and each task reads all data ,or data is read from only one task and is distributed to the cluster in some other way? How about writing data to mysql,how many connections are there?
Here is some piece of code to read or write data from/to mysql:
def jdbc(sqlContext: SQLContext, url: String, driver: String, dbtable: String, user: String, password: String, numPartitions: Int): DataFrame = {
sqlContext.read.format("jdbc").options(Map(
"url" -> url,
"driver" -> driver,
"dbtable" -> s"(SELECT * FROM $dbtable) $dbtable",
"user" -> user,
"password" -> password,
"numPartitions" -> numPartitions.toString
)).load
}
def mysqlToDF(sparkSession:SparkSession, jdbc:JdbcInfo, table:String): DataFrame ={
var dF1 = sparkSession.sqlContext.read.format("jdbc")
.option("url", jdbc.jdbcUrl)
.option("user", jdbc.user)
.option("password", jdbc.passwd)
.option("driver", jdbc.jdbcDriver)
.option("dbtable", table)
.load()
// dF1.show(3)
dF1.createOrReplaceTempView(s"${table}")
dF1
}
}
here is a good article which answers your question:
https://freecontent.manning.com/what-happens-behind-the-scenes-with-spark/
In simple words: the workers separate the reading task into several parts and each worker only read a part of your input data. The number of tasks divided depends on your ressources and your data volume. The writing is the same principle: Spark writes the data to a distributed storage system, such as Hdfs and in Hdfs the data is stored in a ditributed way: each worker writes its data to some storage node in Hdfs.
By default data from jdbc source are loaded by one thread so you will have one task processed by one executor and thats the case you may expect in your second function mysqlToDF
In the first function "jdbc" you are closer to parallel read but still some parameters are needed, numPartitions is not enough, spark need some integer/date column and lower/upper bounds to be able to read in paralell (it will execute x queries for partial results)
Spark jdb documentation
In this docu you will find:
partitionColumn, lowerBound, upperBound (none) These options must
all be specified if any of them is specified. In addition,
numPartitions must be specified. They describe how to partition the
table when reading in parallel from multiple workers. partitionColumn
must be a numeric, date, or timestamp column from the table in
question. Notice that lowerBound and upperBound are just used to
decide the partition stride, not for filtering the rows in table. So
all rows in the table will be partitioned and returned. This option
applies only to reading.
numPartitions (none) The maximum
number of partitions that can be used for parallelism in table reading
and writing. This also determines the maximum number of concurrent
JDBC connections. If the number of partitions to write exceeds this
limit, we decrease it to this limit by calling coalesce(numPartitions)
before writing. read/write
regarding write
How about writing data to mysql,how many connections are there?
As stated in docu it also depends on numPartitions, if number of partitions when writing will be higher than numPartitions Spark will figure it out and call coalesce. Remember that coalesce may generate skew so sometimes it may be better to repartition it explicitly with repartition(numPartitions) to distribute data equally before write
If you don't set numPartitions number of paralell connections on write may be the same as number of active tasks in given moment so be aware that with to high parallelism and no upper bound you may choke source server

Output index of ELKI

I am using ELKI to cluster data from CSV file
I use
-resulthandler ResultWriter
-out folder/
to save the outputdata
But as an output I have some strange indexes
ID=2138 0.1799 0.2761
ID=2137 0.1797 0.2778
ID=2136 0.1796 0.2787
ID=2109 0.1161 0.2072
ID=2007 0.1139 0.2047
The ID is more than 2000 despite I have less than 100 training samples
DBIDs are internal; the documentation clearly says that you shouldn't make too much assumptions on them because their implementation may change. The only reason they are written to the output at all is because some methods (such as OPTICS) may require cross-referencing objects by this unique ID.
Because they are meant to be unique identifiers, they are usually continuously incremented. The next time you click on "run" in the MiniGUI, you will get the next n IDs... so clearly, you clicked run more than once.
The "Tips & Tricks" in the ELKI DBID documentation probably answer your underlying question - how to use map DBIDs to line numbers of your input file. The best way is to if you want to have object identifiers, assign object identifiers yourself by using an identifier column (and configuring it to be an external identifier).
For further information, see the documentation: https://elki-project.github.io/dev/dbids

How to best validate JSON on the server-side

When handling POST, PUT, and PATCH requests on the server-side, we often need to process some JSON to perform the requests.
It is obvious that we need to validate these JSONs (e.g. structure, permitted/expected keys, and value types) in some way, and I can see at least two ways:
Upon receiving the JSON, validate the JSON upfront as it is, before doing anything with it to complete the request.
Take the JSON as it is, start processing it (e.g. access its various key-values) and try to validate it on-the-go while performing business logic, and possibly use some exception handling to handle vogue data.
The 1st approach seems more robust compared to the 2nd, but probably more expensive (in time cost) because every request will be validated (and hopefully most of them are valid so the validation is sort of redundant).
The 2nd approach may save the compulsory validation on valid requests, but mixing the checks within business logic might be buggy or even risky.
Which of the two above is better? Or, is there yet a better way?
What you are describing with POST, PUT, and PATCH sounds like you are implementing a REST API. Depending on your back-end platform, you can use libraries that will map JSON to objects which is very powerful and performs that validation for you. In JAVA, you can use Jersey, Spring, or Jackson. If you are using .NET, you can use Json.NET.
If efficiency is your goal and you want to validate every single request, it would be ideal if you could evaluate on the front-end if you are using JavaScript you can use json2.js.
In regards to comparing your methods, here is a Pro / Cons list.
Method #1: Upon Request
Pros
The business logic integrity is maintained. As you mentioned trying to validate while processing business logic could result in invalid tests that may actually be valid and vice versa or also the validation could inadvertently impact the business logic negatively.
As Norbert mentioned, catching the errors before hand will improve efficiency. The logical question this poses is why spend the time processing, if there are errors in the first place?
The code will be cleaner and easier to read. Having validation and business logic separated will result in cleaner, easier to read and maintain code.
Cons
It could result in redundant processing meaning longer computing time.
Method #2: Validation on the Go
Pros
It's efficient theoretically by saving process and compute time doing them at the same time.
Cons
In reality, the process time that is saved is likely negligible (as mentioned by Norbert). You are still doing the validation check either way. In addition, processing time is wasted if an error was found.
The data integrity can be comprised. It could be possible that the JSON becomes corrupt when processing it this way.
The code is not as clear. When reading the business logic, it may not be as apparent what is happening because validation logic is mixed in.
What it really boils down to is Accuracy vs Speed. They generally have an inverse relationship. As you become more accurate and validate your JSON, you may have to compromise some on speed. This is really only noticeable in large data sets as computers are really fast these days. It is up to you to decide what is more important given how accurate you think you data may be when receiving it or whether that extra second or so is crucial. In some cases, it does matter (i.e. with the stock market and healthcare applications, milliseconds matter) and both are highly important. It is in those cases, that as you increase one, for example accuracy, you may have to increase speed by getting a higher performant machine.
Hope this helps.
The first approach is more robust, but does not have to be noticeably more expensive. It becomes way less expensive even when you are able to abort the parsing process due to errors: Your business logic usually takes >90% of the resources in a process, so if you have an error % of 10%, you are already resource neutral. If you optimize the validation process so that the validations from the business process are performed upfront, your error rate might be much lower (like 1 in 20 to 1 in 100) to stay resource neutral.
For an example on an implementation assuming upfront data validation, look at GSON (https://code.google.com/p/google-gson/):
GSON works as follows: Every part of the JSON can be cast into an object. This object is typed or contains typed data:
Sample object (JAVA used as example language):
public class someInnerDataFromJSON {
String name;
String address;
int housenumber;
String buildingType;
// Getters and setters
public String getName() { return name; }
public void setName(String name) { this.name=name; }
//etc.
}
The data parsed by GSON is by using the model provided, already type checked.
This is the first point where your code can abort.
After this exit point assuming the data confirmed to the model, you can validate if the data is within certain limits. You can also write that into the model.
Assume for this buildingType is a list:
Single family house
Multi family house
Apartment
You can check data during parsing by creating a setter which checks the data, or you can check it after parsing in a first set of your business rule application. The benefit of first checking the data is that your later code will have less exception handling, so less and easier to understand code.
I would definitively go for validation before processing.
Let's say you receive some json data with 10 variables of which you expect:
the first 5 variables to be of type string
6 and 7 are supposed to be integers
8, 9 and 10 are supposed to be arrays
You can do a quick variable type validation before you start processing any of this data and return a validation error response if one of the ten fails.
foreach($data as $varName => $varValue){
$varType = gettype($varValue);
if(!$this->isTypeValid($varName, $varType)){
// return validation error
}
}
// continue processing
Think of the scenario where you are directly processing the data and then the 10th value turns out to be of invalid type. The processing of the previous 9 variables was a waste of resources since you end up returning some validation error response anyway. On top of that you have to rollback any changes already persisted to your storage.
I only use variable type in my example but I would suggest full validation (length, max/min values, etc) of all variables before processing any of them.
In general, the first option would be the way to go. The only reason why you might need to think of the second option is if you were dealing with JSON data which was tens of MBs large or more.
In other words, only if you are trying to stream JSON and process it on the fly, you will need to think about second option.
Assuming that you are dealing with few hundred KB at most per JSON, you can just go for option one.
Here are some steps you could follow:
Go for a JSON parser like GSON that would just convert your entire
JSON input into the corresponding Java domain model object. (If GSON
doesn't throw an exception, you can be sure that the JSON is
perfectly valid.)
Of course, the objects which were constructed using GSON in step 1
may not be in a functionally valid state. For example, functional
checks like mandatory fields and limit checks would have to be done.
For this, you could define a validateState method which repeatedly
validates the states of the object itself and its child objects.
Here is an example of a validateState method:
public void validateState(){
//Assume this validateState is part of Customer class.
if(age<12 || age>150)
throw new IllegalArgumentException("Age should be in the range 12 to 120");
if(age<18 && (guardianId==null || guardianId.trim().equals(""))
throw new IllegalArgumentException("Guardian id is mandatory for minors");
for(Account a:customer.getAccounts()){
a.validateState(); //Throws appropriate exceptions if any inconsistency in state
}
}
The answer depends entirely on your use case.
If you expect all calls to originate in trusted clients then the upfront schema validation should be implement so that it is activated only when you set a debug flag.
However, if your server delivers public api services then you should validate the calls upfront. This isn't just a performance issue - your server will likely be scrutinized for security vulnerabilities by your customers, hackers, rivals, etc.
If your server delivers private api services to non-trusted clients (e.g., in a closed network setup where it has to integrate with systems from 3rd party developers), then you should at least run upfront those checks that will save you from getting blamed for someone else's goofs.
It really depends on your requirements. But in general I'd always go for #1.
Few considerations:
For consistency I'd use method #1, for performance #2. However when using #2 you have to take into account that rolling back in case of non valid input may become complicated in the future, as the logic changes.
Json validation should not take that long. In python you can use ujson for parsing json strings which is a ultrafast C implementation of the json python module.
For validation, I use the jsonschema python module which makes json validation easy.
Another approach:
if you use jsonschema, you can validate the json request in steps. I'd perform an initial validation of the most common/important parts of the json structure, and validate the remaining parts along the business logic path. This would allow to write simpler json schemas and therefore more lightweight.
The final decision:
If (and only if) this decision is critical I'd implement both solutions, time-profile them in right and wrong input condition, and weight the results depending on the wrong input frequency. Therefore:
1c = average time spent with method 1 on correct input
1w = average time spent with method 1 on wrong input
2c = average time spent with method 2 on correct input
2w = average time spent with method 2 on wrong input
CR = correct input rate (or frequency)
WR = wrong input rate (or frequency)
if ( 1c * CR ) + ( 1w * WR) <= ( 2c * CR ) + ( 2w * WR):
chose method 1
else:
chose method 2

How can I store an array of boolean values in a MySql database?

In my case, every "item" either has a property , or not. The properties can be some hundreds, so I will need , say, max 1000 true/false bits per item.
Is there a way to store those bits in one field of the item ?
If you're looking for a way to do this in a way that's searchable, then no.
A couple searchable methods (involving more than 1 column and/or table):
Use a bunch of SET columns. You're limited to 64 items (on/offs) in a set, but you cna probably figure out a way to group them.
Use 3 tables: Items (id, ...), FlagNames(id, name), and a pivot table ItemFlags(item_id, flag_id). You can then query for items with joins.
If you don't need it to be searchable, then all you need is a method to serialize your data before you put it in the database, and a unserialize it when you pull it out, then use a char, or varchar column.
Use facilities built in to your language (PHP's serialize/unserialize).
Concatenate a series of "y" and "n" characters together.
Bit-pack your values into a string (8 bits per character) in the client before making a call to the MySQL database, and unpack them when retrieving data out of the database. This is the most efficient storage mechanism (if all rows are the same, use char[x], not varchar[x]) at the expense of the data not being searchable and slightly more complicated code.
I would rather go with something like:
Properties
ID, Property
1, FirsProperty
2, SecondProperty
ItemProperties
ID, Property, Item
1021, 1, 10
1022, 2, 10
Then it would be easy to retrieve which properties are set or not with a query for any particular item.
At worst you would have to use a char(1000) [ynnynynnynynnynny...] or the like. If you're willing to pack it (for example, into hex isn't too bad) you could do it with a char(64) [hexadecimal chars].
If it is less than 64, then the SET type will work, but it seems like that's not enough.
You could use a binary type, but that's designed more for stuff like movies, etc.. so I'd not.
So yeah, it seems like your best bet is to pack it into a string, and then store that.
It should be noted that a VARCHAR would be wasting space, since you do know precisely how much space your data will take, and can allocate it exactly. (Having fixed-width rows is a good thing)
Strictly speaking you can accomplish this using the following:
$bools = array(0,1,1,0,1,0,0,1);
$for_db = serialize($array);
// Insert the serialized $for_db string into the database. You could use a text type
// make certain it could hold the entire string.
// To get it back out:
$bools = unserialize($from_db);
That said, I would strongly recommend looking at alternative solutions.
Depending on the use case you might try creating an "item" table that has a many-to-many relationship with values from an "attributes" table. This would be a standard implementation of the common Entity Attribute Value database design pattern for storing variable points of data about a common set of objects.