Zabbix Memory Used key - zabbix

I have an item that has vm.memory.size[used] as a key, this returns the memory used, this also included the cached and the buffers.
I need to subtract vm.memory.size[cached] and vm.memory.size[buffers] from the vm.memory.size[used] to get the value that I need.
How can I do this please since I cannot find a way to do this, this what I tried lately but deos not work.

If you want to calculate it in a separate item, you must have used, cached and buffers already monitored as normal items. Once you have them, the calculated item formula would be last(vm.memory.size[used])-last(vm.memory.size[cached])-last(vm.memory.size[[buffers]) .
You can also calculate that directly in a trigger, removing the need for the calculated item.
And maybe even simpler than that - vm.memory.size[available] and vm.memory.size[pavailable] item keys can give you the (raw and percentage, respectively) amount of the available memory - already excluding cache & buffers - that you might be able to alert on directly.

Related

What's the best way to read in an entire LOB using ODBC?

Reading in an entire LOB whose size you don't know beforehand (without a max allocation + copy) should be a fairly common problem, but finding good documentation and/or examples on the "right" way to do this has proved utterly maddening for me.
I wrestled with SQLBindCol but couldn't see any good way to make it work. SQLDescribeCol and SQLColAttribute return column metadata that seemed to be a default or an upper bound on the column size and not the current LOB's actual size. In the end, I settled on using the following:
1) Put any / all LOB columns as the highest numbered columns in your SELECT statement
2) SQLPrepare the statement
3) SQLBindCol any earlier non-LOB columns that you want
4) SQLExecute the statement
5) SQLFetch a result row
6) SQLGetData on your LOB column with a buffer of size 0 just to query its actual size
7) Allocate a buffer just big enough to hold your LOB
8) SQLGetData again on your LOB column with your correctly sized allocated buffer this time
9) Repeat Steps 6-8 for each later LOB column
10) Repeat Steps 5-9 for any more rows in your result set
11) SQLCloseCursor when you are done with your result set
This seems to work for me, but also seems rather involved.
Are the calls to SQLGetData going back to the server or just processing the results already sent to the client?
Are there any gotchas where the server and/or client will refuse to process very large objects this way (e.g. - some size threshold is exceeded so they generate an error instead)?
Most importantly, is there a better way to do this?
Thanks!
I see several improvements to be done.
If you need to allocate a buffer then you should do it once for all the records and columns. So, you could use the technique suggested by #RickJames, improved with a MAX like this:
SELECT MAX(LENGTH(blob1)) AS max1, MAX(LENGTH(blob2)) AS max2, ...
You could use max1 and max2 to upfront allocate the buffers, or maybe only the largest one for all columns.
The length of the buffer returned at 1. might be too large for your application. You could decide at runtime how large the buffer would be. Anyway, SQLGetData is designed to be called multiple times for each column. Just by calling it again, with the same column number, it will fetch the next chunk. The count of available bytes will be saved where StrLen_or_IndPtr (the last argument) points. And this count will decrease after each call with the amount of bytes fetched.
And certainly there will be roundtrips to the server for each call because the purpose of all this is to prevent the driver from fetching more than the application can handle.
The trick with passing NULL as buffer pointer in order to get the length is prohibited in this case, check SQLGetData on Microsoft's Docs.
However, you could allocate a minimal buffer, say 8 bytes, pass it and its length. The function will return the count of bytes written, 7 in our case because the function add a null char, and will put at StrLen_or_IndPtr the count of remaining bytes. But you probably won't need this if you allocate the buffer as explained above.
Note: The LOBs need to be at the end of the select list and must be fetched in that order precisely.
SQLGetData
SQLGetData get the result of already fetched result. For example, if you have SQLFetch the first row of your table, SQLData will send you back the first row. It is used if you don't know if you can SQLBindCol the result.
But the way it is handle depends on your driver and is not describe in the standards. If your database is a SQL database, cursor cannot go backward, so the result may be still in the memory.
Large object query
The server may refuse to process large object according to the server standard and your ODBC Driver standard. It is not described in the ODBC standard.
To avoid a max-allocation, doing an extra copy, and to be efficient:
Getting the size first is not a bad approach -- it takes virtually no extra time to do
SELECT LENGTH(your_blob) FROM ...
Then do the allocation and actually fetch the blob.
If there are multiple BLOB columns, grab all the lengths in a single pass:
SELECT LENGTH(blob1), LENGTH(blob2), ... FROM ...
In MySQL, the length of a BLOB or TEXT is readily available in front of the bytes. But, even if it must read the column to get the length, think of that as merely priming the cache. That is, the overall time is not hurt much in either case.

What is the most efficient way of partitioning my buffer which favours a small memory footprint?

I have a large set of data, represented as a 1D array. Each individual entry is around 50 bytes in size. I would like to partition this data over a given number of buckets on the GPU using CUDA.
In order to be able to fit as much data in a single batch processed on the GPU in one go, I would like the buffer containing the bucket contents to be as small as possible, given that making copies of my input data is not really viable given the memory constraints. Using integer indices referencing the large input buffer seems to be the way to go here.
A naive approach is to allocate a constant number of these references in each bucket. Unfortunately, the input data set is not entirely distributed evenly. This may cause some buckets to end up being full and overflowing, while others may not even have a single entry inside of them. Unfortunately, for my application the buckets have a practical meaning, so entries which are put into a bucket really have to end up in said bucket.
The most memory-efficient possible way as far as I can tell is to allocate an array containing one index per bucket. You subsequently allocate a second buffer containing structs, which in turn contain a pointer to the entry in the large input data buffer, as well as an index to a subsequent element (thus creating a linked list for each entry in the large buffer for each bucket).
This makes the total necessary size of both buffers known on beforehand, allowing for efficient allocation. Unfortunately, while constructing the buffer it is necessary to keep track how much of the buffer has been used up to that point. To avoid race conditions, it's necessary to use an atomic increment on a single variable.
This means many threads will need to wait on a single variable to allocate their next entry in the bucket's linked list. Which I expect will bring performance to its knees in an instant.
I therefore thought it'd be better to keep the idea of using linked lists to define the contents of each bucket, but expand each entry to hold not one but multiple references to the large input buffer, to reduce the number of times atomic operations need to be performed. This comes at the cost for some wasted memory when linked list entries are not entirely filled themselves, though I think this should be acceptable.
I have two questions about this strategy:
Is this a sound strategy?
If not, is there a better one, or one which is used more commonly in similar situations?
You already have a one-to-one correspondence between the structs in your buffer and the original data items. So instead of allocating structs in the buffer via incrementing an atomic counter, you can use the index of the original item to locate the struct in the buffer.
If you index the buffer with the same index as your original data, you also don't need to keep pointers to the original data items, and the buffer of structs becomes a simple array of integers to hold the next entry in the linked list.
Alternatively, a two pass approach over the data works as well: In the first pass, just count the number of items in each bucket. Then you can allocate the right amount of memory for each bucket via an inclusive scan before the second pass to actually fill the buckets.

redis as write-back view count cache for mysql

I have a very high throughput site for which I'm trying to store "view counts" for each page in a mySQL database (for legacy reasons they must ultimately end up in mySQL).
The sheer number of views is making it impractical to do SQL "UPDATE ITEM SET VIEW_COUNT=VIEW_COUNT+1" type of statements. There are millions of items but most are only viewed a small number of times, others are viewed many times.
So I'm considering using Redis to gather the view counts, with a background thread that writes the counts to mySQL. What is the recommended method for doing this? There are some issues with the approach:
how often does the background thread run?
how does it determine what to write back to mySQL?
should I store a Redis KEY for every ITEM that gets hit?
what TTL should I use?
is there already some pre-built solution or powerpoint presentation that gets me halfway there, etc.
I have seen very similar questions on StackOverflow but none with a great answer...yet! Hoping there's more Redis knowledge out there at this point.
I think you need to step back and look at some of your questions from a different angle to get to your answers.
"how often does the background thread run?"
To answer this you need to answer these questions: How much data can you lose? What is the reason for the data being in MySQL, and how often is that data accessed? For example, if the DB is only needed to be consulted once per day for a report, you might only need it to be updated once per day. On the other hand, what if the Redis instance dies? How many increments can you lose and still be "ok"? These will provide the answers to the question of how often to update your MySQL instance and aren't something we can answer for you.
I would use a very different strategy for storing this in redis. For the sake of the discussion let us assume you decide you need to "flush to db" every hour.
Store each hit in hashes with a key name structure along these lines:
interval_counter:DD:HH
interval_counter:total
Use the page id (such as MD5 sum of the URI, the URI itself, or whatever ID you currently use) as the hash key and do two increments on a page view; one for each hash. This provides you with a current total for each page and a subset of pages to be updated.
You would then have your cron job run a minute or so after the start of the hour to pull down all pages with updated view counts by grabbing the previous hour's hash. This provides you with a very fast means of getting the data to update the MySQL DB with while avoiding any need to do math or play tricks with timestamps etc.. By pulling data from a key which is no longer bing incremented you avoid race conditions due to clock skew.
You could set an expiration on the daily key, but I'd rather use the cron job to delete it when it has successfully updated the DB. This means your data is still there if the cron job fails or fails to be executed. It also provides the front-end with a full set of known hit counter data via keys that do not change. If you wanted, you could even keep the daily data around to be able to do window views of how popular a page is. For example if you kept the daily hash around for 7 days by setting an expire via the cron job instead of a delete, you could display how much traffic each page has had per day for the last week.
Executing two hincr operations can be done either solo or pipelined still performs quite well and is more efficient than doing calculations and munging data in code.
Now for the question of expiring the low traffic pages vs memory use. First, your data set doesn't sound like one which will require huge amounts of memory. Of course, much of that depends on how you identify each page. If you have a numerical ID the memory requirements will be rather small. If you still wind up with too much memory, you can tune it via the config, and if needs be could even use a 32 bit compile of redis for a significant memory use reduction. For example, the data I describe in this answer I used to manage for one of the ten busiest forums on the Internet and it consumed less than 3GB of data. I also stored the counters in far more "temporal window" keys than I am describing here.
That said, in this use case Redis is the cache. If you are still using too much memory after the above options you could set an expiration on keys and add an expire command to each ht. More specifically, if you follow the above pattern you will be doing the following per hit:
hincr -> total
hincr -> daily
expire -> total
This lets you keep anything that is actively used fresh by extending it's expiration every time it is accessed. Of course, to do this you'd need to wrap your display call to catch the null answer for hget on the totals hash and populate it from the MySQL DB, then increment. You could even do both as an increment. This would preserve the above structure and would likely be the same codebase needed to update the Redis server from the MySQL Db if you the Redis node needed repopulation. For that you'll need to consider and decide which data source will be considered authoritative.
You can tune the cron job's performance by modifying your interval in accordance with the parameters of data integrity you determine from the earlier questions. To get a faster running cron nob you decrease the window. With this method decreasing the window means you should have a smaller collection of pages to update. A big advantage here is you don't need to figure out what keys you need to update and then go fetch them. you can do an hgetall and iterate over the hash's keys to do updates. This also saves many round trips by retrieving all the data at once. In either case if you will likely want to consider a second Redis instance slaved to the first to do your reads from. You would still do deletes against the master but those operations are much quicker and less likely to introduce delays in your write-heavy instance.
If you need disk persistence of the Redis DB, then certainly put that on a slave instance. Otherwise if you do have a lot of data being changed often your RDB dumps will be constantly running.
I hope that helps. There are no "canned" answers because to use Redis properly you need to think first about how you will access the data, and that differs greatly from user to user and project to project. Here I based the route taken on this description: two consumers accessing the data, one to display only and the other to determine updating another datasource.
Consolidation of my other answer:
Define a time-interval in which the transfer from redis to mysql should happen, i.e. minute, hour or day. Define it in a way so that fast and easyly an identifying key can be obtained. This key must be ordered, i.e. a smaller time should give a smaller key.
Let it be hourly and the key be YYYYMMDD_HH for readability.
Define a prefix like "hitcount_".
Then for every time-interval you set a hash hitcount_<timekey> in redis which contains all requested items of that interval in the form ITEM => count.
There exists two parts of the solution:
The actual page that has to count:
a) get the current $timekey, i.e. by date- functions
b) get the value of $ITEM
b) send the redis-command HINCRBY hitcount_$timekey $ITEM 1
A cronjob which runs in that given interval, not too close to the limit of that intervals (in example: not at the full hour). This cronjob:
a) Extracts the current time-key (for now it would be 20130527_08)
b) Requests all matching keys from redis with KEYS hitcount_* (those should be a small number)
c) compares every such hash against the current hitcount_<timekey>
d) if that key is smaller than current key, then process it as $processing_key:
read all pairs ITEM => counter by HGETALL $processing_key as $item, $cnt
update the database with `UPDATE ITEM SET VIEW_COUNT=VIEW_COUNT+$cnt where ITEM=$item"
delete that key from the hash by HDEL $processing_key $item
no need to del the hash itself - there are no empty hashes in redis as far as I tried
If you want to have a TTL involved, say if the cleanup-cronjob may be not reliable (as might not run for many hours), then you could create the future hashes by the cronjob with an appropriate TTL, that means for now we could create a hash 20130527_09 with ttl 10 hours, 20130527_10 with TTL 11 hours, 20130527_11 with TTL 12 hours. Problem is that you would need a pseudokey, because empty hashes seem to be deleted automatically.
See EDIT3 for current state of the A...nswer.
I would write a key for every ITEM. A few tenthousand keys are definitely no problem at all.
Do the pages change very much? I mean do you get a lot of pages that will never be called again? Otherwise I would simply:
add the value for an ITEM on page request.
every minute or 5 minutes call a cronjob that reads the redis-keys, read the value (say 7) and reduce it by decrby ITEM 7. In MySQL you could increment the value for that ITEM by 7.
If you have a lot of pages/ITEMS which will never be called again you could make a cleanup-job once a day to delete keys with value 0. This should be locked against incrementing that key again from the website.
I would set no TTL at all, so the values should live forever. You could check the memory usage, but I see a lot of different possible pages with current GB of memory.
EDIT: incr is very nice for that, because it sets the key if not set before.
EDIT2: Given the large amount of different pages, instead of the slow "keys *" command you could use HASHES with incrby (http://redis.io/commands/hincrby). Still I am not sure if HGETALL is much faster then KEYS *, and a HASH does not allow a TTL for single keys.
EDIT3: Oh well, sometimes the good ideas come late. It is so simple: Just prefix the key with a timeslot (say day-hour) or make a HASH with name "requests_". Then no overlapping of delete and increment may happen! Every hour you take the possible keys with older "day_hour_*" - values, update the MySQL and delete those old keys. The only condition is that your servers are not too different on their clock, so use UTC and synchronized servers, and don't start the cron at x:01 but x:20 or so.
That means: a called page converts a call of ITEM1 at 23:37, May 26 2013 to Hash 20130526_23, ITEM1. HINCRBY count_20130526_23 ITEM1 1
One hour later the list of keys count_* is checked, and all up to count_20130523 are processed (read key-value by hgetall, update mysql), and deleted one by one after processing (hdel). After finishing that you check if hlen is 0 and del count_...
So you only have a small amount of keys (one per unprocessed hour), that makes keys count_* fast, and then process the actions of that hour. You can give a TTL of a few hours, if your cron is delayed or timejumped or down for a while or something like that.

cache behaviour on redundant writes

Edit - I guess the question I asked was too long so I'm making it very specific.
Question: If a memory location is in the L1 cache and not marked dirty. Suppose it has a value X. What happens if you try to write X to the same location? Is there any CPU that would see that such a write is redundant and skip it?
For example is there an optimization which compares the two values and discards a redundant write back to the main memory? Specifically how do mainstream processors handle this? What about when the value is a special value like 0? If there's no such optimization even for a special value like 0, is there a reason?
Motivation: We have a buffer that can easily fit in the cache. Multiple threads could potentially use it by recycling amongst themselves. Each use involves writing to n locations (not necessarily contiguous) in the buffer. Recycling simply implies setting all values to 0. Each time we recycle, size-n locations are already 0. To me it seems (intuitively) that avoiding so many redundant write backs would make the recycling process faster and hence the question.
Doing this in code wouldn't make sense, since branch instruction itself might cause an unnecessary cache miss (if (buf[i]) {...} )
I am not aware of any processor that does the optimization you describe - eliminating writes to clean cache lines that would not change the value - but it's a good question, a good idea, great minds think alike and all that.
I wrote a great big reply, and then I remembered: this is called "Silent Stores" in the literature. See "Silent Stores for Free", K. Lepak and M Lipasti, UWisc, MICRO-33, 2000.
Anyway, in my reply I described some of the implementation issues.
By the way, topics like this are often discussed in the USEnet newsgroup comp.arch.
I also write about them on my wiki, http://comp-arch.net
Your suggested hardware optimization would not reduce the latency. Consider the operations at the lowest level:
The old value at the location is loaded from the cache to the CPU (assuming it is already in the cache).
The old and new values are compared.
If the old and new values are different, the new value is written to the cache. Otherwise it is ignored.
Step 1 may actually take longer time than steps 2 and 3. It is because steps 2 and 3 cannot start until the old value from step 1 has been brought into the CPU. The situation would be the same if it was implemented in software.
Consider if we simply write the new values to the cache, without checking the old value. It is actually faster than the three-step process mentioned above, for two reasons. Firstly, there is no need to wait for the old value. Secondly, the CPU can simply schedule the write operation in an output buffer. The output buffer can perform the cache write simutaneously while the ALU can start working on something else.
So far, the only latencies involved are that of between the CPU and the cache, not between the cache and the main memory.
The situation is more complicated in modern-day microprocessors, because their cache is organized into cache-lines. When a byte value is written to a cache-line, the complete cache-line has to be loaded because the other part of the cache-line that is not rewritten has to keep its old values.
http://blogs.amd.com/developer/tag/sse4a/
Read
Cache hit: Data is read from the cache line to the target register
Cache miss: Data is moved from memory to the cache, and read into the target register
Write
Cache hit: Data is moved from the register to the cache line
Cache miss: The cache line is fetched into the cache, and the data from the register is moved to the cache line
This is not an answer to your original question on computer-architecture, but might be relevant to your goal.
In this discussion, all array index starts with zero.
Assuming n is much smaller than size, change your algorithm so that it saves two pieces of information:
An array of size
An array of n, and a counter, used to emulate a set container. Duplicate values allowed.
Every time a non-zero value is written to the index k in the full-size array, insert the value k to the set container.
When the full-size array needs to be cleared, get each value stored in the set container (which will contain k, among others), and set each corresponding index in the full-size array to zero.
A similar technique, known as a two-level histogram or radix histogram, can also be used.
Two pieces of information are stored:
An array of size
An boolean array of ceil(size / M), where M is the radix. ceil is the ceiling function.
Every time a non-zero value is written to index k in the full-size array, the element floor(k / M) in the boolean array should be marked.
Let's say, bool_array[j] is marked. This corresponds to the range from j*M to (j+1)*M-1 in the full-size array.
When the full-size array needs to be cleared, scan the boolean array for any marked elements, and its corresponding range in the full-size array should be cleared.

Using Memcache as a counter for multiple objects

I have a photo-hosting website, and I want to keep track of views to the photos. Due to the large volume of traffic I get, incrementing a column in MySQL on every hit incurs too much overhead.
I currently have a system implemented using Memcache, but it's pretty much just a hack.
Every time a photo is viewed, I increment its photo-hits_uuid key in Memcache. In addition, I add a row containing the uuid to an invalidation array also stored in Memcache. Every so often I fetch the invalidation array, and then cycle through the rows in it, pushing the photo hits to MySQL and decrementing their Memcache keys.
This approach works and is significantly faster than directly using MySQL, but is there a better way?
I did some research and it looks like Redis might be my solution. It seems like it's essentially Memcache with more functionality - the most valuable to me is listing, which pretty much solves my problem.
There is a way that I use.
Method 1: (Size of a file)
Every time that someone hits the page, I add one more byte to a file. Then after x seconds or so (I set 600), I will count how many bytes that are in my file, delete my file, then I update it to the MySQL database. This will also allow scalability if multiple servers are adding to a small file in a cache server. Use fwrite to append to the file and you will never have to read that cache file.
Method 2: (Number stored in a file)
Another method is to store a number in a text file that contains the number of hits, but I recommend from using this because if two processes were simultaneously updating, data might be off (maybe same with method1).
I would use method 1 because although it is a bigger file size, it is faster.
I'm assuming you're keeping access logs on your server for this solution.
Keep track of the last time you checked your logs.
Every n seconds or so (where n is less than the time it takes for your logs to be rotated, if they are), scan through the latest log file, ignoring every hit until you find a timestamp after your last check time.
Count how many times each image was accessed.
Add each count to the count stored in the database.
Store the timestamp of the last log entry you processed for next time.