Gemfire/Geode Back-ups - configuration

I'm trying to pin down something in the Gemfire documentation around region back-ups.
http://gemfire.docs.pivotal.io/geode/reference/topics/cache_xml.html#region
Scroll down to the SCOPE attribute...
Using the SCOPE attribute on REGION-ATTRIBUTES I'm assuming that SCOPE="DISTRIBUTED-ACK" would mean a SYNC back-up operation on a REGION and that SCOPE="DISTRIBUTED-NO-ACK" means a ASYNC back-up operation.
The REGION in question is PARTITIONED. I understand that REPLICATED regions default to DISTRIBUTED-ACK.
Would this assumption be correct? e.g. that via configuration Gemfire allows to configure SYNC or ASYNC back-up operations for REGION entry updates.

Backups actually operate at the level of disk stores and files, not individual regions. The backup operation will create a copy of all of the disk store files, which may contain data for many regions with different scopes. The gfsh backup disk-store command will always wait for the backup to complete. So the region scope doesn't really affect whether the backup command is synchronous or asynchronous.
If you use DISTRIBUTED_NO_ACK scope, it does mean that a put could complete before all members receive the update, so technically there is no guarantee that a put on a NO_ACK region will be part of a backup that happens after the put.

Related

Can we enable mysql binary logging for a specific table

I want to write a listener which detects the DML changes on a table and perform some actions. This listener cannot be embedded in the application and it runs separately.
I thought let the application write to blackhole table and I will detect the changes from the binary log file.
But in the docs I found that enabling binary logging slows down the mysql performance slightly. Thats why i was wondering is there a way i can make the mysql master to log the changes related to a specific table.
Thanks!
SQL is the best way to track DML change and call function based on that. But, as you want to explore other options you may try
writing a cronjob with General Query Log which includes SELECT / SHOW statements as well which you don't need
mysqlbinlog : It slows down performance just a little, but it is necessary for point in time data recovery and replication.
Suggestions:
On a prod environment, MySQL binary log must be enabled. and general
query log must be disabled as general query logs almost everything
and gets filled very quickly and might run out of disk space if not
rotated properly.
On a dev/qa environment, general query log can be enabled with proper
rotation policy.

GCE randomly changing disk names for additional mounted disks under /dev/disk/by-id?

I see an apparent random problem about once a month that is doing my head in. Google appears to be changing the naming convention for additional disks (to root) and how they are presented under /dev/disk/by-id/ at boot.
All the time the root disk is available as /dev/disk/by-id/google-persistent-disk-0
MOST of the time the single extra disk we mount is presented as /dev/disk/by-id/google-persistent-disk-1
We didn't give this name but we wrote our provisioning scripts to expect this convention.
Every now and then, on rebooting the VM, our startup scripts fail in executing a safe mount:
/usr/share/google/safe_format_and_mount -m "mkfs.ext4 -F" /dev/disk/by-id/google-persistent-disk-1 /mountpoint
They fail because something has changed the name of the disk. Its no longer /dev/disk/by-id/google-persistent-disk-1 its now /dev/disk/by-id/google-{the name we gave it when we created it}
Last time I updated our startup scripts to use this new naming convention it switched back an hour later. WTF?
Any clues appreciated. Thanks.
A naming convention beyond your control is not a stable API. You should not write your management tooling to assume this convention will never be changed -- as you can see, it's changing for reasons you have nothing to do with, and it's likely that it will change again. If you need access to the list of disks on the system, you should query it through udev, or you can consider using /dev/disk/by-uuid/ which will not change (because the UUID is generated at filesystem creation) instead of /dev/disk/by-id/.

How to perform targeted select queries on main DB instance when using Amazon MySQL RDS and Read replica?

I'm considering to use Amazon MySQL RDS with Read Replicas. The only thing disturbing me is Replica Lag and eventual inconsistency. For example, image the case when user modifies his profile (UPDATE will be performed on main DB instance) and then refreshes the page to see changed info (SELECT might be performed from Replica which has not received changes yet due to Replica Lag).
By accident, I found Amazon article which mentions its possible to perform targeted queries. For me it sounds like we can add some parameter or other to tell Amazon to execute select on the main DB instance instead of on Replica. The example with user profile is quite trivial but the same problem occurs in more realistic cases, for example checkout, when a user performs several steps and he needs to see updated info on then next screens. Yes, application could cache entire data set on its own, however it would be great if anybody knows how to perform targeted queries on main DB instance.
I read the link you referenced and didn't find any mention of "target" or anything like that.
But this line might be what you're referring to:
Otherwise, you should spread out the load and read from one of the
Read Replicas. You can make this decision on a query-by-query basis
within your application. You will probably want to maintain some sort
of registry of available Read Replicas within your application,
choosing from among them on a round-robin or randomly distributed
basis.
If so, then I interpret that line to suggest that you can balance reads in your application by just picking one server from a pool and hitting that one. But it would be all in your application logic.

How do I make a snapshot of my boot disk?

I've read multiple times that I can cause read/write errors if I create a snapshot. Is it possible to create a snapshot of the disk my machine is booted off of?
It depends on what you mean by "snapshot".
A snapshot is not a backup, it is a way of temporarily capturing the state of a system so you can make changes test the results and revert back to the previously known good state if the changes cause issues.
How to take a snapshot varies depending on the OS you're using, whether you're talking about a physical system or a virtual system, what virtualization platform, you're using, what image types you're using for disks within a given virtualization platform etc. etc. etc.
Once you have a snapshot, then you can make a real backup from the snapshot. You'll want to make sure that if it's a database server that you've flushed everything to disk and then write lock it for the time it takes to make the snapshot (typically seconds). For other systems you'll similarly need to address things in a way that ensures that you have a consistent state.
If you want to make a complete backup of your system drive, directly rather than via a snapshot then you want to shut down and boot off an alternate boot device like a CD or an external drive.
If you don't do that, and try to directly back up a running system then you will be leaving yourself open to all manner of potential issues. It might work some of the time, but you won't know until you try and restore it.
If you can provide more details about the system in question, then you'll get more detailed answers.
As far as moving apps and data to different drives, data is easy provided you can shut down whatever is accessing the data. If it's a database, stop the database, move the data files, tell the database server where to find its files and start it up.
For applications, it depends. Often it doesn't matter and it's fine to leave it on the system disk. It comes down to how it's being installed.
It looks like that works a little differently. The first snapshot will create an entire copy of the disk and subsequent snapshots will act like ordinary snapshots. This means it might take a bit longer to do the first snapshot.
According to :
this you ideally want to shut down the system before taking a snapshot of your boot disk. If you can't do that for whatever reason, then you want to minimize the amount of writes hitting the disk and then take the snapshot. Assuming you're using a journaling filesystem (ext3, ext4, xfs etc.) it should be able to recover without issue.
You an use the GCE APIs. Use the Disks:insert API to create the Persistence disk. you have some code examples on how to start an instance using Python, but Google has libraries for other programming languages like Java, PHP and other

Why do most relational databases write to logs rather than directly to disk using memory mapping?

There are memory mapping facilities available for mapping a file writeable into memory. I would expect all modern operating systems to reflect the change in memory to the disk asynchronously, so why do most relational databases use log files/journals instead?
Using memmapped files (RAM cache in general), using logs
and write through to disk is not contradicting.
Keeping heavily used data in RAM will speed up everything.
Write changes not immediately to disk would result
in a possible data loss if crashing (power outage...)
(both main data and/or log).
And change logs are useful, for example, when using transactional stuff
instead of single statements (ie. multiple actions that have to be executed
either completely or nothing): If there is a crash while a transacion is running,
there would be inconsistend data on disk after rebooting
(only parts of the transaction done).
With the change log, the half transaction can be undone again.
edit: As one insert/update... doesn´t map directly to a disk block,
that cannot be solved with caching/writing certain disk blocks.
edit # comment:
No. As i said, db actions doesn´t map to disk blocks.
Lets say there three values in a table: v1,v2,v3.
v1 and v2 are in HDD block b1, v3 in block b2.
What the user want now: First, change v1 to 100
and then add 123 to v2 and v3 in a transaction,
ie. either add to both or none.
Change of v1 and v2 work smoothly, then power outage.
Theoretically, v2 would need a "rollback" to the old value.
How would you do this with a journaling FS only?
You will probably have the old and new content of disk block b1.
The new content has v2 changed already: Bad.
If you use the old content, you would undo the change of v1 too. Bad.
You could take v1 from the new block and v2 from the old, yes.
But how do you know that, what to take from which block without a DB log?
edit2: It would be nice if you could let the original question unchanged in your comment.
It was something like "if a journaling FS doesn´t deprecate a DB log".