HBase RowKey Design based on Salting method

HBase RowKey Design based on Salting method - configuration

How can I configure the number of region in one RegionServer , and how I configure the number off StoreFiles on a Region using the hbase-site.xml file or others file on the configuration of hbase.

I'm not sure if you can precisely define those properties. Since HBase try to better control it for you according the data you have.
You can look on these properties:
hbase.regionserver.regionSplitLimit
hbase.regionserver.region.split.policy
Defining limits or policy for new region creation. I would suggest to take a look in default configuration and see what you can use to achieve your end goal.
Check:
https://github.com/apache/hbase/blob/master/hbase-common/src/main/resources/hbase-default.xml

In my case I use the algorithm has RegionSplitter with
HexAlgoritrhmSplitter, this way I guarantee the knowledge of the position of my data pertinetes Thanks for help :D

Related

what are 'NET USE' possible outputs?

The question is : what are the NET USE possible outputs?
You can drown yourself with websites explaining how to use NET USE command, but not a single one about what is coming out of it.
In my case I'm interested in the various error messages, and the interaction with the Powershell automatic variable $LASTEXITCODE. I want to handle its output correctly but I don't know what can even happen (and no, I won't use New-PSDrive).
Does someone knows the what or where I can find the information ?
Thanks

You can use the example in https://www.compatdb.org/forums/topic/20487-net-use-return-code/ to obtain a list of the numerical codes for your evaluation.
If you want to dig deeper take you need to download the Win32 SDK and go through the definitions in the header files (see https://learn.microsoft.com/en-us/windows/win32/api/winnetwk/ns-winnetwk-netresourcea etc).

How to write a configuration file to tell the AllenNLP trainer to randomly split dataset into train and dev

The official document of AllenNLP suggests specifying "validation_data_path" in the configuration file, but what if one wants to construct a dataset from a single source and then randomly split it into train and validation datasets with a given ratio?
Does AllenNLP support this? I would greatly appreciate your comments.

AllenNLP does not have this functionality yet, but we are working on some stuff to get there.
In the meantime, here is how I did it for the VQAv2 reader: https://github.com/allenai/allennlp-models/blob/main/allennlp_models/vision/dataset_readers/vqav2.py#L354
This reader supports Python slicing syntax where you, for example, specify a data_path as "my_source_file[:1000]" to take the first 1000 instances from my_source_file. You can also supply multiple paths by setting data_path: ["file1", "file2[:1000]", "file3[1000-"]]. You can probably steal the top two blocks in that file (line 354 to 369) and put them into your own dataset reader to achieve the same result.

The use of config file is it equivalent to use of globals?

I've read many times and agree with avoiding the use of globals to keep code orthogonal. Does the use of the config file to keep read only information that your program uses similar to using Globals?

If you're using config files in place of globals, then yes, they are similar.
Config files should only be used in cases where the end-user (presumably a computer-savvy user, like a developer) needs to declare settings for an application or piece of code, while keeping their hands out of the code itself.

My first reaction would be that it is not the same. I think the problem with globals is the read+write scenario. Config-files are readonly (at least in terms of execution).
In the same way constants are not considered bad programming behaviour. Config-files, at least in the way I use them, are just easy-changable constants.

Well, since a config file and a global variable can both have the effect of propagating changes throughout a system - they are roughly similar.
But... in the case of a configuration file that change is usually going to take place in a single, highly-visible (to the developer) location, and global variables can affect change in very sneaky and hard to track down ways -- so in this way the two concepts are not similar.
Having a configuration file ususally helps with DRY concepts, and it shouldn't hurt the orthogonality of the system, either.
Bonus points for using the $25 word 'orthogonal'. I had to look that one up in Wikipedia to find out the non-Euclidean definition.

Configuration files are really meant to be easily editable by the end user as a way of telling the program how to run.
A more specialized form of configuration files, user preferences, are used to remember things between program executions.

Global is related to a unique instance for an object which will never change, whereas config file is used as container for reference values, for objects within the application that can change.
One "global" object will never change during runtime, the other object is initialized through config file, but can change later on.
Actually, those objects not only can change during the lifetime of the application, they can also monitor the config file in order to realize "hot-change" (modification of their value without stopping/restarting the application), if that config file is modified.

They are absolutely not the same or replacements for eachother. A config file, or object can be used non-globally, ie passed explicitly.
You can of course have a global variable that refers to a config object, and that would be defeating the purpose.

Suggestions for Data Access Interface Name

I am looking for suggestions for an interface name.
The interface is for the primitive CRUD methods that will be defined later in the DAL, however I need to use it in a lower-level API. The interface itself will just have the four members, Create(), Read(), Update(), and Delete().
I am currently thinking something along the lines of IDataAccessPrimatives, but am very ambivalant on that name. What do you gals/guys suggest?
Thanks.

How about ICantBelieveItsNotButter ?
Or, ICanReadUpsideDown?
Or, (more seriously), IPersistData

Drop "Primitives."
I'd go with IDataAccess unless you need to differentiate from another "primitive" DAL interface.
Use the most straightforward names possible for your commonly used interfaces.

It sounds like you're using the Table Data Gateway pattern. How about ITableDataGateway or IGateway or some other derivative?

I'll go with IDataAccessOperation/IDataAccessService.
This clearly shows the responsibility of the interface.
Another options is to replace Service with Manager in the later option.

ICrud. Seriously. Why not? Every developer out to know what CRUD means.

Singleton for Application Configuration

In all my projects till now, I use to use singleton pattern to access Application configuration throughout the application. Lately I see lot of articles taking about not to use singleton pattern , because this pattern does not promote of testability also it hides the Component dependency.
My question is what is the best way to store Application configuration, which is easily accessible throughout the application without passing the configuration object all over the application ?.
Thanks in Advance
Madhu

I think an application configuration is an excellent use of the Singleton pattern. I tend to use it myself to prevent having to reread the configuration each time I want to access it and because I like to have the configuration be strongly typed (i.e, not have to convert non-string values each time). I usually build in some backdoor methods to my Singleton to support testability -- i.e., the ability to inject an XML configuration so I can set it in my test and the ability to destroy the Singleton so that it gets recreated when needed. Typically these are private methods that I access via reflection so that they are hidden from the public interface.
EDIT We live and learn. While I think application configuration is one of the few places to use a Singleton, I don't do this any more. Typically, now, I will create an interface and a standard class implementation using static, Lazy<T> backing fields for the configuration properties. This allows me to have the "initialize once" behavior for each property with a better design for testability.

Use dependency injection to inject the single configuration object into any classes that need it. This way you can use a mock configuration for testing or whatever you want... you're not explicitly going out and getting something that needs to be initialized with configuration files. With dependency injection, you are not passing the object around either.

For that specific situation I would create one configuration object and pass it around to those who need it.
Since it is the configuration it should be used only in certain parts of the app and not necessarily should be Omnipresent.
However if you haven't had problems using them, and don't want to test it that hard, you should keep going as you did until today.
Read the discussion about why are they considered harmful. I think most of the problems come when a lot of resources are being held by the singleton.
For the app configuration I think it would be safe to keep it like it is.

The singleton pattern seems to be the way to go. Here's a Setting class that I wrote that works well for me.

If any component relies on configuration that can be changed at runtime (for example theme support for widgets), you need to provide some callback or signaling mechanism to notify about the changed config. That's why it is not enough to pass only the needed parameters to the component at creation time (like color).
You also need to provide access to the config from inside of the component (pass complete config to component), or make a component factory that stores references to the config and all its created components so it can eventually apply the changes.
The former has the big downside that it clutters the constructors or blows up the interface, though it is maybe fastest for prototyping. If you take the "Law of Demeter" into account this is a big no because it violates encapsulation.
The latter has the advantage that components keep their specific interface where components only take what they need, and as a bonus gives you a central place for refactoring (the factory). In the long run code maintenance will likely benefit from the factory pattern.
Also, even if the factory was a singleton, it would likely be used in far fewer places than a configuration singleton would have been.

Here is an example done using Castale.Core >> DictionaryAdapter and StructureMap

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

HBase RowKey Design based on Salting method - configuration

How can I configure the number of region in one RegionServer , and how I configure the number off StoreFiles on a Region using the hbase-site.xml file or others file on the configuration of hbase.

In my case I use the algorithm has RegionSplitter with HexAlgoritrhmSplitter, this way I guarantee the knowledge of the position of my data pertinetes Thanks for help :D

Related

what are 'NET USE' possible outputs?

How to write a configuration file to tell the AllenNLP trainer to randomly split dataset into train and dev

The use of config file is it equivalent to use of globals?

Suggestions for Data Access Interface Name

Singleton for Application Configuration

Categories

Resources