What is a good definition of a "userdata pointer"? - language-agnostic

I have searched for a good explanation but can't find one.
I could try to write one myself but I'd prefer if someone with better english could help me explain this for Zan Lynx in the comment here.
...and it seems like there should be a good explanation somewhere, why not here?

When a library manages some data structures on behalf of a program (e.g. windows in a GUI application are managed by the OS), it usually keeps the contents of those structures private. However, it is typically useful for the program to maintain some additional data specific to the program's use of those structures. Therefore, a library will often provide access to a field (often called user data) which it stores with each structure.
A common use of the user data field by a program is to allocate some memory each time the program requests the library to create a structure, and to store the pointer to that memory in the user data field provided by the library, hence the term userdata pointer.

Look at sqlite3_exec() – it calls a callback (third parameter) for each retrieved row and passes a pointer you provide (fourth parameter) into this callback. This can be a pointer to whatever object you wish – you have to cast it appropriately before you can access the pointed object. This object is called a userdata object and a pointer to it is called a userdata pointer.
In case of sqlite3_exec() you can pass a pointer to a container which is to store all the retrieved table rows upon the request completion.

Related

Tcl extensions: Life Cycle of extensions' ClientData

Non-trivial native extensions will require per-interpreter data
structures that are dynamically allocated.
I am currently using Tcl_SetAssocData, with a key corresponding
to the extension's name and an appropriate deletion routine,
to prevent this memory from leaking away.
However, Tcl_PkgProvideEx also allows one to record such
information. This information can be retrieved by
Tcl_PkgRequireEx. Associating the extension's data structures
with its package seems more natural than in the "grab-bag"
AssocData; yet the Pkg*Ex routines do not provide an
automatically invoked deletion routine. So I think I need
to stay with the AssocData approach.
For which situations were the Pkg*Ex routines designed for?
Additionally, the Tcl Library allows one to install
ExitHandlers and ThreadExitHandlers. Paraphasing the
manual, this is for flushing buffers to disk etc.
Are there any other situations requiring use of ExitHandlers?
When Tcl calls exit, are Tcl_PackageUnloadProcs called?
The whole-extension ClientData is intended for extensions that want to publish their own stub table (i.e., an organized list of functions that represent an exact ABI) that other extensions can build against. This is a very rare thing to want to do; leave at NULL if you don't want it (and contact the Tcl core developers' mailing list directly if you do; we've got quite a bit of experience in this area). Since it is for an ABI structure, it is strongly expected to be purely static data and so doesn't need deletion. Dynamic data should be sent through a different mechanism (e.g., via the Tcl interpreter or through calling functions via the ABI).
Exit handlers (which can be registered at multiple levels) are things that you use when you have to delete some resource at an appropriate time. The typical points of interest are when an interpreter (a Tcl_Interp structure) is being deleted, when a thread is being deleted, and when the whole process is going away. What resources need to be specially deleted? Well, it's usually obvious: file handles, database handles, that sort of thing. It's awkward to answer in general as the details matter very much: ask a more specific question to get tailored advice.
However, package unload callbacks are only called in response to the unload command. Like package load callbacks, they use “special function symbol” registration, and if they are absent then the unload command will refuse to unload the package. Most packages do not use them. The use case is where there are very long-lived processes that need to have extra upgradeable functionality added to them.

What exactly is "handle"?

I've often heard about "handles", what exactly are those?
Edit:
For instance I have heard about:
windows handles
event handles
file handles
and so on. Are those things the same? Or they are some abstract terms?
A handle is an indirect way to reference an object owned by the OS or a library. When the operating system or a library owns an object but wants to let a client refer to it, it can provide a reference to that object called a handle.
Handles can be implemented in different ways. Typically they are not references in the C++ or C# sense. Often they are pointers cast to some opaque type, or they might be (or contain) an index into a table of objects that are owned by the operating system or library.
For example, in Windows, if you create a window, the OS creates an object that represents the window, but it doesn't return a pointer to that object. Instead, it returns a window handle, which provides an extra layer of indirection. When you pass the window handle back in another OS call, the OS knows which window object to use based on the handle. This prevents your code from directly accessing the window object.
The extra layer of indirection allows the OS or library to do things like move objects around, reference count the objects, and generally control what happens to the object. Like the PIMPL idiom, the implementation may change completely while still preserving the original API and thus not forcing clients to recompile. It's especially useful if you're trying to offer a non-object-oriented API for clients written in procedural languages like C.
A "handle" is another name for a reference to a resource which is managed by the programmer explicitly instead of automatically by the runtime.
Handles are pointers to unmanaged resources like file handles, database connection handles, windows handles, etc... As they are handles to unmanaged resources in most cases they won't be automatically garbage collected and you need to ensure to properly release them or you might hear about leaking handles.

Why are copy constructors unnecessary for immutable objects?

Why are copy constructors unnecessary for immutable objects? Please explain this for me.
Because the value cannot change, it's every bit as good to reference the same object in all cases, there's no need to have an "extra copy", so to speak.
This is a language dependent question especially with respect to lifetime. For a moment lets forget about those.
Copy constructors are valuable in that they allow for you to take one object and create a completely independent copy of it. This is valuable in that you can now modify the second object independent of the first. Or a component can create a private copy to protect itself from other components changing the object out from under it.
Immutable objects are unchangeable. There is no value in creating a copy of an object that won't change.
Now lets thing about lifetime again. In languages like C++ copy constructors also allow you to work around memory / lifetime issues. For example if I'm writing an API which takes a SomeType* and I want to keep it around longer than the lifetime of my method. In C++ the most reliable way to do this is to create a copy of the object via a copy constructor.
This is somewhat language dependent:
However, many languages require a copy constructor. If you don't provide one, the language will implicitly generate one.
With an immutable object, however, this is typically fine, since the default copy constructor (typically) does a shallow copy of all values. With a mutable data type (ie: containing internal object references to other objects), shallow copying is typically a poor choice, since the copy is only copying the reference/pointer encapsulated within it.
it's so natural because value of immutable object can't be changed.

Am I overdoing it with my Factory Method?

Part of our core product is a website CMS which makes use of various page widgets. These widgets are responsible for displaying content, listing products, handling event registration, etc. Each widget is represented by class which derives from the base widget class. When rendering a page the server grabs the page's widget from the database and then creates an instance of the correct class. The factory method right?
Private Function WidgetFactory(typeId)
Dim oWidget
Select Case typeId
Case widgetType.ContentBlock
Set oWidget = New ContentWidget
Case widgetType.Registration
Set oWidget = New RegistrationWidget
Case widgetType.DocumentList
Set oWidget = New DocumentListWidget
Case widgetType.DocumentDisplay
End Select
Set WidgetFactory = oWidget
End Function
Anyways, this is all fine but as time has gone on the number of types of widgets has increased to around 50 meaning the factory method is rather long. Every time I create a new type of widget I go to add another couple of lines to the method and a little alarm rings in my head that maybe this isn't the best way to do things. I tend to just ignore that alarm but it's getting louder.
So, am I doing it wrong? Is there a better way to handle this scenario?
I think the question you should ask yourself is: Why am I using a Factory method here?
If the answer is "because of A", and A is a good reason, then continue doing it, even if it means some extra code. If the answer is "I don't know; because I've heard that you are supposed to do it this way?" then you should reconsider.
Let's go over the standard reasons for using factories. Here's what Wikipedia says about the Factory method pattern:
[...], it deals with the problem of creating objects (products) without specifying the exact class of object that will be created. The factory method design pattern handles this problem by defining a separate method for creating the objects, whose subclasses can then override to specify the derived type of product that will be created.
Since your WidgetFactory is Private, this is obviously not the reason why you use this pattern. What about the "Factory pattern" itself (independent of whether you implement it using a Factory method or an abstract class)? Again, Wikipedia says:
Use the factory pattern when:
The creation of the object precludes reuse without significantly duplicating code.
The creation of the object requires access to information or resources not appropriate to contain within the composing object.
The lifetime management of created objects needs to be centralised to ensure consistent behavior.
From your sample code, it does not look like any of this matches your need. So, the question (which only you can answer) is: (1) How likely is it that you will need the features of a centralized Factory for your widgets in the future and (2) how costly is it to change everything back to a Factory approach if you need it in the future? If both are low, you can safely drop the Factory method for the time being.
EDIT: Let me get back to your special case after this generic elaboration: Usually, it's a = new XyzWidget() vs. a = WidgetFactory.Create(WidgetType.Xyz). In your case, however, you have some (numeric?) typeId from a database. As Mark correctly wrote, you need to have this typeId -> className map somewhere.
So, in that case, the good reason for using a factory method could be: "I need some kind of huge ConvertWidgetTypeIdToClassName select-case-statement anyway, so using a factory method takes no additional code plus it provides the factory method advantages for free, if I should ever need them."
As an alternative, you could store the class name of the widget in the database (you probably already have some WidgetType table with primary key typeId anyway, right?) and create the class using reflection (if your language allows for this type of thing). This has a lot of advantages (e.g. you could drop in DLLs with new widgets and don't have to change your core CMS code) but also disadvantages (e.g. "magic string" in your database which is not checked at compile time; possible code injection, depending on who has access to that table).
The WidgetFactory method is really a mapping from a typeId enumeration to concrete classes. In general it's best if you can avoid enumerations entirely, but sometimes (particularly in web applications) you need to round-trip to an environment (e.g. the browser) that doesn't understand polymorphism and you need such measures.
Refactoring contains a pretty good explanation of why switch/select case statements are code smells, but that mainly addresses the case where you have many similar switches.
If your WidgetFactory method is the only place where you switch on that particular enum, I would say that you don't have to worry. You need to have that map somewhere.
As an alternative, you could define the map as a dictionary, but the amount of code lines wouldn't decrease significantly - you may be able to cut the lines of code in half, but the degree of complexity would stay equivalent.
Your application of the factory pattern is correct. You have information which dictates which of N types is created. A factory is what knows how to do that. (It is a little odd as a private method. I would expect it to be on an IWidgetFactory interface.)
Your implementation, though, tightly couples the implementation to the concrete types. If you instead mapped typeId -> widgetType, you could use Activator.CreateInstance(widgetType) to make the factory understand any widget type.
Now, you can define the mappings however you want: a simple dictionary, discovery (attributes/reflection), in the configuration file, etc. You have to know all the types in one place somewhere, but you also have the option to compose multiple sources.
The classic way of implementing a factory is not to use a giant switch or if-ladder, but instead to use a map which maps object type name to an object creation function. Apart from anything else, this allows the factory to be modified at run-time.
Whether it's proper or not, I've always believed that the time to use a Factory is when the decision of what object type to create will be based upon information that is not available until run-time.
You indicated in a followup comment that the widget type is stored in a database. Since your code does not know what objects will be created until run-time, I think that this is a perfectly valid use of the Factory pattern. By having the factory, you enable your program to defer the decision of which object type to use until the time when the decision can actually be made.
It's been my experience that Factories grow so their dependencies don't have to. If you see this mapping duplicating itself in other places then you have cause for worry.
try categories your widgets, maybe based on their functionality.
if few of them are logically depending on each other, create them with single construction

Should persistent objects validate data upon set?

If one has a object which can persist itself across executions (whether to a DB using ORM, using something like Python's shelve module, etc), should validation of that object's attributes be placed within the class representing it, or outside?
Or, rather; should the persistent object be dumb and expect whatever is setting it's values to be benevolent, or should it be smart and validate the data being assigned to it?
I'm not talking about type validation or user input validation, but rather things that affect the persistent object such as links/references to other objects exist, ensuring numbers are unsigned, that dates aren't out of scope, etc.
Validation is a part of the encapsulation- an object is responsible for it's internal state, and validation is part of it's internal state.
It's like asking "should I let an object do a function and set his own variables or should I user getters to get them all, do the work in an external function and then you setters to set them back?"
Of course you should use a library to do most of the validation- you don't want to implement the "check unsigned values" function in every model, so you implement it at one place and let each model use it in his own code as fit.
The object should validate the data input. Otherwise every part of the application which assigns data has to apply the same set of tests, and every part of the application which retrieves the persisted data will need to handle the possibility that some other module hasn't done their checks properly.
Incidentally I don't think this is an object-oriented thang. It applies to any data persistence construct which takes input. Basically, you're talking Design By Contract preconditions.
My policy is that, for a global code to be robust, each object A should check as much as possible, as early as possible. But the "as much as possible" needs explanation:
The internal coherence of each field B in A (type, range in type etc) should be checked by the field type B itself. If it is a primitive field, or a reused class, it is not possible, so the A object should check it.
The coherence of related fields (if that B field is null, then C must also be) is the typical responsibility of object A.
The coherence of a field B with other codes that are external to A is another matter. This is where the "pojo" approach (in Java, but applicable to any language) comes into play.
The POJO approach says that with all the responsibilities/concerns that we have in modern software (persistance & validation are only two of them), domain model end up being messy and hard to understand. The problem is that these domain objects are central to the understanding of the whole application, to communicating with domain experts and so on. Each time you have to read a domain object code, you have to handle the complexity of all these concerns, while you might care of none or one...
So, in the POJO approach, your domain objects must not carry code related to one of these concerns (which usually carries an interface to implement, or a superclass to have).
All concern except the domain one are out of the object (but some simple information can still be provided, in java usually via Annotations, to parameterize generic external code that handle one concern).
Also, the domain objects relate only to other domain objects, not to some framework classes related to one concern (such as validation, or persistence). So the domain model, with all classes, can be put in a separate "package" (project or whatever), without dependencies on technical or concern-related codes. This make it much easier to understand the heart of a complex application, without all that complexity of these secondary aspects.