ruby on rails what is vs what was architecture without papertrail - mysql

Concise explanation
There is a row in the database which shows the current state of 'Umbrella', forged from the Model 'Product'.
You want to access the complete history of what you deem to be relevant changes to Umbrella, involving related models, quickly and painlessly.
The problem is that paper trail doesn't bring in the beef when the events table is tens of thousands of rows long, and you can't truncate it as it contains important history, and its performance is woeful as it has to parse thousands of lines of YAML to find 'relevant' changes.
Background reading done, still no idea what the problem is called
This seems like something basic to me but I see no mention of others tackling it beyond using papertrail, thus I don't know what its non-proprietarily commonly referred to as, if at all. "ruby on rails what is vs what was architecture without papertrail" was the best title I could think of. I'm creating a one to many relationship between models and time?
Have read "A!!! Design Patterns in Ruby, 2007" which references gang of four's design patterns, no mention of this problem?
Have tried "paper trail" gem but it doesn't quite solve it
The problem
Assuming you have Products, Companies and Categories, and
Product: id, name, price, barcode, (also company_id and category_id)
Company: id, name, registered_company_number
Category: id, name, some_immutable_field
Company has many Products
Category has many Products
And you need to see history of each Product, including changes on itself such as price, changes to which company it belongs to, changes to company name, same thing for categories, such as:
date | event | company name | cmp | category | cat | name | price
| | | id | name | id | |
------|---------------|--------------|-----|----------|-----|----------|------
jan11 | created | megacorp | 1 | outdoors | 101 | umbrella | 10
feb11 | cat change | megacorp | 1 | fashion | 102 | umbrella | 10
mar11 | cat rename | megacorp | 1 | vogue | 102 | umbrella | 10
apr11 | cmp rename | megacorp inc | 1 | vogue | 102 | umbrella | 10
may11 | cmp change | ultra & sons | 2 | vogue | 102 | umbrella | 12
jul11 | cmp change | megacorp | 1 | vogue | 102 | umbrella | 12
note that whilst umbrella was with ultra & sons, megacorp inc changed its name back to megacorp, but we don't show that in this history as its not relevant to this product. (The name change of company 1 happens in jun11, but is not shown)
This can be accomplished with papertrail, but the code to do it is either very complex, long and procedural; or if written 'elegantly' in the way papertrail intended, very very slow as it makes many db calls to what is currently a very bloated events table.
Why paper trail is not the right solution here
Paper trail stores all changes in YAML, the database table is polymorphic and stores a lot of data from many different model. This table and thus this gem seems to be suited to identify who did what changes... but to use it for history like I need to use it, its like a god table that stores all information about what was and has too much responsibility.
The history I am after does not care about all changes to an object, only certain fields. (But we still need to record all the small changes, just not include them in the history of products, so we can't just not-record these things as paper trail has its regular duties identifying who did what, it cannot be optimised solely for this purpose). Pulling this information requires getting all records where the item_type is Product, where the item_id is of the currently being viewed product_id, then parsing the YAML, and seeing if we are interested in the changes (is a field changed, which is a field we are interested in seeing the changes to?). Then doing the same for every category and company that product has been associated with in its lifetime, but only keeping the changes which occur in the windows for which product has been associated to said category/company.
Paper trail can be turned off quite easily... so if one of your devs were to disable it in the code somewhere as an optimisation whilst some operations were to be run, but forget to write the code to turn it back on, no history recorded. And because paper trail is more of a man on the loop than man in the loop, if its not running you might not notice (then have to write overly complex code which catches all the possible scenarios with holey data). A solution which enforces the saving of history is required.
Half baked solution
Conceptually I think that the models should be split between that which persists and that which changes. I am surprised this is not something baked into rails from the ground up, but then there are some issues with it:
Product: id, barcode
Product_period: id, name, price, product_id, start_date, (also company_id and product_id)
Company: id, registered_company_number
Company_period: id, name, company_id, start_date
Category: id, some_immutable_field
Category_period: id, name, category_id, start_date
Every time the price of the product, or the company_id of the product changes, a new row is added to product_period which records the beginning of a new era where the umbrella now costs $11, along with the start_date (well, time) that this auspicious period begins.
Thus in the product model, all calls to things which are immutable or we only care about what the most recent value is, remain as they are; whereas things which change and we care, have methods which to an outsider user (or existing code) appear to be operating on product model, but in fact make a call to most recent product_period for this product and get the latest values there.
This solves the problem superficially, but its a little long winded, and it still has the problem that you have to poke around through company_period and category_period selecting relevant entries (as in the company/category experiences a change and it is during a time when product was associated with it) rather than something more elegant.
At least the MySQL will run faster and there is more freedom to make indexes, and there is no longer thousands of YAML parses bogging it down.
On the quest to write more readable code, are these improvements sufficient? What do other people do? Does this have a name? Is there a more elegant solution or just a quagmire of trade offs?

There are a bunch of other versioning and history gems for rails (I contributed to the first one, 10 years ago!) - find them here, https://www.ruby-toolbox.com/categories/Active_Record_Versioning
They all have different methods for storing like you suggest above, and some are configurable. I also don't agree with the polymorphic god table for all users, but it's not too slow if you have decent indexes.

Related

How to index database?

This is killing me - everybody say what it is but noone points to a guide or teach the basics.
Is it something that is better done from the start or can you index it as easily if your loading times are getting longer?
Has anyone found any good starting point for someone who's not a pro in databases? (I mean indexing starting point and don't worry, I know the basics of databases) Main rules, good practise etc.
Im not here to ask you to write a huge tutorial but if you're really, really bored - go ahead. :)
Im using Wordpress if that's important to know. Yes, I know that WP uses very basic indexing but if it's something good to start with from the beginning, I can't see a reason why not to.
It's barely related but I also didn't find answer online. I can guess the answer but Im not 100% sure - what's more efficient way to store data with same key: in array or separate rows (separate ids but same keys)? There's usually maximum of 20 items per post & the number of posts could be in thousands in future. Which would be a better solution?
Different rows, ids & values BUT same key
id | key |values|
--------------------
25 | Bob | 3455 |
--------------------
24 | Bob | 1654 |
--------------------
23 | Bob | 8432 |
Same row, id & key BUT value is serialized array
id | key | values |
------------------------------
23 | Bob | serialized array |
------------------------------
If you want a quick rule of thumb, index any columns in a table that you will be using to lookup rows. For example, I may have a table as follows:
id| Name| date |
--------------------
0 | Bob | 11.12.16 |
--------------------
1 | John| 15.12.16 |
--------------------
2 | Tim | 19.12.16 |
So obviously your ID is your primary index, but lets say you have a page that will SORT the whole table by DATE, well you would add date as an index.
Basically, indexes make it a lot faster for the engine to find specific records or order them by a specific column. They do a lot more, but when I am designing sites for myself or little tools for the office at work, I usually just go by that.
Large corporate tables can have thousands of indexes and even more relations between tables, but usually for us small peasant folk, what I said should be enough.
You're asking a really complicated question. But the tl;dr; A database index is a data structure that improves the speed of data retrieval operations on a database table at the cost of additional writes and storage space to maintain the index data structure.
more detailed info is already provided in the thorough answer here:
How does database indexing work?

MySQL - When should I use a different table for similar data?

Lets say I'm storing play by play info for sports: basketball, football, and baseball. The data basically fits the same model:
| play_id | play_type_id | play_description_id | player1_id | player2_id | player3_id |
Those are the basic columns that each sport would share, but there would be several more. Some columns would only be used by certain sports - like player3_id would be used by football for who made a tackle, but never by basketball - there wouldn't be a lot of these limited-use columns, but some.
Each game can have anywhere from 300 - 1000 rows (high estimate), so this table could grow to the billions eventually.
My questions are:
Should I just start off with different tables for each sport, even though there'd be about a 90% overlap of columns?
At what point should I look into partitioning the table? How would I do this? I'm thinking of archiving all the plays from the 2012 season (whether it be a sports specific table or all-inclusive).
Sorry if my post isn't more concise. This is all a hypothetical case, I'm just trying to figure out what the disadvantages of having one massive table would be, obviously performance is a consideration, but at what point does the table's size warrant being divided. Because this isn't a real project, it's hard to determine what the advantages of having a table like this would be. So again, sorry if this is a stupid question.
EDIT/ADDITIONAL QUESTION:
On a somewhat side-note, I haven't use noSQL DBs before, but is that something I should consider for a project like this? Lets say that there'd be a high velocity of reads and return time would be crucial, but it also needs to have the ability to run complex queries like "how many ground balls has playerA hit to secondbase, off playerB, in night games, during 2002 - 2013?"
I would separate it in multiple table. That way it is more flexible.
And if you want to make some statistics your are gonna be able to do more complex queries than if you have only one table.
It could look like this
Table PLAYER
ID | FIRSTNAME | LASTNAME | DATE_OF_BIRTH
-----------------------------------------
1 | michael | Jordan | 12.5.65
Table SPORT
ID | NAME | DESCRIPTION
------------------------------------------
1 | Basketball | Best sport in the world
2 | Golf | Nice sport too
Table PLAYER_SPORT
SPORT_ID | PLAYER_ID | PLAYER_POSITION_ID
--------------------------------------------
1 | 1 | 1 /* Michael Jordan play Basketball */
2 | 1 | NULL /* Michael Jordan play also Golf */
Table PLAYER_POSITION
ID | POSITION | DESCRIPTION | SPORT_ID
-------------------------------------------
1 | Middlefield | Any description.. | 1
As far as your table structure is concerned the best practice is to have another table for Mapping play_id and player_id. There is no need of columns player1_id,player2_id,player3_id. Just make a new table which has play_id and player_id columns.
Should I just start off with different tables for each sport, even
though there'd be about a 90% overlap of columns?
I don't think that would help you much, the problem of growth rate for a single table will occur for segmentation-ed tables, this kind of distribution will just make a delay and will not solve the problem. Also you will lose integrity and consistency by violating Normal Forms.
At what point should I look into partitioning the table? How would I
do this? I'm thinking of archiving all the plays from the 2012 season
(whether it be a sports specific table or all-inclusive).
You need to use logical database partitioning.
I think a range partition on mach-date field will be helpful.
Documents about MySql partitioning could be found here.
Recomanding to use NoSql will need more information about your application, BTW NoSql will come with its pros and cons. Having a look at the post may helps.
.

Whether to merge avatar and profile tables?

I have two tables:
Avatars:
Id | UserId | Name | Size
-----------------------------------------------
1 | 2 | 124.png | Large
2 | 2 | 124_thumb.png | Thumb
Profiles:
Id | UserId | Location | Website
-----------------------------------------------
1 | 2 | Dallas, Tx | www.example.com
These tables could be merged into something like:
User Meta:
Id | UserId | MetaKey | MetaValue
-----------------------------------------------
1 | 2 | location | Dallas, Tx
2 | 2 | website | www.example.com
3 | 2 | avatar_lrg | 124.png
4 | 2 | avatar_thmb | 124_thumb.png
This to me could be a cleaner, more flexible setup (at least at first glance). For instance, if I need to allow a "user status message", I can do so without touching the database.
However, the user's avatars will be pulled far more than their profile information.
So I guess my real questions are:
What king of performance hit would this produce?
Is merging these tables just a really bad idea?
This is almost always a bad idea. What you are doing is a form of the Entity Attribute Value model. This model is sometimes necessary when a system needs a flexible attribute system to allow the addition of attributes (and values) in production.
This type of model is essentially built on metadata in lieu of real relational data. This can lead to referential integrity issues, orphan data, and poor performance (depending on the amount of data in question).
As a general matter, if your attributes are known up front, you want to define them as real data (i.e. actual columns with actual types) as opposed to string-based metadata.
In this case, it looks like users may have one large avatar and one small avatar, so why not make those columns on the user table?
We have a similar type of table at work that probably started with good intentions, but is now quite the headache to deal with. This is because it now has 100s of different "MetaKeys", and there is no good documentation about what is allowed and what each does. You basically have to look at how each is used in the code and figure it out from there. Thus, figure out how you will document this for future developers before you go down that route.
Also, to retrieve all the information about each user it is no longer a 1-row query, but an n-row query (where n is the number of fields on the user). Also, once you have that data, you have to post-process each of those based on your meta-key to get the details about your user (which usually turns out to be more of a development effort because you have to do a bunch of String comparisons). Next, many databases only allow a certain number of rows to be returned from a query, and thus the number of users you can retrieve at once is divided by n. Last, ordering users based on information stored this way will be much more complicated and expensive.
In general, I would say that you should make any fields that have specialized functionality or require ordering to be columns in your table. Since they will require a development effort anyway, you might as well add them as an extra column when you implement them. I would say your avatar pics fall into this category, because you'll probably have one of each, and will always want to display the large one in certain places and the small one in others. However, if you wanted to allow users to make their own fields, this would be a good way to do this, though I would make it another table that can be joined to from the user table. Below are the tables I'd suggest. I assume that "Status" and "Favorite Color" are custom fields entered by user 2:
User:
| Id | Name |Location | Website | avatarLarge | avatarSmall
----------------------------------------------------------------------
| 2 | iPityDaFu |Dallas, Tx | www.example.com | 124.png | 124_thumb.png
UserMeta:
Id | UserId | MetaKey | MetaValue
-----------------------------------------------
1 | 2 | Status | Hungry
2 | 2 | Favorite Color | Blue
I'd stick with the original layout. Here are the downsides of replacing your existing table structure with a big table of key-value pairs that jump out at me:
Inefficient storage - since the data stored in the metavalue column is mixed, the column must be declared with the worst-case data type, even if all you would need to hold is a boolean for some keys.
Inefficient searching - should you ever need to do a lookup from the value in the future, the mishmash of data will make indexing a nightmare.
Inefficient reading - reading a single user record now means doing an index scan for multiple rows, instead of pulling a single row.
Inefficient writing - writing out a single user record is now a multi-row process.
Contention - having mixed your user data and avatar data together, you've forced threads that only one care about one or the other to operate on the same table, increasing your risk of running into locking problems.
Lack of enforcement - your data constraints have now moved into the business layer. The database can no longer ensure that all users have all the attributes they should, or that those attributes are of the right type, etc.

On a stats-system, should I save little bits of information about single visit on many tables or just one table?

I've been wondering this for a while already. The title stands for my question. What do you prefer?
I made a pic to make my question clearer.
Why am I even thinking of this? Isn't one table the most obvious option? Well, kind of. It's the simpliest way, but let's think more practical. When there is a ton of data in one table and user wants to only see statistics about browsers the visitors use, this may not be as successful. Taking browser-data out of one table is naturally better.
Multiple tables has disadvantages too. Writing data takes more time and resources. With one table there's only one mysql-query needed.
Anyway, I figured out a solution, which I think makes sense. Data is written to some kind of temporary table. All of those lines will be exported to multiple tables later (scheduled script). This way the system doesn't take loading-time from the users page, but the data remains fast to browse.
Let's bring some discussion here. I'm hoping to raise some opinions.
Which one is better? Let's find out!
The date, browser and OS are all related on a one-to-one basis... Without more information to require distinguishing records further, I'd be creating a single table rather than two.
Database design is based on creating tables that reflect entities, and I don't see two distinct entities in the example provided. Consider using views to serve data without duplicating the data in the database; a centralized copy of the data makes managing the data much easier...
What you're really thinking of is whether to denormalize the table or use the first normal form. When you're using 1NF you have a table that looks like this:
Table statistic
id | date | browser_id | os_id
---------------------------------------------
1 | 127003727 | 1 | 1
2 | 127391662 | 2 | 2
3 | 127912683 | 3 | 2
And then to explain what browser and os the client used, you need other tables:
Table browser
id | name | company | version
-----------------------------------------------
1 | Firefox | Mozilla | 3.6.8
2 | Safari | Apple | 4.0
3 | Firefox | Mozilla | 3.5.1
Table os
id | name | company | version
-----------------------------------------------
1 | Ubuntu | Canonical | 10.04
2 | Windows | Microsoft | 7
3 | Windows | Microsoft | 3.11
As OMG Ponies already pointed out, this isn't a good example to be creating several entities, so one can safely go with one table and then think about how he/she is going to deal with having to, say, find all the entries with a matching browser name.

Database - Designing an "Events" Table

After reading the tips from this great Nettuts+ article I've come up with a table schema that would separate highly volatile data from other tables subjected to heavy reads and at the same time lower the number of tables needed in the whole database schema, however I'm not sure if this is a good idea since it doesn't follow the rules of normalization and I would like to hear your advice, here is the general idea:
I've four types of users modeled in a Class Table Inheritance structure, in the main "user" table I store data common to all the users (id, username, password, several flags, ...) along with some TIMESTAMP fields (date_created, date_updated, date_activated, date_lastLogin, ...).
To quote the tip #16 from the Nettuts+ article mentioned above:
Example 2: You have a “last_login”
field in your table. It updates every
time a user logs in to the website.
But every update on a table causes the
query cache for that table to be
flushed. You can put that field into
another table to keep updates to your
users table to a minimum.
Now it gets even trickier, I need to keep track of some user statistics like
how many unique times a user profile was seen
how many unique times a ad from a specific type of user was clicked
how many unique times a post from a specific type of user was seen
and so on...
In my fully normalized database this adds up to about 8 to 10 additional tables, it's not a lot but I would like to keep things simple if I could, so I've come up with the following "events" table:
|------|----------------|----------------|---------------------|-----------|
| ID | TABLE | EVENT | DATE | IP |
|------|----------------|----------------|---------------------|-----------|
| 1 | user | login | 2010-04-19 00:30:00 | 127.0.0.1 |
|------|----------------|----------------|---------------------|-----------|
| 1 | user | login | 2010-04-19 02:30:00 | 127.0.0.1 |
|------|----------------|----------------|---------------------|-----------|
| 2 | user | created | 2010-04-19 00:31:00 | 127.0.0.2 |
|------|----------------|----------------|---------------------|-----------|
| 2 | user | activated | 2010-04-19 02:34:00 | 127.0.0.2 |
|------|----------------|----------------|---------------------|-----------|
| 2 | user | approved | 2010-04-19 09:30:00 | 217.0.0.1 |
|------|----------------|----------------|---------------------|-----------|
| 2 | user | login | 2010-04-19 12:00:00 | 127.0.0.2 |
|------|----------------|----------------|---------------------|-----------|
| 15 | user_ads | created | 2010-04-19 12:30:00 | 127.0.0.1 |
|------|----------------|----------------|---------------------|-----------|
| 15 | user_ads | impressed | 2010-04-19 12:31:00 | 127.0.0.2 |
|------|----------------|----------------|---------------------|-----------|
| 15 | user_ads | clicked | 2010-04-19 12:31:01 | 127.0.0.2 |
|------|----------------|----------------|---------------------|-----------|
| 15 | user_ads | clicked | 2010-04-19 12:31:02 | 127.0.0.2 |
|------|----------------|----------------|---------------------|-----------|
| 15 | user_ads | clicked | 2010-04-19 12:31:03 | 127.0.0.2 |
|------|----------------|----------------|---------------------|-----------|
| 15 | user_ads | clicked | 2010-04-19 12:31:04 | 127.0.0.2 |
|------|----------------|----------------|---------------------|-----------|
| 15 | user_ads | clicked | 2010-04-19 12:31:05 | 127.0.0.2 |
|------|----------------|----------------|---------------------|-----------|
| 2 | user | blocked | 2010-04-20 03:19:00 | 217.0.0.1 |
|------|----------------|----------------|---------------------|-----------|
| 2 | user | deleted | 2010-04-20 03:20:00 | 217.0.0.1 |
|------|----------------|----------------|---------------------|-----------|
Basically the ID refers to the primary key (id) field in the TABLE table, I believe the rest should be pretty straightforward. One thing that I've come to like in this design is that I can keep track of all the user logins instead of just the last one, and thus generate some interesting metrics with that data.
Due to the growing nature of the events table I also thought of making some optimizations, such as:
#9: Since there is only a finite number of tables and a finite (and predetermined) number of events, the TABLE and EVENTS columns could be setup as ENUMs instead of VARCHARs to save some space.
#14: Store IPs as UNSIGNED INTs with INET_ATON() instead of VARCHARs.
Store DATEs as TIMESTAMPs instead of DATETIMEs.
Use the ARCHIVE (or the CSV?) engine instead of InnoDB / MyISAM.
Only INSERTs and SELECTs are supported, and data is compressed on the fly.
Overall, each event would only consume 14 (uncompressed) bytes which is okay for my traffic I guess.
Pros:
Ability to store more detailed data (such as logins).
No need to design (and code for) almost a dozen additional tables (dates and statistics).
Reduces a few columns per table and keeps volatile data separated.
Cons:
Non-relational (still not as bad as EAV):
SELECT * FROM events WHERE id = 2 AND table = 'user' ORDER BY date DESC();
6 bytes overhead per event (ID, TABLE and EVENT).
I'm more inclined to go with this approach since the pros seem to far outweigh the cons, but I'm still a little bit reluctant... Am I missing something? What are your thoughts on this?
Thanks!
#coolgeek:
One thing that I do slightly
differently is to maintain an
entity_type table, and use its ID in
the object_type column (in your case,
the 'TABLE' column). You would want to
do the same thing with an event_type
table.
Just to be clear, you mean I should add an additional table that maps which events are allowed in a table and use the PK of that table in the events table instead of having a TABLE / EVENT pair?
#ben:
These are all statistics derived from
existing data, aren't they?
The additional tables are mostly related to statistics but I the data doesn't already exists, some examples:
user_ad_stats user_post_stats
------------- ---------------
user_ad_id (FK) user_post_id (FK)
ip ip
date date
type (impressed, clicked)
If I drop these tables I've no way to keep track of who, what or when, not sure how views can help here.
I agree that it ought to be separate,
but more because it's fundamentally
different data. What someone is and
what someone does are two different
things. I don't think volatility is so
important.
I've heard it both ways and I couldn't find anything in the MySQL manual that states that either one is right. Anyway, I agree with you that they should be separated tables because they represent kinds of data (with the added benefit of being more descriptive than a regular approach).
I think you're missing the forest for
the trees, so to speak.
The predicate for your table would be
"User ID from IP IP at time DATE
EVENTed to TABLE" which seems
reasonable, but there are issues.
What I meant for "not as bad as EAV" is that all records follow a linear structure and they are pretty easy to query, there is no hierarchical structure so all queries can be done with a simple SELECT.
Regarding your second statement, I think you understood me wrong here; the IP address is not necessarily associated with the user. The table structure should read something like this:
IP address (IP) did something
(EVENT) to the PK (ID) of the
table (TABLE) on date (DATE).
For instance, in the last row of my example above it should read that IP 217.0.0.1 (some admin), deleted the user #2 (whose last known IP is 127.0.0.2) at 2010-04-20 03:20:00.
You can still join, say, user events
to users, but you can't implement a
foreign key constraint.
Indeed, that's my main concern. However I'm not totally sure what can go wrong with this design that couldn't go wrong with a traditional relational design. I can spot some caveats but as long as the app messing with the database knows what it is doing I guess there shouldn't be any problems.
One other thing that counts in this argument is that I will be storing much more events, and each event will more than double compared to the original design, it makes perfect sense to use the ARCHIVE storage engine here, the only thing is it doesn't support FKs (neither UPDATEs or DELETEs).
I highly recommend this approach. Since you're presumably using the same database for OLTP and OLAP, you can gain significant performance benefits by adding in some stars and snowflakes.
I have a social networking app that is currently at 65 tables. I maintain a single table to track object (blog/post, forum/thread, gallery/album/image, etc) views, another for object recommends, and a third table to summarize insert/update activity in a dozen other tables.
One thing that I do slightly differently is to maintain an entity_type table, and use its ID in the object_type column (in your case, the 'TABLE' column). You would want to do the same thing with an event_type table.
Clarifying for Alix - Yes, you maintain a reference table for objects, and a reference table for events (these would be your dimension tables). Your fact table would have the following fields:
id
object_id
event_id
event_time
ip_address
It looks like a pretty reasonable design, so I just wanted to challenge a few of your assumptions to make sure you had concrete reasons for what you're doing.
In my fully normalized database this
adds up to about 8 to 10 additional
tables
These are all statistics derived from existing data, aren't they? (Update: okay, they're not, so disregard following.) Why wouldn't these simply be views, or even materialized views?
It may seem like a slow operation to gather those statistics, however:
proper indexing can make it quite fast
it's not a common operation, so the speed doesn't matter all that much
eliminating redundant data might make other common operations fast and reliable
I've come up with a table schema that
would separate highly volatile data
from other tables subjected to heavy
reads
I guess you're talking about how the user (just to pick one table) events, which would be pretty volatile, are separated from the user data. I agree that it ought to be separate, but more because it's fundamentally different data. What someone is and what someone does are two different things.
I don't think volatility is so important. The DBMS should already allow you to put the log file and database file on separate devices, which accomplishes the same thing, and contention shouldn't be an issue with row-level locking.
Non-relational (still not as bad as
EAV)
I think you're missing the forest for the trees, so to speak.
The predicate for your table would be "User ID from IP IP at time DATE EVENTed to TABLE" which seems reasonable, but there are issues. (Update: Okay, so it's sort of kinda like that.)
You can still join, say, user events to users, but you can't implement a foreign key constraint. That's why EAV is generally problematic; whether or not something is exactly EAV doesn't really matter. It's generally one or two lines of code to implement a constraint in your schema, but in your app it could be dozens of lines of code, and if the same data is accessed in multiple places by multiple apps, it can easily multiply to thousands of lines of code. So, generally, if you can prevent bad data with a foreign key constraint, you're guaranteed that no app will do that.
You might think that events aren't so important, but, as an example, ad impressions are money. I would definitely want to catch any bugs relating to ad impressions as early in the design process as possible.
Further comment
I can spot some caveats but as long as
the app messing with the database
knows what it is doing I guess there
shouldn't be any problems.
And with some caveats you can make a very successful system. With a proper system of constraints, you get to say, "if any app messing with the database doesn't know what it's doing, the DBMS will flag an error." That may require a more time and money than you've got, so something simpler that you can have is probably better than something more perfect that you can't. C'est la vie.
I can't add a comment to Ben's answer, so two things...
First, it would be one thing to use views in a standalone OLAP/DSS database; it's quite another to use them in your transaction database. The High Performance MySQL people recommend against using views where performance matters
WRT data integrity, I agree, and that's another advantage to using a star or snowflake with 'events' as the central fact table (as well as using multiple event tables, like I do). But you cannot design a referential integrity scheme around IP addresses