Storing a variable number of files' download statistics in mysql database - mysql

I have a number of files on my website that are private and pushed through php. I keep track of the downloads using a mysql database. Currently I just use a column for each file and insert a new row for every day, which is fine because I don't have many files.
However, I am going to be starting to add and remove files fairly often, and the number of files will be getting very large. As I see it I have two options:
The first is to add and remove columns for each file as they are added and removed. This would quickly lead to the table having very many columns. I am self-taught so I'm not sure, but I think that's probably a very bad thing. Adding and removing columns once there are a lot of rows sounds like a very expensive operation.
I could also create a new database with a generic 'fileID' feild, and then can add a new row every day for each file, but this would lead to a lot of rows. Also, it would be a lot of row insert operations to create tracking for the next day.
Which would be better? Or is there a third solution that I'm missing? Should I be using something other than mysql? I want something that can be queried so I can display the stats as graphs on the site.
Thank you very much for your help, and for taking the time to read.

I could also create a new database with a generic 'fileID' feild, and then can add a new row every day for each file, but this would lead to a lot of rows.
Yes, this is what you need to do — but you mean "a new table", not "a new database".
Basically you'll want a file table, which might look like this:
id | name | created_date | [other fields ...]
----+-----------+--------------+--------------------
1 | foo.txt | 2012-01-26 | ...
2 | bar.txt | 2012-01-27 | ...
and your downloads_by_day table will refer to it:
id | file_id | `date` | download_count
----+---------+------------+----------------
1 | 1 | 2012-01-27 | 17
2 | 2 | 2012-01-27 | 23
3 | 1 | 2012-01-28 | 6
4 | 2 | 2012-01-28 | 195

Related

MySQL full text search matching similar results

I'll try to explain my situation: I'm trying to create a search engine for products on my website, so when the user needs to find a product I need to show similar ones, here's an example.
User searches:
assassins creed OR assassinscreed OR aSsAssIn's CreeD assuming there are no letters/numbers mispelling (those 3 queries should produce the same result)
Expected results:
Assassin's Creed AND Assassin's Creed: Unity AND Assassin's Creed: Special Edition
What have I tried so far
I have created a MySQL field for the search engine which contains a parsed name of the product (Assassin's Creed: Unity -> assassinscreedunity
I parse the search query
I search using MySQL's INSTR()
My problem
I'm fine by using this, but I heard it can be slow when the number of rows increases, I've created a full-text index in my table, but I don't think it would help, so I need another solution.
Thanks for any answer, and ask me anything before downvoting.
First of all, you should keep track of performance issues in your queries more precisely than 'heard it cand be slow' and 'think it would help'. One starting point may be the Slow Query Log.
If you have a table which contains the same parsed name in more than one row, consider normalizing your database. In the specific case, store unique parsed names in one table, and only the id of the corresponding parsed name in the table you described in your question. This way, you only need to check the smaller table with unique names and can then quickly find all matching entries in the main table by id.
Example:
Consider the following table with your structure
id | product_name | rating
-----------------------------------
1 | assassinscreedunity | 5
2 | assassinscreedunity | 2
3 | monkeyisland | 3
4 | monkeyisland | 5
5 | assassinscreedunity | 4
6 | monkeyisland | 4
you would have to scan all six entries to find relevant rows.
In contrast, consider two tables like this:
id | p_id | rating
--------------------
1 | 1 | 5
2 | 1 | 2
3 | 2 | 3
4 | 2 | 5
5 | 1 | 4
6 | 2 | 4
id | name
--------------------------
1 | assassinscreedunity
2 | monkeyisland
In this case, you only have to scan two entries (compared to six) and can then efficiently look up relevant rows using the integer id.
To further enhance the performance, you could extend the concept of a parsed name and use hashes. For example, you could calculate the SHA1-hash of your parsed name which is a 160 bit value. You can find entries in your database for this value very efficiently. To match substrings, you can add them to the second table as well. Since the hash only needs to computed once, you still can use the database to match by an integer. Another thing for you might be fuzzy hashing.
In addition, you should read up on the Rabin–Karp algorithm or string searching in general.

MySql User Bank account log

I am planning to make a small game where everybody have a bank account. To see their management skills I want to log every hour or day their amount of money and display it as a graph.
Now my question is how can/should I log this with mySql.
I think its not very practically to do this:
id user currentMoney 2014.08.22-04 2014.08.22-03 2014.08.22-02 2014.08.22-01
(after the currentMoney these are columns for every hour) so that every hour 1 column gets created with the currentMoney. I think thats not the right way. There must be a better way. ideally it would be that after one Month it starts from the beginning again and overwrites the old listings but thats only optional.
My second question: Is there a jquery application that can create graphs out of the databse? Or how can i do this?
thanks for helping and sorry for my english skills.
Populating a database is done by adding rows, not columns.
Adding columns is a structural change, and should happen rarely. A change in the structure typically means a change in the application, which implies a new version of the application.
Add rows. Your log table must look like this:
balance_history
===============
* user_id
* balance_date
current_balance
Sample contents:
user_id | balance_date | current_balance
1 | 2014-08-22 04:00:00 | 1.00
... | ... | ...
1 | 2014-08-23 12:00:00 | 99.99
2 | 2014-08-22 04:00:00 | 1.00
... | ... | ...
2 | 2014-08-23 12:00:00 | 1.23
To purge old data, all you need to do is DELETE FROM balance_history WHERE balance_date < [date_of_your_choice].

How do I resolve or avoid need for MySQL with multiple AUTO INCREMENT columns?

I have put a lot of effort into my database design, but I think I am
now realizing I made a major mistake.
Background: (Skip to 'Problem' if you don't need background.)
The DB supports a custom CMS layer for a website template. Users of the
template are limited to turning pages on and off, but not creating
their own 'new' pages. Further, many elements are non editable.
Therefore, if a page has a piece of text I want them to be able to edit,
I would have 'manually' assigned a static ID to it:
<h2><%= CMS.getDataItemByID(123456) %></h2>
Note: The scripting language is not relevant to this question, but the design forces
each table to have unique column names. Hence the convention of 'TableNameSingular_id'
for the primary key etc.
The scripting language would do a lookup on these tables to find the string.
mysql> SELECT * FROM CMSData WHERE CMSData_data_id = 123456;
+------------+-----------------+-----------------------------+
| CMSData_id | CMSData_data_id | CMSData_CMSDataType_type_id |
+------------+-----------------+-----------------------------+
| 1 | 123456 | 1 |
+------------+-----------------+-----------------------------+
mysql> SELECT * FROM CMSDataTypes WHERE CMSDataType_type_id = 1;
+----------------+---------------------+-----------------------+------------------------+
| CMSDataType_id | CMSDataType_type_id | CMSDataType_type_name | CMSDataType_table_name |
+----------------+---------------------+-----------------------+------------------------+
| 1 | 1 | String | CMSStrings |
+----------------+---------------------+-----------------------+------------------------+
mysql> SELECT * FROM CMSStrings WHERE CMSString_CMSData_data_id=123456;
+--------------+---------------------------+----------------------------------+
| CMSString_id | CMSString_CMSData_data_id | CMSString_string |
+--------------+--------------------------------------------------------------+
| 1 | 123456 | The answer to the universe is 42.|
+--------------+---------------------------+----------------------------------+
The rendered text would then be:
<h2>The answer to the universe is 42.</h2>
This works great for 'static' elements, such as the example above. I used the exact same
method for other data types such as file specifications, EMail Addresses, Dates, etc.
However, it fails for when I want to allow the User to dynamically generate content.
For example, there is an 'Events' page and they will be dynamically created by the
User by clicking 'Add Event' or 'Delete Event'.
An Event table will use keys to reference other tables with the following data items:
Data Item: Table:
--------------------------------------------------
Date CMSDates
Title CMSStrings (As show above)
Description CMSTexts (MySQL TEXT data type.)
--------------------------------------------------
Problem:
That means, each time an Event is created, I need to create the
following rows in the CMSData table;
+------------+-----------------+-----------------------------+
| CMSData_id | CMSData_data_id | CMSData_CMSDataType_type_id |
+------------+-----------------+-----------------------------+
| x | y | 6 | (Event)
| x+1 | y+1 | 5 | (Date)
| x+2 | y+2 | 1 | (Title)
| x+3 | y+3 | 3 | (Description)
+------------+-----------------+-----------------------------+
But, there is the problem. In MySQL, you can have only 1 AUTO INCREMENT field.
If I query for the highest value of CMSData_data_id and just add 1 to it, there
is a chance there is a race condition, and someone else grabs it first.
How is this issue typically resolved - or avoided in the first place?
Thanks,
Eric
The id should be meaningless, except to be unique. Your design should work no matter if the block of 4 ids is contiguous or not.
Redesign your implementation to add the parts separately, not as a block of 4. Doing so should simplify things overall, and improve your scalability.
What about locking the table before writing into it? This way, when you are inserting a row in the CMSData table, you can get the last id.
Other suggestion would be to not have an incremented id, but a unique generated one, like a guid or so.
Lock Tables

How to get the right "version" of a database entry?

Update: Question refined, I still need help!
I have the following table structure:
table reports:
ID | time | title | (extra columns)
1 | 1364762762 | xxx | ...
Multiple object tables that have the following structure
ID | objectID | time | title | (extra columns)
1 | 1 | 1222222222 | ... | ...
2 | 2 | 1333333333 | ... | ...
3 | 3 | 1444444444 | ... | ...
4 | 1 | 1555555555 | ... | ...
In the object tables, on an object update a new version with the same objectID is inserted, so that the old versions are still available. For example see the entries with objectID = 1
In the reports table, a report is inserted but never updated/edited.
What I want to be able to do is the following:
For each entry in my reports table, I want to be able to query the state of all objects, like they were, when the report was created.
For example lets look at the sample report above with ID 1. At the time it was created (see the time column), the current version of objectID 1 was the entry with ID 1 (entry ID 4 did not exist at that point).
ObjectID 2 also existed with it's current version with entry ID 2.
I am not sure how to achieve this.
I could use a query that selects the object versions by the time column:
SELECT *
FROM (
SELECT *
FROM objects
WHERE time < [reportTime]
ORDER BY time DESC
)
GROUP BY objectID
Lets not talk about the performance of this query, it is just to make clear what I want to do. My problem is the comparison of the time columns. I think this is no good way to make sure that I got the right object versions, because the system time may change "for any reason" and the time column would then have wrong data in it, which would lead to wrong results.
What would be another way to do so?
I thought about not using a time column for this, but instead a GLOBAL incremental value that I know the insertion order across the database tables.
If you are interting new versions of the object, and your problem is the time column(I assume you are using this column to sort which one is newer); I suggest you to use an auto-incremental ID column for the versions. Eventually, even if the time value is not reliable for you, the ID will be.Since it is always increasing. So higher ID, newer version.

On a stats-system, should I save little bits of information about single visit on many tables or just one table?

I've been wondering this for a while already. The title stands for my question. What do you prefer?
I made a pic to make my question clearer.
Why am I even thinking of this? Isn't one table the most obvious option? Well, kind of. It's the simpliest way, but let's think more practical. When there is a ton of data in one table and user wants to only see statistics about browsers the visitors use, this may not be as successful. Taking browser-data out of one table is naturally better.
Multiple tables has disadvantages too. Writing data takes more time and resources. With one table there's only one mysql-query needed.
Anyway, I figured out a solution, which I think makes sense. Data is written to some kind of temporary table. All of those lines will be exported to multiple tables later (scheduled script). This way the system doesn't take loading-time from the users page, but the data remains fast to browse.
Let's bring some discussion here. I'm hoping to raise some opinions.
Which one is better? Let's find out!
The date, browser and OS are all related on a one-to-one basis... Without more information to require distinguishing records further, I'd be creating a single table rather than two.
Database design is based on creating tables that reflect entities, and I don't see two distinct entities in the example provided. Consider using views to serve data without duplicating the data in the database; a centralized copy of the data makes managing the data much easier...
What you're really thinking of is whether to denormalize the table or use the first normal form. When you're using 1NF you have a table that looks like this:
Table statistic
id | date | browser_id | os_id
---------------------------------------------
1 | 127003727 | 1 | 1
2 | 127391662 | 2 | 2
3 | 127912683 | 3 | 2
And then to explain what browser and os the client used, you need other tables:
Table browser
id | name | company | version
-----------------------------------------------
1 | Firefox | Mozilla | 3.6.8
2 | Safari | Apple | 4.0
3 | Firefox | Mozilla | 3.5.1
Table os
id | name | company | version
-----------------------------------------------
1 | Ubuntu | Canonical | 10.04
2 | Windows | Microsoft | 7
3 | Windows | Microsoft | 3.11
As OMG Ponies already pointed out, this isn't a good example to be creating several entities, so one can safely go with one table and then think about how he/she is going to deal with having to, say, find all the entries with a matching browser name.