How to structure a MYSQL table with lots of data? - mysql

I have statistics for every user in our system. These statistics need to be updated frequently.
The statistics looks something like this:
Campaign : Performance
Campaign : Performance
Each campaign has it's own data stored in another table in MYSQL. The performance metric needs to be updated for each user and each campaign frequently. There are over 1000 campaigns that need to be updated for over 5000 users.
At first I considered simply making a huge VARCHAR column which just stored the data as "campaignId-Performance,campaignId2-Performance,..."
When querying the DB these campaigns will be searched for by their COUNTRY first and then ordered in descending order by the highest performing. The COUNTRY data is stored in the campaigns table. In my new table I was considering making a column for every COUNTRY and splitting the campaigns with their respective performances up like that. I'm unsure whether that would be more efficient or not.
With this poor description I'm hoping someone can suggest an optimal structure. Thanks!

I would think you would want to store this performance in a table with 3 columns: campaignID,userID,performance. This would give you the most flexibility to then query the information however you might need it. Would easily aggregate performance by campaign, or by user. and with a quick join to the campaign table would allow you to aggregate by country as well.

So, your building a MySQL database that contains 1000 campaigns and 5000 users? Do the campaigns contain the performances of each country? If so, I would suggest making a column named Country followed by Performance Column. Also, it would be helpful to know what your trying to view the information in.

Related

Best way to implement many-to-many relationship in MySQL

I have 2 tables, users(~1000 users) and country(~50 countries). A user can support many countries so I am planning to create a mapping table, user_country. However, since I have 1000 users and 50 countries, I will have a maximum of 50000 entries for this table. Is this the best way to implement this or is there a more appropriate method for this?
If this is the best way, how can i add a user supporting many countries to this table using only one SQL statement?
For ex:
INSERT INTO user_country(userid, countrycode)
VALUES ('user001','US'),
('user001','PH'),
('user001','KR'),
('user001','JP')
Above SQL statement will be too long if a user supports all 50 countries. And I have to do it for 1000 users. Anyone have any ideas the most efficient way to implement this?
From the point of view of database design, a table like your user_country is the only sensible way to go. 50000 records are a breeze for MySQL, and having them together with the appropriate indexes will open up all ways of future use for those data.
As far as I can see, this is unrelated to the problem of many large SQL insert statements. No matter how you represent the data in the database, you will have to write statements containing, for each user, a list of countries.
This is a one-time action, right? So it doesn't need to be a masterpiece in software engineering. What I sometimes do is load the raw data in Excel, line by line, then write a formula that "calculates" the appropriate SQL statement for the first line, and copy this formula for all lines. Then throw all these statements at the database. Even if there are tens of thousands of them, it's not much effort.
Personally I'd do the insert based on a select:
INSERT INTO user_country SELECT 'user001', countryid from countries WHERE countryid IN ('US', 'PH', 'KR', 'JP');
You need to adapt to your column names.
The alternative of storing list of countries in a single column usercountries varchar (255) as US,FR,KR and so on would be possible as well - but you'd lose the ability to select users based on the country they support. In fact you don't lose it - but
SELECT * FROM users WHERE usercountries like '%KR%';
Is not a good query in terms of index usage. But as you only have 1000 users a tablescan will be mighty quick as well.

Modelling ownership in MySQL

I have a table Things and I want to add ownership relations to a table Users. I need to be able to quickly query the owners of a thing and the things a user owns. If I know that there will be at most 50 owners, and the pdf for the number of owners will probably look like this, should I rather
add 50 columns to the Things table, like CoOwner1Id, CoOwner2Id, …, CoOwner50Id, or
should I model this with a Ownerships table which has UserId and ThingId columns, or
would it better to create a table for each thing, for example Thing8321Owners with a row for each owner, or
perhaps a combination of these?
The second choice is the correct one; you should create an intermediate table between the table Things and the table Owners (that contains the details of each owner).
This table should have the thing_id and the owner_id as the primary key.
So finally, you well have 3 tables:
Things (the things details and data)
Owner (the owners details and data)
Ownerships (the assignment of each thing_id to an owner_id)
Because in a relational DB you should not have any redundant data.
You should definitely go with option 2 because what you are trying to model is a many to many relationship. (Many owners can relate to a thing. Many things can relate to an owner.) This is commonly accomplished using what I call a bridging table. (Which exactly what option 2 is.) It is a standard technique in a normalized database.
The other two options are going to give you nightmares trying to query or maintain.
With option 1 you'll need to join the User table to the Thing table on 50 columns to get all of your results. And what happens when you have a really popular thing that 51 people want to own?
Option 3 is even worse. The only way to easily query the data is to use dynamic sql or write a new query each time because you don't know which Thing*Owners table to join on until you know the ID value of the thing you're looking for. Or you're going to need to join the User table to every single Thing*Owners table. Adding a new thing means creating a whole new table. But at least a thing doesn't have a limit on the number of owners it could possibly have.
Now isn't this:
SELECT Users.Name, Things.Name
FROM Users
INNER JOIN Ownership ON Users.UserId=Ownership.UserId
INNER JOIN Things ON Things.ThingId=Ownership.ThingId
much easier than any of those other scenarios?

SQL to update column in modified table

I am a reasonably competent SQL programmer but my skills are still pretty much in the domain of simple INSERT, SELECT, UPDATE statements with an occasional LIKE etc thrown in. What I am currently trying to do is rather more complex. Here is the scenario.
I have three tables.
Table 1, *users* identifies users via a User ID, uid. Users can have one or more sub accounts
Table 2 *accounts* keeps a record of subaccounts for each user with, amongst other things the columns uid and sid where uid is the one defined in the *users* table.
Table 3, *data* is currently storing some data, in a data column that is being associated with a particular subaccount, sid.
The thing I have just realized is that there is no particular reason to block users from using those data across subaccounts. No problem - I can change my data subset search SQL to work with the uid instead. However, given the frequency of such searches, it seems well worth while simply sticking in a uid column in *data*.
To do that I would need to write some smart SQL that would get uid,sid pairs from the *accounts* table and use that information to update the newly created uid column in the data table. This I have to admit is beyond my knowledge of SQL.
I should mention that the system using these data is now in production and has several 100s of users so the option of just acting like they are not there is not available. Not terribly relevant I think but I should mention that uid and sid are alphanumeric strinsg with both columns being indexed.
I would be most grateful to anyone here who might be able to help out with it.
Mysql can do updates based on joins and based on reading of your schema here's what I'd do...
UPDATE accounts a, data d
set d.uid=a.uid
where a.sid=d.sid
and d.uid is NULL

Best way to do a query with a large number of possible joins

On the project I'm working on we have an activity table and each activity can be linked to one of about 20 different "activity details" tables...
e.g. If the activity was of type "work", then it would have a corresponding activity_details_work record, if it was of type "sick leave" then it would have a corresponding activity_details_sickleave record and so on.
Currently we are loading the activities and then for each activity we have a separate query to go fetch the activity details from the relevant table. This obviously doesn't scale well if you have thousands of activities.
So my initial thought was to have a single query which fetches the activities and joins the details in one go e.g.
SELECT * FROM activity
LEFT JOIN activity_details_1_work ON ...
LEFT JOIN activity_details_2_sickleave ON ...
LEFT JOIN activity_details_3_travelwork ON ...
...etc...
LEFT JOIN activity_details_20_yearleave ON ...
But this will result in each record having 100's of fields, most of which are empty and that feels nasty.
Lazy-loading the details isn't really an option either as the details are almost always requested in the core logic, at least for the main types anyway.
Is there a super clever way of doing this that I'm not thinking of?
Thanks in advance
My suggestion is to define a view for each ActivityType, that is tailored specifically to that activity.
Then add an index on the Activity table lead by the ActivityType field. Cluster said index unless there is an overwhelming need for some other to be clustered (or performance benchmarking shows some other clustering selection to be more performant).
Is there a particular reason why this degree of denormalization was designed in? Is that reason well known?
Chances are your activity tables are like (date_from, date_to, with_who, descr) or something to that effect. As Pieter suggested, consider tossing in a type varchar or enum field in there, so as to deal with a single details table.
If there are rational reasons to keep the tables apart, consider adding triggers that maintain boolean/tinyint fields (has_work, has_sickleave, etc), or a bit string (has_activites_of_type where the first position amounts to has_work, the next to has_sickleave, etc.).
Either way, you'll probably be better off by fetching the activity's details in one or more separate queries -- if only to avoid field name collisions.
I don't think enum is the way to go, because as you say there might be 1000's of activities, then altering your activity table would become an issue.
There is no point doing a left join on a large number of tables either.
So the options that you have are :
See this The first comment might be useful.
I am guessing that your activity table has a field called activity_type_id.
Build a table called activity_types containing fields activity_type_id, activity_name, activity_details_table_name. First query in the following way
activity
inner join
activity_types
using( activity_type_id )
This query gives you the table name on which to query for the details.
This way you can add any new activity type just by adding a row in the activity_types table.

fast query for too many tables in a database

For very very large tables, indexing may help quite a lot. But what is the solution for too many small tables in a data base. ?
what if I have a large DB, that has too many tables in it. how can i make query fast as indexes help fasten queries of a table?
Lets talk with a real example.
in stackoverflow.com , there is a table say. "questions". having id,date, votes. and then there exist a table for each id in the questions table. (this table will have the name as of the numeric id . eg. "q-45588") now its easy to index the "questions" table. but what about so many child tables of each question id. (that may contain ids,answer 1, answer 2, answer 3, comment 1, comment 2... votes, down votes, dates, flags, so many things) ?
This is what happens in usual accounts software. ie. debtors account table having ids of all debtors and each table exist for each of that id (having further details of the debtor)
or is it a design problem?
*update* -----------------
Some people might say that do all in 3 or 4 tables (which may have trillions of rows)
e.g questions table, answers table, comments table, users table.
heres an example of modified stack
Catagory of thread:-----info----
Question
Discussion
Catagory of Thread Response:----info-----
A Answer
c comment
Theads:----A table-----
Id (key)
Thread Id number (Long data type)
status (active,normal,closed(visible but not editable), deleted, flagged, etc.
type (Ques / Dis)
votes Up
vots Down
count of views
tag 1
tag 2
tag 3
Subject
body
maker ID
date time stramp of time creation
date time stramp of time last activity
A Answer count
c comment count
Thread: (table name is thread id (long data type) (in Threads table)----A table-----
id (key)
response text
response type ( A Answer / c comment)
vote up
vote down
abuse count
Typically, indexes are meant to make searching faster by providing and ordered structure to search within. In a very small table, since searching should be fast to begin with, it might not make much sense. Your best bet would be to try with and without indexes, and measure accordingly.
That being said, if your small tables have the same exact structure, it might make more sense (from a RDBMS point of view anyway) to merge them into a single entity.
What you have there is a design problem. Having multiple tables with the same columns should set off alarm bells immediately -- having multiple tables with the same unique key should as well.
In the example you give you should have a single child table.
Now, in some cases you might have a table with one or more distinct values that represent a large proportion of the table rows. For example, let's say that you have sales for 50 customers but one of them is responsible for 40% of the total sales records with the others distributed evenly between the other customers. Accessing the smaller customers' data through an index on customer_id makes sense, but it does not for the large customer. In that case you might look at partitioning the table to place the large customer's records in one child table and the other records in another, both being related to a master table http://www.postgresql.org/docs/9.2/static/ddl-partitioning.html .
However in general, and for your initial design, you should be using a single non-partitioned table for these child records.
Maybe this document can help you.
http://dev.mysql.com/doc/refman/5.0/en/table-cache.html
Actually, MySQL and other RDBMSs are focus on handling a big table, not many tables, right? If you want to handle extremely large number of tables, you should consider about NoSQL solutions.