Set child grandparent relationship - mysql

I've built a web app (PHP/MySQL) for people to predict soccer games.
For each entry in the leagues table, there are many entries in the matches table.
For each entry in the matches table, there are many entries in the predictions table.
Should I explicitly set a relationship from the predictions table to the leagues table? In other words, should I add a league_id column to the predictions table?
PRO
Easier queries, less tables to be read in some cases. Example query to look for someones predictions from a certain league with the relationship:
SELECT * FROM predictions p
WHERE p.league_id:league_id AND p.user_id=:user_id
Without the relationship:
SELECT * FROM predictions p
INNER JOIN matches m ON m.match_id=p.match_id AND m.league_id=:league_id
WHERE p.user_id=:user_id
CON
It's data that's already there, so it's duplicate data (makes the database bigger).

A correct normallizzazion of the database is expected that duplicates are avoided above all is to avoid the use of redundant data to avoid relations. You shoul theh withou doubt prefer the second query Theproposed and the related schema
SELECT * FROM predictions p
INNER JOIN matches m ON m.match_id=p.match_id AND m.league_id=:league_id
WHERE p.user_id=:user_id

Related

Get stats table from a many to many relationship

I have a pivot table for a Many to Many relationship between users and collected_guitars. As you can see a "collected_guitar" is an item that references some data in foreign tables (guitar_models, finish).
My users also have some foreign data in foreign tables (hand_types and genders)
I want to get a derived table that lists data if I look for a particular model_id in "collected_guitar_user"
Let's say "Fender Stratocaster" is model id = 200, where the make is Fender (id = 1 of makes table).
The same guitar could come in a variety of finish hence the use of another table collected_guitars.
One user could have this item in his collection
Now what I want to find by looking at model_id (in this case 200) in the pivot table "collected_guitar_user" is the number of Fender Stratocasters that are collected by users that share the same genders.sex and hand_types.type as the logged in user and to see what finish they divide in (some percent of finish A and B etc...).
So a user could see that is interested in what others are buying could see some statistics for the model.
What query can derive this kind of table??
You can do aggregate counts by using the GROUP BY syntax, and CROSS JOIN to compute a percentage of the total:
SELECT make.make, models.model_name as model, finish.finish,
COUNT(1) AS number_of_users,
(COUNT(1) / u.total * 100) AS percent_owned
FROM owned_guitar, owned_guitar_users, users, models, make, finish
CROSS JOIN (SELECT COUNT(1) AS total FROM users) u
WHERE users.id = owned_guitar_users.user_id
AND owned_guitar_user.owned_guitar_id = owned_guitar.id
AND owned_guitar.model_id = models.id
AND owned_guitar.make_id = make.id
AND owned_guitar.finish_id = finish.id
GROUP BY owned_guitar.id
Please note though, that in cases where a user owns more than one guitar, the percentages will no longer necessarily sum to unity (for example, Jack and John could both own all five guitars, so each of them owns "100%" of the guitars).
I'm also a little confused by your database design. Why do you have a finish_id and make_id associated directly in the owned_guitar table as well as in the models table?

To create MySQL tables for each specific user, or generalize the tables?

I'm running into all kinds of thought problems while planning my database:
Outline:
The database is a patient database with a large number of patients.
Each patient has tons of data, eg: bloodpressure values on different dates.
Questions:
Would it be easier to create tables per patient e.g.
"bob_builder_BPvalues" or to create one table for the BP values eg. "BP_values" and then have all the patients values in there linked via foreign keys?
As I have so much data per patient, it does not seem to make sense to mix blood pressure value of each patient into one single table as this would look very messy to a human. Which approach would be faster in terms of processing and sorting through the data?
Let's say you have 10 patients:
With your first approach, you'd end up with 10 different tables always containing the same type of data.
For each query on a single patient, you would have to build a dynamic query joining to the right table:
SELECT ...
FROM patients
INNER JOIN bobby_measures ON ... -- this has to be crafted dynamically each time
WHERE patients.name = 'bobby'
And what if you want to make some stats on some kind of data for a range of dates for all patients ? Querying this becomes a nightmare, even with 10 patients. So guess what happens when you have 1000...
On the other hand, your second choice makes (arguably) human reading of the database more difficult. But being read by a human is not one of the objectives of databases.
With a single patientData table (or as many tables you want, one per datatype if needed, bloodPressure and stuff), everything becomes simpler. You can query any patient using the same query, changing only the patient id, you can make all the queries you want for a range of dates, filtering on some datatype, or whatever.
SELECT ...
FROM patients
INNER JOIN patientData ON ...
WHERE patients.name in ('bobby', 'joe'...)
AND patientData.type = 'blood pressure'
AND patientData.date BETWEEN ... AND ...
-- and so on
Using the right indices on the patientData table(s) and an appropriate presentation layer, all this data becomes totally readable by an average user.
Have a single table for all patients. This can then link to a BloodPressure table using a foreign key. The relationship between ...
Patient 1----* BloodPressureResults
So a single patient can have many blood pressure results.
You would then be able to view the blood pressure results for a specific patient by using a simple query...
SELECT * FROM BloodPressureResults
WHERE Patient_Id = '1'
This would then return you all of the blood pressure results for the patient with an Id of 1.
You would then also be able to add other tables like WeightResults or BloodTestResults in the same way as the BloodPressureResults table

Best structure for tables with more than 10000 columns

I am applying a group of data mining algorithms to a dataset comprised of a set of customers along with a large number of descriptive attributes that summarize various aspects of their past behavior. There are more than 10,000 attributes, each stored as a column in a table with the customer id as the primary key. For several reasons, it is necessary to pre-compute these attributes rather than calculating them on the fly. I generally try to select customer with a specified attribute set. The algorithms can combine any arbitrary number of these attributes together in a single SELECT statement and join the required tables. All the tables have the same number of rows (one per customer).
I am wondering what's the best way to structure these tables of attributes. Is it better to group the attributes into tables of 20-30 columns, requiring more joins on average but fewer columns per SELECT, or have tables with the maximum number of columns to minimize the number of joins, but having potentially all 10K columns joined at once?
I also thought of using one giant 3-column customerID-attribute-value table and storing all the info there, but it would be harder to structure a "select all customers with these attributes-type query that I need."
I'm using MySQL 5.0+, but I assume this is a general SQL-ish question.
From my expirience using tables with 10,000 columns is very-very-very bad idea. What if in future this number will be increased?
If there are a lot of attributes you shouldn't use a horizontal scaled tables (with large number of columns). You should create a new table attributes and place alltributes values into it. Then connect this table with Many-To-One relationship to main entry table
Maybe the second way is to use no-SQL (like MongoDB) systems
As #odiszapc said, you have to use a meta-model structure, like for instance:
CREATE TABLE customer(ID INT NOT NULL PRIMARY KEY, NAME VARCHAR(64));
CREATE TABLE customer_attribute(ID INT NOT NULL, ID_CUSTOMER INT NOT NULL, NAME VARCHAR(64), VALUE VARCHAR(1024));
Return basic informations of given customer:
SELECT * FROM customers WHERE name='John';
Return customer(s) matching certain attributes:
SELECT c.*
FROM customer c
INNER JOIN attribute a1 ON a1.id_customer = c.id
AND a1.name = 'address'
AND a1.value = '1078, c/ los gatos madrileƱos'
INNER JOIN attribute a2 ON a2.id_customer = c.id
AND a2.name = 'age'
AND a2.value = '27'
Your generator should generate the inner joins on the fly.
Proper indexes on the tables should allow all this engine to go relatively fast (if we assume 10k attributes per customer, and 10k customers, that's actually pretty much a challenge...)
10,000 columns is much. The SELECT statement will be very long and messy if you wouldn't use *. I think you can narrow the attributes down to most useful and meaningful ones, eliminating others

MySQL data fetching

I have trouble to understand how to fetch data from MySQL. Answer might by simple, but i'm stuck, and can't find any help over internet...
I have three tables, let's say first table is named series, second is teams and third id games.
Table structure is something like this:
series:
id
name
teams:
id
name
games:
series_id (relates to series.id)
hometeam_id (relates to teams.id)
visitorteam_id (relates to teams.id)
So my problem is fetching rows from games, where those id's need to be there names, not id's...
Result should be something like this:
"championship
wolverines
marines"
not like
"1
45
142"
Currently i'm fetching those series and teams tables to hashref (in perl) and taking id's from there. But there must be more efficient way to do that in one sql query instead of three.
You need to join the teams table twice (once for home team, second time for visitors)
SELECT g.*,h.name as hometeam_name,v.name as visitorteam_name FROM games g
INNER JOIN series s ON g.series_id=s.id
INNER JOIN teams h ON g.hometeam_id=h.id
INNER JOIN teams v ON g.visitorteam_id=v.id

Is there something more efficient than joining tables in MySQL?

I have a table with entity-attribute-value structure. As an example, as entities I can have different countries. I can have the following attributes: "located in", "has border with", "capital".
Then I want to find all those countries which are "located in Asia" and "has border with Russia". The straightforward way to do that is to join the table with itself using entities are the column for joining and then to use where.
However, if I have 20 rows where Russia in in the entity-column, than in the joint table I will have 20*20=400 rows with Russia as the entity. And it is so for every country. So, the joint table going to be huge.
Will it be not more efficient to use the original table to extract all countries which are located in Asia, then to extract all countries which have border with Russia and then to use those elements which are in both sets of countries?
You shouldn't end up having a huge number of records so this should work
SELECT a.entity,
a.located_in,
a.border
FROM my_table a
WHERE a.border in (SELECT b.entity FROM my_table b WHERE b.entity = 'RUSSIA' )
AND a.located_in = 'ASIA'
You are confusing join with Cartesian product. There could never be more rows in the join then there are in the actual data, the only thing being altered is which elements/rows are taken.
So if you have 20 Russian rows, the table resulting from the join could never have more than 20 Russian entries.
The operation you suggest using is exactly what a join does. Just make sure you have the appropriate indices and let MySQL do the rest.