dumping data from multiple tables with common fields - mysql

Hey All.
I have a bunch of tables that have some common fields tying them together, but I can't figure out the right way to dump them in a meaningful way.
Basically, users will be given two tests, and each test may be taken multiple times.
The main table stores information about the user and the test, similar to the below (we'll call this table MAIN):
user_id test iteration completion_time
1 1 1 1:00
1 2 1 1:30
1 1 2 0:49
1 2 2 1:30
Each test page then has its own table to store the answers provided, since some pages have a ton of questions. We'll call this one sample table RESULTS, but there are many tables like this that are basically the same.
user_id test iteration q1 q2 q3
1 1 1 A B A
1 2 1 B B A
1 1 2 A B B
1 2 2 A B B
These results tables (again, there are many) basically store the results, plus just enough information to accurately tie the results together across all tables. I set it up this way because to use just one table for the results would have left me with a table with several hundred columns, which I had read was not recommended.
So the problem here is i can't figure out how best to tie together these tables and get the results out. I've read up on joins and unions and neither one seems right, as far as I can tell, because i need to pull data from ~10-15 tables at once.
I can do a huge complex select -- something along the lines of
select m.*, a.*, b.*, c.* from main m, results_a a, results_b b, results_c c where a.user_id=m.user_id and b.user_id=m.user_id and c.user_id=m.user_id'
and that works, but there's got to be a better way. Keep in mind that i've only given 3 results tables in this example -- in my actual application, it's going to be more like 15-20 tables of results.
Beyond being really complicated, it returns duplicates of some rows, and if i want to throw in any extra logic (lets say, for example, that i only want the same data i queried before, but only for test 2) it gets even more complicated. And lets not even talk about sorting.
From what I've read, JOIN is for 2 tables, and UNION combines rows of results, not columns.
I don't claim to be a mysql expert, but i have looked into this before posting. I feel like I must be close to the right answer, but just not quite hitting on it.
Can anyone give me some guidance?

To use inner joins try:
SELECT m.*, a.*, b.*, c.*
FROM main m
INNER JOIN results_a a ON a.user_id = m.user_id
INNER JOIN results_b b ON b.user_id = m.user_id
INNER JOIN results_c c ON c.user_id = m.user_id
WHERE m.user_id = x
To differentiate column names, explicitly name the column and assign an alias
SELECT m.*, a.q1 as A_Q1, a.q2 as A_Q2, b.q1 as B_Q1, b.q2 as B_Q2 ...

I think you've designed your database in a way that you're going to have to do joins. It sounds better than the alternative (large, non-normalized table). What are you trying to get out? View all tests and determine a user's best score?
So if I understand the tables you have multiple users (obviously). Each user can take multiple tests. Each test could be taken multiple times.
select m.*, a.*, b.*, c.* from main m
join results_a a
on a.user_id = m.user_id and a.test_id = m.test_id and a.iteration_id = m.iteration_id
join results_b b
on b.user_id = m.user_id and b.test_id = m.test_id and b.iteration_id = m.iteration_id
join results_c c
on c.user_id = m.user_id and c.test_id = m.test_id and c.iteration_id = m.iteration_id
That should give you one row per iteration, per test, per user. If you want a certain test for a user then add:
where m.user_id = #userid and m.test_id = #testid
and then you can look at all the iterations for a test, by a user.
If you want the latest iteration:
select top(1) ...
where m.user_id = #userid and m.test_id = #testid
order by m.iteration_id desc
You might consider generating a view when your questions are set-up so you don't have to generate the select statement at run-time which represents a particular test.

Related

MySQL Query Performance with a Derived Query

I am looking at a few queries for performance and made a change to a query, which is based on the following examples. The change turned a 6 minute query into one which completes in few seconds and I was wondering why? How has this altered things to such an extent?
In the example, please assume the BOOK table to contain the general details for all books in a library and the FORMATS table contains details, such as HARDBACK, PAPERBACK and eBOOK (allowing for new formats to be added) where there is a key (called FORMATID) linking the two tables.
Query executes in 6 minutes
select b.bookid, f.formatname
from book b
inner join formats f on f.formatid = b.formatid
select b.bookid, f.formatname
from book b
left join formats f on f.formatid = b.formatid
Query executes in 12 seconds
select b.bookid, (select f.formatname from formats f where f.formatid = b.formatid)
from book b
where b.formatid is not null
select b.bookid, (select f.formatname from formats f where f.formatid = b.formatid)
from book b
In the above, the first query of each pair achieves INNER JOIN results and the second, achieves LEFT JOIN. The results difference on my database is 295166 and 295376 rows; the ties differences remain pretty much the same.
[added] For confirmation; I have tested this (with the same results) by creating the two test tables mentioned herein, populating the BOOKS table with ~1 million rows and NOT applying any index or other optimisation.

Getting count from joined table

Here's my problem: I need to get the amount of test cases and issues associated to a project that meet certain conditions (test cases that are successful, and issues that are flaws of the application), but for some reason the amount doesn't add up. I have 10 test cases in a project, of which 6 are successful; and 8 issues, of which only 4 are flaws. However, the respective results for COUNT each show 24, which makes no sense. I did notice, though, that 24 happens to be 6 times 4, but I don't see how the query would multiply them.
Anyway... Can someone help me find which part of my query is wrong? How can I get the correct result? Thanks in advance.
Here's the query:
SELECT
p.codigo_proyecto,
p.nombre,
IFNULL(COUNT(iep.id_incidencia_etapa_proyecto), 0) AS cantidad_defectos,
IFNULL(COUNT(tc.id_test_case), 0) AS test_cases_exitosos,
CASE IFNULL(COUNT(tc.id_test_case), 0) WHEN 0 THEN 'No aplica'
ELSE CONCAT((IFNULL(COUNT(tc.id_test_case), 0) / IFNULL(COUNT(tc.id_test_case), 0)) * 100, '%') END AS tasa_defectos
FROM proyecto p
INNER JOIN etapa_proyecto ep ON p.codigo_proyecto = ep.codigo_proyecto
INNER JOIN incidencia_etapa_proyecto iep ON ep.id_etapa_proyecto = iep.id_etapa_proyecto
INNER JOIN incidencia i ON iep.id_incidencia = i.id_incidencia
INNER JOIN test_case tc ON ep.id_etapa_proyecto = tc.id_etapa_proyecto
INNER JOIN etapa_proyecto ep_ultima ON ep_ultima.id_etapa_proyecto =
(SELECT ep_ultima2.id_etapa_proyecto FROM etapa_proyecto ep_ultima2
WHERE p.codigo_proyecto = ep_ultima2.codigo_proyecto ORDER BY ep_ultima2.fecha_termino_real DESC LIMIT 1)
WHERE p.esta_cerrado = 1
AND i.es_defecto = 1
AND tc.resultado = 'Exitoso'
AND ep_ultima.fecha_termino_real BETWEEN '2015-01-01' AND '2016-12-31';
I would have thought it obvious that you're not going to get the expected output from an aggregate query without a GROUP BY (which suggests you're not really in a position to evaluate any advice given here effectively).
You've not said how the states of your data are represented in the database - so I'm having to make a lot of guesses based on SQL which is clearly very wrong. And I don't speak spanish/portugese or whatever your native language is.
It looks like you are inferring that a defect exists if the primary key of the defects table is null. Primary keys cannot be null. The only way this would make any sort of sense (BTW it still won't give you the answer you're looking for) is to do a LEFT JOIN rather than an INNER JOIN.
But even then a simple COUNT() will consider null cases (no record in source table) as 1 record in the output set.
Then you've got the problem that you will have the product of defects and test cases in your output - consider the case where you have no defects, but 2 tests cases (1,2) - the result of an outer joiun will be:
defect test
------ ----
null 1
null 2
If you just count the rows, you'll get 2 defects in your output.
Taking a simpler schema, this demonstrates the 2 methods for getting the values - note that they have very different performance characteristics.
SELECT project.id
, dilv.defects
, (SELECT COUNT(*)
FROM test_cases) AS tests
FROM project
LEFT JOIN ( SELECT project_id, COUNT(*) AS defects
FROM defect_table
GROUP BY project_id) AS dilv
ON project.id=dilv.project_id

Need help fixing MySql Query

Here is the query I'm trying to run:
SELECT A.*
FROM student_lesson_progress A
LEFT JOIN student_lesson_progress B
ON A.studentId = B.studentId
AND A.lessonId = B.lessonId
WHERE A.lessonStatusTypeId = 2 AND
EXISTS (SELECT * FROM student_lesson_progress WHERE B.lessonStatusTypeID = 4)
Basically I'm not very skilled with SQL, but am trying to return all rows with a lessonStatusTypeID = 2 but only if there is a row with the same studentId and lessonId that has lessonStatusTypeID = 4.
My end goal once I am certain I have the query right, is that if a Student (studentID) has achieved a Status (lessonStatusTypeId) of 4 on a particular lesson (lessonID) I want to delete all the rows where Status is 2 for that particular Student on that particular lesson, as that data is no longer needed.
I pieced together the above query, and it runs alright on a small test DB, and seems to be returning the desired rows. However, when I try and run it on the production DB, where the student_lesson_progress table has around 600,000 rows, it just runs and runs and runs, locks up the database, pins the server cpu at 100%, and never returns data.
My guess is that my query is very poorly put together, and probably overly complicated for what I'm trying to do. I would greatly appreciate any tips or nudges in the right direction with this one.
General rule of thumb: If you're using a sub-select, you're probably not doing it right. This is not always the case, but if you can avoid sub-selects, you should.
This should work for the query. Your sub-select is what is probably killing your performance. You also should index sutdentId and lessonId, or put a compound index on both columns.
SELECT A.*
FROM student_lesson_progress A
INNER JOIN student_lesson_progress B
ON A.studentId = B.studentId
AND A.lessonId = B.lessonId
WHERE A.lessonStatusTypeId = 2 AND B.lessonStatusTypeID = 4
You need to use corelated subquery. But first make sure you have right indices on the table.
SELECT distinct A.*
FROM student_lesson_progress A
WHERE A.lessonStatusTypeId = 2
AND A.studentId in (
SELECT B.studentId
FROM student_lesson_progress B
WHERE B.studentId = A.studentId
And B.lessonStatusTypeId = 4);
Essentially means, get me list of all students with status 2, who also have a corresponding lesson with status of 4. The distinct will eliminate duplicates (if a student has more than 1 lesson with status 4).
Hope this works..

Not getting the Join I want

We are doing some pro bone work for a good cause and I'm having a hell of a time with a query. The coding has been done by many volunteers over the years which has an inevitable outcome.
I have two tables, A and B. What I need is a sum of of score_hours on a join between the two where the data is unique for each instance of only A.
Please keep in mind that both tables are quite big (10 to 50k+ each depending on time in the month).
Table A:
id (pk, ai)
uid (int)
scores_date (timestamp (but for some reason only the actual date, not
the time))
score_hours (decimal 3,1)
Table B:
id (pk, ai)
uid (int)
shift_date (timestamp)
There are many records in table B that have the uid we are looking for on several dates (the dates are not unique). Table A has multiple records for uid but on different days. So it could have 1 uid a day, but not 2 instances of 1 uid a day.
There are obviously more selectors for both tables, but they don't match in any way between the tables (although I do need to query them with simple "AND") so this is what I have to work with. I do need to join them because of the rest of the query, but so far I'm not getting the records I need within a decent time.
My attempts were:
This almost made it. But the execution time was disgusting and failed with some simple selectors.
SELECT SUM(score_hours)
FROM A
WHERE
A.uid IN
(SELECT B.uid
FROM B
WHERE B.uid = "1")
This gives the right output but it joins one for every instance of a uid. Normally you can solve that by grouping, but the sum will still count all. So that is not an option:
SELECT SUM(score_hours)
FROM A
LEFT JOIN B ON A.uid = B.uid
WHERE A.uid = "1"
*edit: Not only do I need to JOIN on uid, but there has to be something like this in it:
DISTINCT(date(m.shift_datum)) = DATE(d.dagscores_date)
It is actually a very basic query, except for the fact that a SUM is needed on a record which is not unique in regards to the Left join and that I need to JOIN on two tables at the same time.
If you need more data please tell me so. I can provide all.
You need to remove the duplicates from the table you're joining with, otherwise the cross-product creates multiple rows that get added into the sum.
SELECT SUM(score_hours)
FROM A
JOIN (SELECT DISTINCT uid
FROM B) AS B
ON A.uid = B.uid

MySQL - Fix multiple records

SELECT cf.FK_collection, c.collectionName,
uf.FK_userMe, uf.FK_userYou,
u.userId, u.username
FROM userFollows as uf
INNER JOIN collectionFollows as cf ON uf.FK_userMe = cf.FK_user
INNER JOIN collections as c ON cf.FK_collection = c.collectionId
INNER JOIN users as u ON uf.FK_userYou = u.userId
WHERE uf.FK_userMe = 2
Hey guys.
I'm trying to make this query, and it of course won't do as I want it to, since it's returning multiple rows which is in some way what I want, and yet it's not. Let me try to explain:
I trying to get both collectionFollows and userFollows, for showing a users activity on the site. But when doing this, I will have multiple rows from userFollows even tho a user only follows 1. This occurs because I'm following multiple collectionFollows.
So when I show my result it will return like this:
John is following 'webdesign'
John is following 'Lisa'
John is following 'programming'
John is following 'Lisa'
I would like to know if I have to make multiple queries or use an subquery? What would be best practice? And how would I write the query then?
You are actually combining two quite unrelated queries. I would keep them as separate queries, especially since you report them like that too. You could, if you like, use UNION ALL to combine those queries. This way, you have just a list of names of items you follow, regardless of the type of item it is. If you want, you can specify that too.
SELECT
cf.user,
cf.FK_collection as followItem,
c.collectionName as followName,
'collection' as followType
FROM collectionFollows as cf
INNER JOIN collections as c ON cf.FK_collection = c.collectionId
WHERE cf.user = 2
UNION ALL
SELECT
uf.FK_userMe,
u.userId,
u.username
'user' as followType
FROM userFollows as uf
INNER JOIN users as u ON uf.FK_userYou = u.userId
WHERE uf.FK_userMe = 2
An alternative would be to filter unique values in PHP, but even then your query will fail. Because of the inner joins, you will not get any results if a user only follows other users or only follows collections. You need at least one of both to get any results.
You could change INNER JOIN to LEFT JOIN, but then you would still have to post-process the query to filter doubles and filter out the NULL values.
UNION ALL is fast. It just sticks two query results together without furthes processing. This is different from UNION, which will filter double as well (like DISTINCT). In this case, it is not needed, because I assume a user can only follow a collection or other user once, so these queries will never return duplicate records. If that is indeed the case, UNION ALL will do just fine and will be faster than UNION.
Apart from UNION ALL, two separate queries is fine too.