I am developing a database for a payroll application, and one of the features I'll need is a table that stores the list of employees that work at each store, each day of the week.
Each employee has an ID, so my table looks like this:
| Mon | Tue | Wed | Thu | Fri | Sat | Sun
Store 1 | 3,4,5 | 3,4,5 | 3,4,5 | 4,5,7 | 4,5,7 | 4,5,6,7 | 4,5,6,7
Store 2 | 1,8,9 | 1,8,9 | 1,8,9 | 1,8,9 | 1,8,9 | 1,8,9 | 1,8,9
Store 3 | 10,12 | 10,12 | 10,12 | 10,12 | 10,12 | 10,12 | 10,12
Store 4 | 15 | 15 | 15 | 16 | 16 | 16 | 16
Store 5 | 6,11,13 | 6,11,13 | 6,11,13 | 14,18,19| 14,18,19| 14,18,19| 14,18,19
My question is, how do I represent that on my database? I came up with the following ideas:
Idea 1: Pretty much replicate the design above, creating a table with the following columns: [Store_id | Mon | Tue ... | Sat | Sun] and then store the list of employee IDs of each day as a string, with IDs separated by commas. I know that comma-separated lists are not good database design, but sometimes they do look tempting, as in this case.
Store_id | Mon | Tue | Wed | Thu | Fri | Sat
---------+---------+---------+---------+---------+---------+---------
1 | '3,4,5' | '3,4,5' | '3,4,5' | '4,5,7' | '4,5,7' | '4,5,6,7'
2 | '1,8,9' | '1,8,9' | '1,8,9 '| '1,8,9' | '1,8,9' | '1,8,9'
Idea 2: Create a table with the following columns: [Store_id | Day | Employee_id]. That way each employee working at a specific store at a specific day would be an entry in this table. The problem I see is that this table would grow quite fast, and it would be harder to visualize the data at the database level.
Store_id | Day | Employee_id
---------+-----+-------------
1 | mon | 3
1 | mon | 4
1 | mon | 5
1 | tue | 3
1 | tue | 4
Any of these ideas sound viable? Any better way of storing the data?
if I were you I would store the employee data and stores data in separate tables... but still keep the design of your main table. so do something like this
CREATE TABLE stores (
id INT, -- make it the primary key auto increment.. etc
store_name VARCHAR(255)
-- any other data for your store here.
);
CREATE TABLE schedule (
id INT, -- make it the primary key auto increment.. etc
store_id INT, -- FK to the stores table id
day VARCHAR(20),
emp_id INT -- FK to the employees table id
);
CREATE TABLE employees
id INT, -- make it the primary key auto increment.. etc
employee_name VARCHAR(255)
-- whatever other employee data you need to store.
);
I would have a table for stores and for employees as that way you can have specific data for each store or employee
BONUS:
if you wanted a query to show the store name with the employees name and their schedule and everything then all you have to do is join the two tables
SELECT s.store_name, sh.day, e.employee_name
FROM schedule sh
JOIN stores s ON s.id = sh.store_id
JOIN employees e ON e.id = sh.emp_id
this query has limitations though because you cannot order by days so you could get data by random days.. so in reality you also need a days table with specific data for the day that way you can order the data by the beginning or end of the week.
if you did want to make a days table it would just be the same thing again
CREATE TABLE days(
id INT,
day_name VARCHAR(20),
day_type VARCHAR(55)
-- any more data you want here
)
where day name would be Mon Tue... and day_type would be Weekday or Weekend
and then all you would have to do for your query is
SELECT s.store_name, sh.day, e.employee_name
FROM schedule sh
JOIN stores s ON s.id = sh.store_id
JOIN employees e ON e.id = sh.emp_id
JOIN days d ON d.id = sh.day_id
ORDER BY d.id
notice the two colums in the schedule table for day would be replaced with one column for the day_id linked to the days table.
hope thats helpful!
The second design is correct for a relational database. One employee_id per row, even if it results in multiple rows per store per day.
The number of rows is not likely to get larger than the RDBMS can handle, if your example is accurate. You have no more than 4 employees per store per day, and 5 stores, and up to 366 days per year. So no more than 7320 rows per year, and perhaps less.
I regularly see databases in MySQL that have hundreds of millions or even billions of rows in a given table. So you can continue to run those stores for many years before running into scalability problems.
I upvoted John Ruddell's answer, which is basically your option #2 with the addition of tables to hold data about the store and the employee. I won't repeat what he said, but let me just add a couple of thoughts that are too long for a comment:
Never ever ever put comma-separated values in a database record. This makes the data way harder to work with.
Sure, either #1 or #2 makes it easy to query to find which employees are working at store 1 on Friday:
Method 1:
select Friday_employees from schedule where store_id='store 1'
Method 2:
select employee_id from schedule where store_id=1 and day='fri'
But suppose you want to know what days employee #7 is working.
With method 2, it's easy:
select day from schedule where employee_id=7
But how would you do that with method 1? You'd have break the field up into it's individual pieces and check each piece. At best that's a pain, and I've seen people screw it up regularly, like writing
where Friday_employees like '%7%'
Umm, except what if there's an employee number 17 or 27? You'll get them too. You could say
where Friday_employees like '%,7,%'
But then if the 7 is the first or the last on the list, it doesn't work.
What if you want the user to be able to select a day and then give them the list of employees working on that day?
With method 2, easy:
select employee_id from schedule where day=#day
Then you use a parameterized query to fill in the value.
With method 1 ...
select employee_id from schedule where case when #day='mon' then Monday_employees when #day='tue' then Tuesday_employees when #day='wed' then Wednesday_employees when #day='thu' then Thursday_employees when #day='fri' then Friday_employees when #day='sat' then Saturday_employees as day_employees
That's a beast, and if you do it a lot, sooner or later you're going to make a mistake and leave a day out or accidentally type "when day='thu' then Friday_employees" or some such. I've seen that happen often enough.
Even if you write those long complex queries, performance will suck. If you have a field for employee_id, you can index on it, so access by employee will be fast. If you have a comma-separated list of employees, then a query of the "like '%,7,%' variety requires a sequential search of every record in the database.
Related
I am by no means an MySQL expert, so I am looking for any help on this matter.
I need to perform a simple test (in principle), I have this (simplified) table:
tableid | userid | car | From | To
--------------------------------------------------------
1 | 1 | Fiesta | 2015-01-01 | 2015-01-31
2 | 1 | MX5 | 2015-02-01 | 2015-02-28
3 | 1 | Navara | 2015-03-01 | 2015-03-31
4 | 1 | GTR | 2015-03-28 | 2015-04-30
5 | 2 | Focus | 2015-01-01 | 2015-01-31
6 | 2 | i5 | 2015-02-01 | 2015-02-28
7 | 2 | Aygo | 2015-03-01 | 2015-03-31
8 | 2 | 206 | 2015-03-29 | 2015-04-30
9 | 1 | Skyline | 2015-04-29 | 2015-05-31
10 | 2 | Skyline | 2015-04-29 | 2015-05-31
I need to find two things here:
If any user has date overlaps in his car assignments of more than one day (end of the assignment can be on the same day as the new assignment start).
Did any two users tried to get the same car assigned on the same date, or the date ranges overlap for them on the same car.
So the query (or queries) I am looking for should return those rows:
tableid | userid | car | From | To
--------------------------------------------------------
3 | 1 | Navara | 2015-03-01 | 2015-03-31
4 | 1 | GTR | 2015-03-28 | 2015-04-30
7 | 2 | Aygo | 2015-03-01 | 2015-03-31
8 | 2 | 206 | 2015-03-29 | 2015-04-30
9 | 1 | Skyline | 2015-04-29 | 2015-05-31
10 | 2 | Skyline | 2015-04-29 | 2015-05-31
I feel like I am bashing my head against the wall here, I would be happy with being able to do these comparisons in separate queries. I need to display them in one table but I could always then join the results.
I've done research and few hours of testing but I cant get nowhere near the result I want.
SQLFiddle with the above test data
I've tried these posts btw (they were not exactly what I needed but were close enough, or so I thought):
Comparing two date ranges within the same table
How to compare values of text columns from the same table
This was the closest solution I could find but when I tried it on a single table (joining table to itself) I was getting crazy results: Checking a table for time overlap?
EDIT
As a temporary solution I have adapted a different approach, similar to the posts I have found during my research (above). I will now check if the new car rental / assignment date overlaps with any date range within the table. If so I will save the id(s) of the rows that the date overlaps with. This way at least I will be able to flag overlaps and allow a user to look at the flagged rows and to resolve any overlaps manually.
Thanks to everyone who offered their help with this, I will flag philipxy answer as the chosen one (in next 24h) unless someone has better way of achieving this. I have no doubt that following his answer I will be able to eventually reach the results I need. At the moment though I need to adopt any solution that works as I need to finish my project in next few days, hence the change of approach.
Edit #2
The both answers are brilliant and to anyone who finds this post having the same issue as I did, read them both and look at the fiddles! :) A lot of amazing brain-work went into them! Temporarily I had to go with the solution I mention in #1 Edit of mine but I will be adapting my queries to go with #Ryan Vincent approach + #philipxy edits/comments about ignoring the initial one day overlap.
Here is the first part: Overlapping cars per user...
SQLFiddle - correlated Query and Join Query
Second part - more than one user in one car at the same time: SQLFiddle - correlated Query and Join Query. Query below...
I use the correlated queries:
You will likely need indexes on userid and 'car'. However - please check the 'explain plan' to see how it mysql is accessing the data. And just try it :)
Overlapping cars per user
The query:
SELECT `allCars`.`userid` AS `allCars_userid`,
`allCars`.`car` AS `allCars_car`,
`allCars`.`From` AS `allCars_From`,
`allCars`.`To` AS `allCars_To`,
`allCars`.`tableid` AS `allCars_id`
FROM
`cars` AS `allCars`
WHERE
EXISTS
(SELECT 1
FROM `cars` AS `overlapCar`
WHERE
`allCars`.`userid` = `overlapCar`.`userid`
AND `allCars`.`tableid` <> `overlapCar`.`tableid`
AND NOT ( `allCars`.`From` >= `overlapCar`.`To` /* starts after outer ends */
OR `allCars`.`To` <= `overlapCar`.`From`)) /* ends before outer starts */
ORDER BY
`allCars`.`userid`,
`allCars`.`From`,
`allCars`.`car`;
The results:
allCars_userid allCars_car allCars_From allCars_To allCars_id
-------------- ----------- ------------ ---------- ------------
1 Navara 2015-03-01 2015-03-31 3
1 GTR 2015-03-28 2015-04-30 4
1 Skyline 2015-04-29 2015-05-31 9
2 Aygo 2015-03-01 2015-03-31 7
2 206 2015-03-29 2015-04-30 8
2 Skyline 2015-04-29 2015-05-31 10
Why it works? or How I think about it:
I use the correlated query so I don't have duplicates to deal with and it is probably the easiest to understand for me. There are other ways of expressing the query. Each has advantages and drawbacks. I want something I can easily understand.
Requirement: For each user ensure that they don't have two or more cars at the same time.
So, for each user record (AllCars) check the complete table (overlapCar) to see if you can find a different record that overlaps for the time of the current record. If we find one then select the current record we are checking (in allCars).
Therefore the overlap check is:
the allCars userid and the overLap userid must be the same
the allCars car record and the overlap car record must be different
the allCars time range and the overLap time range must overlap.
The time range check:
Instead of checking for overlapping times use positive tests. The easiest approach, is to check it doesn't overlap, and apply a NOT to it.
One car with More than One User at the same time...
The query:
SELECT `allCars`.`car` AS `allCars_car`,
`allCars`.`userid` AS `allCars_userid`,
`allCars`.`From` AS `allCars_From`,
`allCars`.`To` AS `allCars_To`,
`allCars`.`tableid` AS `allCars_id`
FROM
`cars` AS `allCars`
WHERE
EXISTS
(SELECT 1
FROM `cars` AS `overlapUser`
WHERE
`allCars`.`car` = `overlapUser`.`car`
AND `allCars`.`tableid` <> `overlapUser`.`tableid`
AND NOT ( `allCars`.`From` >= `overlapUser`.`To` /* starts after outer ends */
OR `allCars`.`To` <= `overlapUser`.`From`)) /* ends before outer starts */
ORDER BY
`allCars`.`car`,
`allCars`.`userid`,
`allCars`.`From`;
The results:
allCars_car allCars_userid allCars_From allCars_To allCars_id
----------- -------------- ------------ ---------- ------------
Skyline 1 2015-04-29 2015-05-31 9
Skyline 2 2015-04-29 2015-05-31 10
Edit:
In view of the comments, by #philipxy , about time ranges needing 'greater than or equal to' checks I have updated the code here. I havn't changed the SQLFiddles.
For each input and output table find its meaning. Ie a statement template parameterized by column names, aka predicate, that a row makes into a true or false statement, aka proposition. A table holds the rows that make its predicate into a true proposition. Ie rows that make a true proposition go in a table and rows that make a false proposition stay out. Eg for your input table:
rental [tableid] was user [userid] renting car [car] from [from] to [to]
Then phrase the output table predicate in terms of the input table predicate. Don't use descriptions like your 1 & 2:
If any user has date overlaps in his car assignments of more than one day (end of the assignment can be on the same day as the new assignment start).
Instead find the predicate that an arbitrary row states when in the table:
rental [tableid] was user [user] renting car [car] from [from] to [to]
in self-conflict with some other rental
For the DBMS to calculate the rows making this true we must express this in terms of our given predicate(s) plus literals & conditions:
-- query result holds the rows where
FOR SOME t2.tableid, t2.userid, ...:
rental [t1.tableid] was user [t1.userid] renting car [t1.car] from [t1.from] to [t1.to]
AND rental [t2.tableid] was user [t2.userid] renting car [t2.car] from [t2.from] to [t2.to]
AND [t1.userid] = [t2.userid] -- userids id the same users
AND [t1.to] > [t2.from] AND ... -- tos/froms id intervals with overlap more than one day
...
(Inside an SQL SELECT statement the cross product of JOINed tables has column names of the form alias.column. Think of . as another character allowed in column names. Finally the SELECT clause drops the alias.s.)
We convert a query predicate to an SQL query that calculates the rows that make it true:
A table's predicate gets replaced by the table alias.
To use the same predicate/table multiple times make aliases.
Changing column old to new in a predicate adds ANDold=new.
AND of predicates gets replaced by JOIN.
OR of predicates gets replaced by UNION.
AND NOT of predicates gets replaced by EXCEPT, MINUS or appropriate LEFT JOIN.
ANDcondition gets replaced by WHERE or ON condition.
For a predicate true FOR SOMEcolumns to drop or when THERE EXISTScolumns to drop, SELECT DISTINCTcolumns to keep.
Etc. (See this.)
Hence (completing the ellipses):
SELECT DISTINCT t1.*
FROM t t1 JOIN t t2
ON t1.userid = t1.userid -- userids id the same users
WHERE t1.to > t2.from AND t2.to > t1.from -- tos/froms id intervals with overlap more than one day
AND t1.tableid <> t2.tableid -- tableids id different rentals
Did any two users tried to get the same car assigned on the same date, or the date ranges overlap for them on the same car.
Finding the predicate that an arbitrary row states when in the table:
rental [tableid] was user [user] renting car [car] from [from] to [to]
in conflict with some other user's rental
In terms of our given predicate(s) plus literals & conditions:
-- query result holds the rows where
FOR SOME t2.*
rental [t1.tableid] was user [t1.userid] renting car [t1.car] from [t1.from] to [t1.to]
AND rental [t2.tableid] was user [t2.userid] renting car [t2.car] from [t2.from] to [t2.to]
AND [t1.userid] <> [t2.userid] -- userids id different users
AND [t1.car] = [t2.car] -- .cars id the same car
AND [t1.to] >= [t2.from] AND [t2.to] >= [t1.from] -- tos/froms id intervals with any overlap
AND [t1.tableid] <> [t2.tableid] -- tableids id different rentals
The UNION of queries for predicates 1 & 2 returns the rows for which predicate 1ORpredicate 2.
Try to learn to express predicates--what rows state when in tables--if only as the goal for intuitive (sub)querying.
PS It is good to always have data checking edge & non-edge cases for a condition being true & being false. Eg try query 1 with GTR starting on the 31st, an overlap of only one day, which should not be a self-conflict.
PPS Querying involving duplicate rows, as with NULLs, has quite complex query meanings. It's hard to say when a tuple goes in or stays out of a table and how many times. For queries to have the simple intuitive meanings per my correspondences they can't have duplicates. Here SQL unfortunately differs from the relational model. In practice people rely on idioms when allowing non-distinct rows & they rely on rows being distinct because of constraints. Eg joining on UNIQUE columns per UNIQUEs, PKs & FKs. Eg: A final DISTINCT step is only doing work at a different time than a version that doesn't need it; time might or might not be be an important implementation issue affecting the phrasing chosen for a given predicate/result.
I'm new to relation databases and mySQL, I am trying to develop a database for employees, that logs all the times its employees access the system(shown by recording the timestamp of everytime it access the system).
So when the employee accesses the system, the current timeStamp is recorded, and the next time they acess it that current timestamp is also recorded. The idea is that i can go back and query how many times in a day an employee accessed the system or week and so on, for any employee.
so far i have:
EMP_ID | F_Name | L_Name | TimeStamp
-------------------------------------
1222 | joe | blogs | 12.03.22
1222 | joe | blogs | 12.44.34
1352 | carl | mansy | 19.33.22
and so on, i would like to know if there is a way to have just one emp_id show up with all the timestamps below, or do i need another table? or can i just have the data base like this?
Obviously this will grow in size a lot, so would it be better to have a table for every emp_id?
Thanks in advance Jonny
You should have 2 tables
first one is the employee table
emplyee : EMP_ID | F_Name | L_Name
the second one is the log table
employee_log : EMP_ID | TimeStamp
the first table will store the data of the empolyee
the second will store just the log of this employee
and if you want to retrieve the logs you just need to join betwen these tables
select * from employee
left join employee_log on employee.EMP_ID = employee_log.EMP_ID
You should have 2 tables one to store employee data and the other to store
the log data
employee : EMP_ID | F_Name | L_Name
employee_log : EMP_ID | TimeStamp
Assuming all data is stored in table named my_log_table, To see all timestamps for an EMP_ID 12222, query it like
select TimeStamp from my_log_table where EMP_ID = 12222
Having all logs in same table would be scalable and easier to use.
Issues with having multiple log tables:
For each new user, you have to manually create a new table
Privileges have to be granted to this script/user to query the new
table created for a new user
If EMP_ID changes, then you have to track and change table names
Moreover, you are not saving any space other than one column of EMP_ID of
combined log table
I hope it would be better if you maintain your table as below.
EMP_ID | TimeStamp
1222 | 12.03.22
1222 | 12.44.34
1352 | 19.33.22
So, while retrieving you can display as cross table like below
1222 | 1352 |
12.03.22 | 19.33.22
12.44.34 |
What do you think is the better basis, in sense of "easyer to use" with SQL Syntax - the first or the second table?
Please give reasons.
table one:
+----+--------------------------------------+
| id | date1 | date2 | date3 |
+----+------------+------------+------------+
| 1 | 2014-02-15 | 2014-03-24 | 2014-03-24 |
| 2 | NULL | NULL | 2014-08-15 |
| 3 | 2014-06-13 | NULL | NULL |
| 4 | 2014-01-10 | 2014-09-14 | 2014-01-12 |
+----+------------+------------+------------+
table two:
+----+------------+-------+-------+-------+
| id | date | one | two | three |
+----+------------+-------+-------+-------+
| 1 | 2015-07-04 | true | true | false |
| 2 | 2014-06-13 | false | true | false |
| 3 | 2014-11-11 | true | false | false |
| 4 | 2017-03-02 | false | true | true |
+----+------------+-------+-------+-------+
(content of tables doesn't match in this example)
I just want to know if it is easier to deal with when you have just one date field and additional boolean fields instead of multiple date fields. For example if you want to have SELECTs like this
That depends what the dates are.
Just because two fields are both dates tell us nothing about what they have to do with each other, if anything.
If the three dates are totally unrelated and would never be interchangeable in processing, and if they are a fixed set that is not likely to change frequently, like "birth date", "hire date", and "next annual review date", then I would just make them three separate fields. Then when you write queries it would be very straightforward, like
select employee_id, name from employee where next_annual_review_date='2015-02-01'
On the other hand, if you might quite reasonably write a query that would search all three dates, then it makes sense to break the dates out into another table, with a field that identifies the specific date. Like I created a table once for a warehouse system where there were many dates associated with a stock item -- the date it arrived in the warehouse, the date it was sold, inventoried, returned to the warehouse (because the customer returned it, for example), re-sold, lost, damaged, repaired, etc. These dates could come in many possible orders, and many of them could occur multiple times. Like an item might be damaged, repaired, and then damaged and repaired again, or it could be sold, returned, sold again, and returned again, etc. So I created a table for the stock item with the "static" info like part number, description, and the bazillion codes that the user needed to describe the item, and then a separate "stock event" table with the stock item id, event code, the date, and various other stuff. Then there was another stock event table that listed the event codes with descriptions.
This made it easy to construct queries like, "List everything that has happened to this item in the past four years in date order", or "list all items added to the inventory in November", etc.
Your second table seems like an all-around bad idea. I can't think of any advantage to having 3 Boolean fields rather than one field that says what it is. Suppose the three dates are birth date, hire date, and next review date. You could create codes for these -- maybe 1,2, 3; maybe B, H, R; whatever. Then selecting on a specific event is easy enough either way, I guess: select date where hire = true versus select date where event = 'H'.
But listing multiple dates with a description is much easier with a code. You just need a table of codes and descriptions, and then you write
select employee_name, event_code, date
from employee e
join employee_event ev on ev.employee_id=e.employee_id
join event v on v.event_id=ev.event_id
where ... whatever ...
But with the Booleans, you'd need a three-way case/when.
What happens when new event types are added? With an event code, it's just a data change: add a enw record to the event code table. With the Booleans, you need to change the database.
You create the potential for ambiguous data. What happens if two of the Booleans are true, or if none of them are true? What does that mean? There's a whole category of error that can't possibly happen with event codes.
Neither of those are normalized. Normalization is a good way to avoid data anomalies and keep things DRY.
What do your dates represent? What does "one", "two", and "three" represent?
I would go with something like this:
create table my_table (
my_table_id int primary key,
a_more_descriptive_word_than_date date not null,
label text not null
);
The data would look like this:
id date label
1 2014-12-23 one
2 2014-12-24 two
3 2014-12-25 three
In my application I have association between two entities employees and work-groups.
This association usually changes over time, so in my DB I have something like:
emplyees
| EMPLOYEE_ID | NAME |
| ... | ... |
workgroups
| GROUP_ID | NAME |
| ... | ... |
emplyees_workgroups
| EMPLOYEE_ID | GROUP_ID | DATE |
| ... | ... | ... |
So suppose I have an association between employee 1 and group 1, valid from 2014-01-01 on.
When a new association is created, for example from 2014-02-01 on, the old one is no longer valid.
This structure for the associative table is a bit problematic for queries, but I actually would avoid to add an END_DATE field to the table beacuse it will be a reduntant value and also requires the execution of an insert + update or update on two rows every time a change happens in an association.
So have you any idea to create a more practical architecture to solve my problem? Is this the better approach?
You have what is called a slowly changing dimension. That means that you need to have dates in the employees_workgroup table in order to find the right workgroup at the right time for a set of employees.
The best way to handle this is to have to dates, which I often call effdate and enddate on each row. This greatly simplifies queries, where you are trying to find the workgroup at a particular point in time. Such a query might look like with this structure:
select ew.*
from employees_workgroup ew
where MYDATE between effdate and enddate;
Now consider the same results using only one date per field. It might be something like this:
select ew.*,
from employees_workgroup ew join
(select employee_id, max(date) as maxdate
from employees_workgroup ew2
where ew2.employee_id = ew.employee_id and
ew2.date <= MYDATE
) as rec
on ew.employee_id = rec.employee_id and ew.adte = ew.maxdate;
The expense of doing an update along with the insert is minimal compared to the complexity this will introduce in the queries.
I have two tables, one table has some information in each row along with a comma seperated list of ids that another table contains. Right now I am grabbing the data from table A (with the comma seperated ids), and I want to also grab all of the data from Table B (the table containing additional information). I would like to do this in the most efficient SQL method possible.
I was thinking about joining Table B to Table A based on the ids IN the field, but I was not sure if this is possible. It is also important to note that I am grabbing data from Table A based on another IN statement, so my ultimate goal is to attach all of the rows in Table B to Table A's rows depending on which ids are in the field in Table A's rows (row by row basis)
If someone could follow all of that and knows what I am trying to do I would appreciate a sample query :D
If you need any further clarifaction I would be happy to provide them.
Thanks
The way Table A is setup now:
`table_a_id` VARCHAR ( 6 ) NOT NULL,
`table_b_ids` TEXT NOT NULL, -- This is a comma seperated list at the moment
-- More data here that is irrelevant to this question but i am grabbing
Table B is setup like this:
`table_b_id` VARCHAR ( 6 ) NOT NULL,
`name` VARCHAR ( 128 ) NOT NULL,
-- More data that is not relevant to the question
Also I want to eventually switch to a NOSQL system like Cassandra, from what I have briefly read I understand there are no such things as joins in NOSQL? A bonus help would be to help me to setup these tables so I can convert over with less conversions and difficulty.
You need to add another table.
Person -- your Table A
------
PersonID
Thing -- your Table B
------
ThingID
ThingName
PersonThing -- new intersection table
-------
PersonID
ThingID
Then your query becomes
SELECT * from Person
INNER JOIN PersonThing ON Person.PersonID = PersonThing.PersonID
INNER JOIN Thing ON PersonThing.ThingID = Thing.ThingID
So where now you have
001 | Sam Spade | 12,23,14
You would have
Person
001 | Sam Spade
Thing
12 | box
23 | chair
14 | wheel
PersonThing
001 | 12
001 | 23
001 | 14
This is what the other answers mean by "normalizing".
Edited to add
From what I understand of NoSQL, you would get around the joins like this:
Person -- your Table A
------
PersonID
OtherPersonStuff
Thing -- your Table B
------
ThingID
ThingName
OtherThingStuff
PersonThing -- denormalized table, one record for each Thing held by each Person
-------
PersonID
ThingID
ThingName
OtherThingStuff
In exchange for taking up extra space (by duplicating the Thing information many times) and potential data management headaches (keeping the duplicates in sync), you get simpler, faster queries.
So your last table would look like this:
PersonThing
001 | 12 | box | $2.00
001 | 23 | chair | $3.00
001 | 14 | wheel | $1.00
002 | 12 | box | $2.00
003 | 14 | wheel | $1.00
In this case OtherThingStuff is the value of the Thing.
You should consider normalizing your database schema in order to use a join. Using comma separated lists will not allow you to use any SQL IN commands.
The best way to do it is to store a row for each unique ID, then you can JOIN on TableA.id = TableB.id