I'm writing an application in PHP that uses a MySQL database to store information. One of the pieces of information that it has to store is an array of names, but the array has no set length. It could range from one to many items. Is there a way to store this array in MySQL and be able to retrieve it by specifying just one item from the array?
For example, I want to be able to store something like this in a table:
Foo Bar
-------------------
[baz,car] fiz
[abc,def] ghi
Then, I want to be able to tell MySQL just to search for car and SELECT the first row in the table above.
Is there a feasible way to implement this?
The way to implement it is to "normalized" the schema. There are several related ideas which comprise schema normalization, but the critical one here is that your data should be represented with two tables and a relationship between. The relationship is called, "one to many", since for one "Bar" there are one-or-more "Foo".
A necessary step, then, is to add unique identifier columns to your schema (easiest just to use auto-incrementing integers), and then query using the JOIN mechanism to relate your data.
this is why we call MySQL (and many others) a "relational" database.
Bar
+----+----------+
| id | name |
+----+----------+
| 01 | fiz |
+----+----------+
| 02 | ghi |
+----+----------+
Foo
+----+--------+----------+
| id | bar_id | name |
+----+--------+----------+
| 01 | 01 | baz |
+----+--------+----------+
| 02 | 01 | car |
+----+--------+----------+
| 03 | 02 | abc |
+----+--------+----------+
| 04 | 03 | def |
+----+--------+----------+
And here is what the join looks like to select the "fiz" record based on "car" in the Foo relation
SELECT
Bar.*
FROM Bar
JOIN Foo ON Bar.id = Foo.bar_id
WHERE Foo.name = "car"
And if you want the entire list of Foo relations for each bar with a matching Foo:
SELECT
Bar.*,
GROUP_CONCAT(Foo.name)
FROM Bar
JOIN Foo ON Bar.id = Foo.bar_id
WHERE Foo.name = "car"
GROUP BY Bar.id
Either
SELECT Bar FROM tbl
WHERE FIND_IN_SET('car', Foo)
Related
In short; we are trying to return certain results from one table based on second level criteria of another table.
I have a number of source data tables,
So:
Table DataA:
data_id | columns | stuff....
-----------------------------
1 | here | etc.
2 | here | poop
3 | here | etc.
Table DataB:
data_id | columnz | various....
-----------------------------
1 | there | you
2 | there | get
3 | there | the
4 | there | idea.
Table DataC:
data_id | column_s | others....
-----------------------------
1 | where | you
2 | where | get
3 | where | the
4 | where | idea.
Table DataD: etc. There are more and more will be added ongoing
And a relational table of visits, where there are "visits" to some of these other data rows in these other tables above.
Each of the above tables holds very different sets of data.
The way this is currently structured is like this:
Visits Table:
visit_id | reference | ref_id | visit_data | columns | notes
-------------------------------------------------------------
1 | DataC | 2 | some data | etc. | so this is a reference
| | | | | to a visit to row id
| | | | | 2 on table DataC
2 | DataC | 3 | some data | etc. | ...
3 | DataB | 4 | more data | etc. | so this is a reference
| | | | | to a visit to row id
| | | | | 4 on table DataB
4 | DataA | 1 | more data | etc. | etc. etc.
5 | DataA | 2 | more data | etc. | you get the idea
Now we currently list the visits by various user given criteria, such as visit date.
however the user can also choose which tables (ie data types) they want to view, so a user has to tick a box to show they want data from DataA table, and DataC table but not DataB, for example.
The SQL we currently have works like this; the column list in the IN conditional is dynamically generated from user choices:
SELECT visit_id,columns, visit_data, notes
FROM visits
WHERE visit_date < :maxDate AND visits.reference IN ('DataA','DataC')
The Issue:
Now, we need to go a step beyond this and list the visits by a sub-criteria of one of the "Data" tables,
So for example, DataA table has a reference to something else, so now the client wants to list all visits to numerous reference types, and IF the type is DataA then to only count the visits if the data in that table fits a value.
For example:
List all visits to DataB and all visits to DataA where DataA.stuff = poop
The way we currently work this is a secondary SQL on the results of the first visit listing, exampled above. This works but is always returning the full table of DataA when we only want to return a subset of DataA but we can't be exclusive about it outside of DataA.
We can't use LEFT JOIN because that doesn't trim the results as needed, we can't use exclusionary joins (RIGHT / INNER) because that then removes anything from DataC or any other table,
We can't find a way to add queries to the WHERE because again, that would loose any data from any other table that is not DataA.
What we kind of need is a JOIN within an IF/CASE clause.
Pseudo SQL:
SELECT visit_id,columns, visit_data, notes
FROM visits
IF(visits.reference = 'DataA')
INNER JOIN DataA ON visits.ref_id = DataA.id AND DataA.stuff = 'poop'
ENDIF
WHERE visit_date < 2020-12-06 AND visits.reference IN ('DataA','DataC')
All criteria in the WHERE clause are set by the user, none are static (This includes the DataA.stuff criteria too).
So with the above example the output would be:
visit_id | reference | ref_id | visit_data | columns | notes
-------------------------------------------------------------
1 | DataC | 2 | some data | etc. |
2 | DataC | 3 | some data | etc. |
5 | DataA | 1 | more data | etc. |
We can't use Union because the different Data tables contain lots of different details.
Questions:
There may be a very straightforward answer to this but I can't see it,
How can we approach trying to achieve this sort of partial exclusivity?
I suspect that our overarching architecture structure here could be improved (the system complexity has grown organically over a number of years). If so, what could be a better way of building this?
What we kind of need is a JOIN within an IF/CASE clause.
Well, you should know that's not possible in SQL.
Think of this analogy to function calls in a conventional programming language. You're essentially asking for something like:
What we need is a function call that calls a different function depending on the value you pass as a parameter.
As if you could do this:
call $somefunction(argument);
And which $somefunction you call would be determined by the function called, depending on the value of argument. This doesn't make any sense in any programming language.
It is similar in SQL — the tables and columns are fixed at the time the query is parsed. Rows of data are not read until the query is executed. Therefore one can't change the tables depending on the rows executed.
The simplest answer would be that you must run more than one query:
SELECT visit_id,columns, visit_data, notes
FROM visits
INNER JOIN DataA ON visits.ref_id = DataA.id AND DataA.stuff = 'poop'
WHERE visit_date < 2020-12-06 AND visits.reference = 'DataA';
SELECT visit_id,columns, visit_data, notes
FROM visits
WHERE visit_date < 2020-12-06 AND visits.reference = 'DataC';
Not every task must be done in one SQL query. If it's too complex or difficult to combine two tasks into one query, then leave them separate and write code in the client application to combine the results.
I have a table with a column for agent names and a column for each of the skills those agents could possibly have. Each skill the agent is assigned shows a 1 in the field under that skill.
Columns look like this:
+---------+----------+----------+----------+
| Name | 'Skill1' | 'Skill2' | 'Skill3' |
+---------+----------+----------+----------+
| John | 1 | | 1 |
| Sam | 1 | 1 | |
| Roberta | 1 | | 1 |
+---------+----------+----------+----------+
I would like to make a query that returns a list of all agent names that have a 1 for each particular skill. The query would return something like this:
+-----------+
| Skill 1 |
+-----------+
| John |
| Sam |
| Roberta |
+-----------+
Additionally I would like to be able to query a single name and retrieve all skills that agent has (all rows the Name column has a 1 in) like this:
+-----------+
| John |
+-----------+
| Skill 1 |
| Skill 3 |
+-----------+
I've done this in Excel using an index but I'm new to Access and not sure how to complete this task.
Thanks in advance.
One of the reasons that you are finding this task difficult is because your database is not normalised and so due to the way that your database is structured, you are working against MS Access, not with it.
Consequently, whilst a solution is still possible with the current data, the resulting queries will be painful to construct and will either be full of multiple messy iif statements, or several union queries performing the same operations over & over again, one for each 'skill'.
Then, if you every wish to add another Skill to the database, all of your queries have to be rewritten!
Whereas, if your database was normalised (as Gustav has suggested in the comments), the task would be a simple one-liner; and what's more, if you add a new skill later on, your queries will automatically output the results as if the skill had always been there.
Your data has a many-to-many relationship: an agent may have many skills, and a skill may be known by many agents.
As such, the most appropriate way to represent this relationship is using a junction table.
Hence, you would have a table of Agents such as:
tblAgents
+-----+-----------+----------+------------+
| ID | FirstName | LastName | DOB |
+-----+-----------+----------+------------+
| 1 | John | Smith | 1970-01-01 |
| ... | ... | ... | ... |
+-----+-----------+----------+------------+
This would only contain information unique to each agent, i.e. minimising the repeated information between records in the table.
You would then have a table of possible Skills, such as:
tblSkills
+-----+---------+---------------------+
| ID | Name | Description |
+-----+---------+---------------------+
| 1 | Skill 1 | Skill 1 Description |
| 2 | Skill 2 | Skill 2 Description |
| ... | ... | ... |
+-----+---------+---------------------+
Finally, you would have a junction table linking Agents to Skills, e.g.:
tblAgentSkills
+----+----------+----------+
| ID | Agent_ID | Skill_ID |
+----+----------+----------+
| 1 | 1 | 1 |
| 2 | 1 | 2 |
| 3 | 2 | 1 |
| 4 | 3 | 2 |
+----+----------+----------+
Now, say you want to find out which agents have Skill 1, the query is simple:
select Agent_ID from tblAgentSkills where Skill_ID = 1
What if you want to find out the skills known by an agent? Equally as simple:
select Skill_ID from tblAgentSkills where Agent_ID = 1
Of course, these queries will merely return the ID fields as present in the junction table - but since the ID uniquely identifies a record in the tblAgents or tblSkills tables, such ID is all you need to retrieve any other required information:
select
tblAgents.FirstName,
tblAgents.LastName
from
tblAgentSkills inner join tblAgents on
tblAgentSkills.AgentID = tblAgents.ID
where
tblAgentSkills.Skill_ID = 1
To get all agents with skill1, open the query designer and create the following query:
this will generate the following sql
SELECT Skills.AgentName
FROM Skills
WHERE (((Skills.Skill1)=1));
If you adjust the names you can also paste this query into the sql pane of the designer to get the query you want.
To get all the skills an agent has I chose a parameterized query. Open the query designer and create a new query:
When you run this query it will ask you for the name of the agent. Make sure to type the agent name exactly. Here is the resulting sql:
SELECT Skills.AgentName, Skills.Skill1, Skills.Skill2, Skills.Skill3
FROM Skills
WHERE (((Skills.AgentName)=[Agent]));
If you continue working with this query I would improve the table design by breaking your table into a skills table, agents table, skills&agents table. Then link the skills and agents tables to the skills&agents table in a many to many relationship. The query to get all an agents skills would then look like this in the designer:
I need a list of user IDs (course_user_ids) that is currently stored in a single field of a larger table.
I have a table called courses that contains course information with course_id and course_students as such:
-----------------------------------------------------------
| course_id | course_students |
----------------------------------------------------------
| 1 | a:3:{i:0;i:12345;i:1;i:22345;i:2;i:323456;} |
-----------------------------------------------------------
| 2 | a:32:{ … } |
-----------------------------------------------------------
The course_students part contains 3 chunks of information:
the number of students (a:3:{…) -- not needed
the order/key for the array of each student ({i:0;… i:1;… i:2; …}) -- also not needed
the course_user_id (i:12345; … i:22345;… i:32345;)
I only need the course_user_id and the original course_id, resulting in a new table that i can use for joins/subqueries like this:
------------------------------
| course_id | course_user_id |
------------------------------
| 1 | 12345 |
------------------------------
| 1 | 22345 |
------------------------------
| 1 | 323456 |
------------------------------
(ideally able to continue to break out values for other course_ids and course_user_ids, but not a priority:)
| … | … |
------------------------------
| 2 | … |
------------------------------
| 2 | … |
------------------------------
| 97 | … |
------------------------------
| 97 | … |
------------------------------
| … | … |
------------------------------
Note: the course_user_id can vary in length (some are 5 digits, some are 6)
Any ideas would be much appreciated!
Update
My user table does have user_id which can be mapped to course_students or course_user_id, so that is a very helpful observation from below.
I also think I need to use a LEFT JOIN because some students are registered in multiple courses, and I'd like to see each instance/combo.
Let us assume that you have a table name users which contains all users data along with user_id.
Now you can join table courses and table users in following manner:
select c.course_id,u.user_id
from
courses c
join users u
on u.user_id=if(instr(c.course_students,concat(":",u.user_id,";"))>0,u.user_id,c.course_students)
You get the result as per your requirement.
Verify at http://sqlfiddle.com/#!9/3667d/2
Note: The above query works fine if no overlapping between user_id and array index. In case of overlapping, kindly filter data using where-clause
If I got your goal correctly you have users table. And {i:0;i:12345;i:1;i:22345;i:2;i:323456; equal users.id=12345,users.id=22345 etc.
If my guess is correct you can try this solution:
http://sqlfiddle.com/#!9/cfef27/5
SELECT * FROM courses
LEFT JOIN users u
ON courses.course_students LIKE CONCAT('%i:',u.id,';%')
So far we have been storing information of changes as following.
Imagine having a changeset table structure of something that gets changed that is called object. The object is connected to say a foreign element by a foreign key. The object gets created like this
changesetId (Timestamp) | objectId | foreignKey | name (String) | description (String)
2015-04-29 23:28:52 | 2 | 123 | none | none
Now we change the name, the table will look like that after the name change
changesetId (Timestamp) | objectId | foreignKey | name (String) | description (String)
2015-04-29 23:28:52 | 2 | 123 | none | none
2015-04-29 23:30:01 | 2 | null | foo | null
This structure is exactly the minimum. It contains exactly the change we did. But to create the current version of the object, we have to add up the changes to actually get the final version. E.g.
changesetId (Timestamp) | objectId | foreignKey | name (String) | description (String)
2015-04-29 23:28:52 | 2 | 123 | none | none
2015-04-29 23:30:01 | 2 | null | foo | null
*2015-04-29 23:30:01 | 2 | 123 | foo | none
the * marking the final version, which does not exist in the DB.
So if we only store exactly the changes, we have more work to do. Especially, when coming from a foreign object f. If I have a number of objects f and I want to get all changes to the object from our table, I have to create a bit of an ugly SQL. This obviously gets worse, the more foreign objects you have.
Basically I have to do:
Select all F that I want and
Select all objects WHERE foreignKey = foreignId
OR Select all objects that have objectId in (Select all objects that have foreignKey = foreignId)
e.g. I have to select the objects that have foreignKey 123 or elements that have foreignKey null but there exists an entry with same objectId with foreignKey 123.
The more dependencies, the uglier this SQL gets obviously.
Did I make myself clear?
Wouldn't it be much easier to keep always all fields in all versions
e.g. a simple name change gets:
changesetId (Timestamp) | objectId | foreignKey | name (String) | description (String)
2015-04-29 23:28:52 | 2 | 123 | none | none
2015-04-29 23:30:01 | 2 | 123 | foo | none
now to create a diff I have to compare both versions, but I don't have to do the extra work for selecting the right elements nor for calculating the final version of said timestamp.
What do you consider the proven best solution?
how is svn doing it?
For your use case the method you suggest seem to be better. Key value stores like LSM trees do exactly the same. They just write a newer version of the object without deleting the older version. If, at any point of time, you need the change that was made, I think you can just diff two adjacent versions.
The second method might use more space if you have a lot of variable length text fields, but that's a trade-off you get for speed and maintainability.
Say I have a product_attribute table with the following rows:
================================================================
| product_attribute_id | product_id | name | value |
================================================================
| 1 | 25 | Author | John Doe |
----------------------------------------------------------------
| 2 | 25 | Author | Jane Doe |
----------------------------------------------------------------
| 3 | 55 | Publisher | ABC Corp |
----------------------------------------------------------------
| 4 | 55 | Release Date | 20100125 |
----------------------------------------------------------------
I'm looking into implementing Solr for full-text searching and I think this table potentially has important information that should be indexed. So, I think this table needs to be pivoted (using product_id as the pivot point) so I can combine it with other tables that have information that should be indexed.
Questions:
How do I pivot this in MySQL?
I do not know in advance what all the name/value pairs are going to be. Will this be a problem?
Some attributes have identical names (e.g. "Author" in the example above). Will this be a problem?
thats a pretty standard implementation
SELECT
product_id,
GROUP_CONCAT(if(name = 'Author', value, NULL)) AS 'Author',
GROUP_CONCAT(if(name = 'Publisher', value, NULL)) AS 'Publisher',
FROM product_attribute
GROUP BY product_id;
you have to
select distinct(name) from product_attribute
so you can build the above query
but NO it will not work with identical names , GROUP_CONCAT will concat the values .
i ve seen an implementation which adds a column and populates it with increment values so that it can then pivot the table using variables and a counter. but i dont have that in mysql