Find records where CSV column values match

Find records where CSV column values match - mysql

I am making a website. In the database I have a table of articles that kind of looks like this:
id | name | cats | etc.
------------------------------------------------------
1 | "alice" | "this, that, those, them" |
2 | "bob" | "this, that, those" |
3 | "carol" | "this, banana, cupcake" |
4 | "dave" | "other, unrelated, words" |
5 | "errol" | "those, them, fishstick" |
When viewing an article I want to also show some of the most related articles, based on the amount of categories in common.
For example, if I was viewing the Alice article I would want to pick out (in order of preference) Bob (3 cats in common), Errol(2), Carol(1).
I am aware that this would be easier if the data was normalised (I could for example do this) but unfortunately that's not really an option.

I ended up creating a couple of extra tables and populating them with properly normalized data every time something was saved. These run alongside the existing tables so it's not the cleanest of solutions but it works and the query speeds are excellent.

Related

How do I handle linking a record to another table?

I'm very new to Access and my teacher is... hard to follow. So I feel like there's something pretty basic I'm probably missing here. I think the biggest problem I'm having with this question is that I'm struggling to find the words to communicate what I actually need to do, which is really putting a damper on my google-fu.
In terms of what I think I want to do, I want to make a record reference another table in its entirety.
Main
+----+-------+--------+-------+----------------------------+
| PK | Name | Phone# | [...] | Cards |
+----+-------+--------+-------+----------------------------+
| 1 | Bob | [...] | [...] | < Reference to 2nd table > |
| 2 | Harry | [...] | [...] | [...] |
| 3 | Ted | [...] | [...] | [...] |
+----+-------+--------+-------+----------------------------+
Bob's Cards
+----+-------------+-----------+-------+-------+-------+
| PK | Card Name | Condition | Year | Price | [...] |
+----+-------------+-----------+-------+-------+-------+
| 1 | Big Slugger | Mint | 1987 | .20 | [...] |
| 2 | Quick Pete | [...] | [...] | [...] | [...] |
| 3 | Mac Donald | [...] | [...] | [...] | [...] |
+----+-------------+-----------+-------+-------+-------+
This would necessitate an entire new table for each record in the main table though, if it's even possible.
But the only alternative solution I can think of is to add 'Card1, Condition1, [...], Card2, Condition2, [...], Card3, [...]' fields to the main table and having to add another set of fields any time someone increases the maximum number of cards stored.
So I'm sort of left believing there is some other approach I should be taking that our teacher has failed to properly explain. We haven't even touched on forms and reports yet so I don't need to worry about working them in.
Any pointers?
(Also, the entirety of this data and structure is only a rough facsimile of my own, as I'd rather learn how to do it and apply it myself than be like 'here's my data, pls fix.')

Third option successfully found in comments by the helpful Minty.
This depends on a number of things, however to keep it simple you
would normally add one field to the cards table, with an number data
type called CardOwnerID. In your example it would be 1 indicating Bob.
This is known as a foreign key. (FK) - However if you have a table of
cards and multiple possible owners then you need a third table - a
Junction table. This would consist of the Main Person ID and the Card
ID. – Minty

Many to many relationship with different data types

I am trying to create a Database for different types of events. Each event has arbitrary, user created, properties of different types. For example "number of guests", "special song to play", "time the clown arrives". Not every event has a clown but one user could still have different events with a clown. My basic concept is
propID | name | type
------ | ---- | -----
1 |#guest| number
2 |clown | time
and another table with every event with a unique eventID. The Problem is that a simple approach like
eventID | propID | value
------ | ------ | -----
1 | 1 | 20
1 | 2 | 10:00
does not really work because of the different DataTypes.
Now I thought about some possible solutions but I don't really know which one is best, or if there is an even better solution?
1. I store all values as strings and use the datatype in the property table. I think this is called EAV and is not considered good practice.
2. There are only a limited amount of meaningful datatypes, which could lead to a table like this:
eventID | propID | stringVal | timeVal | numberVal
------ | ------ | --------- | ------- | --------
1 | 1 | null | null | 20
1 | 2 | null | 10:00 | null
3. Use the possible datatypes for multiple tables like:
propDateEvent propNumberEvent
-------------------------- --------------------------
eventID | propId | value eventID | propId | value
--------|--------|-------- --------|--------|--------
1 | 2 | 10:00 1 | 1 | 20
Somehow I think every solution has its ups and downs. #1 feels like the simplest but least robust. #3 seems like the cleanest solution, but pretty complicated if I wanted to add e.g. a priority for the properties per event.

All the options you propose are variations on entity/attribute/value or EAV. The basic concept is that you store entities (in your case events), their attributes (#guest, clown), and the values of those attributes as rows, not columns.
There are lots of EAV questions on Stack Overflow, discussing the benefits and drawbacks.
Your 3 options provide different ways of storing the data - but you don't address the ways in which you want to retrieve that data, or verify the data you're about to store. This is the biggest problem with EAV.
How will you enforce the rule that all events must have "#guests" as a mandatory field (for instance)? How will you find all events that have at least 20 guest, and no clown booke? How will you show a list of events between 2 dates, ordered by date, and number of guests?
If those requirements don't matter to you, EAV is fine. If they do, consider using a document to store this user-defined data (JSON or XML). MySQL can query those documents natively, you can enforce business logic much more easily, and you won't have to write horribly convoluted queries for even the simplest business cases.

MySQL Table structure: Multiple attributes for each item

I wanted to ask you which could be the best approach creating my MySQL database structure having the following case.
I've got a table with items, which is not needed to describe as the only important field here is the ID.
Now, I'd like to be able to assign some attributes to each item - by its ID, of course. But I don't know exactly how to do it, as I'd like to keep it dynamic (so, I do not have to modify the table structure if I want to add a new attribute type).
What I think
I think - and, in fact, is the structure that I have right now - that I can make a table items_attributes with the following structure:
+----+---------+----------------+-----------------+
| id | item_id | attribute_name | attribute_value |
+----+---------+----------------+-----------------+
| 1 | 1 | place | Barcelona |
| 2 | 2 | author_name | Matt |
| 3 | 1 | author_name | Kate |
| 4 | 1 | pages | 200 |
| 5 | 1 | author_name | John |
+----+---------+----------------+-----------------+
I put data as an example for you to see that those attributes can be repeated (it's not a relation 1 to 1).
The problem with this approach
I have the need to make some querys, some of them for statistic purpouses, and if I have a lot of attributes for a lot of items, this can be a bit slow.
Furthermore - maybe because I'm not an expert on MySQL - everytime I want to make a search and find "those items that have 'place' = 'Barcelona' AND 'author_name' = 'John'", I end up having to make multiple JOINs for every condition.
Repeating the example before, my query would end up like:
SELECT *
FROM items its
JOIN items_attributes attr
ON its.id = attr.item_id
AND attr.attribute_name = 'place'
AND attr.attribute_value = 'Barcelona'
AND attr.attribute_name = 'author_name'
AND attr.attribute_value = 'John';
As you can see, this will return nothing, as an attribute_name cannot have two values at once in the same row, and an OR condition would not be what I'm searching for as the items MUST have both attributes values as stated.
So the only possibility is to make a JOIN on the same repeated table for every condition to search, which I think it's very slow to perform when there are a lot of terms to search for.
What I'd like
As I said, I'd like to be able to keep the attributes types dynamical, so by adding a new input on 'attribute_name' would be enough, without having to add a new column to a table. Also, as they are 1-N relationship, they cannot be put in the 'items' table as new columns.
If the structure, in your opinion, is the only one that can acheive my interests, if you could light up some ideas so the search queries are not a ton of JOINs it would be great, too.
I don't know if it's quite hard to get it as I've been struggling my head until now and I haven't come up with a solution. Hope you guys can help me with that!
In any case, thank you for your time and attention!
Kind regards.

You're thinking in the right direction, the direction of normalization. The normal for you would like to have in your database is the fifth normal form (or sixth, even). Stackoverflow on this matter.
Table Attribute:
+----+----------------+
| id | attribute_name |
+----+----------------+
| 1 | place |
| 2 | author name |
| 3 | pages |
+----+----------------+
Table ItemAttribute
+--------+----------------+
| item_id| attribute_id |
+--------+----------------+
| 1 | 1 |
| 2 | 1 |
| 3 | 2 |
+--------+----------------+
So for each property of an object (item in this case) you create a new table and name it accordingly. It requires lots of joins, but your database will be highly flexible and organized. Good luck!

In my Opinion it should be something like this, i know there are a lot of table, but actually it normilizes your DB
Maybe that is why because i cant understant where you get your att_value column, and what should contains this columns

More efficient to have more columns or more rows?

I'm currently redesigning a database which could contain a lot of data - I have the option to either include a number of different columns in the database or use a lot of rows instead. It's probably easier if I did some kind of outline below:
item_id | user_id | title | description | content | category | template | comments | status
-------------------------------------------------------------------------------------------
1 | 1 | ABC | DEF | GHI | 1 | default | 1 | 1
2 | 1 | ZYX | | QWE | 2 | default | 0 | 1
3 | 1 | A | | RTY | 2 | default | 0 | 0
4 | 2 | ABC | DEF | GHI | 3 | custom | 1 | 1
5 | 2 | CBA | | GHI | 3 | custom | 1 | 1
Versus something in the following structure:
item_id | user_id | attribute | value
---------------------------------------
1 | 1 | title | ABC
1 | 1 | description | DEF
1 | 1 | content | GHI
... | ... | ... | ...
I may want to create additional attributes in the future (50 for arguments sake) - so there could be a lot of empty cells if using multiple columns. The attribute names would be reused, where possible, across different types of content - say a blog entry, event, and gallery - title would easily be reused.
So my question is, is it more efficient to use multiple columns or multiple rows - in terms of query speed and disk space. Or would you instead recommend relationship tables, so there's a table for blogs, a table for events, etc. I'm just trying to come up with an easily expandable solution, where I ideally do not want to create a table for every kind of content as I'm thinking of developers creating new kinds of content via an app/API system (with attributes being tightly controlled).
Supplementary Question if Multiple Rows
How could I, in MySQL, convert multiple rows into a usable column format (I guess temporary tables) - so I could do some filtering by content type, as an example.

Basically, mysql has a variable row length as long as one does not change the on a per table level. Thus, empty cols will not use any space (well, almost).
But with blobs or text columns, it might be better to normalize those, as these may have large data to store and this needs to be read / skipped every time a table is scanned. Even if the column is not in the result set and you're doing queries outside of an index, it will take it's time on a large amount of rows.
As a good practice I think it will be fast to put all administrative and often used cols in one table and normalize all the rest. A kind of "vertical" design as in your second example will be complex to read and as soon as you work with temporary tables you will run in to performance issues sooner or later.

For a traditional row-based store, the cost of spooling through rows will depend on their width, so scanning a table with wide rows will take longer than one with narrow rows.
That said, it you're using an index to locate the rows that are of interest, this won't be so much of an issue.
If you normalise your data by replacing columns with keys to rows in other tables, you can reduce the amount of storage if the linked tables end up being significantly smaller than the original table, however any query will need to include the cost of required joins into the related table.
As with all these things, it's a balancing act that depends on your requirements, but understanding what's going on under the hood can certainly help you to make more informed decisions.

This question is very difficult to answer as it all comes down to what you are looking for and how your database will grow in size and complexity over time. I find the best way to answer these types of questions is to read case studies from other successful sites. For example Reddit would be a case study where they use a lot of rows but very little tables and/or columns. The article is here and a question on it is here.
There is also the option of exploring a NoSQL solution which may be more applicable to what you are trying to achieve.
Google case studies of sites that would have a similar structure to your own and see how they accomplished it as they have most likely encountered all the issues you will and already overcome them.

SQL - database design, need suggestion

I'm building website where each user has different classes
+----+-----------+---------+
| id | subject | user_id |
+----+-----------+---------+
| 1 | Math 140 | 2 |
| 2 | ART 240 | 2 |
+----+-----------+---------+
Each class then will have bunch of Homework files, Class-Papers files and so on.
And here I need your help. What will be the better approach: Build one table like that:
+----+-----------+--------------------------------------------------+--------------+
| id | subject | Homework | Class-Papers |
+----+-----------+-----------------------------------------------------------------+
| 1 | Math 140 | www.example.com/subjects/Math+140/file_name.pdf | bla-bla |
| 2 | Math 140 | www.example.com/subjects/Math+140/file_name.pdf | bla-bla |
| 3 | Math 140 | www.example.com/subjects/Math+140/file_name.pdf | bla-bla |
| 4 | ART 240 | www.example.com/subjects/ART +240/file_name.pdf | bla-bla |
| 5 | ART 240 | www.example.com/subjects/ART +240/file_name.pdf | bla-bla |
+----+-----------+--------------------------------------------------+--------------+
And than just separate the content when I want to display it,
OR build a table for every single subject and than just load necessary table?
Or if you can suggest something better or more common/useful/efficient please go ahead.

You should read about normalization and relational design before attempting this.
This is a one-to-many relationship - model it as such.
A table for every subject is crazy. You'll have to add a new table for every subject.
A better solution will make it possible to add new subjects simply by adding data. That's what the relational model is all about.
Don't worry about tables; think about it in natural language first.
A SUBJECT(calculus) can have many COURSES(differential, integral, multi-variable).
A COURSE(differential calculus) can have many SECTIONs (Mon 9-10 am in room 2 of the math building).
A STUDENT(first name, last name, student id) can sign up for zero or more SECTIONs. The list of SECTIONs for a given STUDENT is a TRANSCRIPT.
Each STUDENT has one TRANSCRIPT per semester (fall 2012).
A SECTION can have zero or more ASSIGNMENTs.
These are the tables you'll need for this simple problem. Worry about the names and how they relate before you start writing SQL. You'll be glad you did.

I most definitely woudl NOT create a separate table for each subject. If you did, then if you wanted a query like "list all the homework for student X", you would have to access different tables depending on which subjects that student was enrolled in. Worse, anytime someone added a new subject, you would have to create a new table. If down the road you decide you need a new attribute of homework, instead of updating one table, you would have to update every one of these subject tables. It's just bad news all around.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Find records where CSV column values match - mysql

I ended up creating a couple of extra tables and populating them with properly normalized data every time something was saved. These run alongside the existing tables so it's not the cleanest of solutions but it works and the query speeds are excellent.

Related

How do I handle linking a record to another table?

Many to many relationship with different data types

MySQL Table structure: Multiple attributes for each item

More efficient to have more columns or more rows?

SQL - database design, need suggestion

Categories

Resources