Related
I’m developing a web app with a search functionality. Generally speaking, the user can search under specific categories or groups of categories. Example:
Mammal (group)
Cat (category)
Dog (category)
Mammal, Cat, and Dog are tables in the DB, and are represented by their own class in the source code. Common fields between Cat and Dog are stored in Mammal; both Cat and Dog have a set of unique fields. I’m trying to figure out the best way to execute a query (or queries) when a user searches under a group (rather than a specific category). For example, the user searches for “all mammals under the age of 4”. As part of the response, I want to return all the fields in the tables belonging to the Mammal category (Cat and Dog, in this case).
Given that the tables Cat and Dog have unique fields, it seems (according to my googling) that I would need to run multiple queries (one for each category). Is this, indeed, the case? If so, what is most efficient way of doing this? And if not, how would I run such a request with a single query?
Essentially, my question is this: What is the most efficient way of executing a query for the situation I’ve described above?
[EDIT]
DB example w/ queries:
https://www.db-fiddle.com/f/na9ctPmi6CjyDB4MNnjycb/3
In the example in the link above, there are two queries which, together, can get all the data for the user's search (described above). I'm wondering if there's a way to do this with a single query, or at least a single call to the DB.
So far, I've tried the naive approach with the multiple query calls. This works fine (insofar as there aren't any errors). My concern is that when the stored data accumulates to the several hundreds/thousands, this approach will become too slow. Furthermore, my current approach requires additional data processing in the source code. For example, if the user wants the top 5 results from the search, then I have to get the top 5 results from each table, create an aggregated collection of the DB results, sort the collection, and pick the top 5 results from the sorted collection to return to the user. I'm wondering if there is a way to accomplish all this on the DB side (assuming that it would be faster).
From your sqlfiddle:
select * from cat left join mammal on (cat.id = mammal.id) where age < 4
union all
select * from dog left join mammal on (dog.id = mammal.id) where age < 4
A few points:
You only want left join if, for example, you'd want to show lines for mammals that don't have corresponding dogs or cats. In your case, it doesn't seem like that's what you want, so it would be better to use an inner join (or simply join, which defaults to an inner one). That way, if there are no dogs, and only 2 cats, the results would only show 2 lines, instead of 2 cats + 1 non-existing dog.
Making one table per mammal type won't scale. What happens if your users now want to add coyotes? Or rabbits? Every single time a new mammal needs to be added to the system, you need to create a new table. The proper way to do this would be to create a join table that would detail:
A table animal_type, with lines something like this:
animal_type_id name
Then you'd have a separate table called attribute_type with something like this:
attribute_id name
A separate table called animals:
animal_id animal_type_id
And then finally you'd have a separate table called animal_attributes:
id animal_id attribute_id value
Now, you can simply add animals by inserting a row in animals, specifying the animal_type_id, which could be cat/dog/whatever. You simply need to have created the animal_type beforehand. And then, you add attributes to your animal by creating rows in animal_attributes, referencing the animal_id you just created, as well as the proper animal_attribute_id, which could be shared attributes like color, length, size, and unique cat/dog/animal fields.
So I am building a swingers site. The users can search other users by their interests. This is only part of a number of parameters used to search a user. The thing is there are like 100 different interests. When searching another user they can select all the interests the user must share. While I can think of ways to do this, I know it is important the search be as efficient as possible.
The backend uses jdbc to connect to a mysql database. Java is the backend programming language.
I have debated using multiple columns for interests but generating the thing is the sql query need not check them all if those columns are not addressed in the json object send to the server telling it the search criteria. Also I worry i may have to make painful modifications to the table at a later point if i add new columns.
Another thing I thought about was having some kind of long byte array, or number (used like a byte array) stored in a single column. I could & this with another number corresponding to the interests the user is searching for but I read somewhere this is actually quite inefficient despite it making good sense to my mind :/
And all of this has to be part of one big sql query with multiple tables joined into it.
One of the issues with me using multiple columns would be the compiting power used to run statement.setBoolean on what could be 40 columns.
I thought about generating an xml string in the client then processing that in the sql query.
Any suggestions?
I think the correct term is a Bitmask. I could maybe have one table for the bitmask that maps the users id to the bitmask for querying users interests, and another with multiple entries for each interest per user id for looking up which user has which interests efficiently if I later require this?
Basically, it would be great to have a separate table with all the interests, 2 columns: id and interest.
Then, have a table that links the user to the interests: user_interests which would have the following columns: id,user_id,interest_id. Here some knowledge about many-to-many relations would help a lot.
Hope it helps!
My site shows collections of links on different subjects. These links are divided into two types: web and images. My database will have millions (probably more than ten million) of these records. When the page loads, I need to show the user the web and image links for the particular subject of that page. So the first question is:
Do I create two separate, smaller tables, one each for the web and image links, and then make a query to each, or do I create one huge table (with correct indexes) for both and make one query. Where will I get better performance? Should the one table and one query be more efficient, then my next question is:
What would be the most efficient way to subdivide the two types for presentation? Should I use group by, or should I use php to divide my result array into the two types?
TIA!
You can get similar performances using a table for all objects, or one for links or websites. If you have two separate tables, doing a UNION of the results would return all of the results you needed.
The main reason to divide the results is whether they are really different (from your application point of view). That is, if you are going to end up using a lot of queries like
select * from objects where type='image';
then it might make sense to have two tables.
Then using group by is not a way of grouping the different results, it is a way of aggregating them.
So, for instance, you can use
select type, count(*) from objects group by type
to get
| image | 100000 |
| web | 2000000 |
but it will not return the objects separated. To get them "grouped", you can either use a query for each one, or use an ordering and then have the logic in the application to divide the results.
It's possible you'll get slightly better performance from just one table, but this decision should be primarily guided by whether the nature of data or constraints is different or not.
There is another (more important from the performance perspective) decision you'll have to make: how do you want to cluster the data (all InnoDB tables are clustered)?
If you want to have an excellent performance getting all the links of a given page, use an identifying relationship, producing a natural key in the link table(s):
The LINK table is effectively just a single B-tree, with the page PK1 at its leading edge, which physically groups together the rows that belong to the same page. The following query can be satisfied by a simple index range scan and minimal I/O:
SELECT URL
FROM LINK
WHERE PAGE_ID = <whatever>
If you used separate tables, you can just have two different queries. Many client APIs support executing two queries in a single database round-trip. If PHP doesn't, you can UNION the two queries to save one database round-trip:
SELECT *
FROM (
SELECT 1 LINK_TYPE, URL
FROM IMAGE_LINK
WHERE PAGE_ID = <whatever>
UNION ALL
SELECT 2, URL
FROM WEB_LINK
WHERE PAGE_ID = <whatever>
)
ORDER BY LINK_TYPE
The above query will give you...
LINK_TYPE URL
1 http://somesite.com/foo.jpeg
1 http://somesite.com/bar.jpeg
1 http://somesite.com/baz.jpeg
...
2 http://somesite.com/foo.html
2 http://somesite.com/bar.html
2 http://somesite.com/baz.html
...
...which will be very easy to separate at the client level.
If you didn't use separate tables, you can them separate the URLs by their extension at the client level, or introduce an additional field in the LINK PK: {PAGE_ID, LINK_TYPE, URL}, which should make the following query very efficient:
SELECT LINK_TYPE, URL
FROM LINK
WHERE PAGE_ID = <whatever>
ORDER BY LINK_TYPE
Note that the order of fields in the PK matters, so placing the LINK_TYPE at the end would prevent the DBMS from just doing the index range scan.
1 Whatever it may be; I just used the PAGE_ID as an example.
It depends on how web data is close to img data. If data is basically made of the link, one table fits better, having a column to differentiate between web and data (and possibly others later, like css, js ...)
Links: (id, link, type)
adding an index on type or type link will help the grouping (by type), and the matching search by (type, link).
If however, web and img data are different in such a way that you don't want to mix apples and oranges, like
Web: (wid, wlink, rating, ...)
Img: (iid, ilink, width, height, mbsize, camera, datetaken, hasexif...)
in this case, besides the link both tables don't have much in common. Image links and web links being different, there is not even a "gain" when having a same link for both kinds of data. Another advantage (which is also possible with one table, but makes more sense here) is to link both kinds of data in another table
Relations: (wid,iid)
that allows to maintain the relation between web sites and images, since an image may be used by several web sites, and web sites use several images. Indexing on wid and on iid.
My preference goes to the two tables (with optional Relations link).
Regarding queries from PHP, using UNION you can obtain the data from two tables in one query.
Do I create two separate, smaller tables or one huge table?
Go for one table.
What would be the most efficient way to subdivide the two types for presentation?
Depends on the certain search criteria.
I have a database that has many tables. In this database, there are a subset of tables that store information for similar (but distinct) rows of data, and one table that contains common search attributes that can be applied to each table.
There are 18 columns of searchable variables, and I'm not sure which is the best way to set up the indexes. Do I create a single Index for all the pertinent columns, or one Index for each one?
As u can not use SOLR or similar.
You need to emulate this, using MySQL.
To do so u create one table which u de-normalize all other tables into it. Do notice, this table does not replace the other ones. Consider it like a VIEW.
For a simple example. I have a product, and that product can come in several color codes.
Normalization requires me to have 3 tables. One for product, one for colors (name|code) and one table to link them all.
Denormalize it into one table: product code (pk) |name| color code 1|color name 1| color code 2 | color name 2 ..... No I believe it will be easy for u to decide what to index (really based on the queries you do on that table).
Obviously it isn't optimal, but, you need to play with the toys u have.
Something else to consider, which is very similar, is using star schemas
I'm a software developer. I love to code, but I hate databases... Currently, I'm creating a website on which a user will be allowed to mark an entity as liked (like in FB), tag it and comment.
I get stuck on database tables design for handling this functionality. Solution is trivial, if we can do this only for one type of thing (eg. photos). But I need to enable this for 5 different things (for now, but I also assume that this number can grow, as the whole service grows).
I found some similar questions here, but none of them have a satisfying answer, so I'm asking this question again.
The question is, how to properly, efficiently and elastically design the database, so that it can store comments for different tables, likes for different tables and tags for them. Some design pattern as answer will be best ;)
Detailed description:
I have a table User with some user data, and 3 more tables: Photo with photographs, Articles with articles, Places with places. I want to enable any logged user to:
comment on any of those 3 tables
mark any of them as liked
tag any of them with some tag
I also want to count the number of likes for every element and the number of times that particular tag was used.
1st approach:
a) For tags, I will create a table Tag [TagId, tagName, tagCounter], then I will create many-to-many relationships tables for: Photo_has_tags, Place_has_tag, Article_has_tag.
b) The same counts for comments.
c) I will create a table LikedPhotos [idUser, idPhoto], LikedArticles[idUser, idArticle], LikedPlace [idUser, idPlace]. Number of likes will be calculated by queries (which, I assume is bad). And...
I really don't like this design for the last part, it smells badly for me ;)
2nd approach:
I will create a table ElementType [idType, TypeName == some table name] which will be populated by the administrator (me) with the names of tables that can be liked, commented or tagged. Then I will create tables:
a) LikedElement [idLike, idUser, idElementType, idLikedElement] and the same for Comments and Tags with the proper columns for each. Now, when I want to make a photo liked I will insert:
typeId = SELECT id FROM ElementType WHERE TypeName == 'Photo'
INSERT (user id, typeId, photoId)
and for places:
typeId = SELECT id FROM ElementType WHERE TypeName == 'Place'
INSERT (user id, typeId, placeId)
and so on... I think that the second approach is better, but I also feel like something is missing in this design as well...
At last, I also wonder which the best place to store counter for how many times the element was liked is. I can think of only two ways:
in element (Photo/Article/Place) table
by select count().
I hope that my explanation of the issue is more thorough now.
The most extensible solution is to have just one "base" table (connected to "likes", tags and comments), and "inherit" all other tables from it. Adding a new kind of entity involves just adding a new "inherited" table - it then automatically plugs into the whole like/tag/comment machinery.
Entity-relationship term for this is "category" (see the ERwin Methods Guide, section: "Subtype Relationships"). The category symbol is:
Assuming a user can like multiple entities, a same tag can be used for more than one entity but a comment is entity-specific, your model could look like this:
BTW, there are roughly 3 ways to implement the "ER category":
All types in one table.
All concrete types in separate tables.
All concrete and abstract types in separate tables.
Unless you have very stringent performance requirements, the third approach is probably the best (meaning the physical tables match 1:1 the entities in the diagram above).
Since you "hate" databases, why are you trying to implement one? Instead, solicit help from someone who loves and breathes this stuff.
Otherwise, learn to love your database. A well designed database simplifies programming, engineering the site, and smooths its continuing operation. Even an experienced d/b designer will not have complete and perfect foresight: some schema changes down the road will be needed as usage patterns emerge or requirements change.
If this is a one man project, program the database interface into simple operations using stored procedures: add_user, update_user, add_comment, add_like, upload_photo, list_comments, etc. Do not embed the schema into even one line of code. In this manner, the database schema can be changed without affecting any code: only the stored procedures should know about the schema.
You may have to refactor the schema several times. This is normal. Don't worry about getting it perfect the first time. Just make it functional enough to prototype an initial design. If you have the luxury of time, use it some, and then delete the schema and do it again. It is always better the second time.
This is a general idea
please don´t pay much attention to the field names styling, but more to the relation and structure
This pseudocode will get all the comments of photo with ID 5
SELECT * FROM actions
WHERE actions.id_Stuff = 5
AND actions.typeStuff="photo"
AND actions.typeAction = "comment"
This pseudocode will get all the likes or users who liked photo with ID 5
(you may use count() to just get the amount of likes)
SELECT * FROM actions
WHERE actions.id_Stuff = 5
AND actions.typeStuff="photo"
AND actions.typeAction = "like"
as far as i understand. several tables are required. There is a many to many relation between them.
Table which stores the user data such as name, surname, birth date with a identity field.
Table which stores data types. these types may be photos, shares, links. each type must has a unique table. therefore, there is a relation between their individual tables and this table.
each different data type has its table. for example, status updates, photos, links.
the last table is for many to many relation storing an id, user id, data type and data id.
Look at the access patterns you are going to need. Do any of them seem to made particularly difficult or inefficient my one design choice or the other?
If not favour the one that requires the fewer tables
In this case:
Add Comment: you either pick a particular many/many table or insert into a common table with a known specific identifier for what is being liked, I think client code will be slightly simpler in your second case.
Find comments for item: here it seems using a common table is slightly easier - we just have a single query parameterised by type of entity
Find comments by a person about one kind of thing: simple query in either case
Find all comments by a person about all things: this seems little gnarly either way.
I think your "discriminated" approach, option 2, yields simpler queries in some cases and doesn't seem much worse in the others so I'd go with it.
Consider using table per entity for comments and etc. More tables - better sharding and scaling. It's not a problem to control many similar tables for all frameworks I know.
One day you'll need to optimize reads from such structure. You can easily create agragating tables over base ones and lose a bit on writes.
One big table with dictionary may become uncontrollable one day.
Definitely go with the second approach where you have one table and store the element type for each row, it will give you a lot more flexibility. Basically when something can logically be done with fewer tables it is almost always better to go with fewer tables. One advantage that comes to my mind right now about your particular case, consider you want to delete all liked elements of a certain user, with your first approach you need to issue one query for each element type but with the second approach it can be done with only one query or consider when you want to add a new element type, with the first approach it involves creating a new table for each new type but with the second approach you shouldn't do anything...