Im working on a website that is bassed around voting for posts. the website contains a 'hot' page will basicly sorts by most votes in the last 24h.
i been thinking about how to set it up table wise for the last few days and im wondering what would be the best option performance wise.
at the moment i was thinking about building a table that makes a new row for every vote and give it the date the vote was made and a id linked to the post and get all the other details like titel and auther from a other table, but sinds the table holding the votes will get really big really fast like this would it be a good idea to remove all the rows that wont get used anymore after 24h hours?
so basicly if its a good idea to make a cronjob that removes all the rows that got made longer then 24hours ago?
It might be a good idea to remove info from the table but remember that SQL works pretty fine with millions of rows!
In your case, I'd run a cronjob every day that removes the rows that got made longer then 48h! I think that this would be enough.
Related
Is it a good idea to store like count in the following format?
like table:
u_id | post_id | user_id
And count(u_id) of a post?
What if there were thousands of likes for each post? The like table is going to be filled with billions of rows after a few months.
What are other efficient ways to do so?
In two words answer is : yes , it is OK. (to store data about each like any user did for any post).
But I want just to separate or transform it to several questions:
Q. Is there other way to count(u_id)? or even better:
SELECT COUNT(u_id) FROM likes WHERE post_id = ?
A. Why not? you can save count in your post table and increase/decrease it every time when user like/dislike the post. You can set trigger (stored procedure) to automate this action. And then to get counter you need just:
SELECT counter FROM posts WHERE post_id = ?
If you like previous Q/A and think that it is good idea I have next question:
Q. Why do we need likes table then?
A. That depends of your application design and requirements. According to the columns set you posted : u_id, post_id, user_id (I would even add another column timestamp). Your requirements is to store info about user as well as about post when it liked. That means you can recognize if user already liked this post and refuse multilikes. If you don't care about multilikes or historical timeline and stats you can delete your likes table.
Last question I see here:
Q. The like table is going to be filled with billions of rows after a few months. isn't it?
A. I wish you that success but IMHO you are 99% wrong. to get just 1M records you need 1000 active users (which is very very good number for personal startup (you are building whole app with no architect or designer involved?)) and EVERY of those users should like EVERY of 1000 posts if you have any.
My point here is: fortunately you have enough time till your database become really big and that would hurt your application. Till your table get 10-20M of records you can do not worry about size and performance.
I am hosting a forum with "forum gold".
People trade this a lot, gift it, award people with it to "thank" or "like" posts, or to increase reputation.
However, I am concerned that there might be some exploit that allows people to hack gold into their forum account, so I added logging on EVERY forum gold transaction.
It works well. I can perform sum queries to assure that no unknown sources are introducing forum gold into the system, and to ensure that all forum gold awarded to users are accounted for.
However, it totally blew up. Within just a couple of days, I have more than 100,000 entries in the table. I also got mail from my webhost about a slow mySQL query warning, which is just a simple SELECT from that table of a single record, no joins, ordering, functions like date_add() or anything at all even.
So I want to completely export AND empty the table with the logs. Now, I normally back up the rest of my database via the "export" feature in phpmyadmin. However, this table is highly active, anywhere from 10 up to 50 new rows are added every second, but I want to keep the integrity and accuracy of my computations by not losing any records.
Is there an "atomic" way I can export then delete all records, with no transactions getting in between?
Okay, so I just ended up:
creating a new TEMP table,
selecting everything from the LOG table,
inserting it into the new TEMP table,
then deleting from LOG everything where exists the same record in the TEMP table
exporting the TEMP table
doing a global replace of "INSERT INTO `temp`" into "INSERT INTO `log`"
I'm busy looking at building a matchmaking (dating-) site. The tables and queries are set up in a way i'm not happy with. It's still under construction so i'm happy to change anything in order to get it to work properly. It was very speedy until i decided to fill the database with loads of records to see how it performs.
Beware, this is a rather long post with lots (i hope the right...) information.
I have been googling' and reading in books for days now but A) the query performs very slowly (no wonder, it's no good i think) and B) i'm not getting any further with it.
So i hope somebody can help me to tell me what i'm doing wrong, what i should be doing and how to speed the query up. It's taking as long as 10-20 seconds to generate a result set and thats not good, anybody knows..
The information:
I have a table called 'profiles' with about 500.000 records in it. (at this time, could be more in the future)
Profiles table:
example contents:
Every answer to a question gets inserted to the profiles table, as a row. There are questions who are multiple choice so every answer selected by the member will be inserted as a new row in the profiles table and there are questions which can only have one answer thus one row in the profiles table.
Also, there is the table 'status' in which i keep record of members blocking or favoriting each other:
example contents:
Once the member fills in his/her criteria, php dynamically builds the query which needs to fetch records from the profiles table:
You can imagine how big this select query will get if all 90 questions are in the sql statement above..
Explain tells me this:
Basically i want to query the profiles table, fetching members matching the given criteria. The criteria are:
the criteria that the member who searches entered as his/hers will retrieve matching members as a result
the current member (the one who searches) should not be in the result(s)
members who are present in the status table with a status of 'block' should not be in the result(s)
I'm aware (now) that the 'where in' clauses are not very fast, the indexes could be wrong and the maybe my whole table lay-out needs to be different, but i hope somebody can either point me or help me in the right direction. After a day of trying and googling' i'm at my wits end.
If you need more information, just shout! Hopefully somebody can help me.
I'm writing an application to allow users to create a Poll. They ask a question and set n number of predefined answers to the question. Other users can vote on the answers provided for that question.
Current structure
I have designed the database like this:
Storing the vote count
Current thinking is, I create a new column called vote_count on the link table and every time that answer gets voted, it updates the record.
This works. But is it right? I'm new to database systems, so I can't imagine I'm doing much right. What are some more efficient ways to achieve this?
As far as it goes yes that's OK. However these tables will be incomplete. When your second quiz is created, you'll have to extend the QUESTIONS table. If this second quiz's Q1 also has a yes/no answer, you're going to have to extend the LINK/VOTES table.
You also have to think about how it's going to be queried and design indexes to support those queries.
Cheers -
I am making a database that is for employee scheduling. I am, for the first time ever, making a relational mySQL database so that I can efficiently manage all of the data. I have been using the mySQL Workbench program to help me visualize how this is going to go. Here is what I have so far:
What I have pictured in my head is that, based on the drawing, I would set the schedule in the schedule table which uses references from the other tables as shown. Then when I need to display this schedule, I would pull everything from the schedule table. Whenever I've worked with a database in the past, it hasn't been of the normalized type, so I would just enter the data into one table and then pull the data out from that one table. Now that I'm tackling a much larger project I am sure that having all of the tables split (normalized) like this is the way to go, but I'm having trouble seeing how everything comes together in the end. I have a feeling it doesn't work the way I have it pictured, #grossvogel pointed out what I believe to be something critical to making this all work and that is to use the join function to pull the data.
The reason I started with a relational database was so that if I made a change to (for example) the shift table and instead of record 1 being "AM" I wanted it to be "Morning", it would then automatically change the relevant sections through the cascade option.
The reason I'm posting this here is because I am hoping someone can help fill in the blanks and to point me in the right direction so I don't spend a lot of hours only to find out I made a wrong turn at the beginning.
Maybe the piece you're missing is the idea of using a query with joins to pull in data from multiple tables. For instance (just incorporating a couple of your tables):
SELECT Dept_Name, Emp_Name, Stat_Name ...
FROM schedule
INNER JOIN departments on schedule.Dept_ID = departments.Dept_ID
INNER JOIN employees on schedule.Emp_ID = employees.Emp_ID
INNER JOIN status on schedule.Stat_ID = status.Stat_ID
...
where ....
Note also that a schedule table that contains all of the information needed to be displayed on the final page is not in the spirit of relational data modeling. You want each table to model some entity in your application, so it might be more appropriate to rename schedule to something like shifts if each row represents a shift. (I usually use singular names for tables, but there are multiple perspectives there.)
This is, frankly, a very difficult question to answer because you could get a million different answers, each with their own merits. I'd suggest you take a look at these (there are probably better links out there too, these just seemed like good points to note) :
http://www.devshed.com/c/a/MySQL/Designing-a-MySQL-Database-Tips-and-Techniques/
http://en.wikipedia.org/wiki/Boyce%E2%80%93Codd_normal_form
http://www.sitepoint.com/forums/showthread.php?66342-SQL-and-RDBMS-Database-Design-DO-s-and-DON-Ts
I'd also suggest you try explaining what it is you want to achieve in more detail rather than just post the table structure and let us try to figure out what you meant by what you've done.
Often by trying to explain something verbally you may come to the realisations you need without anyone else's input at all!
One thing I will mention is that you don't have to denormalise a table to report certain values together, you should be considering views for that kind of thing...