I am a website developer and I need help for an analyse: My (future) website is more or less a villa directory. People can add their villas there. Each villa will be stored in database.
I need to show 15 villas per page but I want a "turn over" (not sure it's the correct word in English) of the villas: every hour the villa that appears first on first page becomes the last villa of last page (so every villa rank increase of 1 except the first one that become the last). I want every villa to have the same chance (more or less) to appear on the first page. I don’t want a totally random system.
I need help on how to make a simple system that would not take a lot of resources (should be working with a few millions of records).
Note: I don’t want to use the ID of the villa because if a person posts 3 different villas at the same time, they will be all shown next to each other.
My proposition:
I create a field (INTEGER) called “random_order” for each villa and I put a random number between 0 and Max(INTEGER) and I create an Index on the column “random_order”.
Then to get the records in the order I want, I store (dunno where yet) a variable that point to a record in the index. Then every hours, I increase by 1 this variable (with a modulo).
I’m not an expert on indexes so I’m not really sure if it’s possible to do it and how to do it. I don’t know if there is a better way to do it as well…
Could you please tell me if this is correct or if you have better ideas?
Thank you
Another thing you could do, is store a count variable - from 0 to MAX, and constantly update that. Then query the server for the top 15 villas (using ORDER BY ASC/DESC) on (random_order + count). This will prevent the need to update the column every hour - only the count variable needs to be updated.
EDIT:
First you would get the count (from where you have stored it) and store it in a variable - count.
Then execute a query like
SELECT *, (random_order + <count>)%MAX_VAL AS villa_order
FROM villa_table
ORDER BY villa_order ASC
LIMIT 15
This will prevent constant unnecessary updations to your indexed column.
EDIT 2:
Ok after further analyzing, this is how i would do this.
Execute a simple select query
SELECT * FROM villa_table
WHERE random_order > count
ORDER BY random_order
LIMIT 15
If the number of rows in the result set is < 15 then fill in the remaining records from the beginning using.
SELECT *
FROM villa_table
ORDER BY random_order ASC
LIMIT <number of rows to be filled>
Even on 20m rows on an indexed column this takes < .5s.
Related
I have a table of > 250k rows of 'names' (and ancillary info) which I am displaying using jQuery Datatables.
My Users can choose any 'name' (Row), which is then flagged as 'taken' (and timestamped).
A (very) cut down version of the table is:
Key, Name, Taken, Timestamp
I would like to be able to display the 'taken' rows (in timestamp order) first and then the untaken records in their key order [ASC] next.
The problem would be simple, but, because of size constraints (both visual UI & data set size) My display mechanism paginates - 10 / 20 / 50 / 100 rows (user choice)
Which means a) the total number of 'taken' will vary and b) the pagination length varies.
Thus I can see no obvious method of keeping track of the pagination.
(My Datatable tells me the count of the start record and the length of the displayed records)
My SQL (MySQL) at this level is weak, and I have no idea how to return a record set that accounts for the 'taken' offset without some kind of new (or internal MySQL) numeric indices to paginate to.
I thought of:
Creating a temporary table with the key and a new numeric indices on
each pagination.
Creating a trigger that re-ordered the table when the row was
'taken'.
Having a "Running order" column that was updated on each new 'taken'
Some sort of cursor based procedure (at this point my hair was
ruffled as the explanations shot straight over the top of my head!)
All seem excessive.
I also thought of doing a lot of manipulation in PHP (involving separate queries, dependant on the pagination size, amount of names already taken, and keeping a running record of the pagination position.)
To the Human Computer (Brain) the problem is untaxing - but translating it into SQL has foxed me, as has coming up with a fast alternative to 1-3 (the test case on updating the "Running order" solution took almost three minutes to complete!)
It 'feels' like there should be a smart SQL query answer to this, but all efforts with ORDER BY, LIMITS, and the like fall over unless I return the whole dataset and do a lot of nasty counting.
Is there something like a big elephant in the room I am missing - or am I stuck with the hard slog to get what I need.
A query that displays the 'taken' rows (in timestamp order) first and then the untaken records in their key order [ASC] next:
SELECT *
FROM `table_name`
ORDER BY `taken` DESC, IF(`taken` = 1, `Timestamp`, `Key`) ASC
LIMIT 50, 10
The LIMIT values: 10 is the page size, 50 is the index of the first element on page 6.
Change the condition on IF(taken = 1,Timestamp,Key) with the correct condition to match the values you store in column taken. I assumed you store 1 when the row is 'taken' and 0 otherwise.
I am trying to find an optimal solution for my Database (MySQL), but I'm stuck over the decision whether or not to store a Total column.
This is the simplified version of my database :
I have a Team table, a Game table and a 'Score' table. Game will have {teamId, scoreId,...} while Score table will have {scoreId, Score,...} (Here ... indicates other columns in the tables).
On the home page I need to show the list of Teams with their scores. Over time the number of Teams will grow to 100s while the list of Score(s) will grow to 100000s. Which is the preferred way:
Should I sum up the scores and show along with teams every time the page is requested. (I don't want to cache because the scores will keep changing) OR
Should I have a total_score field in the Team table where I update the total_score of a team every time a new score is added to the Scores table for that group?
Which of the two is a better option or is there any other better way?
I use two guidelines when deciding to store a calculated value. In the best of all worlds, both of these statements will be true.
1) The value must be computationally expensive.
2) The value must have a low probability of changing.
If the cost of calculating the value is very high, but it changes daily, I might consider making a nightly job that updates the value.
Start without the total column and only add it if you start having performance issues.
Calculating sum at request time is better for accuracy but worse for efficiency.
Caching total in a field (dramatically) improves performance of certain queries, but increases code complexity or may show stale data (if you update cached value not at the same time, but via cron job).
It's up to you! :)
I agree that computed values should not be used except for special situations such as month end snapshots of databases.
I would simply create a view with one column in the view equal to your computed total column. Then you can query the view instead of the base tables.
Depending on how often your scores gets updated and what exactly the "score" means
Case1: Score is a LIVE score
If the "score" is the live score like "runs scored in cricket or baseball" or "score of vollyball match or tabletennis" then I really dont understand the need of showing the "sum" of the "running" scores. However, this may be a requirements also in some cases like showing the total runs scored by a team till now + the runs scored so far in the on going (live) game.
In this case I suggest you another option which is combination of your 1st and 2nd option
Total_score in the team table would be good with slight change in your data model. which is
Add a new column in the scores table called LIVE which will be 0 for a finished match 1 for a live match (and optionally -1 indicating match is about to start but the scores wont get update)
Now union two tables something like
select team_id,sum(total_sore) from (
select team_id,total_score from team
union
select team_id,sum(score) total_score from scores where live = 1 group by team_id)subquery
group by team_id
Case2: Score is just a RESULT
Well just query the db directly (your 1st option) as because the result will be updated only after the game ends and the update infact it will be a new entry in the score table.
If my assumption is correct, the scores get updated only after the game is finished. Moreover the update can be even less often when considered the games played by a team.
Question is: How to rank keywords that have been used in search queries in my web application based on time and number of search?
A user types his search query in the text box. Via AJAX I need to return some suggestions to the user. These suggestions are based on number of search done for that keyword, and should be sorted by most recently searched.
For example if a user enters the search term as "hang" the suggestions should be in this order: "hangover part 2", "hangover".
How should I design the database to store the search queries?How should I write the sql query to get the suggestions?
For query suggestion a good way is to count the number of occurrences of each search query (it is probably better to not count repeated queries made by the same user). You'll have a file/table/something (query, count) like this:
"britney spears" 12
"kelly clarkson" 5
"billy joel" 27
"query abcdef" 2
"lady gaga" 39
...
Then you can sort by descending order of occurrence:
"lady gaga" 39
"billy joel" 27
"britney spears" 12
"lady xyz" 5
"query abcdef" 2
...
Then when someone is searching "lady", for example, do a prefix search on all strings from the top of the file/table/something to the bottom. If you only want K suggestions you'll go only until you find the Top-K suggestions.
You could implement this using a simple file, or you can also have a counting query table and do a query similar to:
SELECT q.query from (SELECT * from search_queries order by query_count DESC) as q where q.query LIKE "prefix%" LIMIT 0,K
Two notes:
There are better (and more difficult) ways of doing this. Amazon, for example, has a pretty nice query suggestion.
The provided solution will only suggest queries that starts with the user query. Like:
"lady" => ["lady gaga", "lady xyz"]
Query "lady" won't match "gaga lady". For them to match you will need query indexing, through the Full-Text Search support of your database or an external library such as Lucene.
Ideally, you'd sort on something like the following:
order by sum(# of searches / (how long ago that search was performed + 1))
This would have to be modified so that how long ago would be base on an appropriate base time. For example, if you want searches to count as half after a week, you'd make a week = 1.
This will clearly be inefficient, because calculating how long ago each search was performed for all search results will be time consuming. Thus, you might want to keep a running total for each search and multiply the totals by a certain value each time period. For example, if you want searches to count as half after a week, you would add one to that column for every search. Then, you would have a process that multiplies the search column by .5 every week. Then you just sort on that column.
Do you need something like autosuggestion? There is an JQuery plugin called autocomplete which only looks for similar words as soon as the user types in the letters. However, if you want to get the suggestions based on the number of times that keyword is searched by user, then you need to store the keywords in a separate table and then fetch it later for other user?
I am trying to come up with a database design to hold the "Top 10" results for some calculations that are being done. Basically, when all is said in done, there will be 3 "Top 10" categories, which I am fine with all being separate tables, however I need to be able to go back and later pull historical data about what was in the Top 10 at certain times, hence the need for a database, although a flat-file would work, this has the potential to hold years worth of data.
Now, it's been awhile since I have done anything serious with a database, other than something that had a couple of simple tables, so I am having some issues thinking through this design. If someone could help me with the design of it, I know enough MySQL to get the rest done.
So, in essence, I need to store: A group of 10 names, a % of the total points each name had, the rank they held in the Top 10 and a time associated with that Top 10 (So I can later query for that time)
I would think I need a table for for the Top 10 with 11 columns, one for the ID and 10 for the Foreign Key of the 'Names' table, that holds every name ever used with a PK, Name, %, and Rank. This seems clunky to me, anyone else have a suggestion?
edit:The 'Top 10' is associated with a specific set of data for 5-minute intervals, and each interval is completely independent from the previous or future intervals.
I don't recommend your solution, because then if you want to ask the database "How often has Joe been in the top 10," you have to write 10 queries of the form
SELECT Date FROM Top10 WHERE FirstPlace = 'joe'
SELECT Date FROM Top10 WHERE SecondPlace = 'joe'
...
Instead, how about a Rankings table, with fields:
id
Date
Person
Rank
Then if you want the Top 10 list for a certain date, the query is
SELECT * FROM Rankings WHERE Date = ...
and if you want to know someone's historical ranking, the query is
SELECT * FROM Rankings WHERE Person = ...
and if you want to know all the historical leaders, the query is
SELECT * FROM Rankings WHERE Rank = 1
The downside to this is that you might accidentally make two different people 8th place, and your database would allow the anomaly. But I have good news for you -- people might actually tie for 8th place, so you might actually want that to be possible!
I assume that your "Top 10" is a snapshot data in certain time. And your business logic is that "every 5 minutes" so that the time is the parent entity for table design
top_10_history
th_id - the primary key
th_time - the time point when taking the snapshot data of "Top 10"
top_10_detail
td_th_id - the FK to top_10_history
td_name_id - the FK to name
td_percentage - the "%"
td_rank - the rank
If the sequence of "Top 10" could be calculated from columns in "top_10_detail", you don't need a column to keep the sequence of it. Otherwise, you need a column to persist the sequence for it.
If you need more complicated query such as "The top 10 at 12:00 AM in last 30 days", using individual columns for "day", "hour", and "minute" would be a better idea for performance(with suitable indexes).
Let's say I have a screensaver website. I want to display the CURRENT top 100 screensavers on the front page of the website.
What I mean is, "RECENT" top 100 screensavers. What would be an example query to do this?
My current one is:
SELECT * FROM tbl_screensavers WHERE WEEK(tbl_screensavers.DateAdded) = WEEK('".date("Y-m-d H:i:s",strtotime("-1 week"))."') ORDER BY tbl_screensavers.ViewsCount, tbl_screensavers.DateAdded
This will select the most viewed ("tbl_screensavers.ViewsCount") screensavers that were added ("tbl_screensavers.DateAdded") in the last week.
However, in some cases there are no screensavers, or less than 100 screensavers, submitted in that week.
So, how can I perform a query which would select "RECENT" top 100 screensavers? Hopefully you have an idea of what I'm try to accomplish when I say "RECENT" or "CURRENT" top screensavers. -- aka. the most viewed, recently - not the most viewed, all-time.
Given no other algorithm to weigh the value of a view vs. a recent view, you would just simply want
SELECT * FROM tbl_screensavers ORDER BY ViewsCount limit 100
However, to capture the concept of "recent" you may want to introduce an algorithm to weigh the recent-ness of a particular view. One way to do that is to assign a daysOld score to each view and show the 100 with the lowest score (with this mechanism, low score is good like in golf).
I'm not enough of a MySQL guru to write the query for that, but it would involve summing up the score, computed based on daysOld=today-dateOfScore and then ordering the result set based on that score, with a limit of 100.