I have a table of > 250k rows of 'names' (and ancillary info) which I am displaying using jQuery Datatables.
My Users can choose any 'name' (Row), which is then flagged as 'taken' (and timestamped).
A (very) cut down version of the table is:
Key, Name, Taken, Timestamp
I would like to be able to display the 'taken' rows (in timestamp order) first and then the untaken records in their key order [ASC] next.
The problem would be simple, but, because of size constraints (both visual UI & data set size) My display mechanism paginates - 10 / 20 / 50 / 100 rows (user choice)
Which means a) the total number of 'taken' will vary and b) the pagination length varies.
Thus I can see no obvious method of keeping track of the pagination.
(My Datatable tells me the count of the start record and the length of the displayed records)
My SQL (MySQL) at this level is weak, and I have no idea how to return a record set that accounts for the 'taken' offset without some kind of new (or internal MySQL) numeric indices to paginate to.
I thought of:
Creating a temporary table with the key and a new numeric indices on
each pagination.
Creating a trigger that re-ordered the table when the row was
'taken'.
Having a "Running order" column that was updated on each new 'taken'
Some sort of cursor based procedure (at this point my hair was
ruffled as the explanations shot straight over the top of my head!)
All seem excessive.
I also thought of doing a lot of manipulation in PHP (involving separate queries, dependant on the pagination size, amount of names already taken, and keeping a running record of the pagination position.)
To the Human Computer (Brain) the problem is untaxing - but translating it into SQL has foxed me, as has coming up with a fast alternative to 1-3 (the test case on updating the "Running order" solution took almost three minutes to complete!)
It 'feels' like there should be a smart SQL query answer to this, but all efforts with ORDER BY, LIMITS, and the like fall over unless I return the whole dataset and do a lot of nasty counting.
Is there something like a big elephant in the room I am missing - or am I stuck with the hard slog to get what I need.
A query that displays the 'taken' rows (in timestamp order) first and then the untaken records in their key order [ASC] next:
SELECT *
FROM `table_name`
ORDER BY `taken` DESC, IF(`taken` = 1, `Timestamp`, `Key`) ASC
LIMIT 50, 10
The LIMIT values: 10 is the page size, 50 is the index of the first element on page 6.
Change the condition on IF(taken = 1,Timestamp,Key) with the correct condition to match the values you store in column taken. I assumed you store 1 when the row is 'taken' and 0 otherwise.
Related
What I want to achieve:
I am developing website with a catalog of products.
This is normalized model (simplified) of entities which are related to my question:
So some product features exist (like size and type in this example), which all have predefined sets of values (e.g. sizes 1, 2 and 3 exist, and type may be 1, 2 or 3 (these sets do not have to be equal, just example.)).
Relationship between Product and each of features is "many-to-many" - different values of one feature do not exclude each other.
My task is to build form which will allow user to filter search results, based on features of products. Example screenshot:
Multiple checked values of one feature are mixed using "AND" logic, so if I have sizes One and Three checked, I need all products, which have both sizes (+ may have any other sizes, that doesn't matter, but selected ones must be present).
Number near each value of feature represents amount of products, which is returned if user checks this value right now. So it is effectively a number of products satisfying filter "current active filter + this one value applied".
When user checks/unchecks any value, counters must be updated considering new "current filter".
Problem:
Real use case is: ~200k products, ~6 features with ~5-15 values each.
My COUNT queries, (especially with decent number of selected options) are too slow, and to render the form I need as many of these counts as there are values of all filters - in total that gives unacceptable response time.
What I have tried:
Query to retrieve results:
select * from products p, product_size ps
where p.id = ps.product_id
and (ps.size_id IN (1, 2, 3, 5))
group by p.id
having count(p.id) = 4;
(this is to select products which have sizes 1, 2, 3 and 5 at the same time).
It completes in ~0.360 sec on 120k products, almost same time with COUNT wrapped around it. And this query does not allow more than one feature (but I could place values of all features in one table).
Another query to retrieve the same set:
SELECT ps1.product_id
FROM product_size AS ps1, (SELECT id FROM size AS s1 WHERE id IN (1, 2, 3, 5)) AS t
WHERE ps1.size_id = t.id
GROUP BY ps1.product_id
HAVING COUNT(ps1.size_id) = (SELECT COUNT(id) FROM (SELECT id FROM size AS s2 WHERE id IN (1, 2, 3, 5)) AS t2);
It completes in ~0.230 sec (same time when wrapped in COUNT) and does not allow multiple features too.
It is modified query I found here: https://www.simple-talk.com/sql/t-sql-programming/divided-we-stand-the-sql-of-relational-division/ (second query in "Division with a Remainder" part).
Alternative schema:
Denormalized model, where value of each feature is a boolean column in Products table.
The query is obvious here:
select * from products
where `size_1` = 1 and `size_2` = 1
and `size_3` = 1 and `size_5` = 1;
Weird and harder to maintain in application's code, but completes in ~0.056 sec when COUNT-ing.
None of these methods are acceptable per se because multiplied ~30 times (to populate all counters in form) that gives inadequate response time.
Caching and precomputing
Data in DB is going to be updated only few times a day (like, may be, even 2), so I could probably precompute counts for all combinations of filters when data is updated (I haven't measured necessary time to be honest), but it is anyway not going to work too - search form has fields with arbitrary values (like min/max price and text search by the product's name), which I can't precompute for.
Load counters in form dynamically
Render form, but fetch numbers through AJAX, so user would be able to see page, and then, after quite long waiting, numbers. This is my last thought, but it seems like poor quality of service for me (may be it is worse than no counters at all).
I am stuck. Any hints? May be I am not seeing some bigger picture? I would be very glad to any advice.
UPDATE: if we forget about counters, what is the effective and usually used way (query) for just retrieving results with such a filters (or what am I doing wrong)? Like "find post with all requested tags" model, that is equivalent. I suspect it can be faster than my 0.230 sec (query #2), considering small (?) amount of rows for MySQL.
You can
Create one table which will store all possible combinations (product_id <> size_id <> type_id)
Update this table when Admin will make any changes in product from backend (assuming there will be a backend management)
In frontend, for filters, use this table instead of product tables, and extract product ids once filter query is fired
Once you have list of product ids for result, you can fetch actual data by using those product Ids
I have used this before, and it worked for me, you can first make table and try running query to check response time.
Hope this helps.
I am trying to find an optimal solution for my Database (MySQL), but I'm stuck over the decision whether or not to store a Total column.
This is the simplified version of my database :
I have a Team table, a Game table and a 'Score' table. Game will have {teamId, scoreId,...} while Score table will have {scoreId, Score,...} (Here ... indicates other columns in the tables).
On the home page I need to show the list of Teams with their scores. Over time the number of Teams will grow to 100s while the list of Score(s) will grow to 100000s. Which is the preferred way:
Should I sum up the scores and show along with teams every time the page is requested. (I don't want to cache because the scores will keep changing) OR
Should I have a total_score field in the Team table where I update the total_score of a team every time a new score is added to the Scores table for that group?
Which of the two is a better option or is there any other better way?
I use two guidelines when deciding to store a calculated value. In the best of all worlds, both of these statements will be true.
1) The value must be computationally expensive.
2) The value must have a low probability of changing.
If the cost of calculating the value is very high, but it changes daily, I might consider making a nightly job that updates the value.
Start without the total column and only add it if you start having performance issues.
Calculating sum at request time is better for accuracy but worse for efficiency.
Caching total in a field (dramatically) improves performance of certain queries, but increases code complexity or may show stale data (if you update cached value not at the same time, but via cron job).
It's up to you! :)
I agree that computed values should not be used except for special situations such as month end snapshots of databases.
I would simply create a view with one column in the view equal to your computed total column. Then you can query the view instead of the base tables.
Depending on how often your scores gets updated and what exactly the "score" means
Case1: Score is a LIVE score
If the "score" is the live score like "runs scored in cricket or baseball" or "score of vollyball match or tabletennis" then I really dont understand the need of showing the "sum" of the "running" scores. However, this may be a requirements also in some cases like showing the total runs scored by a team till now + the runs scored so far in the on going (live) game.
In this case I suggest you another option which is combination of your 1st and 2nd option
Total_score in the team table would be good with slight change in your data model. which is
Add a new column in the scores table called LIVE which will be 0 for a finished match 1 for a live match (and optionally -1 indicating match is about to start but the scores wont get update)
Now union two tables something like
select team_id,sum(total_sore) from (
select team_id,total_score from team
union
select team_id,sum(score) total_score from scores where live = 1 group by team_id)subquery
group by team_id
Case2: Score is just a RESULT
Well just query the db directly (your 1st option) as because the result will be updated only after the game ends and the update infact it will be a new entry in the score table.
If my assumption is correct, the scores get updated only after the game is finished. Moreover the update can be even less often when considered the games played by a team.
I am a website developer and I need help for an analyse: My (future) website is more or less a villa directory. People can add their villas there. Each villa will be stored in database.
I need to show 15 villas per page but I want a "turn over" (not sure it's the correct word in English) of the villas: every hour the villa that appears first on first page becomes the last villa of last page (so every villa rank increase of 1 except the first one that become the last). I want every villa to have the same chance (more or less) to appear on the first page. I don’t want a totally random system.
I need help on how to make a simple system that would not take a lot of resources (should be working with a few millions of records).
Note: I don’t want to use the ID of the villa because if a person posts 3 different villas at the same time, they will be all shown next to each other.
My proposition:
I create a field (INTEGER) called “random_order” for each villa and I put a random number between 0 and Max(INTEGER) and I create an Index on the column “random_order”.
Then to get the records in the order I want, I store (dunno where yet) a variable that point to a record in the index. Then every hours, I increase by 1 this variable (with a modulo).
I’m not an expert on indexes so I’m not really sure if it’s possible to do it and how to do it. I don’t know if there is a better way to do it as well…
Could you please tell me if this is correct or if you have better ideas?
Thank you
Another thing you could do, is store a count variable - from 0 to MAX, and constantly update that. Then query the server for the top 15 villas (using ORDER BY ASC/DESC) on (random_order + count). This will prevent the need to update the column every hour - only the count variable needs to be updated.
EDIT:
First you would get the count (from where you have stored it) and store it in a variable - count.
Then execute a query like
SELECT *, (random_order + <count>)%MAX_VAL AS villa_order
FROM villa_table
ORDER BY villa_order ASC
LIMIT 15
This will prevent constant unnecessary updations to your indexed column.
EDIT 2:
Ok after further analyzing, this is how i would do this.
Execute a simple select query
SELECT * FROM villa_table
WHERE random_order > count
ORDER BY random_order
LIMIT 15
If the number of rows in the result set is < 15 then fill in the remaining records from the beginning using.
SELECT *
FROM villa_table
ORDER BY random_order ASC
LIMIT <number of rows to be filled>
Even on 20m rows on an indexed column this takes < .5s.
I am trying to come up with a database design to hold the "Top 10" results for some calculations that are being done. Basically, when all is said in done, there will be 3 "Top 10" categories, which I am fine with all being separate tables, however I need to be able to go back and later pull historical data about what was in the Top 10 at certain times, hence the need for a database, although a flat-file would work, this has the potential to hold years worth of data.
Now, it's been awhile since I have done anything serious with a database, other than something that had a couple of simple tables, so I am having some issues thinking through this design. If someone could help me with the design of it, I know enough MySQL to get the rest done.
So, in essence, I need to store: A group of 10 names, a % of the total points each name had, the rank they held in the Top 10 and a time associated with that Top 10 (So I can later query for that time)
I would think I need a table for for the Top 10 with 11 columns, one for the ID and 10 for the Foreign Key of the 'Names' table, that holds every name ever used with a PK, Name, %, and Rank. This seems clunky to me, anyone else have a suggestion?
edit:The 'Top 10' is associated with a specific set of data for 5-minute intervals, and each interval is completely independent from the previous or future intervals.
I don't recommend your solution, because then if you want to ask the database "How often has Joe been in the top 10," you have to write 10 queries of the form
SELECT Date FROM Top10 WHERE FirstPlace = 'joe'
SELECT Date FROM Top10 WHERE SecondPlace = 'joe'
...
Instead, how about a Rankings table, with fields:
id
Date
Person
Rank
Then if you want the Top 10 list for a certain date, the query is
SELECT * FROM Rankings WHERE Date = ...
and if you want to know someone's historical ranking, the query is
SELECT * FROM Rankings WHERE Person = ...
and if you want to know all the historical leaders, the query is
SELECT * FROM Rankings WHERE Rank = 1
The downside to this is that you might accidentally make two different people 8th place, and your database would allow the anomaly. But I have good news for you -- people might actually tie for 8th place, so you might actually want that to be possible!
I assume that your "Top 10" is a snapshot data in certain time. And your business logic is that "every 5 minutes" so that the time is the parent entity for table design
top_10_history
th_id - the primary key
th_time - the time point when taking the snapshot data of "Top 10"
top_10_detail
td_th_id - the FK to top_10_history
td_name_id - the FK to name
td_percentage - the "%"
td_rank - the rank
If the sequence of "Top 10" could be calculated from columns in "top_10_detail", you don't need a column to keep the sequence of it. Otherwise, you need a column to persist the sequence for it.
If you need more complicated query such as "The top 10 at 12:00 AM in last 30 days", using individual columns for "day", "hour", and "minute" would be a better idea for performance(with suitable indexes).
I found a weard problem with my MySQL DB.
sometime when I insert new data into it, the way it arranges the data is like a stack, for example
4 (newest)
3
2
1 (oldest)
...
how can I make it arranged like this?
1 (newest)
2
3
4 (oldest)
thanks all.
SELECT *
FROM TABLE
ORDER BY ID
You have to remember that when viewing/selecting data from a table without any ORDER BY specified does not garuantee any specific order.
The way you are view the data (unordered) can be due to any one of a lot of factos (the database engine, schema, page storage, page fragmentation, indexes, primary keys, or simply execution plan optimization).
The SQL standards specifically states that tables do not have a "natural" order. Therefore, the database engine is free to return a request without an ORDER BY in any order it wants to. The order may change from one request to another because most engines choose to return the data in whatever order they can get it to you the most rapidly.
It follows, therefore, that if you want the data out in a particular order you must include a column in your table whose job is to proxy for the order in which you added records to the table. Two common ways of doing this are using an autoincrement field which will be in numerical order from oldest to newest record, and a TIMESTAMP column which does just what it says. Once you have such a column you can use ORDER BY ColumnName when searching to get an ordered result set.