I have a simple MySQL table made up of words and an associated number. The numbers are unique for each word. I want to find the first word whose index is larger than a given number. As an example:
-----------------------
| WORD: | F_INDEX: |
|---------------------|
| a | 5 |
| cat | 12 |
| bat | 4002 |
-----------------------
If I was given the number "9" I would want "cat" returned, as it is the first word whose index is larger than 9.
I know that I can get a full list of sorted rows by querying:
SELECT * FROM table_name ORDER BY f_index;
But would, instead, like to make a MySQL query that does this. (The confusion lies in the fact that I'm unsure as to how to keep track of the current row in my query). I know can loop with something like this:
CREATE PROCEDURE looper(desired_index INT)
BEGIN
DECLARE current_index int DEFAULT 0
// Loop here, setting current_index to whatever the next rows index is,
// then do a comparison to check it to our desired_index, breaking out
// if it is greater.
END;
Any help would be greatly appreciated.
Try this:
SELECT t.word
, t.f_index
FROM table_name t
WHERE t.f_index > 9
ORDER
BY t.f_index
LIMIT 1
It is much more efficient to have the database return the row you need, than it is to pull a whole bunch of rows and figure out which one you need.
For best performance of this query, you will want an index ON table_name (f_index,word).
Why don't you just use MYSQL statement to retrieve the first item you found from f_index where the f_index is greater than the value your pass in.
For example :
select word from table_name
where f_index > desired_index
order by f_index
limit 1
Related
What I mean by literal order is that, altough the IDs are auto-increment, through business logic, it might end up that 8 comes after 4 when 5 should've been there. That is to say, if a deletion if ID happens, there's no re-indexing
This is how my rows look (table name is wp_posts):
+-----+-------------+----+--+--+--+
| ID | post_author | .. | | | |
+-----+-------------+----+--+--+--+
| 4 | .. | | | | |
+-----+-------------+----+--+--+--+
| 8 | .. | | | | |
+-----+-------------+----+--+--+--+
| 124 | .. | | | | |
+-----+-------------+----+--+--+--+
| 672 | .. | | | | |
+-----+-------------+----+--+--+--+
| 673 | .. | | | | |
+-----+-------------+----+--+--+--+
| 674 | .. | | | | |
+-----+-------------+----+--+--+--+
ID is an int that has the auto-increment characteristic, but when a post is deleted, there is no re-assignment of IDs. It will just simply get deleted and because it's auto-increment, you can still assume that, vertically, the items that come after the one you're looking at are always bigger than the ones before.
I'm querying for ID: SELECT ID FROM wp_posts to get a list of all the IDs I need. Now, it just so happens that I need to batch all of this, using AJAX requests because once I retrieve the IDs, I need to operate on them.
Thing is, I don't really understand how to pass my data back to AJAX. What LIMIT does is, if I provide 2 arguments, such as: SELECT ID FROM wp_posts LIMIT 1,3, it'll return back 4,8,124 because it looks at row number. But what do I do on the next call? Yes, the first call always starts with 1, but once I need to launch the second AJAX request to perform yet another SELECT, how do I know where I should start? In my case, I'd want to start again at 4, so, my second query would be SELECT ID FROM wp_posts LIMIT 4, 7 and so on.
Do I really need to send that counter (even if I can automate it, since, you see, it's an increment of 3) back?
Is there no way for SQL to handle this automatically?
You have many confusions in your question. Let me try to clear up some basic ones.
First, the auto-incremented key is the primary key for the table. You do not need to worry about gaps. In fact, the key should basically be meaningless. It fulfills the following:
It is guaranteed to be unique.
It is guaranteed to be in insertion order.
Gaps are allowed and of no concern. There is no re-indexing. It is a bad idea because:
Primary keys uniquely identify each row and this mapping should be consistent across time.
Primary keys are used in other tables to refer to values, so re-indexing would either invalidate those relationships or require massive changes to many tables.
Re-indexes pre-supposes that the value means something, when it doesn't.
Second, a query such as:
SELECT ID
FROM wp_posts
LIMIT 1, 3;
Can return any three rows. Why? Because you have no specified an ORDER BY and SQL result sets without ORDER BY are unordered. There are no guarantees. So you should always be in the habit of using an ORDER BY.
Third, if you want to essentially "page" through results, then use the OFFSET feature in LIMIT (as you have above):
SELECT ID
FROM wp_posts
ORDER BY ID
LIMIT #offset, 3;
This will allow you to reset the #offset value and go to which rows you want.
First query:
SELECT ID FROM wp_posts ORDER BY ID LIMIT 3
This returns 4,8,124 as you said. In your client, save the largest ID value in a variable.
Subsequent queries:
SELECT ID FROM wp_posts WHERE ID > ? ORDER BY ID LIMIT 3
Send a parameter into this query using the greated ID value from the previous result. It's still in a variable.
This also helps make the query faster, because it doesn't have to skip all those initial rows every time. Paging through a large dataset using LIMIT/OFFSET is pretty inefficient. SQL has to actually read all those rows even though it's not going to return them.
But if you use WHERE ID > ? then SQL can efficiently start the scan in the right place, on the first row that would be included in the result.
Seems, you want to return the first three rows of your query ordered by currently existing ID values(whatever they're after all DML statement's applied on the table wp_posts).
Then, Consider using an auxiliary iteration variable #i to provide an ordered integer value set starting from 1 and increasing as 2,3,... without any gaps :
select t.*
from
(
select #i := #i + 1 as rownum, t1.*
from tab t1
join (select #i:=0) t2
) t
order by rownum
limit 0,3;
Demo
Suppose we have 2 numbers of 3 bits each attached together like '101100', which basically represents 5 and 4 combined. I want to be able to perform aggregation functions like SUM() or AVG() on this column separately for each individual 3-bit column.
For instance:
'101100'
'001001'
sum(first three column) = 6
sum(last three column) = 5
I have already tried the SUBSTRING() function, however, speed is the issue in that case as this query will run on millions of rows regularly. And string matching will slow the query.
I am also open for any new databases or technologies that may support this functionality.
You can use the function conv() to convert any part of the string to a decimal number:
select
sum(conv(left(number, 3), 2, 10)) firstpart,
sum(conv(right(number, 3), 2, 10)) secondpart
from tablename
See the demo.
Results:
| firstpart | secondpart |
| --------- | ---------- |
| 6 | 5 |
With the current understanding I have of your schema (which is next to none), the best solution would be to restructure your schema so that each data point is its own record instead of all the data points being in the same record. Doing this allows you to have a dynamic number of data points per entry. Your resulting table would look something like this:
id | data_type | value
ID is used to tie all of your data points together. If you look at your current table, this would be whatever you are using for the primary key. For this answer, I am assuming id INT NOT NULL but yours may have additional columns.
Data Type indicates what type of data is stored in that record. This would be the current tables column name. I will be using data_type_N as my values, but yours should be a more easily understood value (e.g. sensor_5).
Value is exactly what it says it is, the value of the data type for the given id. Your values appear to be all numbers under 8, so you could use a TINYINT type. If you have different storage types (VARCHAR, INT, FLOAT), I would create a separate column per type (val_varchar, val_int, val_float).
The primary key for this table now becomes a composite: PRIMARY KEY (id, data_type). Since your previously single record will become N records, the primary key will need to adjust to accommodate that.
You will also want to ensure that you have indexes that are usable by your queries.
Some sample values (using what you placed in your question) would look like:
1 | data_type_1 | 5
1 | data_type_2 | 4
2 | data_type_1 | 1
2 | data_type_2 | 1
Doing this, summing the values now becomes trivial. You would only need to ensure that data_type_N is summed with data_type_N. As an example, this would be used to sum your example values:
SELECT data_type,
SUM(value)
FROM my_table
WHERE id IN (1,2)
GROUP BY data_type
Here is an SQL Fiddle showing how it can be used.
If I have a large table with floating numbers, can it help in reading speed if I add a column that represent the int value of each float? maybe if the int value will be an index, then when I need to select all the floats that starts with certain int it will "filter" the values that are surely not necessary?
For example if there are 10,000 numbers, 5000 of which begin with 14: 14.232, 14.666, etc, is there an sql statement that can increase the selecting speed if I add the int value column?
id | number | int_value |
1 | 11.232 | 11 |
2 | 30.114 | 30 |
3 | 14.888 | 14 |
.. | .. | .. |
3005 | 14.332 | 14 |
You can create a non clustered index on number column itself. and when selecting the data from table you can filtered out with like operator. No need of additional column,
Select * from mytable
where number like '14%'
First of all: Do you have performance issues? If not then why worry?
Then: You need to store decimals, but you are sometimes only interested in the integer part. Yes?
So you have one or more queries of the type
where number >= 14 and number < 15
or
where truncate(number, 0) = 14
Do you already have indexes on the number? E.g.
create index idx on mytable(number);
The first mentioned WHERE clause would probably benefit from it. The second doesn't, because when you invoke a function on the column, the DBMS doesn't see the relation to the index anymore. This shows it can make a difference how you write the query.
If the first WHERE clause is still too slow in spite of the index, you can create a computed column (ALTER TABLE mytable ADD numint int GENERATED ALWAYS AS truncate(number, 0) STORED), index that, and access it instead of the number column in your query. But I doubt that would speed things up noticeably.
As to your example:
if there are 10,000 numbers, 5000 of which begin with 14
This is not called a large table, but a small one. And as you'd want half of the records anyway, the DBMS would simply read all records sequentially and look at the number. It doesn't make a difference whether it looks at an integer or a decimal number. (Well, some nanoseconds maybe, but nothing you would notice.)
I have a field for comments used to store the title of the item sold on the site as well as the bid number (bid_id). Unfortunately, the bid_id is not stored on its own in that table.
I want to query items that have a number (the bid_id) greater than 4,000 for example.
So, what I have is:
select * from mysql_table_name where comment like '< 4000'
I know this won't work, but I need something similar that works.
Thanks a lot!
Just get your bid_id column cleaned up. Then index is.
create table `prior`
( id int auto_increment primary key,
comments text not null
);
insert `prior` (comments) values ('asdfasdf adfas d d 93827363'),('mouse cat 12345678');
alter table `prior` add column bid_id int; -- add a nullable int column
select * from `prior`; -- bid_id is null atm btw
update `prior` set bid_id=right(comments,8); -- this will auto-cast to an int
select * from `prior`;
+----+-----------------------------+----------+
| id | comments | bid_id |
+----+-----------------------------+----------+
| 1 | asdfasdf adfas d d 93827363 | 93827363 |
| 2 | mouse cat 12345678 | 12345678 |
+----+-----------------------------+----------+
Create the index:
CREATE INDEX `idxBidId` ON `prior` (bid_id); -- or unique index
select * from mysql_table_name where substring(comment,start,length, signed integer) < 4000
This will work, but I suggest create new column and put the bid value in it then compare.
To update value in new column you can use
update table set newcol = substring(comment,start,length)
Hope this will help
There is nothing ready that works like that.
You could write a custom function or loadable UDF, but it would be a significant work, with significant impact on the database. Then you could run WHERE GET_BID_ID(comment) < 4000.
What you can do more easily is devise some way of extracting the bid_id using available string functions.
For example if the bid_id is always in the last ten characters, you can extract those, and replace all characters that are not digits with nil. What is left is the bid_id, and that you can compare.
Of course you need a complex expression with LENGTH(), SUBSTRING(), and REPLACE(). If the bid_id is between easily recognizable delimiters, then perhaps SUBSTRING_INDEX() is more your friend.
But better still... add an INTEGER column, initialize it to null, then store there the extracted bid_id. Or zero if you're positive there's no bid_id. Having data stored in mixed contexts is evil (and a known SQL antipattern to boot). Once you have the column available, you can select every few seconds a small number of items with new_bid_id still NULL and subject those to extraction, thereby gradually amending the database without overloading the system.
In practice
This is the same approach one would use with more complicated cases. We start by checking what we have (this is a test table)
SELECT commento FROM arti LIMIT 3;
+-----------------------------------------+
| commento |
+-----------------------------------------+
| This is the first comment 100 200 42500 |
| Another 7 Q 32768 |
| And yet another 200 15 55332 |
+-----------------------------------------+
So we need the last characters:
SELECT SUBSTRING(commento, LENGTH(commento)-5) FROM arti LIMIT 3;
+-----------------------------------------+
| SUBSTRING(commento, LENGTH(commento)-5) |
+-----------------------------------------+
| 42500 |
| 32768 |
| 55332 |
+-----------------------------------------+
This looks good but it is not; there's an extra space left before the ID. So 5 doesn't work, SUBSTRING is 1-based. No matter; we just use 4.
...and we're done.
mysql> SELECT commento FROM arti WHERE SUBSTRING(commento, LENGTH(commento)-4) < 40000;
+-------------------+
| commento |
+-------------------+
| Another 7 Q 32768 |
+-------------------+
mysql> SELECT commento FROM arti WHERE SUBSTRING(commento, LENGTH(commento)-4) BETWEEN 35000 AND 55000;
+-----------------------------------------+
| commento |
+-----------------------------------------+
| This is the first comment 100 200 42500 |
+-----------------------------------------+
The problem is if you have a number not of the same length (e.g. 300 and 131072). Then you need to take a slice large enough for the larger number, and if the number is short, you will get maybe "1 5 300" in your slice. That's where SUBSTRING_INDEX comes to the rescue: by capturing seven characters, from " 131072" to "1 5 300", the ID will always be in the last space separated token of the slice.
IN THIS LAST CASE, when numbers are not of the same length, you will find a problem. The extracted IDs are not numbers at all - to MySQL, they are strings. Which means that they are compared in lexicographic, not numerical, order; and "17534" is considered smaller than "202", just like "Alice" comes before "Bob". To overcome this you need to cast the string as unsigned integer, which further slows down the operations.
WHERE CAST( SUBSTRING(...) AS UNSIGNED) < 4000
Is there a way I can store multiple values in a single cell instead of different rows, and search for them?
Can I do:
pId | available
1 | US,UK,CA,SE
2 | US,SE
Instead of:
pId | available
1 | US
1 | UK
1 | CA
1 | SE
Then do:
select pId from table where available = 'US'
You can do that, but it makes the query inefficient. You can look for a substring in the field, but that means that the query can't make use of any index, which is a big performance issue when you have many rows in your table.
This is how you would use it in your special case with two character codes:
select pId from table where find_in_set('US', available)
Keeping the values in separate records makes every operation where you use the values, like filtering and joining, more efficient.
you can use the like operator to get the result
Select pid from table where available like '%US%'