add int value of float to increase selecting speed - mysql

If I have a large table with floating numbers, can it help in reading speed if I add a column that represent the int value of each float? maybe if the int value will be an index, then when I need to select all the floats that starts with certain int it will "filter" the values that are surely not necessary?
For example if there are 10,000 numbers, 5000 of which begin with 14: 14.232, 14.666, etc, is there an sql statement that can increase the selecting speed if I add the int value column?
id | number | int_value |
1 | 11.232 | 11 |
2 | 30.114 | 30 |
3 | 14.888 | 14 |
.. | .. | .. |
3005 | 14.332 | 14 |

You can create a non clustered index on number column itself. and when selecting the data from table you can filtered out with like operator. No need of additional column,
Select * from mytable
where number like '14%'

First of all: Do you have performance issues? If not then why worry?
Then: You need to store decimals, but you are sometimes only interested in the integer part. Yes?
So you have one or more queries of the type
where number >= 14 and number < 15
or
where truncate(number, 0) = 14
Do you already have indexes on the number? E.g.
create index idx on mytable(number);
The first mentioned WHERE clause would probably benefit from it. The second doesn't, because when you invoke a function on the column, the DBMS doesn't see the relation to the index anymore. This shows it can make a difference how you write the query.
If the first WHERE clause is still too slow in spite of the index, you can create a computed column (ALTER TABLE mytable ADD numint int GENERATED ALWAYS AS truncate(number, 0) STORED), index that, and access it instead of the number column in your query. But I doubt that would speed things up noticeably.
As to your example:
if there are 10,000 numbers, 5000 of which begin with 14
This is not called a large table, but a small one. And as you'd want half of the records anyway, the DBMS would simply read all records sequentially and look at the number. It doesn't make a difference whether it looks at an integer or a decimal number. (Well, some nanoseconds maybe, but nothing you would notice.)

Related

Is there a way in MySQL to use aggregate functions in a sub section of binary column?

Suppose we have 2 numbers of 3 bits each attached together like '101100', which basically represents 5 and 4 combined. I want to be able to perform aggregation functions like SUM() or AVG() on this column separately for each individual 3-bit column.
For instance:
'101100'
'001001'
sum(first three column) = 6
sum(last three column) = 5
I have already tried the SUBSTRING() function, however, speed is the issue in that case as this query will run on millions of rows regularly. And string matching will slow the query.
I am also open for any new databases or technologies that may support this functionality.
You can use the function conv() to convert any part of the string to a decimal number:
select
sum(conv(left(number, 3), 2, 10)) firstpart,
sum(conv(right(number, 3), 2, 10)) secondpart
from tablename
See the demo.
Results:
| firstpart | secondpart |
| --------- | ---------- |
| 6 | 5 |
With the current understanding I have of your schema (which is next to none), the best solution would be to restructure your schema so that each data point is its own record instead of all the data points being in the same record. Doing this allows you to have a dynamic number of data points per entry. Your resulting table would look something like this:
id | data_type | value
ID is used to tie all of your data points together. If you look at your current table, this would be whatever you are using for the primary key. For this answer, I am assuming id INT NOT NULL but yours may have additional columns.
Data Type indicates what type of data is stored in that record. This would be the current tables column name. I will be using data_type_N as my values, but yours should be a more easily understood value (e.g. sensor_5).
Value is exactly what it says it is, the value of the data type for the given id. Your values appear to be all numbers under 8, so you could use a TINYINT type. If you have different storage types (VARCHAR, INT, FLOAT), I would create a separate column per type (val_varchar, val_int, val_float).
The primary key for this table now becomes a composite: PRIMARY KEY (id, data_type). Since your previously single record will become N records, the primary key will need to adjust to accommodate that.
You will also want to ensure that you have indexes that are usable by your queries.
Some sample values (using what you placed in your question) would look like:
1 | data_type_1 | 5
1 | data_type_2 | 4
2 | data_type_1 | 1
2 | data_type_2 | 1
Doing this, summing the values now becomes trivial. You would only need to ensure that data_type_N is summed with data_type_N. As an example, this would be used to sum your example values:
SELECT data_type,
SUM(value)
FROM my_table
WHERE id IN (1,2)
GROUP BY data_type
Here is an SQL Fiddle showing how it can be used.

Is it better to have many columns or a single column bit string for many checkboxes

I have the following scenario:
A form with many checkboxes, around 100.
I have 2 ideas on how to save them in database:
1. Multicolumn
I create a table looking like this:
id | box1 | box2 | ... | box100 | updated| created
id: int
box1: bit(1)
SELECT * FROM table WHERE box1 = 1 AND box22 = 1 ...
2. Single data column
Table is simply:
id | data | updated | created
data: varchar(100)
SELECT * FROM table WHERE data LIKE '_______1___ ... ____1____1'
where data looks like 0001100101010......01 each character representing if value was checked or not.
Considering that the table will have 200k+ rows, which is a more scalable solution?
3. Single data column of type JSON
I have no good information about this yet.
Or...
4. A few SETs
5. A few INTs
These are much more compact: about 8 checkboxes per byte.
They are a bit messy to set/test.
Since they are limited to 64 bits, you would need more than one SET or INT. I recommend grouping the bits is some logical way, based on the app.
Be aware of FIND_IN_SET().
Be aware of (1 << $n) for creating the value 2^n.
Be aware of | and & Operators.
Which of the 5 is best? That depends on the queries you need to run -- for searching (if necessary?), for inserting, for updating (if necessary?), and for selecting.
An example: For INTs , WHERE (bits & 0x2C08) = 0x2C08 would simultaneously check for 4 flags being 'ON'. That constant could either be constructed in app code, or ((1<<13) | (1<<11) | (1<<10) | (1<<3)) for bits 3,10,11,13. Meanwhile, the other flags are ignored. If you need them to be 'OFF', the test would be WHERE bits ^ 0x2C08 = 0. If either of these kind of test is your main activity, then Choice 5 is probably the best for both performance and space, though it is somewhat cryptic to read.
When adding another option, SET requires an ALTER TABLE. INT usually has some spare bits (TINYINT UNSIGNED has 8 bits, ... BIGINT UNSIGNED has 64). So, about one time in 8, you would need an ALTER to get a bigger INT or add another INT. Deleting an option: suggest just abandoning that SET element or bit of INT.

Iterating through MySQL rows

I have a simple MySQL table made up of words and an associated number. The numbers are unique for each word. I want to find the first word whose index is larger than a given number. As an example:
-----------------------
| WORD: | F_INDEX: |
|---------------------|
| a | 5 |
| cat | 12 |
| bat | 4002 |
-----------------------
If I was given the number "9" I would want "cat" returned, as it is the first word whose index is larger than 9.
I know that I can get a full list of sorted rows by querying:
SELECT * FROM table_name ORDER BY f_index;
But would, instead, like to make a MySQL query that does this. (The confusion lies in the fact that I'm unsure as to how to keep track of the current row in my query). I know can loop with something like this:
CREATE PROCEDURE looper(desired_index INT)
BEGIN
DECLARE current_index int DEFAULT 0
// Loop here, setting current_index to whatever the next rows index is,
// then do a comparison to check it to our desired_index, breaking out
// if it is greater.
END;
Any help would be greatly appreciated.
Try this:
SELECT t.word
, t.f_index
FROM table_name t
WHERE t.f_index > 9
ORDER
BY t.f_index
LIMIT 1
It is much more efficient to have the database return the row you need, than it is to pull a whole bunch of rows and figure out which one you need.
For best performance of this query, you will want an index ON table_name (f_index,word).
Why don't you just use MYSQL statement to retrieve the first item you found from f_index where the f_index is greater than the value your pass in.
For example :
select word from table_name
where f_index > desired_index
order by f_index
limit 1

Improve performance of count and sum when already indexed

First, here is the query I have:
SELECT
COUNT(*) as velocity_count,
SUM(`disbursements`.`amount`) as summation_amount
FROM `disbursements`
WHERE
`disbursements`.`accumulation_hash` = '40ad7f250cf23919bd8cc4619850a40444c5e90c978f88635a09ccf66a82ffb38e39ea51cdfd651b0ebdac5f5ca37cd7a17e0f60fea6cbce1397ccff5fa37346'
AND `disbursements`.`caller_id` = 1
AND `disbursements`.`active` = 1
AND (version_hash != '86b4111677294b27a1805643d193b8d437b6ddb170b4ed5dec39aa89bf070d160cbbcd697dfc1988efea8429b1f1557625bf956180c65d3dcd3a318280e0d2da')
AND (`disbursements`.`created_at` BETWEEN '2012-12-15 23:33:22'
AND '2013-01-14 23:33:22') LIMIT 1
Explain extended returns the following:
+----+-------------+---------------+-------+-----------------------------------------------------------------------------------------------------------------------------------------------+------------------------------+---------+------+--------+----------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------------+-------+-----------------------------------------------------------------------------------------------------------------------------------------------+------------------------------+---------+------+--------+----------+--------------------------+
| 1 | SIMPLE | disbursements | range | unique_request_index,index_disbursements_on_caller_id,disbursement_summation_index,disbursement_velocity_index,disbursement_version_out_index | disbursement_summation_index | 1543 | NULL | 191422 | 100.00 | Using where; Using index |
+----+-------------+---------------+-------+-----------------------------------------------------------------------------------------------------------------------------------------------+------------------------------+---------+------+--------+----------+--------------------------+
The actual query counts about 95,000 rows. If I explain another query that hits ~50 rows the explain is identical, just with fewer rows estimated.
The index being chosen covers accumulation_hash, caller_id, active, version_hash, created_at, amount in that order.
I've tried playing around with doing COUNT(id) or COUNT(caller_id) since these are non-null fields and return the same thing as count(*), but it doesn't have any impact on the plan or the run time of the actual query.
This is also a heavy insert table, essentially every single query will have had a row inserted or updated since the last time it was run, so the mysql query cache isn't entirely useful.
Before I go and make some sort of bucketed time sequence cache with something like memcache or redis, is there an obvious solution to getting this to work much faster? A normal ~50 row query returns in 5MS, the ones across 90k+ rows are taking 500-900MS and I really can't afford anything much past 100MS.
I should point out the dates are a rolling 30 day window that needs to be essentially real time. Expiration could probably happen with ~1 minute granularity, but new items need to be seen immediately upon commit. I'm also on RDS, Read IOPS are essentially 0, and cpu is about 60-80%. When I'm not querying the giant 90,000+ record items, CPU typically stays below 10%.
You could try an index that has created_at before version_hash (might get a better shot at having an index range scan... not clear how that non-equality predicate on the version_hash affects the plan, but I suspect it disables a range scan on the created_at column.
Other than that, the query and the index look about as good as you are going to get, the EXPLAIN output shows the query being satisfied from the index.
And the performance of the statement doesn't sound too unreasonable, given that it's aggregating 95,000+ rows, especially given the key length of 1543 bytes. That's a much larger size than I normally deal with.
What are the datatypes of the columns in the index, and what is the cluster key or primary key?
accumulation_hash - 128-character representation of 512-bit value
caller_id - integer or numeric (?)
active - integer or numeric (?)
version_hash - another 128-characters
created_at - datetime (8bytes) or timestamp (4bytes)
amount - numeric or integer
95,000 rows at 1543 bytes each is on the order of 140MB of data.

Get first available numbers not listed in my MySQL database table

Any SQL to get first numbers not listed in my MySQL database table?
Ex:
Table:
Users
ID | Name | Number
------------------------
1 | John | 1456
2 | Phil | 345
3 | Jenny | 345612
In this case the SQL must return me list of row with number from 1 to 344 and 346 to 1455 and 1457 to 345611
Any suggestions? Maybe with some procedure?
I like the answer by #pst but would suggest another alternative.
Create a new table of unassigned numbers, insert a few thousand rows or so in there.
Present some of those numbers to the user.
When a number is used, delete it from the unassigned numbers table.
Periodically generate more unassigned numbers as needed.
The generation of those unassigned numbers could use the random method suggested by #pst, but using this method you move the uncertainty of how long it'll take to generate a list of unassigned numbers into a batch task rather than having to do it at the front end while the user is waiting. This probably isn't an issue if the usage of the number space is sparse, but as more of the number space becomes used, it becomes a bigger issue.
Given the comment(s), my first approach would be use a "random number" probe. This approach assumes:
Number is indexed; and
There are "significantly less" users than available numbers
Approach:
Choose N (i.e. 1-10) numbers at random on the client;
Query the database for Number IN (ns..), or Number = n for N=1; then
If the number is available can be detected based on not finding the requested record(s).
A size of N=1 is likely "okay" in this case and it is the most trivial to implement although it will require at least 6 database requests to find 6 free numbers. A larger N would decrease the number of trips to the database.
Make sure to use transactions.
SELECT 'start', 1 AS number FROM tableA
UNION
SELECT 'min', number - 1 number FROM tableA
UNION
SELECT 'max', number + 1 number FROM tableA
ORDER BY number
You can check the answer at http://www.sqlfiddle.com/#!2/851de/6
Then you can make a comparison of missing numbers when you populate the next time.
Just use an auto increment column. The database will assign the next number automatically. You don't need to even know what it is at the time of the insert. Just tell the user the number he got, don't give him a choice at all.
Based on your comments, the approach below might work for you. It doesn't really answer your specific question, but it probably meets your requirements.
I'm going to assume your requirements cannot change (e.g., presenting users with 6 possible id choices). Frankly I think it's a bit of a weird requirement, but it makes for some interesting SQL. :-)
Here's my approach: generate 10 random numbers. Filter out any already in the database. Present 6 of these random numbers to your user. Random id numbers have very nice properties with respect to transactionality compared to sequential id numbers, so this should scale very nicely should your app become popular.
SELECT
temp.i
FROM
(
SELECT 18 AS i -- 10 random
UNION SELECT 42 -- numbers.
UNION SELECT 88
UNION SELECT 191 -- Let's assume
UNION SELECT 192 -- you generated
UNION SELECT 193 -- these in the
UNION SELECT 1000 -- application
UNION SELECT 123456 -- layer.
UNION SELECT 1092930
UNION SELECT 9892919
) temp
LEFT JOIN
mytable ON (temp.i = mytable.i)
WHERE
mytable.i IS NULL -- filter out collisions
LIMIT
6 -- limit results to 6
SQL pop quiz time!!!
Why does the line "WHERE mytable.i IS NULL" filter collisions? (Hint: How can mytable.i be null when it's a primary key?)
Here's some test data:
CREATE TABLE mytable (i BIGINT PRIMARY KEY) ;
INSERT INTO mytable VALUES (88), (3), (192), (123456) ;
Run the query above, and here's the result. Notice that 88, 192, and 123456 were filtered out, since they would be collisions against the test data.
+---------+
| i |
+---------+
| 18 |
| 42 |
| 191 |
| 193 |
| 1000 |
| 1092930 |
+---------+
And how to generate those random numbers? Probably rand() * 9223372036854775807 would work. (Assuming you don't want negative numbers!)