Store all possible combinations of a specific number range - mysql

Supposing I have 1000 numbers from 1 -> 1000, and a user can have any of the 1000 combination (eg: 4, 25, 353..).
How can I efficiently store that combination in a MySQL DB.
What I thought. I can use the power of 2, and store each number in a really large int, like:
1 -> 01
2 -> 10
4 -> 100
etc.
So if I happen to get the number 6 (110) I know the user has the combination of numbers 2, 4 (2 | 4 = 6) .
So we can have 2^1000 combinations, 125byte. But that is not efficient at all since bigint has 8bytes and I cant store
that in MySQL without using vachars etc. Nodejs cant handle that big number either (and I dont as well) with 2^53-1 being the max.
Why I am asking this question; can I do the above with base 10 instead of 2 and minimize the max bytes that the int can be. That was silly and I think making it to base10 or another base out of 2 changes nothing.
Edit: Additional thoughts;
So one possible solution is to make them in sets of 16digit numbers then convert them to strings concat them with a delimiter, and store that instead of numbers. (Potentially replace multiple 1's or 0's with a certain character to make it even smaller. Though I have a feeling that falls into the compression fields, but nothing better has come to my mind.)

Based on your question I am assuming you are optimizing for space
If most users have many numbers from the set then 125 bytes the way you described is the best you can do. You can store that in a BINARY(125) column though. In Node.js you could just a Buffer (you could use a plain string but should use a Buffer) to operate on the 125 byte bit-field.
If most users have only a few elements in the set then it will take less space to have a separate table with two columns such as:
user_id | has_element (SMALLINT)
---------------------
1 | 4
1 | 25
1 | 353
2 | 7
2 | 25
2 | 512
2 | 756
2 | 877
This will also make queries cleaner and more efficient for doing simple queries like SELECT user_id FROM user_elements WHERE has_element = 25;. You should probably add an index on has_element if you do queries like that to make them many times more efficient than storing a bitfield in a column.

Related

Correct DB design to store huge amount of stock cryptocurrencies data in DB

I want to store large amount of cryptocurrencies data in db. Then I want to show nice javascript price graphs with historical prices on webpage.
Problem is that I am not sure what database design is best for this problem, I was thinking about Mysql DB, but maybe NOSQL db are better in this case, I don’t know.
What I need:
I need to track at least 100 crypto currencies with historical and
current prices and other stock information like volume etc…
I am going to insert new data every 10 minutes for each crypto ((6
records / hour * 24h * 365 days) * 100 for each crypto = 5 256 000
new records per year )
I need to query various time ranges for each coin to draw graph on webpage.
My idea:
I came with this solution but I need to know if this is ok or I am completely wrong and naive.
In this case I would have 2 tables, first parent table where I would store all necessary info about coins, children table where would be all prices, but this child table would have to contain a huge amount of data, which is worrying me.
My table structure example:
tbl_coin_detail:
id. |Tick_name | Name |Algorithm |Icon
1 | BTC |Bitcoin |SHA256 |path/to/img
2 | ETH |Ethereum |Ethash |path/to/img
.
.
.
tbl_prices:
id | price_USD | price_EUR | datetime | Volume_Day_BTC | FK_coin
1 | 6537.2 | 5 632,28 | 2018-07-01 15:00:00 | 62121.7348556964 | 1
2 | 466.89 | 401.51 | 2018-07-01 15:01:00 | 156373.79481106618 | 2
.
.
.
Another idea is to make separate table for each coin prices, that would mean 100 tables with all historical and current prices and stock info instead of one huge table.
I am really not sure here, what is better, all prices in one table are good for simple querying, but I guess it can be huge performance bottleneck, make queries from separated table will be worse for querying, because I will need to write query for each table but it can help with performance.
Can you point me to right direction how to solve this? SQL DB or NOSQL what is better?
Thank you in advance.
MySQL recommendations...
You have Volume_Day_BTC, yet you say "6 records/hour" -- is the record daily or more fine grained.
The volume of data is not that great, but it will be beneficial to shrink the datatypes before you get started.
id is unnecessary; use PRIMARY KEY(coin, datetime) instead.
Think carefully about the datatype for prices and volumes. At one extreme is space (hence, somewhat, speed); at the other, precision.
DOUBLE -- 8 bytes, about 16 significant digits, large range
DECIMAL(17, 11) -- 8 bytes, limited to $1M and 11 decimal places (not enough?)
DECIMAL(26, 13) -- 12 bytes, maybe big enough?
etc.
Would it be OK to summarize data over, say, one month to save space? Hourly or daily avg/hi/low, etc. This would be very useful for speeding up fetching data for graphing.
In particular, I recommend keeping a Summary table by coin+day with volume, price, etc. Consider using FLOAT (4 bytes, 7 significant digits, sufficient range) as more than good enough for graphing.
So, I am recommending 3 tables:
Coins -- 100 rows with meta info about the currencies.
Prices -- 5M rows/year of details -- unless trimmed (400MB/year)
Summary -- 36500 rows/year for graphing range more than, say, a week. (4MB/yr)
It may be worth it to have an hourly summary table for shorter-range graphs. There is no need to go with weekly or monthly summaries; they can be derived from the daily with sufficient efficiency.
Use InnoDB.
Summary tables
To be honest, that's far from 'huge'. We aren't talking billions of records here, so any properly indexed DB will do just fine.

Get all unset values with spring hibernate jpa

Let's say there is a table like the following:
id | number
----|----------
1 | 1
4 | 6
5 | 2
14 | 3
now I need all numbers that are not set between 1 and the highest number that is set. Here the highest number is 6, so I need all numbers that are not set between 1 and 6. Here it is: 4 and 5
But how can I achieve this with spring and hibernate jpa with a MySQL database?
The highest number is easy. Sort numbers DESC and then the first one. But then return all missing numbers that are not in the database? Is this possible?
One way: select each number that is smaller then the highest number and check if the returned object is null. So first check 5, then 4, then 3, ... But this is of course very slow on big databases.
So another idea was: get all numbers that are set and get the missing numbers on java side with the difference of two lists (one list with the numbers out of the database, the other list with the numbers from 1 to the highest number of the database). But on big databases it is also dumb to get everything. (Let's say, there are 1 million entries and only one number is missing.)
The third idea: something like select where number NULL would be perfect. But for this the database would have to be initialized with all possible numbers ever there. So that is also not possible.
Is there a possible way? Am I overseeing something?

Finding Reccurring Number Combinations in Column of Numbers

I have searched and found discussions and solutions to similar problems, but not quite or as complex as I'm trying to figure out.
I have an access table which consists of two columns Draw Number and Number Drawn as shown below. Draw Number is repeated 20 times, to correspond to the 20 numbers that are drawn in each particular draw.
I'm trying to figure a way to determine the most frequent occurring combination of numbers (5 numbers) for all of the draws in each of the 20 number sets. So for instance, 12341 occurs n x, 12342 occurs nx, 12343 occurs n x, etc.
I've created parameter queries which allow me to search for different number combinations from 2 to 10 numbers, and they work OK returning the number of occurrences of a combination of numbers that I input through a simple UI. But the goal is to figure out pragmatically what the optimum combination of numbers.
Hope this makes sense. And by the way, there are 36 million or so rows in the table. The para queries work quite well however; it takes just over a second to return results for each number added. So, query two numbers = 2 second wait, three numbers = 3 second wait, etc.
I've been thinking about a loop of some type but don't know how to get started? Processing time isn't an issue; can take a day if required!
This is written in VBA and has an assortment of queries, temp tables, etc to get the job done.
The text says Access, but the tags say MySql, which is it? – RBarryYoung 21 hours ago
This part confuses me: I'm trying to figure a way to determine the most frequent occurring combination of numbers (5 numbers) for all of the draws in each of the 20 number sets. So for instance, 12341 occurs n x, 12342 occurs nx, 12343 occurs n x, etc. – Newd 21 hours ago
^What do you mean five numbers? No where in your sample data do I see 12341. Please explain using the data you have, and give expected results using that data. – McAdam331 21 hours ago
drosberg - clarification:
thanks for the response. It is an Access application, but as a first-time poster Stackoverflow recommends tags?
By five numbers I mean the most frequently occurring group of five numbers (I used five as an example, could be groups of 2 to 10 numbers) which occur in each draw, where a draw consists of 20 drawn numbers from a total of 80 numbers. So the data that I posted was intended as an example. The sample provided only has 50, 51 in common. I can plug 50 and 51 into the parameter query and it will tell me that this combination occurs 60,000 times (or whatever), but perhaps 50 and 57 occurs 65,000 times.
If i was to do this manually, and assuming I'm looking for the most frequent 5 number combination I would enter the following in the parameter query: 1,2,3,4,1 group = 30,000 occurrences 1,2,3,4,2 group = 31,000 occurrences 1,2,3,4,3 group = 31,050 occurrences 1,2,3,4,4 group = 29,050 occurrences etc........... etc...........
but I would have to do this for every combination of 5 numbers that can be derived from the numbers 1 thru 80. I'm hoping to have program do the work!!
thanks
don
DRAW NUMBER NUMBER DRAWN
1 1
1 28
1 19
1 3
1 38
1 46
1 43
1 29
1 13
1 22
1 20
1 11
1 50
1 51
1 53
1 54
1 57
1 64
1 76
1 78
2 29
2 14
2 2
2 1
2 35
2 40
2 39
2 30
2 10
2 27
2 21
2 6
2 42
2 50
2 51
2 53
2 54
2 61
2 65
2 69
I wrote a post a while ago about generating permutations with and without repetition using Excel. Perhaps you can use it.
https://michiel.wordpress.com/2015/03/29/permutations-with-repetition-using-excel/
Here's how it works. I am using strings, but you can easily modify that for numbers (since you say you need 5).
You can use the MID function to grab a single char from a string, and generate permutations from it.
=MID(Pattern,MOD([N]/[P],Length)+1,1)
N revers to the column N
P refers to the horizontal row (1,4,16). You can generate these with a formula like =4^.
After putting in the code, you can make a list of all permutations in Excel and in the cell next to it generate a sql query that you can perform as well from VBA.
Example: Looking up Access database in Excel
Or find a commercial tool like http://thingiequery.com/
I don't know if there's any open source tools for it.
I'm thinking that you should consider:
Say there are 100 balls.
Setting up a table to have one row for each "Draw number" with 100 columns one for every possible number each column has type boolean.
When you look to see which draws had number 23 you just add a
WHERE Column23 = true.
For numbers 23 and 56
WHERE Column23 = true AND Column56 = true
This should massivel simplify and speed up your SQL.
You set up a table with every possible combination of numbers.
You run SQL to find the counts.
Harvey

Selecting rows if the total sum of a row is equal to X

I have a table that holds items and their "weight" and it looks like this:
items
-----
id weight
---------- ----------
1 1
2 5
3 2
4 9
5 8
6 4
7 1
8 2
What I'm trying to get is a group where the sum(weight) is exactly X, while honouring the order in which were inserted.
For example, if I were looking for X = 3, this should return:
id weight
---------- ----------
1 1
3 2
Even though the sum of ids 7 and 8 is 3 as well.
Or if I were looking for X = 7 should return
id weight
---------- ----------
2 5
3 2
Although the sum of the ids 1, 3 and 6 also sums 7.
I'm kind of lost in this problem and haven't been able to come up with a query that does at least something similar, but thinking this problem through, it might get extremely complex for the RDBMS to handle. Could this be done with a query? if not, what's the best way I can query the database to get the minimum amount of data to work with?
Edit: As Twelfth says, I need to return the sum, regardless of the amount of rows it returns, so if I were to ask for X = 20, I should get:
id weight
---------- ----------
1 1
3 2
4 9
5 8
This could turn out to be very difficult in sql. What you're attempting to do is solve the knapsack problem, which is non-trivial.
The knapsack problem is interesting from the perspective of computer science for many reasons:
The decision problem form of the knapsack problem (Can a value of at least V be achieved without exceeding the weight W?) is NP-complete, thus there is no possible algorithm both correct and fast (polynomial-time) on all cases, unless P=NP.
While the decision problem is NP-complete, the optimization problem is NP-hard, its resolution is at least as difficult as the decision problem, and there is no known polynomial algorithm which can tell, given a solution, whether it is optimal (which would mean that there is no solution with a larger, thus solving the decision problem NP-complete).
There is a pseudo-polynomial time algorithm using dynamic programming.
There is a fully polynomial-time approximation scheme, which uses the pseudo-polynomial time algorithm as a subroutine, described below.
Many cases that arise in practice, and "random instances" from some distributions, can nonetheless be solved exactly.

MySQL: how to search as much as substrings matches in a table of millions of strings

Let's say I have this strings in a MySQL table:
id | hash
1 | 462a276e262067573e553b5f6a2b4a323e35272d3c6b6227417c4f2654
2 | 5c2670355b6e503f39427a435a423d6d4c7c5156344c336c6c244a7234
3 | 35785c5f45373c495b70522452564b6f4531792b275e40642854772764
...
millions of records !
Now I have a set of substrings (6 character size), for example this:
["76e262", "435a42", "75e406", "95b705", "344c33"]
What I want is to know how many of these substrings are in each string, so the result could be:
id | matches
63 | 5
34 | 5
123 | 3
153 | 3
13 | 2
9 | 1
How can achieve this in a fast way ?
Real numbers and sizes are:
1) Table with 100.000/200.000 hashes
2) Main Hash size: 256 bytes
3) Substring of mini-hashes: 16 of 32 each one
NOTE: I'd like to avoid the "%LIKE%" since it's 16 likes for each row, and millions rows
You can accomplish this by using the Aho-Corasick algorithm: http://en.wikipedia.org/wiki/Aho%E2%80%93Corasick_string_matching_algorithm
MySQL doesn't have a function for that, so you'd need to write your own or consider using a language like java or c to massage the data.
How about a different approach?
You could also consider having a shifting mechanism for your data and the check on the shifting. For example, if your key is 462a276e262067573e553b5f6a2b4a323e35272d3c6b6227417c4f2654 and you know that your hash will have 58 chars, then you would have these variations:
62a276e262067573e553b5f6a2b4a323e35272d3c6b6227417c4f26544
2a276e262067573e553b5f6a2b4a323e35272d3c6b6227417c4f265446
a276e262067573e553b5f6a2b4a323e35272d3c6b6227417c4f2654462
276e262067573e553b5f6a2b4a323e35272d3c6b6227417c4f2654462a
...
Each one of these would be in a column, every one of them would be indexed.
So your query would be simply:
Select * from table where hash like "a27e262%" or s1 like "a27e262%" ...
Note that this would be MUCH faster than LIKE "%value%" as the column is indexed and the LIKE is only checking the begins with.
There are many disadvantages to this solutions: space required for the extra columns, insertion and update time would increase because of the time calculating the shifted columns, and time required to process the result of the select. But you wouldn't need to implement the algorithm in mysql.
You could also require that the minimum length of the string being searched is 6 chars, so you won't need to shift the whole string, only to keep the first 6 digits. If a match is found then you keep looking for the next 6 digits on the next match.